• | 8:00 am

After a week of DeepSeek freakout, doubts and mysteries remain

The hard lessons learned from the DeepSeek models may ultimately help U.S. AI companies and speed progress toward human-level AI.

After a week of DeepSeek freakout, doubts and mysteries remain
[Source photo: DG-Studio/Adobe Stock]

Welcome to AI DecodedFast Company’s weekly newsletter that breaks down the most important news in the world of AI. You can sign up to receive this newsletter every week here.

After a week of DeepSeek freakout, doubts and mysteries remain

The Chinese company DeepSeek sent shockwaves through the AI and investment communities this week as people learned that it created state-of-the-art AI models using far less computing power and capital than anyone thought possible. The company then showed its work in published research papers and by making its models available to other developers. This raised two burning questions: Has the U.S. lost its edge in the AI race? And will we really need as many expensive AI chips as we’ve been told?

How much computing power did DeepSeek really use? 

DeepSeek claimed it trained its most recent model for about $5.6 million, and without the most powerful AI chips (the U.S. barred Nvidia from selling its powerful H100 graphics processing units in China, so DeepSeek made do with 2,048 H800s). But the information it provided in research papers about its costs and methods is incomplete. “The $5 million refers to the final training run of the system,” points out Oregon State University AI/robotics professor Alan Fern in a statement to Fast Company. “In order to experiment with and identify a system configuration and mix of tricks that would result in a $5M training run, they very likely spent orders of magnitude more.” He adds that based on the available information it’s impossible to replicate DeepSeek’s $5.6 million training run.

How exactly did DeepSeek do so much with so little?

DeepSeek appears to have pulled off some legitimate engineering innovations to make its models less expensive to train and run. But the techniques it used, such as Mixture-of-experts architecture and chain-of-thought reasoning, are well-known in the AI world and generally used by all the major AI research labs.

The innovations are described only at a high level in the research papers, so it’s not easy to see how DeepSeek put its own spin on them. “Maybe there was one main trick or maybe there were lots of things that were just very well engineered all over,” says Robert Nishihara, cofounder of the AI run-time platform Anyscale. Many of DeepSeek’s innovations grew from having to use less powerful GPUs (Nvidia H800s instead of H100s) because of the Biden Administration’s chip bans.

“Being resource limited forces you to come up with new innovative efficient methods,” Nishihara says. “That’s why grad students come up with a lot of interesting stuff with far less resources—it’s just a different mindset.”

What innovation is likely to influence other AI labs the most?

As Anthropic’s Jack Clark points out in a recent blog post, DeepSeek was able to use a large model, DeepSeek-V3 (~700K parameters), to teach a smaller R1 model to be a reasoning model (like OpenAI’s o1) with a surprisingly small amount of training data and no human supervision. V3 generated 800,000 annotated text samples showing questions and the chains of thought it followed to answer them, Clark writes.

DeepSeek showed that after processing the samples for a time the smaller R1 model spontaneously began to “think” about its answers, explains Andrew Jardine, head of go-to-market at Adaptive ML. “You just say ‘here’s my problem—create some answers to that problem’ and then based on the answers that are correct or incorrect, you give it a reward [a binary code that means “good”] and say ‘try again,’ and eventually it starts going ‘I’m not sure; let me try this new angle or approach’ or ‘that approach wasn’t the right one, let me try this other one’ and it just starts happening on its own.” There’s some real magic there. DeepSeek’s researchers called it an “aha moment.”

Why haven’t U.S. AI companies already been doing what DeepSeek did?

“How do you know they haven’t?” asks Jardine. “We don’t have visibility into exactly the techniques that are being used by Google and OpenAI; we don’t know exactly how efficient the training approaches are.” That’s because those U.S. AI labs don’t describe their techniques in research papers and release the weights of their models, as DeepSeek did. “There’s a lot of reason to believe they do have at least some of these efficiency methods already.” It should come as no surprise if OpenAI’s next reasoning model, o3, is less compute-intensive, more cost-effective, and faster than DeepSeek’s models.

Is Nvidia stock still worth 50X of earnings? 

Nvidia provides up to 95% percent of the advanced AI chips used to research, train, and run frontier AI models. The company’s stock lost 17% of its value on Monday when investors interpreted DeepSeek’s research results as a signal that fewer expensive Nvidia chips would be needed in the future than previously anticipated. Meta’s Yann LeCun says Monday’s sell-off grew from a “major misunderstanding about AI infrastructure investments.”

The Turing Award winner says that while DeepSeek showed that frontier models could be trained with fewer GPUs, the main job of the chips in the future will be during inference—the reasoning work the model does when it’s responding to a user’s question or problem. (Actually, DeepSeek did find a novel way of compressing context window data so that less compute is needed during inference.) He says that as AI systems process more data, and more kinds of data, during inference, the computing costs will continue to increase. As of Wednesday night, the stock has not recovered.

Did DeepSeek use OpenAI models to help train its own models?

Nobody knows for sure, and disagreement remains among AI experts on the question. The Financial Times reports Wednesday that OpenAI believes it has seen evidence that DeepSeek did use content generated by OpenAI models to train its own models, which would violate OpenAI’s terms. Distillation refers to saving time and money by feeding the outputs of larger, smarter models into smaller models to teach them how to handle specific tasks.

We’ve just experienced a moment when the open-source world produced some models that equaled the current closed-source offerings in performance. The real cost of developing the DeepSeek models remains an open question. But in the long run the AI companies that can marshal the most cutting-edge chips and infrastructure will very likely have the advantage as fewer performance gains can be wrung from pretraining and more computing power is applied at inference, when the AI must reason toward its answers. So the answers to the two burning questions raised above are “probably not” and “likely yes.”

The DeepSeek breakthroughs could be good news for Apple

The problem of finding truly useful ways of using AI in real life is becoming more pressing as the cost of developing models and building infrastructure mounts. One big hope is that powerful AI models will become so small and efficient that they can run on devices like smartphones and AR glasses. DeepSeek’s engineering breakthroughs to create cheaper and less compute-hungry models may breathe new life into research on small models that live on edge devices.

“Dramatically decreased memory requirements for inference make edge inference much more viable, and Apple has the best hardware for exactly that,” says tech analyst Ben Thompson in a recent Stratechery newsletter. “Apple Silicon uses unified memory, which means that the CPU, GPU, and NPU (neural processing unit) have access to a shared pool of memory; this means that Apple’s high-end hardware actually has the best consumer chip for inference.”

Stability AI founder Emad Mostaque says that reasoning models like OpenAI’s o1 and DeepSeek’s R1 will run on smartphones by next year, performing PhD-level tasks with only 20 watts of electricity—equivalent to the human brain.

OpenAI releases an AI agent for government workers

OpenAI this week announced a new AI tool called ChatGPT Gov that’s designed specifically for use by U.S. government agencies. Since sending sensitive government data out through an API to an OpenAI server presents obvious privacy and security problems, ChatGPT Gov can be hosted within an agency’s own private cloud environment.

“[W]e see enormous potential for these tools to support the public sector in tackling complex challenges—from improving public health and infrastructure to strengthening national security,” OpenAI writes in a blog post. The Biden Administration in 2023 directed government agencies to find productive and safe ways to use new generative AI technology (Trump recently revoked the executive order).

The Department of Homeland Security, for example, built its own AI chatbot, which is now used by thousands of DHS workers. OpenAI says 90,000 users within federal, state, and local government offices have already used the company’s ChatGPT Enterprise product.

  Be in the Know. Subscribe to our Newsletters.

ABOUT THE AUTHOR

Mark Sullivan is a senior writer at Fast Company, covering emerging tech, AI, and tech policy. Before coming to Fast Company in January 2016, Sullivan wrote for VentureBeat, Light Reading, CNET, Wired, and PCWorld More

More Top Stories:

FROM OUR PARTNERS

Brands That Matter
Brands That Matter