• | 8:00 am

Why AI can’t replace science

We shouldn’t overstate the impact of machine learning on the scientific process.

Why AI can’t replace science
[Source photo: CDC/Unsplash; Google DeepMind/Unsplash; Growtika/Unsplash]

The scientific revolution has increased our understanding of the world immensely and improved our lives immeasurably. Now, many argue that science as we know it could be rendered passé by artificial intelligence. Way back in 2008, in an article titled, “The End of Theory: The data deluge makes the scientific method obsolete,” Chris Anderson, the then-editor-in-chief of Wired magazine, argued that,

Petabytes allow us to say: “Correlation is enough.” We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot. 

Since then the chorus has gotten louder. In 2023, for example, Eric Schmidt, a former Google CEO, wrote that,

AI can rewrite the scientific process. We can build a future where AI-powered tools will both save us from mindless and time-consuming labor and also lead us to creative inventions and discoveries, encouraging breakthroughs that would otherwise take decades.

Today, AI is being increasingly integrated into scientific discovery to accelerate research, helping scientists generate hypotheses, design experiments, gather and interpret large datasets, and write papers. But the reality is that science and AI have little in common and AI is unlikely to make science obsolete. The core of science is theoretical models that anyone can use to make reliable descriptions and predictions. Thus Paul Samuelson wrote that

science is public knowledge, reproducible knowledge. When Robert Adams wrote an MIT thesis on the accuracy of different forecasting methods, he found that ‘being Sumner Slichter’ was apparently one of the best methods known at that time. This was a scientific fact, but a sad scientific fact. For Slichter could not and did not pass on his art to an assistant or to a new generation of economists. It died with him, if indeed it did not slightly predecease him. What we hope to get by scientific breakthrough is a way of substituting for men of genius men of talent and even just run-of-the-mill men. That is the sense in which science is public, reproducible knowledge.

The core of AI, in contrast, is, as Anderson noted, data mining: ransacking large databases for statistical patterns: “correlation is enough.” If anything, public knowledge is viewed as hindering an unfettered search for statistical patterns.

However, without an underlying causal explanation, we don’t know whether a discovered pattern is a meaningful reflection of an underlying causal relationship or meaningless serendipity. Tests with fresh data can expose a pattern as coincidental but there are an essentially unlimited number of patterns that can be discovered, and many coincidental patterns and spurious correlations will survive repeated testing and retesting.

For example, if we calculate the pairwise correlations among one million variables, each one of which is nothing more than randomly generated numbers, we can expect nearly 8,000 correlations to be statistically significant in the initial tests and through five rounds of re-testing. In practice, there are far more than one million variables and algorithms are not restricted to pairwise correlations. In addition, there are often not enough data for multiple rounds of retesting needed to show just how many data-mined patterns are coincidental.

We ultimately need expert opinion in order to discard obviously coincidental patterns a priori and identify plausible causal models that can be tested and retested, ideally with randomized controlled trials. Without this, as we are too often painfully reminded, all we have is correlation—which is often fleeting and useless.

Two of Schmidt’s examples of AI rewriting the scientific process involve large language modes (LLMs). His first example:

Artificial intelligence is already transforming how some scientists conduct literature reviews. Tools like PaperQA and Elicit harness LLMs to scan databases of articles and produce succinct and accurate summaries of the existing literature—citations included. 

We now know that LLM literature reviews are unreliable. In May of 2023, two months before Schmidt’s article was published, a credulous lawyer submitted a legal brief that had been largely written by ChatGPT to a Manhattan court. When pressed about fake citations that ChatGPT had included in the filing, ChatGPT obliged by generating fake details of fake cases. The judge was familiar with the relevant precedents and rebuked (and later fined) the lawyer for submitting a brief that was full of “bogus judicial decisions . . . bogus quotes and bogus internal citations.” That, in a nutshell is the problem with relying on LLMs for literature reviews and other factual information. If you know the facts, you don’t need an LLM. If you don’t know the facts, you can’t trust an LLM.

Schmidt’s second example:

Once the literature review is complete, scientists form a hypothesis to be tested. LLMs at their core work by predicting the next word in a sentence, building up to entire sentences and paragraphs. This technique makes LLMs uniquely suited to scaled problems intrinsic to science’s hierarchical structure and could enable them to predict the next big discovery in physics or biology.

Identifying statistical patterns in text that can be used to predict a sequence of words is not at all like looking at scientific progress and predicting the next big discovery. Not knowing what words mean or how words relate to the real world, LLMs are prone to generating confident garbage.

There have been several media-friendly reports of AI-powered scientific breakthroughs but the details seldom justify the headlines. For instance a 2020 Nature article was titled “It will change everything’: DeepMind’s AI makes gigantic leap in solving protein structures. The subtitle claimed that “Google’s deep-learning program for determining the 3D shapes of proteins stands to transform biology, say scientists.” A 2021 followup paper in Nature was titled, “Highly accurate protein structure prediction with AlphaFold.” A 2022 Guardian article gushed that, “Success of AlphaFold program could have huge impact on global problems such as famine and disease.”

The basic argument is that proteins are the basis of life, and their 3D structure largely determines their function. DeepMind’s AlphaFold can rapidly predict this information. “Since then, it has been crunching through the genetic codes of every organism that has had its genome sequenced, and predicting the structures of the hundreds of millions of proteins they collectively contain.” In 2023 another Nature article reported that, “Tool from Google DeepMind predicts nearly 400,000 stable substances, and an autonomous system learns to make them in the lab.”

As with LLMs, what these AI systems do is amazing but claims about the implications are exaggerated. In two Science op-eds, Derek Lowe, a researcher who has worked on several drug discovery projects, wrote that “it doesn’t make as much difference to drug discovery as many stories and press releases have had it” because “protein structure determination simply isn’t a rate-limiting step in drug discovery.” As Lowe argues:

It’s important to realize that the new protein computational tools do not make all these into solved problems. Not even close. They clear out a lot of obstacles so that we can get to these problems more easily and more productively, for sure, but they do not solve them once we get up to the actual rock faces in our particular gold mines.

The CEO of the AI-powered drug company, Verseon, was more blunt: “People are saying, AI will solve everything. They give you fancy words. We’ll ingest all of this longitudinal data and we’ll do latitudinal analysis. It’s all garbage. It’s just hype.”

The real test is whether new products and services are developed faster and cheaper with AI than without it. In a 2024 Science op-ed, Lowe examined drugs that were purportedly designed by AI and concluded that none of them can be classified as “target discovered by AI.”

Jennifer Listgarten, a professor of electrical engineering and computer science and a principal investigator at the Center for Computational Biology, University of California, Berkeley, said that protein structure prediction was “the only challenge in biology, or possibly in all the sciences, that [DeepMind] could have tackled so successfully.” First, the problem of protein structure prediction is easily defined quantitatively. Second, there were sufficient existing data to use in training a complex, supervised model, and third, it was “possible to assess the accuracy of the results by way of held-out proteins whose structures were already known.” She continued: “Very few problems in the sciences are lucky enough to have all of these characteristics.”

Two research professors in the Materials Research Lab at UC Santa Barbara analyzed Google’s 2023 Nature paper that claimed AI can discover useful new materials and concluded “that it promises more than it delivers” in that it

is not particularly useful to experimentalists such as ourselves because it offers an overwhelming number of predictions (2.2 million, of which nearly 400,000 are believed to be stable), many of which do not appear to be very novel. These are chemical compounds rather than materials because they have no demonstrated functionality or utility at this point.

The impact of AI on our lives may be enormous but it will not necessarily be positive. One of the biggest harms of ChatGPT and other LLMs so far has been the pollution of the Internet with disinformation and scams. Let’s hope that AI doesn’t pollute science too.

  Be in the Know. Subscribe to our Newsletters.

ABOUT THE AUTHOR

More

More Top Stories:

FROM OUR PARTNERS

Brands That Matter
Brands That Matter