The State of AI in Drug Development

Published

June 15, 2023

Note from the future (2025): this is an article I wrote as EvE was just coming into being in response to a request from our founding CEO to summarize the state of AI in drug development. Consider it a time capsule back to 2023 – ages ago in the AI world!

The Magical Ingredients

The notable broadly recognized progress of AI-type capabilities in the last few years is the result of three necessary ingredients coming together to produce seemingly magical leaps forward in language, image, and code generation among other things.

Methods + Compute + Data = Magic🔮.

Methods

Neural networks are the class of methods powering modern advances, but they have been around since the 1940s. Modeled on the human brain, neural networks have layers that begin as a naive structure, but learn from training data what the essential patterns are in an arbitrarily complex way that can function as a black box.

After multiple cycles of favor and disfavor (mostly caused by computational limitations), neural networks emerged in the 2010s in the form of “deep learning”, which began to consistently demonstrate superior performance on a diverse set of tasks. Deep learning is a term for a neural network with more than three layers.

There are many types of neural networks. Some only move in one direction (“feed forward”), some circle back through layers (“recurrent”), some pool local information (“convolutional”), and so forth. Different types are suited to different tasks. The application of neural networks to any particular problem requires a series of judgment calls about how to design and train the system.

The application of deep learning in the early 2010s resulted in meaningful advances in image processing tasks, but not in language. The design of language processing approaches was increasingly complex, attempting to represent what humans knew about language in the architecture. Performance was underwhelming.

Then in the mid-2010s, a team at Google Brain working on language translation discovered the benefits of a different architecture and published the seminal paper Attention Is All You Need (2017). Known as transformers, this approach backed off on the representation of complexity and instead allowed the network more flexibility in learning patterns itself. A critical feature of this architecture is that it is easily parallelized, leading to the ability to process much larger amounts of data. This essentially represented a tradeoff in which the model was responsible for more of the “intelligence”, but was given much more data from which to learn. This ended up being a very good tradeoff, leading to the impressive language models of the early 2020s.

This 2021 blog post from Dale Markowitz is a very accessible high-level introduction to transformers. It describes the key elements of transformers without getting into anything highly technical, and summarizes the broad arc of their impact.

What turns out to be magical about the transformer architecture is that it excels at many tasks well beyond language translation. At the moment it is the golden ticket to progress that has not yet run out, and most of the AI developments since 2021 involve transformers. Expertise is still required to determine how best to apply them to new areas, but for the time being they seem to be broadly applicable across domains. No doubt at some unknown point in the future progress will stall, awaiting another advance in methods.

Compute

The success of the massive data + massive parallelization approach was only possible due to the availability of massive compute capability, specifically graphical processing units (GPUs). The initial rise of GPUs was for gaming. In the 2010s, they began to be used for deep learning, and then experienced additional demand for crypto mining. The recent AI hype and its demand for GPUs made GPU supplier NVIDIA a trillion dollar company in May 2023.

We can assume that compute capability is domain-agnostic. It is expensive, however, so the business model of a given endeavor will determine willingness to pay.

Data

The raw material required to power the AI machine is data. Massive amounts of data. The most prominent AI models were trained on data harvested from publicly available sources – writing on the internet, digitized books, digitized art, and code posted to Github. Humanity had already produced the data, and the internet made it machine-ready. Recent learning has demonstrated that the sheer volume of data is critical, with iterative improvements on models largely correlating with the increasing size of training datasets.

Many of the new models fall into a category dubbed “foundation models”. Instead of being trained for a specific task, they are allowed to learn from data in a very general way that produces broadly applicable capabilities. These foundation models can then be used as a base from which specific models can be trained using small, specific datasets (a process known as “fine-tuning”). Large language models (LLMs) like GPT are an example of a foundation model. In this sense, datasets are being repurposed across applications to some degree. Regardless, data is by far the most domain-specific of the three ingredients.

So, in assessing the AI-readiness and potential in various arenas, we should consider the availability of large scale high quality data first, methods second, and (willingness to pay for) compute third.

Fertile Ground for AI?

Does the hypothesis that methods, compute, and data are the requirements for AI success hold for biology and drug development? There is some positive evidence.

Protein Folding

The most prominent AI triumph in the scientific realm is the AlphaFold protein folding model. The original AlphaFold model from Google DeepMind performed well, but AlphaFold2 (2021) – in which the transformer architecture was applied – did substantially better. Notably, it out-competed other transformer-based approaches, in a testament to the importance of expertise in applying methods to specific domains and problems.

This 2022 Frontier in Bioinformatics article reviews the developments in protein folding models, along with the strengths and weaknesses of the various approaches.

The key ingredient of data was available for the protein folding application due to the vast numbers of sequenced proteins, of which a non-trivial portion have solved structures available. AlphaFold was able to take advantage of both this “labeled” data on structures and learn patterns from the sequences without structures. This represents an important difference between the current AI approaches and more traditional applications of machine learning. Traditionally, this type of prediction model would being limited to the ~150,000 proteins with available structures, which would then need to be split for use in both training and validation processes. The ability of AI systems to learn effectively from unlabeled data (through “unsupervised” and “self-supervised” processes), provides ways to make use of data without being entirely bottlenecked by expensive or impossible labeling processes.

A layperson’s description of how AlphaFold2 works can be found in this blog post.

Quickly following after AlphaFold were another type of protein folding models based on large language model approaches. These models rely only on the protein sequence itself (no multi-sequence alignment), and treat amino acids like words and proteins like sentences. ESMFold (from Facebook Research) is an example of this type of model. While less accurate than AlphaFold2, it is much less computationally intensive, and thus can be applied on a metagenomic scale.

These models have provided an abundance of structures that are readily available and are in fruitful use by researchers. AlphaFold2 alone has taken the structural coverage for human proteins from 48% to 76%.

A few other key ingredients

In addition to the big three ingredients outlined previously, also important for AlphaFold2’s success were:

  • The right cross-disciplinary team in the right working environment
  • A problem with strong patterns to learn

It’s not clear where critical domain-specific AI developments will happen in the future. The ability to attract, assemble, and support the necessary team has seemed possible only under the wing of major technology companies to date. As AI becomes more mainstream, that may change, but academia and pharma/techbio companies both have some substantial challenges in this regard.

Thanks to evolution and the nature of protein folding itself (with secondary and tertiary structures), the type of patterns AI systems are good at learning are present in the protein folding problem. Not all problems will be as well suited, and consequently may require even larger amounts of data and/or novel or hybrid methods.

Limitations and root causes

Protein folding models are AI success stories, but have weaknesses of particular relevance to drug development. For example, AlphaFold2 has trouble predicting longer loops, which are important because of their surface exposure. It also struggles to predict structures with ligands, DNA/RNA complexes or post-translational modifications. Many of these challenges go back to the nature of the training data itself. Structures sourced from the Protein Data Bank (PDB) can represent many contexts and conditions, and these are not accounted for in the model itself. Other challenges may be due to the complexity of the problem. AlphaFold2 does well at predicting shorter loops, with under 20 residues. Longer, more flexible loops and unstructured regions of proteins generally are inherently harder to predict.

What’s Next?

Protein folding shows us that AI progress can carry over into biological applications. But does that mean we should expect a near-term rapid acceleration of progress in drug development and therapeutics as a result of AI?

The Bull Case

The bull case for AI is that the biological world is an excellent fit for the nature of AI, and that we’ve just reached the point where data generation at the necessary scale is possible. DeepMind CEO Demis Hassabis has argued that biology can be seen as a complex information processing system, making it ripe to be decoded by AI systems that can theoretically learn patterns at a scale that humans cannot.

This Centry of Biology blog post lays out the big picture (and VC-style optimistic) case for why biology is a great AI target, much better than other applications in the physical world.

The nature of data in the biological realm is that it can’t be scraped from existing sources as training data for LLMs has been from the internet. It requires instruments, measurement systems, and physical substrates. However, there is optimism that now is the time that we are ready to generate the type of data that’s needed, at least in several key areas. The ability to sequence, synthesize, and edit DNA at reasonable cost and speed is enabling data generation on a new scale. Whole genome sequencing for millions of people is within reach, potentially creating the type of massive datasets necessary for AI success.

One optimistic voice is Daphne Koller, CEO of Insitro and well-respected AI researcher. She believes a few publicly available datasets like the UK Biobank are fit for ML use, but that most data will need to be specifically generated. This McKinsey article has a brief summary of her perspective. A broader, richer conversation is in this Bio Eats World podcast episode. Another such voice is Jakob Uszkoreit, an author of the Google Brain transformers paper, now working on applications of AI to RNA as CEO at Inceptive. He discusses his perspective in this Bio Eats World podcast episode.

One path to success could be through the compounding effects of a growing number of specific AI models. As we have models that are very good at a particular task (like AlphaFold is for protein structure prediction), these could then be connected in a modular way to become more than the sum of their parts. This could help us tackle the unfathomable complexity of the biological world one piece at a time, while getting the benefits of chaining capabilities together in a way that can still result in a fully machine-driven system.

Another path could be to amass huge amounts of heterogeneous data and impose less structure representing processes as we see them. This tradeoff has served us well recently, and we have generally underestimated the complexity of patterns AI can learn. Perhaps viewing protein folding as a modular capability is not as good as allowing a system to learn from observations that reflect the context in which proteins operate. It’s possible that large numbers of whole genome sequences paired with digital medical records could power such a system. In the UK, where WGS efforts have been most extensive to date (UK Biobank) and the NHS has extensive medical records in a unified system, the ability to attempt this may be within reach.

This Nature article reviews the many steps of the drug discovery process on which GPU computing and deep learning capabilities are having a positive impact.

Additionally, advances in lab automation capabilities may unlock the ability to generate and use data on a fundamentally different scale. A virtuous cycle of data collection that feeds back into the models and accelerates their performance could be possible (and some companies are attempting this approach today). These “closed-loop” systems could operate at a speed and scale inconceivable when human scientists must always be in the loop, leading to an inflection point in progress.

Even if the progress driven by AI is more incremental than exponential for some time, it has the potential to improve performance at many steps of the drug development process. In an industry with long cycles and high failure rates, AI could lead to substantial business gains even before it becomes truly transformative.

The Bear Case

The bear case for AI is that the biological world is orders of magnitude more complex than anything AI has been successfully applied to to date, and our ability to collect data on it is highly imperfect and expensive. The extent of data collection required would be beyond the capacity of any given entity, and the incentives don’t promise sufficient payoff in a single area to justify even a consortium of players making such a tremendous investment. While more advances akin to those we’ve seen in protein folding are likely for specific cases, and will drive incremental progress, the breadth of progress required to see transformational impact at the level of human health may require many more step change advances in AI capabilities that would likely take decades.

Hints that the current wave of progress could solve some hard problems in biology but leave the many very hard problems untouched come again from protein folding models. Despite the capabilities of AlphaFold, we are unable to model what we might care most about for drug development - binding with compounds. One can argue that compounds are much less amenable to wrangling by current AI systems because they:

In this 2023 article Derek Lowe makes the argument that ligand-binding predictions are a much harder problem than protein folding predictions, and that we lack the necessary data.
  • have more complex constructions than sequences of nucleic or amino acids
  • have more possible individual building blocks
  • have no single convenient representation akin to an amino acid sequence
  • lack information from what exists in the biological world that constrains the total potential space of possibilities to a more tractable subset

This 2023 Century of Biology blog post explores the arenas of tangible success and failure-to-date of AI systems in biological applications. The hypothesis put forth is that we’re showing substantial progress in modality companies, but not in target discovery endeavors, because the ripeness of the problems for AI is fundamentally different.

While the hypothetical number of protein sequences is 20^(sequence length of your choosing), the universe of proteins that exist in the world is much smaller, providing at least an initial fruitful constraint on the problem space. The possible chemical space is comparatively enormous. Furthermore, the specification of the problem for drugs is less clear than for protein folding. Predicting a protein structure from an amino acid sequence is a well specified problem for which there are “right” answers available. Identifying drug candidates is a more complex problem that has to be broken down into components. One step could be identifying compounds that bind to a particular target. This is well specified and can be validated experimentally. However, this is one of many steps necessary to produce an effective drug. The data needed to validate the therapeutic value of drugs is extremely slow and expensive to collect in clinical trials, making the feedback loop quite challenging.

This article by Verseon CEO Adityo Prakash (not an unbiased voice) speculates about the difficulties and possibilities of applying AI and computational methods to the chemical space.

So, compounds in chemical space may be a very hard problem. Target discovery is also likely to be a hard problem, as it requires modeling more complexity than AI systems have succeeded at to date. Problems like protein folding are certainly challenging, but have the specificity of task and clear measurement of success that AI systems thrive on. Target discovery does not.

This 2022 McKinsey article explores the promise and challenges of health data platforms.

Then there is the very, very hard problem of clinical trials. If progress in identifying potential therapeutics rapidly accelerates, it will still be bottlenecked by the speed and massive expense of clinical trials (which currently make up ~70% of R&D costs). Unless the odds of success in clinical studies increases, it may exceed our ability to move candidates through this critical step.

Another challenge is management of medical records data in a way that it can be a useful input for AI learning. Reams of potentially valuable data exist in digital format, but the barriers to making them accessible are substantial.

My Take

If I were forced to make predictions…

Next Five Years

  • Progress in end-to-end drug discovery will be incremental rather than transformational.
    • Success rates in the clinic of “AI drugs” will be similar to the standard process, but will get to the clinical phase more quickly and with more novel compounds and disease areas than pharma companies have seen. This will fall short of the hype, but will be enough to motivate continued partnerships with and investment in AI-centric companies.
    • But if the current set of drugs in clinical trials has a lower success rate than the existing process, we’re likely to spend at least five years in the “trough of disillusionment”, with greatly reduced investment. It may then take a major development from a tech company to motivate another round of attempts.
  • We’ll see transformational progress outside the area of small molecule drugs, based on genomics and/or biologics. These are the areas that seem most well-suited for major AI impacts using the existing methods and formula.

Ten Years Out

  • “Closed loop” drug development systems will start to bear fruit, based on the level of scale that automated experimentation has enabled. The impact of these will depend on how much confidence they can generate that their compounds will have high success in the clinic.
  • A combination of methodological and computing innovations will make exploration of chemical space more tractable, leading to the ability to design truly novel drugs.
  • Models will emerge that take genomics inputs and predict disease states and therapeutic responsiveness, largely ignoring the mechanisms in between. Their progress will be based on the use of AI to process medical records data into an anonymized and usable form across systems, which may require enabling legal/regulatory changes.

If R&D becomes more efficient with a new set of approaches, what will established pharma companies do?

  • Continue to partner with techbio companies?
  • Try to bring this capability in-house? (which I would expect to have a high rate of failure due to incompatible cultures and ways of operating)
  • Get out-competed by new companies who increasingly have the ability to bring the drugs they discover to market on their own?

How are companies using AI today?

A number of companies have been founded in the last decade with the goal of applying ML/AI to drug development. Several of these now have compounds in clinical trials, but it’s rather early to assess their success. No “AI-designed” drug has yet achieved FDA approval.

This Nature article reflects on the state of this industry and summaries the current clinical status of compounds from this set of companies.

Most of these companies seem to operate in a general pattern involving:

  • A “full-stack” / “closed-loop” approach in which they develop (or acquire) multiple capabilities with the promise of a virtuous cycle of data generation that accelerates a flywheel of progress
  • A proprietary software/data/algorithms platform that integrates their components and adds their AI/ML/compute secret sauce
  • Partnerships with pharmaceutical companies

The nature of the IP portfolios varies more widely with some companies (ex. BenevolentAI) appearing to have a more classic pharma focus on compounds and specific therapeutic areas, and others with ML/AI components and computing/automation methods related to the “closed loop” capabilities. Generally, these companies seem to acknowledge the importance of generating large amounts of fit-for-ML proprietary datasets, and of being able to integrate several steps of the drug discovery process rather than focusing on mastering a single area. So, a fundamentally different type of effort from DeepMind’s AlphaFold work.