Learn
M E T H I O N I N E · A L A · G L Y
Structure Prediction · AlphaFold

Protein Structure Prediction Explained

From the protein folding problem to AlphaFold, Boltz, and co-folding models

10 min read

Proteins are molecular machines. Their function is determined by their 3D shape, which is determined by their amino acid sequence. Predicting this shape from sequence alone — the "protein folding problem" — was one of biology's grand challenges for over 50 years.

Why Structure Matters for Drug Discovery

Most drugs work by binding to proteins and altering their function. To design a drug that binds to a specific protein, you need to know the protein's 3D structure — particularly the shape of the binding site where the drug will attach. Without structural information, drug design is largely trial and error.

Experimental methods for determining protein structure — X-ray crystallography, cryo-electron microscopy (cryo-EM), and NMR spectroscopy — are accurate but slow. A single structure can take months to years to solve, and some proteins resist experimental characterization entirely (notably membrane proteins and disordered regions).

AlphaFold

In 2020, DeepMind's AlphaFold2 demonstrated that deep learning could predict protein structures from amino acid sequences with accuracy comparable to experimental methods. It won the CASP14 competition (the field's benchmark) by a wide margin.

AlphaFold2 uses a neural network architecture called Evoformer, which processes multiple sequence alignments (MSAs) — comparisons of the target sequence against evolutionary relatives — to extract structural information encoded by evolution. The model predicts inter-residue distances and angles, then assembles these into a full 3D structure.

In 2024, DeepMind released AlphaFold3, which extended the approach to predict the structure of protein complexes with other molecules: other proteins, DNA, RNA, and small molecule ligands. This is directly relevant to drug discovery, where understanding how a drug binds to its target is the central design question.

Boltz and Co-Folding

Boltz, developed by MIT researchers, is an open-source alternative to AlphaFold3 focused on "co-folding" — predicting the structure of a protein bound to a small molecule simultaneously. Boltz-1 matched AlphaFold3's accuracy on protein-ligand complex prediction. Boltz-2 added the ability to predict binding affinity (how tightly the drug binds), not just binding pose (where it binds) — an important distinction for drug design.

Other Models

RoseTTAFold, from the Baker lab at University of Washington, is an open-source model that uses a "three-track" architecture processing 1D sequence, 2D distance maps, and 3D coordinates simultaneously. RoseTTAFold All-Atom (RFAA) extended this to model proteins alongside small molecules, nucleic acids, and metal ions.

ESMFold, from Meta AI, takes a different approach: instead of using multiple sequence alignments, it uses a protein language model (ESM-2) trained on millions of sequences. This makes it much faster than AlphaFold2 (seconds vs. minutes per structure) at the cost of some accuracy, particularly for proteins with few evolutionary relatives.

OpenFold is an open-source reimplementation of AlphaFold2 built for training on custom data, enabling researchers to fine-tune structure prediction for specific protein families.

Impact on Drug Discovery

Before AlphaFold, structural coverage of the human proteome was roughly 35%. Now, predicted structures are available for nearly every human protein. This has enabled structure-based drug design for targets that previously had no structural information. The key remaining challenges are accuracy for protein-ligand complexes, prediction of protein dynamics (proteins are not static), and modeling of disordered regions that lack a fixed structure.

SharePostShare

Continue reading
TARGETSCREENOPTIMIZETEST

How AI Is Changing Drug Discovery

A stage-by-stage look at where machine learning enters the pharmaceutical pipeline

8 min read
NOnoisestructure

Generative Models in Drug Design

How diffusion models, VAEs, and language models are designing novel molecules

7 min read
DOCKINGSCOREhit 1hit 2hit 310M3

Virtual Screening and Molecular Docking

How computational methods sift through billions of molecules to find drug candidates

8 min read
LIVERBLOODAabsorbDdistribMmetabE / T

ADMET Prediction with Machine Learning

Why most drug candidates fail and how AI predicts absorption, metabolism, and toxicity early

7 min read
EGFRJAK2TP53KRASPI3KmTORRASknowninferred

AI for Target Identification

Finding the right protein to drug — how machine learning mines omics data for novel targets

8 min read
FcVLCLVHCH1CDRCDRAgFabFab

AI-Driven Antibody and Biologics Design

From traditional hybridoma screening to de novo computational antibody generation

9 min read
1DC(=O)Nc1ccc(O)cc1CC#NSMILES2DNOGRAPH3DxyzCOORDS

Molecular Representations for Machine Learning

SMILES, molecular graphs, fingerprints, and 3D coordinates — how molecules become data

7 min read
[CLS]MetAlaGlySerL1L2LN...houtself-attn+ FFNTRANSFORMER ENCODER

Foundation Models in Biology

How protein language models and biological LLMs are creating a new paradigm for drug discovery

9 min read
LARGEPHARMAAI-FIRSTBIOTECHDISCOVERYPLATFORMCLINICALSTAGE

The AI Drug Discovery Landscape

A map of the companies, funding, partnerships, and clinical programs reshaping pharma

10 min read

Stay current

Weekly digest of AI drug discovery developments. No noise.