Learn
FcVLCLVHCH1CDRCDRAgFabFab
Antibody Design · Biologics

AI-Driven Antibody and Biologics Design

From traditional hybridoma screening to de novo computational antibody generation

9 min read

Antibodies are one of the most successful classes of therapeutic molecules. As of the mid-2020s, over 100 antibody-based therapeutics have been approved by the FDA, representing a major share of the biopharmaceutical market. Despite this success, traditional antibody discovery is slow and labor-intensive. AI is beginning to transform how therapeutic antibodies are discovered and optimized.

What Antibodies Are and How They Work

Antibodies are large Y-shaped proteins produced by the immune system. Each antibody binds to a specific molecular target (its antigen) with high affinity and selectivity. The binding interface is formed primarily by six hypervariable loops called complementarity-determining regions (CDRs) — three on the heavy chain (CDR-H1, H2, H3) and three on the light chain (CDR-L1, L2, L3). CDR-H3 is the most variable and typically the most important for binding specificity.

The enormous diversity of antibodies in the immune system arises from genetic recombination and somatic hypermutation, which generate a vast repertoire of possible antibody sequences. This diversity is what allows the immune system to recognize virtually any foreign molecule — and what makes antibody discovery a needle-in-a-haystack problem.

Traditional Antibody Discovery

The classical approach to finding therapeutic antibodies is hybridoma technology, developed in the 1970s: immunize a mouse with the target antigen, harvest antibody-producing B cells from the spleen, fuse them with myeloma cells to create immortalized cell lines (hybridomas), and screen for clones producing antibodies with desired binding properties. Identified mouse antibodies must then be "humanized" (engineered to replace mouse-derived framework regions with human sequences) to avoid immune rejection in patients.

Phage display, developed in the late 1980s, provides an alternative: large libraries of antibody fragments are displayed on the surface of bacteriophages, and binding clones are selected by panning against the target. This is faster and does not require animal immunization. Single B-cell sequencing, a more recent approach, directly sequences antibodies from individual B cells of immunized animals or human donors.

All these methods involve extensive experimental screening followed by iterative optimization — a process that typically takes 12–18 months from target to lead candidate.

Computational Structure Prediction for Antibodies

Predicting antibody structure is essential for understanding binding and guiding optimization. While the framework regions of antibodies are structurally conserved, the CDR loops — especially CDR-H3 — are highly variable and difficult to predict. General protein structure prediction models like AlphaFold2 perform well on antibody frameworks but struggle with CDR-H3 loops due to their extreme sequence and structural diversity.

IgFold, developed at Johns Hopkins University, is a fast antibody structure prediction model that uses language model embeddings (from AntiBERTy, an antibody-specific language model) to predict antibody structures directly from sequence without multiple sequence alignments. It can produce structures in seconds, enabling high-throughput computational analysis of large antibody repertoires.

AbLang is an antibody-specific language model trained on sequences from the Observed Antibody Space (OAS) database, which contains hundreds of millions of antibody sequences from immunogenomics studies. AbLang learns contextual representations of antibody sequences that can be used for residue prediction, humanization, and property prediction.

Generative Models for De Novo Antibody Design

The frontier of the field is de novo computational antibody design — generating entirely new antibody sequences optimized for a given target without starting from an experimentally discovered lead. Several approaches are being pursued:

Diffusion models have been adapted for antibody CDR design. These models can generate CDR loop sequences and structures conditioned on a target epitope (the surface patch on the antigen where binding occurs), effectively designing the binding interface from scratch. DiffAb, published in 2022, demonstrated this approach for CDR-H3 design.

Language model-based generation uses large models trained on antibody sequence databases to generate novel antibody sequences. By conditioning on desired properties — target binding, low immunogenicity, high stability — these models can propose sequences that are likely to be functional.

Companies Advancing AI Antibody Design

Absci combines generative AI with a high-throughput wet-lab platform for antibody discovery. Their approach uses deep learning models to design antibody variants, then tests them experimentally at scale using their proprietary cell-free and cell-based expression systems. This tight integration of computation and experiment — the design-build-test-learn cycle — accelerates iteration.

Generate:Biomedicines applies generative models broadly to protein and antibody design, using diffusion-based architectures to generate novel protein sequences with desired structural and functional properties. Their Chroma platform generates protein designs with controllable properties.

Chai Discovery has developed Chai-1, a multi-modal foundation model for molecular structure prediction that handles proteins, small molecules, and nucleic acids. The model is relevant to antibody design through its ability to predict antibody-antigen complex structures.

Nanobodies and Alternative Scaffolds

Nanobodies (also called VHH antibodies or single-domain antibodies) are derived from the heavy-chain-only antibodies found naturally in camelids (llamas, alpacas, camels). They are much smaller than conventional antibodies (about 15 kDa versus 150 kDa), can access binding sites that conventional antibodies cannot reach, and are easier to engineer and manufacture. Their simpler structure — a single domain with three CDR loops — also makes them more tractable for computational design. Caplacizumab, a nanobody approved for thrombotic thrombocytopenic purpura, demonstrated the clinical viability of the format.

The Validation Loop

Computationally designed antibodies must still be validated experimentally. Key measurements include binding affinity (typically by surface plasmon resonance or bio-layer interferometry), specificity, thermal stability, expression yield, aggregation propensity, and ultimately, activity in functional assays and animal models. The gap between computational design and experimental validation is narrowing but remains significant — even the best generative models produce candidates where only a fraction bind the target with therapeutic-grade affinity. The field is converging on integrated workflows that combine computational design with rapid experimental screening to close this loop efficiently.

SharePostShare

Continue reading
TARGETSCREENOPTIMIZETEST

How AI Is Changing Drug Discovery

A stage-by-stage look at where machine learning enters the pharmaceutical pipeline

8 min read
M E T H I O N I N E · A L A · G L Y

Protein Structure Prediction Explained

From the protein folding problem to AlphaFold, Boltz, and co-folding models

10 min read
NOnoisestructure

Generative Models in Drug Design

How diffusion models, VAEs, and language models are designing novel molecules

7 min read
DOCKINGSCOREhit 1hit 2hit 310M3

Virtual Screening and Molecular Docking

How computational methods sift through billions of molecules to find drug candidates

8 min read
LIVERBLOODAabsorbDdistribMmetabE / T

ADMET Prediction with Machine Learning

Why most drug candidates fail and how AI predicts absorption, metabolism, and toxicity early

7 min read
EGFRJAK2TP53KRASPI3KmTORRASknowninferred

AI for Target Identification

Finding the right protein to drug — how machine learning mines omics data for novel targets

8 min read
1DC(=O)Nc1ccc(O)cc1CC#NSMILES2DNOGRAPH3DxyzCOORDS

Molecular Representations for Machine Learning

SMILES, molecular graphs, fingerprints, and 3D coordinates — how molecules become data

7 min read
[CLS]MetAlaGlySerL1L2LN...houtself-attn+ FFNTRANSFORMER ENCODER

Foundation Models in Biology

How protein language models and biological LLMs are creating a new paradigm for drug discovery

9 min read
LARGEPHARMAAI-FIRSTBIOTECHDISCOVERYPLATFORMCLINICALSTAGE

The AI Drug Discovery Landscape

A map of the companies, funding, partnerships, and clinical programs reshaping pharma

10 min read

Stay current

Weekly digest of AI drug discovery developments. No noise.