AI-Driven Antibody and Biologics Design
From traditional hybridoma screening to de novo computational antibody generation
Antibodies are one of the most successful classes of therapeutic molecules. As of the mid-2020s, over 100 antibody-based therapeutics have been approved by the FDA, representing a major share of the biopharmaceutical market. Despite this success, traditional antibody discovery is slow and labor-intensive. AI is beginning to transform how therapeutic antibodies are discovered and optimized.
What Antibodies Are and How They Work
Antibodies are large Y-shaped proteins produced by the immune system. Each antibody binds to a specific molecular target (its antigen) with high affinity and selectivity. The binding interface is formed primarily by six hypervariable loops called complementarity-determining regions (CDRs) — three on the heavy chain (CDR-H1, H2, H3) and three on the light chain (CDR-L1, L2, L3). CDR-H3 is the most variable and typically the most important for binding specificity.
The enormous diversity of antibodies in the immune system arises from genetic recombination and somatic hypermutation, which generate a vast repertoire of possible antibody sequences. This diversity is what allows the immune system to recognize virtually any foreign molecule — and what makes antibody discovery a needle-in-a-haystack problem.
Traditional Antibody Discovery
The classical approach to finding therapeutic antibodies is hybridoma technology, developed in the 1970s: immunize a mouse with the target antigen, harvest antibody-producing B cells from the spleen, fuse them with myeloma cells to create immortalized cell lines (hybridomas), and screen for clones producing antibodies with desired binding properties. Identified mouse antibodies must then be "humanized" (engineered to replace mouse-derived framework regions with human sequences) to avoid immune rejection in patients.
Phage display, developed in the late 1980s, provides an alternative: large libraries of antibody fragments are displayed on the surface of bacteriophages, and binding clones are selected by panning against the target. This is faster and does not require animal immunization. Single B-cell sequencing, a more recent approach, directly sequences antibodies from individual B cells of immunized animals or human donors.
All these methods involve extensive experimental screening followed by iterative optimization — a process that typically takes 12–18 months from target to lead candidate.
Computational Structure Prediction for Antibodies
Predicting antibody structure is essential for understanding binding and guiding optimization. While the framework regions of antibodies are structurally conserved, the CDR loops — especially CDR-H3 — are highly variable and difficult to predict. General protein structure prediction models like AlphaFold2 perform well on antibody frameworks but struggle with CDR-H3 loops due to their extreme sequence and structural diversity.
IgFold, developed at Johns Hopkins University, is a fast antibody structure prediction model that uses language model embeddings (from AntiBERTy, an antibody-specific language model) to predict antibody structures directly from sequence without multiple sequence alignments. It can produce structures in seconds, enabling high-throughput computational analysis of large antibody repertoires.
AbLang is an antibody-specific language model trained on sequences from the Observed Antibody Space (OAS) database, which contains hundreds of millions of antibody sequences from immunogenomics studies. AbLang learns contextual representations of antibody sequences that can be used for residue prediction, humanization, and property prediction.
Generative Models for De Novo Antibody Design
The frontier of the field is de novo computational antibody design — generating entirely new antibody sequences optimized for a given target without starting from an experimentally discovered lead. Several approaches are being pursued:
Diffusion models have been adapted for antibody CDR design. These models can generate CDR loop sequences and structures conditioned on a target epitope (the surface patch on the antigen where binding occurs), effectively designing the binding interface from scratch. DiffAb, published in 2022, demonstrated this approach for CDR-H3 design.
Language model-based generation uses large models trained on antibody sequence databases to generate novel antibody sequences. By conditioning on desired properties — target binding, low immunogenicity, high stability — these models can propose sequences that are likely to be functional.
Companies Advancing AI Antibody Design
Absci combines generative AI with a high-throughput wet-lab platform for antibody discovery. Their approach uses deep learning models to design antibody variants, then tests them experimentally at scale using their proprietary cell-free and cell-based expression systems. This tight integration of computation and experiment — the design-build-test-learn cycle — accelerates iteration.
Generate:Biomedicines applies generative models broadly to protein and antibody design, using diffusion-based architectures to generate novel protein sequences with desired structural and functional properties. Their Chroma platform generates protein designs with controllable properties.
Chai Discovery has developed Chai-1, a multi-modal foundation model for molecular structure prediction that handles proteins, small molecules, and nucleic acids. The model is relevant to antibody design through its ability to predict antibody-antigen complex structures.
Nanobodies and Alternative Scaffolds
Nanobodies (also called VHH antibodies or single-domain antibodies) are derived from the heavy-chain-only antibodies found naturally in camelids (llamas, alpacas, camels). They are much smaller than conventional antibodies (about 15 kDa versus 150 kDa), can access binding sites that conventional antibodies cannot reach, and are easier to engineer and manufacture. Their simpler structure — a single domain with three CDR loops — also makes them more tractable for computational design. Caplacizumab, a nanobody approved for thrombotic thrombocytopenic purpura, demonstrated the clinical viability of the format.
The Validation Loop
Computationally designed antibodies must still be validated experimentally. Key measurements include binding affinity (typically by surface plasmon resonance or bio-layer interferometry), specificity, thermal stability, expression yield, aggregation propensity, and ultimately, activity in functional assays and animal models. The gap between computational design and experimental validation is narrowing but remains significant — even the best generative models produce candidates where only a fraction bind the target with therapeutic-grade affinity. The field is converging on integrated workflows that combine computational design with rapid experimental screening to close this loop efficiently.