Virtual Screening and Molecular Docking
How computational methods sift through billions of molecules to find drug candidates
Virtual screening is the computational counterpart to experimental high-throughput screening. Instead of physically testing millions of compounds against a target protein, virtual screening uses computational models to predict which molecules are most likely to bind. This allows researchers to prioritize a small number of candidates for experimental testing, dramatically reducing cost and time.
Structure-Based vs. Ligand-Based Approaches
Virtual screening falls into two broad categories. Structure-based virtual screening uses the 3D structure of the target protein to evaluate how well each candidate molecule fits into its binding site. This requires a known protein structure (from experiment or prediction). Ligand-based virtual screening does not need a protein structure — instead, it identifies new candidates by finding molecules that are chemically similar to known active compounds. Ligand-based methods include pharmacophore modeling and similarity searching using molecular fingerprints.
Traditional Molecular Docking
Molecular docking is the most widely used structure-based virtual screening method. A docking program places a small molecule into the binding site of a protein, samples many possible orientations and conformations (poses), and scores each one to estimate binding affinity. The goal is to predict both the correct binding pose and the strength of the interaction.
Widely used docking tools include AutoDock and its faster variant AutoDock Vina, which use a combination of stochastic search algorithms and empirical scoring functions. Glide, part of the Schrodinger Suite, uses a hierarchical filtering approach — first a rough shape fit, then a more detailed energy evaluation. GOLD employs a genetic algorithm to explore conformational space. These tools have been workhorses of computational drug discovery since the 1990s and 2000s.
Scoring Functions and Their Limitations
The accuracy of docking is bottlenecked by its scoring functions — the mathematical models that estimate how tightly a molecule will bind. Classical scoring functions fall into three categories: force-field-based (modeling physics of molecular interactions), empirical (fitting parameters to experimental binding data), and knowledge-based (derived from statistical analysis of known protein-ligand crystal structures). All of these are approximations, and none reliably ranks compounds by true binding affinity. Scoring functions tend to produce many false positives: molecules predicted to bind well that do not actually bind in experiments.
ML-Based Docking and Scoring
Machine learning is being applied to improve both pose prediction and scoring. DiffDock, developed at MIT, reframes docking as a generative modeling problem. Instead of exhaustively sampling poses, DiffDock uses a diffusion model to directly generate likely binding poses by learning from thousands of experimentally determined protein-ligand complexes. It has shown improved pose prediction accuracy over traditional methods, particularly for flexible binding sites.
EquiBind, also from MIT, uses SE(3)-equivariant graph neural networks to predict binding poses in a single forward pass, making it orders of magnitude faster than traditional docking — though at some cost to accuracy. These fast methods are particularly useful for initial filtering when screening very large libraries.
ML-based scoring functions trained on experimental binding affinity data (such as those from the PDBbind database) can improve ranking accuracy compared to classical scoring functions, though generalization to novel protein families remains a challenge.
Ultra-Large Library Screening
The scale of virtual screening has expanded enormously. The Enamine REAL library contains billions of make-on-demand compounds that can be readily synthesized. Screening libraries of this size with traditional docking is computationally prohibitive — docking a single molecule takes seconds to minutes, and billions of molecules would take years of compute time. Approaches to tackle this include hierarchical filtering (dock a subset, then expand around hits), GPU-accelerated docking, and ML surrogate models that approximate docking scores at a fraction of the computational cost. Recursion and other companies have invested heavily in infrastructure for ultra-large-scale virtual screening.
Practical Considerations and Failure Modes
Virtual screening is a funnel, not an oracle. A typical campaign might virtually screen millions of molecules, select the top few hundred for experimental testing, and find that a small percentage of those are confirmed active — a "hit rate" that, while low in absolute terms, is far more efficient than random screening. Common failure modes include poor protein structure quality (especially for flexible or disordered binding sites), inadequate treatment of water molecules in the binding site, and the tendency of scoring functions to favor larger molecules that make more contacts regardless of actual binding affinity. Experienced practitioners combine virtual screening with medicinal chemistry intuition and use multiple orthogonal methods to reduce false positives.