## Introduction

Screening chemical libraries using biophysical assays has long been the dominant approach to discover new chemotypes for chemical biology and drug discovery. High-throughput screening (HTS) of libraries of 500,000 to 3 million molecules has been used since the 1990s1, and multiple drugs have had their origins in this technique2. While the libraries physically screened in HTS were an enormous expansion on those used by classical, pre-molecular pharmacology3, they nevertheless represent only a tiny fraction of possible ‘drug-like’ molecules4. DNA-encoded libraries5, where molecules are synthesized on DNA that encodes their chemistry, begin to address this problem by offering investigators libraries of 108 molecules, sometimes more, in a single, highly compact format; and multiple such libraries can be used in a single campaign. However, as DNA-encoded libraries are restricted to reactions on DNA, reaction chemistries are limited to aqueous solutions, thereby limiting the type of chemical reactions and subsequent chemical libraries available with this technology6.

Computational approaches using virtual libraries are an attractive way to explore a much larger chemical space7. Large numbers of molecules—certainly into the tens of billions, and likely many more—may be enumerated in a virtual library. Naturally, very few of these compounds can ever be actually synthesized because of time, cost and storage limitations, but one can imagine a computational method to prioritize those that should be pursued. In practice, this idea has had two limitations that have prevented wide-scale adoption: the virtual libraries have rarely been carefully curated for true synthetic accessibility8, and there were well-founded concerns that computational methods, such as molecular docking, were not accurate enough to prioritize true hits within this large space9.

In the last several years, however, two advances have at least partly addressed these problems:

First, several vendors and academic laboratories have introduced ‘make-on-demand’ libraries based on relatively simple two- or three-component reactions where the final product is readily purified in high yields10. At Enamine, a pioneer in this area, >140 reactions may be used to synthesize products from among >120,000 distinct and often highly stereogenic building blocks, leading to a remarkably diverse and, critically, pragmatically accessible library of currently over 29 billion molecules11.

Second, structure-based molecular docking, for all of its problems, has proven able to prioritize among these ultra-large libraries, if not at the tens of billion molecule level, then at the 0.1–1.4 billion molecule level12,13,14, finding unusually potent and selective molecules against several unrelated targets (Table 1). Indeed, simulations and proof-of-concept experiments suggest that, at least for now, as the libraries get bigger, docking results and experimental molecular efficacies improve12.

If docking ultra-large libraries brings new opportunities, it also brings new challenges. Docking tests the fit of each library molecule in a protein binding site in a process that often involves sampling hundreds-of-thousands to millions of possible configurations. Each molecule is scored for fit using one of several different scoring functions15,16,17,18. To be feasible for a billion-molecule library on moderately sized computer clusters (e.g., 500–1,000 cores), this calculation must consume not much more than 1 s/molecule/core (1 ms/configuration).

This need for speed means that the calculation cannot afford the level of detail and number of interaction terms that would be necessary to achieve chemical accuracy. For instance, docking typically undersamples conformational states, ignores important terms (e.g., ligand strain) and approximates terms that it does include (e.g., fixed potentials)19,20.

Owing to these approximations and neglected terms, docking energies have known errors, and the method cannot even reliably rank order molecules from a large library screen21,22. What it can hope to do, however, is separate a tiny fraction of plausible ligands from the much larger number of library molecules that are unlikely to bind a target. This level of prioritization is enhanced with the careful implementation of best-practice guidelines and controls. It is the goal of this essay to provide investigators with such best practices and control calculations for ultra-large library docking, though they equally apply to modest-sized library docking (Table 1). These will not ensure the success of a prospective docking campaign—the only true test of the method—but they may eliminate some of the more common reasons for failure.

We begin by describing protocols and controls that can be used across docking programs and that are general to the field (Fig. 1). There are by now multiple widely used and effective docking programs23,24,25,26,27,28,29,30,31, employing different strategies for sampling ligand orientations and ligand conformations in the protein site, for handling protein flexibility, and for scoring ligand fit once they have been docked. Notwithstanding these differences, there are strategies and controls that may be used across docking programs, including how to prepare protein sites for docking calculations, benchmarking controls to investigate whether one can identify known ligands from among a large library, and controls to investigate whether one’s calculations contain biases towards particular types of interactions. Since the true success of a virtual screen is the experimental confirmation of docking hits, we also propose a set of control assays to validate initial in vitro results. We then turn to protocols that are specific to the docking program we use in our own laboratory, DOCK3.7—these necessarily get into fine details, and will be of most interest to investigators wanting to use this particular program.

### Any structure-based campaign begins with a suitable target site

The most promising starting point for a virtual screening campaign is typically a high-resolution ligand-bound structure. Ligand-bound (holo) structures usually outperform ligand-free (apo) structures as the geometries of the binding pocket are better defined in the bound state than in the unbound state32,33. If there is no available holo structure, tools such as SphGen34, SiteMap35 and FTMap36 can be used to identify potential ligand binding sites.

Generally, small, enclosed binding pockets that well complement a ligand perform better than large, flat and solvent-exposed binding sites typical of protein–peptide or protein–protein interactions. For instance, neurotransmitter G-protein-coupled receptor (GPCR) orthosteric sites, such as the β2 adrenergic, D4 dopamine, histamine H1 and A2a adenosine receptors37,38,39,40 typically have higher hit rates and more potent docking hits than do peptide receptors like the CXCR4 receptor41, and these often perform better than the more open sites typical of soluble enzymes like β-lactamase12,42. In most cases, targeting protein–protein interaction surfaces, outside of a few that are well defined43, often leads to disappointing results.

### Modifying the high-resolution protein structure

It is not always a good idea to use the structure exactly as it was found in the database.

• Dealing with mutations. For stability, crystallization and other biochemical reasons, high-resolution protein structures are sometimes determined in a mutant form; such mutations should be reverted to the wild type especially if they are within the ligand site to be targeted. Missing side chains and loops in the experimental structures should be added as well if they are close to the binding site, while those that are weakly defined by the experimental observables (e.g., low occupancy, high displacement parameters (B values), poor electron density) should be examined critically

• What about water molecules? When the resolution permits, water molecules can also be included in the target preparation, often treating them as nondisplaceable parts of the protein structure. Typically, water molecules enclosed in the targeted binding pocket or involved in interactions between the co-crystallized ligand should be considered as they may determine side-chain conformations or offer additional hydrogen-bonding sites. Some docking programs will allow water to be switched on or off during the docking44 or include other implicit solvent terms45

• Buffer components. Buffer components such as PEG and salts are likely specific to the crystallization conditions, and should be removed

• Cofactors. Cofactors like heme or metal ions should be considered if they are involved in ligand recognition

• Hydrogen atoms. Due to the resolution limits of most experimentally determined structures, hydrogen atoms are often unresolved. The position of many hydrogen atoms can be readily modeled according to holonomic constraints (e.g., backbone amide hydrogen atoms). For those that are not, such as hydroxyl protons on serine and tyrosine residues, imidazole hydrogens on histidine, and the adjustment of frequently erroneous terminal amide groups for glutamine and asparagine residues46, programs like Reduce47 (default in DOCK3.7), Maestro (Schrödinger)48, PropKa49 or Chimera50 can be used to protonate the target of interest. While these automated protocols usually produce reasonable protonation states, special care should be taken for the residues within the ligand binding pocket or residues that form a catalytic/enzymatic site. Lastly, although less frequently encountered, glutamic and aspartic acids can adopt protonated forms under specific circumstances51

• In summary, the protonation of the protein structure is critical for the success of docking to more accurately depict the Van der Waals (VDW) surface and dipole moments of the binding pocket

#### Homology modeling

When no experimental structure has been determined for the target protein, structural models can be generated if a template structure with high sequence identity is known. Common programs used for homology modeling include Modeller52, Rosetta53, ICM27 and I-Tasser54. These two principles can improve the chances of success:

• Typically, the higher the sequence identity between the target and the template, the better the accuracy of the model55. Particular focus should be given to identity within the target binding pocket; if there is a choice, choose the template that has the highest identity in the binding pocket

• Incorporation of a ligand during the modeling process or ligand-steered homology modeling approaches56,57 will help prevent the pocket from collapsing inward, and will better orient the side chains of binding residues32,58,59

When it is unknown how a ligand binds within the pocket, orthogonal experimental data can guide the modeling such as iterating between docking and modeling60,61. In the case of MRGPRX262 and GPR6863, for instance, the authors predicted multiple binding poses of a known active ligand and used mutagenesis and binding assays to test these predictions. A binding mode and receptor structure were identified that was consistent with the mutagenesis data and used for subsequent preparation in the prospective screen. Despite many difficulties, homology models have been successful in identifying novel ligands from prospective docking campaigns41,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76; though it is also true that, given the choice, most investigators will prefer to use a well-determined experimental structure.

#### Control calculations

Docking undersamples ligand–protein configurations and conformations, and its scoring of these configurations for fit remains highly approximate. Unlike methods like quantum mechanics, or certain lattice calculations in statistical mechanics, docking has surrendered ‘ground truth’ to be able to pragmatically search among a large and growing chemical space. Accordingly, control calculations are critical to the success of a docking campaign. As with experimental controls, they do not ensure prospective success, but they do guard against obvious sources of failure, and can help one understand where things have gone wrong if they do. Through a key control, we assess whether the prepared binding pocket and docking parameters can prioritize known ligands over presumed inactive molecules. In an optimized binding pocket, these known actives should rank higher against a background of decoy molecules in a retrospective screen, and reasonable poses should be predicted.

As it is more likely to know true actives than true inactives, it is common practice to use property-matched decoys77, which are compounds that have similar physical properties as the actives, but unrelated topologies, and so are presumed inactive. The DUDE-Z server (tldr.docking.org) was built specifically to generate decoys for a given list of ligands by matching the following physical properties: molecular weight, hydrophobicity (LogP), charge, the number of rotatable bonds, and the number of hydrogen bond donors and acceptors78.

The performance of the parameterized binding pocket to enrich known ligands over decoys can be evaluated by receiver-operator characteristic (ROC) curves, quantifying the true positive rate as a function of false positive rate (Fig. 2)17,77,79,80,81. The area under the curve (AUC) is a well-regarded metric to monitor the performance of a virtual screen by a single number17 (Fig. 2a). The log transformation of the false positive rate enhances the effect of early enrichment for true positives17 (Fig. 2b). This is important because, in a docking campaign with hundreds of millions of molecules, only compounds ranked within the top 0.1% are often closely evaluated (see ranks in Table 1). For example, if a retrospective docking challenge shows that known actives are only identified starting around the tenth percentile, novel actives may be missed in a prospective screen. In this setting, higher LogAUC values correspond to better discrimination between actives and inactives and provide a sanity check on the ability of the docking parameters to identify actives.

A second criterion is the pose fidelity of the docked ligands to their experimental structures. The validity of predicted binding poses can be assessed qualitatively by visual inspection of reported key interactions between protein and ligand or, in the best case, quantitatively by calculating root mean square deviations between predicted and experimentally determined poses82. During pocket modeling and docking parameter optimization, one will often insist that the retrospective controls lead to both high LogAUC values and good pose fidelity; often there will be some trade-off between the two. While calibrating the scoring functions, it is also important to monitor the contributions of each energy term to the total score and ensure they match the properties of the binding pocket. If one term dominates, the scoring may have been overoptimized to that term, while other protein–ligand interactions are underweighted, leading to dominance by a certain type of molecule. For instance, if the docking score of a polar solvent-exposed pocket is inappropriately dominated by VDW energies, large molecules may score high due to nonspecific surface contacts with the target protein.

In addition to property-matched decoys, other chemical matter can be used to evaluate different aspects of the docking model (Fig. 3). A test set including molecules with extreme physical properties (Extrema set), such as a wide variety of different net charges, can be screened to measure whether the docking model succeeds in prioritizing molecules with net charges corresponding to known actives78. If there is a difference in the charge of top-ranked compounds from the extrema set and the charges of the known ligands, the scoring may be biased. In another useful control experiment, a small fraction of the purchasable chemical space (e.g., readily available ‘in-stock’ compounds from multiple vendors) representing the characteristics of the ultra-large make-on-demand screening library can be docked against the protein model. This control serves two purposes: to test whether the positive control molecules remain among the highest scored compounds and to examine if intriguing novel compounds rise to the top of the rank-ordered docking list. It may even be fruitful to purchase and experimentally test a few promising and structurally diverse in-stock compounds as it could inform the docker if the binding pocket model is likely to find hits in the ultra-large library screen. If top ranked compounds do not form expected interactions, further optimization may be beneficial.

Another control, if available, are true inactives from a previous discovery campaign. These can be used as a background against known actives to provide a ‘real-world’ benchmark of the performance of the system. Lastly, for a protein that has little or no known chemical matter (i.e., reported ligands), enrichment calculations with known actives against a background of property-matched decoys may be impossible. Here, the docking parameters can be calibrated by docking ‘Extrema’, which challenges the docking with extremes of physical properties, and ‘in-stock’ compound sets, which probe how the docking will perform on a representative subset of the library78,83. It remains true that, without known ligands as positive controls, one is at a substantial disadvantage at setting up docking campaigns, increasing the risk of failure.

#### Prospective screen

Once the docking model is calibrated, large libraries of molecules can be virtually screened against the target protein. For this virtual screen, it makes sense to focus on compounds that are readily available for testing. The ZINC20 database (http://zinc20.docking.org/) enumerates over 14 billion commercially available chemical products, of which ~700 million are available with calculated 3D conformer libraries ready for docking. Most of the enumerated compounds belong to the make-on-demand libraries of Enamine and WuXi. Further, ZINC20 allows one to preselect subsets of molecules for docking, reducing computation time. For instance, ZINC20 allows users to download ready-to-dock subsets of molecules within user-defined ranges of molecular weight (MWT), LogP and net charge, as well as predefined sets such as fragments (MWT ≤ 250 amu) or lead-like molecules (250 ≤ MWT < 350 amu; LogP ≤ 3.5). The result of a prospective screen is a list of molecules rank-ordered by docking score.

#### Hit-picking

A well-controlled docking calculation can concentrate likely ligands among the top-ranked molecules. But even if it was able to do so among the top 0.1% of the ranked library, in a screen of 1 billion molecules this would still leave 1 million molecules to consider, and with the errors inherent in docking, many of these will be false positives. Accordingly, we rarely pick the top N-ranked compounds by docking to test experimentally, but rather will use additional filters to identify promising hits within the top scoring 300,000–1,000,000 molecules. These filters can catch problematic features missed by the primary docking function, ensure dissimilarity to known ligands and promote diversity among the prioritized compounds.

Compounds may be filtered for both positive and negative features84 (Table 2). For instance, one may insist that a docked orientation has favored interactions with key residues. Conversely, molecules with strained conformations should be discarded. Molecules with unsatisfied, buried hydrogen-bond donors and acceptors may also be deprioritized. Compounds with metabolic liabilities85,86 or that closely resemble colloidal aggregators87 can also be filtered out, despite otherwise favorable scores. Further, as closely related compounds will likely dock in similar poses with similar score, we typically cluster compounds by 2D structure similarity after all other filters have been used and only select the best scoring cluster representative for testing. In such a large dataset, clustering can be computationally expensive, so reduced scaffold clustering such as Bemis–Murcko88 analysis is useful to efficiently parse the compounds. In principle, many of these filters could be included directly in the scoring functions, but balancing them against other scoring terms can demand extensive optimization and will likely increase the compute time, which will become a hindrance for ultra-large library docking. Lastly, visual examination of the docked poses has been useful for selection of compounds to purchase. Following the criteria in Table 2, we typically inspect up to 5,000 compounds visually after applying automated filtering and clustering steps.

#### Experiments to test docking hits

The success of a docking campaign is ultimately measured by its ability to reveal novel chemotypes that can be shown to experimentally bind to the target, typically in binding or functional assays (Fig. 4). However, common artifacts should be controlled for: chemotypes likely to interfere with specific assays89 (e.g., the controversial90 pan assay interference chemotypes), covalent adducts, redox cyclers91 and aggregators87,92.

Among the most common of these mechanisms is colloidal aggregation, which can account for >90% of all primary hits89,93. This aggregates sequester proteins with little selectivity, partially denaturing and inhibiting them32,92,94,95,96, occasionally even activating them97,98, a common problem both in HTS and also in docking screens92. Aggregators tend to have high LogP values and limited aqueous solubility, so we prefer to dock and test molecules with LogP ≤ 3.5. Chemical stability or reactivity can also contribute to experimental artifacts, and reactive scaffolds should be avoided. The ZINC20 database allows users to select screening libraries in the lead-like chemical space (i.e., low LogP) and exclude reactive compounds9,99. Unless controlled for in the experimental setting (Box 1), these artifacts can lead to false positives.

Once convinced that one’s hits are not artifactual, more detailed testing on-target is warranted. For all targets, this involves determining concentration response curves, typically with 7–12 points at half-log intervals, with the transition being sampled over at least two log orders of concentration (Fig. 4). For enzymes, determining a true Ki will illuminate mechanism, though this can be laborious and it might only make sense to do this for characteristic lead molecules. For receptors such as GPCRs and ion channels, functional assays are typically performed to determine if the new ligand is acting as an agonist, an antagonist or an allosteric modulator. For GPCRs, initial screening assays may differ from secondary confirmatory assays, and showing that a molecule is active in more than one assay is often quite useful.

Before initial hits are advanced for optimization, it is important for make-on-demand libraries to confirm the identity of the hit compound. Limitations of virtual library enumeration, chemical synthesis and downstream purification can result in mismatches between the in silico predicted compound and the in vitro tested compound. For example, in the screen against the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) macrodomain protein, a number of purine derivatives with an N9 substitution were requested but N3-substituted compounds were synthesized instead83. Accordingly, it is worth fully characterizing promising compounds before costly lead optimization is undertaken; we ourselves have made the mistake of designing analogs based on an in silico starting scaffold that was different than the true hit delivered from the chemists. At a minimum, the identity and purity of the compounds should be determined. High-resolution mass spectrometry or quantitative 1D proton nuclear magnetic resonance spectroscopy can be used to detect gross variation between the tested sample and the expected molecule; these data can be obtained from most vendors, or the experiments could be performed independently.

The most direct experiment to test a docking pose prediction is the determination of an experimental protein structure in complex with the docking hit (Fig. 4). Such structures illuminate crucial details of ligand recognition, including adjustments of binding site residues or the docking hit in the binding pocket12,100,101. The key question to answer with high-resolution structures is whether docking worked for the right reasons, i.e., does the predicted binding mode agree with an experimental structure? Previous studies suggest that hits from virtual screening generally compare fairly well with experimental high-resolution structures; i.e., key anchor points are predicted correctly12,83,102,103. However, as docking is typically performed against rigid protein structures, conformational changes, especially in flexible loops, will complicate pose prediction104. Water-mediated protein–ligand interactions are generally difficult to explore de novo45,105, though experimentally determined and structurally conserved water molecules can be included in virtual screens100. Lastly, there are few cases where docking hits showing on-target activity revealed binding positions at unexpected and nontargeted subsites. For example, the β2 adrenergic receptor allosteric modulator AS408 was predicted to bind to an extracellular subpocket but crystallographically determined to bind to the membrane-facing surface of the receptor106. In contrast, the pose of an inverse agonist also identified at the β2 adrenergic receptor from in silico docking39 was confirmed by X-ray crystallography107, demonstrating the variability in outcomes even at the same target.

If the protein target can be readily expressed and purified in milligram quantities, crystallography and cryo electron microscopy can guide hit discovery early in the campaign. In campaigns against targets where high-resolution structures are more challenging to obtain (e.g., for transmembrane receptors), an experimental protein–ligand complex might only be achievable for the most potent hit compound. Nonetheless, it is usually worth the effort to confirm the predicted binding site and ligand pose, or to identify unexpected interactions between the discovered hit and the target106.

#### Next steps: selecting analogs for hit to lead optimization

In the fortunate event that the virtual screen was successful and docking hits were confirmed by different experiments, the newly obtained scaffolds are blueprints for exploring structure–activity relationships and lead optimization.

A concern about molecules from make-on-demand libraries is that they will initially be delivered as racemic or stereomeric mixtures since the production of pure stereoisomers requires more sophisticated synthesis routes. If confirmed hits were observed from stereomeric mixtures, the measured potency of the mixture may be artifactually lower if only one stereoisomer is active at the target (e.g., the concentration of the active stereoisomer is less than the total mixture concentration). Purified stereoisomers can typically be purchased from the make-on-demand compound supplier or separated in-house.

Hit optimization can be performed in several ways. Synthetic chemistry groups may obtain synthesis routes of the parent compound from the supplier allowing the generation of medicinal chemistry-inspired analog series108. Alternatively, chemoinformatic tools can be employed to virtually search the purchasable chemical space for structurally related scaffolds or identifying molecules with a common substructure to the parent hit compound (analog-by-catalog). Searching for similar molecules within the Enamine REAL space can be conducted using the SmallWorld (sw.docking.org) search engine99. Molecules with an identical substructure as the parent compound can be found using Arthor (arthor.docking.org)99. Additionally, the supplier of the parent compound might be able to provide a collection of molecules resembling the hit scaffold. While we cannot provide clear guidance or a best-practice protocol for hit-to-lead optimization, analog-by-catalog of docking hits obtained for the melatonin 1 (MT1) receptor and AmpC led to compounds with improved potencies compared with the initial parent compounds12,13. Currently, we recommend subtle changes to the starting compound or modifications to test particular interactions as suggested from the docked pose.

It is our hope that the above guidelines will be useful for outlining a docking campaign from start to finish. As shown, the prospective docking step is actually only one small component of the overall pipeline (Fig. 1). Control calculations using retrospective datasets and docking setup optimization make up the bulk of the process. Hit picking from the prospective screen requires careful perusal of the data, and experimental design of the confirmatory assays is critical for defining success of the campaign. We ourselves find that these guidelines insulate against the more common sources of failure in large library docking campaigns. Lastly, we want to mention that docking campaigns against protein targets without experimental structure, i.e., requiring homology modelling, or without known active chemical matter for retrospective control calculations are particularly risky and should not be initiated naively.

This concludes our general guidelines for large library docking. In the last section, we turn to a detailed protocol to set up, optimize and prospectively screen a target of interest using DOCK3.7, though any docking program can use the controls presented.

While the protocol should be general for most proteins, we provide example data from a recent campaign against the MT1 receptor13, a target particularly well suited for docking: a crystal structure had been determined; the orthosteric pocket is compact and almost completely encloses the ligands, simplifying the biophysics; many such ligands exist for retrospective calculations and for optimization; and in vitro assays to test docking hits were well established. We note that the MT1 receptor is ideal for large library docking, and so the achieved hit rate, and the hit potencies, were unusually high. Still, the docking optimization strategies below have been useful against a wide spectrum of targets, including the Nsp3 macrodomain of SARS-CoV-283 and highly solvated pockets as in β-lactamase12, and were developed against the spectrum of targets in the DUD-E benchmark77, ranging from ion channels to kinases to soluble enzymes. It should be clear that the optimization strategies sketched remain rooted in retrospective controls, and so will work best against targets with precedented ligands, and against targets with well-formed and readily liganded binding sites.

### Docking campaigns with DOCK3.7 and ZINC20

In DOCK3.7, ligands are placed in the target pocket by mapping ligand atoms onto predefined hot-spots, so-called matching spheres. Matching spheres are generated from the coordinates of heavy atoms from an input bound ligand structure, if available, and supplemented with coordinates based on the negative image of the binding site generated from the program SphGen34. Ligand rigid fragment (e.g., rings) are mapped onto matching spheres using a bipartite graph algorithm79,109. Different ligand conformations and orientations are sampled in the binding pocket using precalculated 3D conformer libraries (flexibases, db2 files)9,99. During conformer library building, each molecule is divided into its rigid fragments, and different conformations of rigid fragment substituents are generated.

Ligand poses are evaluated using a physics-based scoring function (Escore) combining VDW, electrostatic (ES) and ligand desolvation (lig_desol) energy terms (Equation 1):

$$E_{score} = E_{VDW} + E_{ES} + E_{lig\_desol}$$
(1)

In order to allow rapid scoring of new poses (typically 1,000 poses per molecule per second), contributions of the target protein pocket are mapped onto pregenerated scoring grids. The VDW surface of the binding pocket is converted into a scoring grid by ChemGrid15. Electrostatic potentials within the binding pocket are estimated by numerically solving the Poisson–Boltzmann equation using QNIFFT16,110. Context-dependent desolvation energy scoring grids, both polar and apolar, are generated by the Solvmap program17. Thereby, ligand desolvation energies are computed as the sum of the atomic desolvation of each ligand atom scaled by the extent to which it is buried within the binding site. Atomic desolvation energies for ligand atoms are calculated during ligand building using AMSOL111 and are included in the ligand conformer library. Both, the electrostatic and ligand desolvation scoring grids depend on the dielectric boundary between the low dielectric protein (εr = 2) and high dielectric solvent (εr = 80) environments. Consequently, these scoring grids can be fine-tuned by modulating the protein–solvent dielectric interface (see below Steps 41–44).

The following protocol (Fig. 5) prepares a starting structure (Steps 1–4), generates the scoring grids and matching spheres (Steps 5–10), and collects a set of control ligands for model evaluation (Steps 11–33). Control ligands are docked retrospectively (Steps 34–58) to determine the model’s ability to identify actives from a pool of inactives. Optimization of sampling (Steps 59–64) and scoring (Steps 65–77) is evaluated with the same controls. Model biases are evaluated using an Extrema control set (Steps 78–88) to examine for charge preferences in the scoring grids and with a small-scale in-stock screen (Steps 89–97) to ensure a computationally expensive large-scale prospective screen (Steps 98–103) is likely to yield interesting chemical matter. Hit picking (Steps 104–108), though critical to success of these prospective experiments, is beyond the scope of the protocol as it is highly user- and target-specific, and we suggest referring to Table 2 for insight. Additionally, we caution against regarding hits from a screen as true positives until common artifacts are ruled out (Box 1). All controls in silico and in vitro are applicable to any docking program or computer-aided drug discovery campaign. The details of grid preparation and modification are DOCK3.7-specific, but the principles are transferrable to other software.

## Materials

### Software

• DOCK3.7.5: apply for a license from http://dock.docking.org/Online_Licensing/dock_license_application.html. Licenses are free for nonprofit academic research. Once your application is approved, you will be directed to a download for the source code. The code should run without issue on most Linux environments, but can be optimized by recompiling with gfortran if needed. Questions related to installation can be addressed to dock_fans@googlegroups.com

• Python: the current code uses python2.7, which will also need to be installed. Additional python dependencies that are required for running scripts can be found in the file $DOCKBASE/install/environ/python/requirements.txt • AMSOL: free academic licenses can be obtained from https://comp.chem.umn.edu/amsol/ • (Optional) 3D ligand building software: if interested in 3D ligand building in-house (not necessary for this protocol), licenses will also need to be obtained for ChemAxon (https://chemaxon.com/), OpenEye Omega (https://www.eyesopen.com/omega) and Corina (https://www.mn-am.com/products/corina). We note that, for many campaigns, 3D molecular structures with all necessary physical properties may be downloaded directly from ZINC20. This tutorial makes use of a webserver for 3D ligand building that is suitable for small control sets. Please apply for an account at https://tldr.docking.org/, which is free of charge • Chimera: this application is recommended for grid visualization with the VolumeViewer feature (see Troubleshooting). The ViewDock feature, also within Chimera, is useful to examine the docking results ### Equipment • Hardware: initial tests of this tutorial (grid preparation and small retrospective screens) can be performed on a single workstation, but more intensive docking will need access to a high-performance computing cluster (we regularly use 1,000 cores) • Queuing system: DOCK3.7 comes with submission scripts for SGE, Slurm and PBS job schedulers. If using a different scheduler, scripts may need to be adapted for your specific computing cluster • Protein structure: structures can be downloaded from www.rcsb.org or generated by the user in the case of homology models • Example data: an example set of files used in this protocol including ligand and decoy sets, default docking grids and optimized docking grids can be downloaded from http://files.docking.org/dock/mt1_protocol.tar.gz. The example dataset uses the MT1 structure (PDB: 6ME3) co-crystallized with 2-phenylmelatonin ### Equipment setup #### Define the pathway to critical software The environment will need to be defined in order to run the following commands correctly. The two most important paths that need to be defined are to DOCKBASE and python. We provide an environment script in the example dataset that will need to be modified for a user’s settings. Always run the following command with updated paths specific to your file directory before working with DOCK3.7: source env.csh ## Procedure ### Section 1: set up binding pocket for docking calculations Timing 30 min to 1 h 1. 1 Download the structure from the PDB or use in-house structures. Preference should be given to structures of high resolution and with a ligand bound in the site that will be docked into. Apo structures are useable, but tend to yield poorer results due to unconstrained binding site geometries32. 2. 2 Visualize the binding pocket in PyMOL or Chimera to decide on the relevance of protein cofactors. • Delete components from the structure that do not contribute to ligand binding such as lipids, water molecules and buffer components • Fusion proteins engineered for protein stability that are near the binding pocket should be removed, and the resulting chain break should be either capped or the loop remodeled • Protein cofactors such as heme or ions should also be kept or discarded depending on their relevance to ligand binding in the pocket • Additionally, examine the residues in the binding pocket for incomplete sidechain, multiple rotamers or engineered mutations, and revert or rebuild as necessary • It is helpful to examine the pocket with the electron density if available for making these decisions. For the example MT1, we deleted the fusion protein, capped non-native termini and reverted two mutations in the binding pocket (6me3_cleaned.pdb) 3. 3 Save the rec.pdb file. This file will contain any component that will remain static during the docking such as structural waters and cofactors. 4. 4 Save the xtal-lig.pdb file. This file will contain the atoms of the bound ligand, which will be used to generate the matching spheres that guide ligand sampling in the pocket. 5. 5 Generate the scoring grids and matching spheres with blastermaster from these two inputs (rec.pdb and xtal-lig.pdb)$DOCKBASE/proteins/blastermaster/blastermaster.py --addhOptions=" -HIS -FLIPs " –v

Troubleshooting

6. 6

Inspect the output, in particular the rec.crg.pdb file located in the new working directory to ensure protonation of polar residues and side chain flips of glutamine and asparagine side chains look accurate. Examine histidine tautomers (HID, HIE and HIP) as well. If everything looks as it should, proceed to Step 10; otherwise, proceed to Step 7.

#### Generating a rec.crg.pdb file manually

1. 7

If the automatically protonated rec.crg.pdb file does not fit to the expected protonation state of key residues, the rec.crg.pdb file can be generated manually using various protein modeling software packages including Rosetta112, Chimera50 or Maestro113. This may include manually flipping side chain rotamers and setting the pH value for pKa calculation of charged residues. After the modeling step is completed, new rec.pdb, rec.crg.pdb and xtal-lig.pdb files need to be provided to blastermaster. To be compatible with the united atom AMBER force field, rec.crg.pdb should only contain polar hydrogen atoms, and explicit histidine tautomer names. rec.pdb and xtal-lig.pdb only need to contain heavy atom coordinates. In Step 8, we provide a script that produces the required files.

2. 8

Assuming the protein modeling software resulted in a fully protonated protein–ligand complex, protein atoms should be listed as ATOM records, while ligand atoms should be listed as HETATM records. Further, ensure that cofactors are given the correct heading in the PDB file (i.e., static metal ions should be given ATOM records if they are to be included in the scoring grids).

bash $DOCKBASE/proteins/protein_prep/prep.sh protonated_input.pdb$PWD

The outputs are new rec.pdb and xtal-lig.pdb files, as well as a rec.crg.pdb file inside a working directory. HID, HIE and HIP naming is generated automatically from the protonation state of the HIS residues and again should be checked for accuracy.

Note: this script requires a path to the Chimera binary file and may need to be modified by the user.

3. 9

Generate scoring grids and matching spheres from the new files (rec.pdb, xtal-lig.pdb, and working/rec.crg.pdb), and examine the outputs as in Step 6:

$DOCKBASE/proteins/blastermaster/blastermaster.py --addNOhydrogensflag –v #### Checking the files 1. 10 Check that all files are generated. At the end of a successful blastermaster run, the directory should contain an INDOCK file and two directories: working and dockfiles. The working directory contains all the intermediate files used for grid generation. The dockfiles directory contains the scoring grid files and matching spheres file. The INDOCK file contains all parameters to control the docking program, such as sampling options and location of input files, i.e., docking grids and 3D ligand conformer libraries (see the INDOCK Guide in Supplementary Information). In our experience, the automated grid generation with blastermaster.py successfully produces reliable docking parameters; however, nonstandard amino acids or particular atom types require additional information and adjustments of force field parameters. Instructions on how to check and adopt the grid generation are provided in the Troubleshooting section below and in the Blastermaster Guide (Supplementary Information). For the provided example files generated for MT1, the protein–ligand complex was modeled using the automated preparation pipeline in Maestro, Schrodinger48. ### Section 2: collect and build the control ligand set Timing Minutes to hours for ligand building, depending on the number of molecules 1. 11 Collect the known actives (positive controls) for retrospective analysis. For a given target, these can be found in the scientific literature, patent literature or public databases such as IUPHAR/BPS114, ChEMBL115 or ZINC9,99,116, or available in-house. #### Curate the actives list #### Critical While it may be possible to find dozens of actives, it is likely that many come from the same chemical series. For a rigorous control analysis, redundant (i.e., highly similar) compounds should be clustered and the most potent compound selected. 1. 12 Sort all knowns by potency. 2. 13 Cluster these based on 2D similarity using any preferred method. We suggest calculating clusters based on ECFP4 Tanimoto similarities with a cutoff of 0.35. 3. 14 Use the most potent compound from each cluster as the representative of that scaffold. It is best to refine these actives to a set that is representative of the chemical space intended for prospective screening (i.e., with limits on molecular weight) so that the retrospective analysis will match the prospective aims. 4. 15 Find out whether the known ligands of the target are neutral or charged as one criteria of the docking parameter calibration is the ability to score ligands with corresponding charges well. 5. 16 Save a final list of actives’ SMILES as ligands.smi. Typically, 10–30 diverse actives represent a good control set. For targets with less than this value, the controls are still useful but may not be as informative. For the example of MT1 we extracted a set of 28 agonist and antagonists from the IUPHAR/BPS database13. #### Build the 3D conformer library of the actives 1. 17 Go to the ‘build3d’ application on tldr.docking.org79. 2. 18 Add the ligands.smi file to the ‘Input’ section. 3. 19 Select ‘db2’ for the file type. 4. 20 Click ‘Go’ to submit. 5. 21 Download the build3d_results.tar.gz file, and move it to your work directory. 6. 22 Decompress the tar file: tar xvfz build3d_results.tar.gz 7. 23 The results are in a build3d_results/ directory with two subdirectories failed/ and finished/ indicating the build status of the compounds. In example data provided, all ligands were built to completion. 8. 24 To build a split database index file, or path file to the ligands, use the following command: ls -d$PWD/build3d_results/finished/*.db2.gz > actives.sdi

9. 25

To generate a list of actives’ IDs:

ls build3d_results/finished/*0.db2.gz | awk -F"/" '{print $NF}' | awk -F"_" '{print$1}' > actives.id

#### Curate and build the property-matched decoy set

1. 26

Go to the ‘dudez’ application on tldr.docking.org78.

2. 27

Submit the ligands.smi file to the input section.

3. 28

Click ‘Go’ to start the calculation. The ZINC database will be scanned for compounds that match the following six properties of the input ligands: molecular weight, LogP, charge, number of rotatable bonds, number of hydrogen bond donors, and number of hydrogen bond acceptors. Each compound will then be compared with the actives for 2D similarity, with similar compounds being discarded. A final set of 50 decoys per input active will be calculated.

4. 29

5. 30

Extract the files:

tar xvfz decoys_dudez.tar.gz

The files will live in a new directory new_decoys.

6. 31

To obtain the split database index file of the decoys:

ls -d $PWD/newdecoys/decoys/*.db2.gz > decoys.sdi 7. 32 To obtain a list of all the decoys’ IDs: awk ‘{print$2}’ newdecoys/decoys.smi > decoys.id

8. 33

Collect output files. At this point, four files should be generated: actives.sdi, actives.id, decoys.sdi and decoys.id.

### Section 3: run retrospective docking calculations to test the binding pocket parameters

Timing Minutes to hours depending on number of molecules and compute cores

1. 34

In the directory where blastermaster was run (i.e., contains the INDOCK file and dockfiles directory), copy over the ligand and decoys .sdi files and .id files.

2. 35

Combine the two .sdi files:

cat actives.sdi ligands.sdi » controls.sdi

3. 36

Check values in the INDOCK file

• Set the atom_maximum to 100. This value is a hard cutoff such that if a ligand has more than this number of heavy atoms it will not be docked. For retrospective calculations, we want all ligands to be docked and scored regardless of size, so we use a large value here

• Set the mol2_maximum_cutoff to 100. This value is a cutoff for saving 3D coordinates of docked poses. If a ligand scores worse (i.e., more positive) than this cutoff, the pose will not be saved. Setting a low value is useful in large-scale prospective screens to save on disk space and computation time, but for retrospective screens we want all information saved for analysis

4. 37

Set up the docking directory

$DOCKBASE/docking/setup/setup_db2_zinc15_file_number.py ./ controls controls.sdi 1 count This script will separate the controls.sdi file into one directory called controls0000. For this first calculation, the size of the sdi file is manageable on a single core, though splitting it will be faster (i.e., 10). For control calculations with 10,000–100,000 ligands and for large-scale prospective screens, the sdi file should be split into multiple directories and the jobs distributed over multiple cores. A dirlist file is generated that lists all of the split directories prepared for docking. 5. 38 Dock the control set. At this point, it is possible to submit the data to a computer cluster or do the calculation on a single computer. 1. (A) Docking without submitting to a cluster 1. (i) Move into the controls0000 directory. You will find INDOCK (pointing to the dockfiles/ directory) and a split_database_index file. These are all the inputs that the DOCK program needs to run the calculation. 2. (ii) Run the docking calculation:$DOCKBASE/docking/DOCK/bin/dock.csh

Troubleshooting

3. (iii)

The output files from the docking program are OUTDOCK, listing docking scores of successfully docked molecules, computational performance or potential error messages, and a zipped mol2 file (test.mol2.gz) containing the 3D poses of docked molecules.

4. (iv)

Move back into the directory that contains dirlist when done.

2. (B)

Docking with submitting to a cluster

1. (i)

In the directory that contains the dirlist (created in the previous step), run one of the following commands depending on cluster architecture:

• SGE: $DOCKBASE/docking/submit/submit.csh • Slurm:$DOCKBASE/docking/submit/submit_slurm.csh

• PBS: $DOCKBASE/docking/submit/submit_pbs.csh #### Critical step If you use another cluster scheduler, adopt these scripts to your needs. 6. 39 When the job has completed, the last line of the OUTDOCK file should contain an elapsed time. If this is not the last line of the OUTDOCK file, an error has occurred (see Troubleshooting) and the following script will not run. Troubleshooting 7. 40 Extract the scores of all docked compounds python$DOCKBASE/analysis/extract_all_blazing_fast.py ./ dirlist extract_all.txt 100

The arguments are the dirlist, the name of the file to be written (extract_all.txt) and the max energy to be kept (this value should match the mol2_maximum_cutoff value in the INDOCK file). The important output file for future analysis is the extract_all.sort.uniq.txt file, containing the rank-ordered list of docked compounds with only the highest score for each compound.

8. 41

Get the poses of the docked compounds

python $DOCKBASE/analysis/getposes_blazing_faster.py ./ extract_all.sort.uniq.txt 10000 poses.mol2 test.mol2.gz The arguments are the path where the docking is located (./), the name of the extract_all.sort.uniq.txt file, the number of poses to get (10000, i.e., set to larger than the number of compounds docked for retrospective calculations since we want to get all), the name of the output mol2 file (poses.mol2), and the name of the input mol2 files (test.mol2.gz), containing 3D coordinates of predicted poses from the docking calculation (located in the controls0000 directory). 9. 42 Get the poses of just the actives python$DOCKBASE/analysis/collect_mol2.py actives.id poses.mol2 actives.mol2

The arguments are the file containing active IDs, the pose file containing all of the compounds both actives and decoys, and the name of the output file to be written.

10. 43

Calculate the LogAUC early enrichment

python $DOCKBASE/analysis/enrich.py -i . -l actives.id -d decoys.id Inputs are the working directory (‘.’ if in the directory where the extract_all file is located) and the two files containing the IDs of actives and decoys. The output is a roc.txt and roc_own.txt file. Both files report an AUC and LogAUC and contain the grid points for plotting receiver operator curve (ROC) plots. The roc.txt file is calculated over all inputs in the ID files, and the roc_own.txt file is calculated only over the compounds that successfully docked. 11. 44 To generate a plot (roc_own.png) of the roc_own.txt file: python$DOCKBASE/analysis/plots.py -i . -l actives.id -d decoys.id

12. 45

Calculate the charge distribution by DOCK score

python $DOCKBASE/analysis/get_charges_from_poses.py poses.mol2 charges Inputs are the poses files and the name of the output file to be written. The output charges file is a list of DOCK scores and the charge of the ligand with that score. 13. 46 To generate a plot of total DOCK score by ligand charge (charge_distributions_vs_energy.png) run: python$DOCKBASE/analysis/plot_charge_distribution.py charges

14. 47

Plot the contribution of each score term in the energy function outlined in Equation 1 (energy_distributions.png):

python $DOCKBASE/analysis/plot_energy_distributions.py extract_all.sort.uniq.txt 15. 48 Collate the output files. The important outputs of these steps are: actives.mol2, roc_own.png, energy_distributions.png and charge_distributions_vs_energy.png. #### Evaluate the docked poses of the actives 1. 49 Open the rec.pdb and xtal-lig.pdb in Chimera, and prepare the visualization of the binding pocket to the user’s preference. 2. 50 In the Tools tab, select Structure/Binding Analysis and click ViewDock. 3. 51 In the menu, navigate to your directory and open actives.mol2. 4. 52 For file type, select any of the DOCK options. This will open a window (ViewDock) to navigate through all of the molecules in this file such as in Fig. 6a. 5. 53 Under the ‘Column’ tab, several details about the docked ligands can be listed, e.g., Total Energy, Electrostatic or Van der Waals terms. 6. 54 In the Tools tab, under Structure/Binding Analysis, ‘Find HBonds’ can be used to identify hydrogen bonds between the protein and ligand molecules. 7. 55 Evaluate the docked poses, asking the following questions: • Do the ligands occupy the part of the pocket as expected? • Do they make the types of interactions anticipated from what is known about the ligands and the pocket? • Are there aberrant or unsatisfied interactions? For example, in MT1, it is known from the crystal structure that ligands can form hydrogen bonds with Asn162 and Gln181. Issues with binding poses are usually a sampling problem, but scoring terms such as electrostatics can influence specific interactions • If many actives do not dock, is there a reasonable explanation (e.g., selected controls are larger than the pocket volume as has been observed when attempting to dock antagonists into agonist pockets)? Optimization of sampling is performed with a matching sphere scan as detailed in Steps 59–64 #### Evaluate the enrichment and scoring metrics 1. 56 Evaluate the docking parameters’ ability to discriminate actives from decoys. The roc_own.png file shows the plot of the rate of finding actives as a function of decoys. The AUC is indicative of this discriminatory power (Fig. 6b). A positive LogAUC value is a sign that actives are enriched over decoys, a value near 0 represents random selection, and a negative value demonstrates that the model prefers decoys over actives. This may be a result of poor poses, improper scoring or both. It is best to optimize poses before pushing forward on scoring discrimination. Even for good LogAUC values, it is important to evaluate the poses as in Step 55 as it is possible to get good scoring actives that do not dock in the correct pose. 2. 57 Examine the energy contributions of the docked poses. In the energy_distribution.png file (Fig. 6c), the total DOCK scores for the docked poses are broken down into the main components of the scoring function: electrostatics, VDW and polar ligand desolvation. Do the contributions of the various score terms match the properties of the binding pocket? In the MT1 pocket, the VDW interactions dominate because of its largely hydrophobic nature. For targets forming salt bridges to ligands, electrostatic terms should at least be balanced with VDW scores. If the balance does not match with what is expected for the pocket, the strength of the scoring terms can be modified in Steps 65–68. 3. 58 Evaluate the charge preference of the docking parameters. If the electrostatics term dominates in Step 57, the scoring function will likely prefer highly charged ligands. This can be measured in the charge_distributions_vs_energy.png plot (Fig. 6d), which shows the charge state of the top scoring molecules. DUDE-Z decoys should have charges matching the active ligands. However, in property-unmatched decoys such as the Extrema set78 (Steps 78–88) and in-stock set (Steps 89–97), it is important to ensure the docking is not biased toward charge extremes, which can suggest an overweighting of electrostatic terms; this can be addressed in Steps 65–68. ### Section 4: optimize poses by modifying matching spheres Timing Minutes to build; ≤1 h to dock 1. 59 Create a new directory matching_sphere_scan/ that contains the INDOCK file and working/ and dockfiles/ directory from the previous directory. 2. 60 Run the following script to create sets of matching spheres in which the matching spheres that do not map to atoms in the original xtal-lig.pdb file are perturbed as in Fig. 7a: (~1 min) python$DOCKBASE/proteins/optimization/scramble-matching-spheres.py -i $PWD/dockfiles -o$PWD -s 0.5 -n 50

The arguments are: -i, the path to the dockfiles/ directory; -o, the path to where new docking directories should be created; -s, the maximum distance to move a matching sphere (0.5 recommended); -n, the number of perturbed sphere combinations to create. For a single receptor target, we recommend 50–100 sets of sphere sets.

3. 61

Each new directory should contain an INDOCK file and dockfiles/ directory. Check that the matching_spheres.sph file in the dockfiles/ directory is unique to each new directory created.

4. 62

For each directory, copy over the controls.sdi file.

5. 63

Run the docking calculation and extract the poses as in Steps 37–48.

6. 64

Examine the new actives.mol2 files and LogAUC values from the different matching sphere sets. Ideally, a set of spheres will increase the LogAUC and improve the binding poses of the actives. For MT1, the LogAUC increased from ~2 to 5. While not a large change, there was improved placement of the flexible components of the active ligands, which suggests better sampling and pose identification in a prospective screen.

If the poses did not improve during this matching sphere scan, it may indicate that there is a problem with the binding pocket model, the ligand set, the sampling of ligand conformations, or the placement of the crystallographic matching spheres. Examining each of these can help improve the binding poses in this first control calculation. Alternatively, the scoring function needs to be optimized to score the expected pose better (see Steps 65–68).

Troubleshooting

### Section 5: optimize ligand scoring by modulating the protein–water dielectric interface (adding a layer of dielectric boundary spheres)

Timing 15 min on cluster; 20–30 min per radius on local machine

1. 65

Create a new directory with the INDOCK file and working/ and dockfiles/ directories from the best matching sphere set (or the default settings if no matching sphere scan was run).

2. 66

Generate the boundary modifying spheres (Fig. 7b)

$DOCKBASE/proteins/optimization/boundary-sphere-scan.sh -b$PWD -v

This version of the script will run on an SGE cluster. Modify the submission script and scheduler as needed for different cluster architecture. To run on a local machine, use the script:

$DOCKBASE/proteins/optimization/boundary-sphere-scan_no_cluster.sh -b$PWD -v

3. 67

Combine the different radii boundary spheres (5 min):

python $DOCKBASE/proteins/optimization/combine-grids.py -b$PWD

This script generates a set of combinations of boundary modifying spheres across both ligand desolvation and low dielectric spheres. These directories have everything needed to run individual retrospective calculations except for the controls.sdi file that will need to be copied into each one. Repeat the retrospective calculations from Steps 37–48.

4. 68

Following retrospective calculations across the combinations of dielectric boundary sphere radii, select a combination that ideally improves LogAUC (Fig. 6b) and helps to balance the score distributions in the energy_distribution.png plot (Fig. 6c). If there are a number of equally performing combinations, subsequent control calculations can help identify the best set for your system (see Steps 78–97). In the case of MT1, we chose electrostatic spheres of radius 1.9 and desolvation spheres of radius 0.1 as they increased LogAUC and enhanced the charge distribution of compounds to favor neutrals to match known MT1 ligands (Fig. 6d).

### Section 6: (optional): polarize (or depolarize) residues to effect electrostatics

Timing 15 min

#### Critical

If after these scans a particular interaction is still missed or erroneously captured, it may be necessary to modify the partial charges of a particular residue. For example, in the case of MT1, the docking scores are largely dominated by VDW interactions (Fig. 6c). However, two residues Gln181 and Asn162 form hydrogen bonds with the known ligands. For these specific interactions, a global modification to the dielectric boundary (Steps 65–68) may not be sufficient and, instead, a local modification to partial charges as outlined in Steps 69–77 may enhance favorable scores for these interactions. For targets where this is not necessary, proceed to Step 78.

1. 69

Make a new directory for polarizing residues. Copy over rec.pdb, xtal-lig.pdb, working/rec.crg.pdb, working/amb.crg.oxt and working/prot.table.ambcrg.ambH from the previous docking setup.

2. 70

In the rec.crg.pdb file, rename the residues to be polarized with a unique three-letter amino acid code. For example, in the MT1 receptor, we chose to polarize GLN181 and ASN162 to enhance the polar interactions between actives and these hydrogen bonding residues. Accordingly, we renamed GLN181 as GLD and ASN162 as ASM (see Fig. 8).

sed -i 's/GLN A 181/GLD A 181/g' rec.crg.pdb sed -i 's/ASN A 162/ASM A 162/g' rec.crg.pdb

#### Critical

The two files that read in partial charges are the amb.crg.oxt and prot.table.ambcrg.ambH files located withing the working/ directory. These files contain default partial charges for all amino acids and some special residues such as ions. Make the following changes in both amb.crg.oxt and prot.table.ambcrg.ambH files.

1. 71

Create new sections for the polarized residues in the two files by copying the standard residue partial charges.

2. 72

Rename the polarized residues to match the three-letter codes used in the rec.crg.pdb file. The names are all capitalized in the prot.table.ambcrg.ambH file as shown in Fig. 8 and all lowercase in the amb.crg.oxt file.

3. 73

Redistribute the charge around the atom of interest, making sure that the net charge remains the same. We suggest testing modifications of 0.2 or 0.4 charge units for a given atom. For example, for an ASN-to-ASM change, where we want to enhance the electronegativity of the sidechain carbonyl (OD1), increase the charge by 0.4 units, resulting in a change from −0.470 to −0.870. As −0.4 charge was added to the residue, +0.4 must be distributed to other atoms in the residue to maintain the net charge. One option is to add +0.2 to each of the sidechain amide hydrogens (HD21 and HD22) as in Fig. 8, though the charge could have been distributed to the backbone amide or another atom not involved in binding.

#### Running the grid generation

1. 74

Before running the grid generation, check that the following files are present: rec.pdb, xtal-lig.pdb, the modified amb.crg.oxt and prot.table.ambcrg.ambH files, and a working/ directory with the modified rec.crg.pdb file.

2. 75

Run blastermaster with the following command:

$DOCKBASE/proteins/blastermaster/blastermaster.py --addNOhydrogensflag --chargeFile=amb.crg.oxt --vdwprottable=prot.table.ambcrg.ambH –v Troubleshooting 3. 76 Rerun control calculations as in Step 37–48. When examining the new poses, consider the following questions: • Does the polarized residue promote or discourage the desired interaction? • Are the new interactions scored more favorably? 4. 77 Rerun the dielectric boundary modifying sphere scan in Steps 65–68. We recommend doing this because the new polarized residue(s) will have altered the electrostatics grid. It may be necessary to test different combinations and strengths (0.2, 0.4) of the polarized residues to identify the best-performing set of parameters that enhances correct interactions, improves poses and ideally improves LogAUC. However, due to the modification to the electrostatic potential introduced by these changes, it is important to ensure these improvements do not come at the cost of biasing the screen towards overly charged molecules. This bias can be checked in the next section Steps 78–88. ### Section 7: Extrema charge control calculations Timing 2–3 h for compound building #### Generate the Extrema set #### Critical Compounds in the extrema set come from the area of chemical space occupied by the known actives (molecular weight and LogP) but distinct charges (−2, −1, 0, +1, +2). These decoys will be used to measure biases introduced into the scoring function owing to modifications to the electrostatic potential. 1. 78 On tldr.docking.org, navigate to the ‘extrema’ application. 2. 79 Upload the SMILES of your active ligands used in section II to the webpage. 3. 80 Select 500 or 1,000 molecules per charge, and click the ‘Go’ button. 4. 81 After a few minutes, a results file will be available to download and extract. Within the file should be five SMILES files grouped by charge. Extract the SMILES file from the download: unzip extrema_results.zip 5. 82 Build the extrema set. 6. 83 For each SMILES file, submit the file to the ‘build3d’ application on tldr.docking.org. Depending on the number of ligands in each file, and the activity on the website, each file should take 2–3 h to build. 7. 84 When all sets are built and downloaded, combine the path to all compounds in a split database index file along with the known actives. #### Run the extrema set 1. 85 Navigate to a directory containing the INDOCK file and dockfiles/ directory that you want to test. The dockfiles/ directory should contain the set of matching spheres and the dielectric boundary sphere combination selected from the previous steps. 2. 86 Prepare the docking run. In the previous steps, the control ligand set was small enough to run in a single run. This control set often has 10,000+ compounds, and it is best to split this into multiple jobs. To do this run:$DOCKBASE/docking/setup/setup_zinc15_file_number.py ./ extrema extrema.sdi 100 count

3. 87

Submit the docking calculations over a cluster as in Step 38B.

4. 88

Repeat the analysis in Steps 39–48 with a focus on the charge_distributions_vs_energy.png plot (Fig. 6d). An optimal set of docking parameters will have good LogAUC with early enrichment, and the top-ranking compounds will contain charges that match the charges of the known actives. Sometimes multiple boundary-modifying sphere combinations will be tested in this Extrema calculation to find a set that enhances the scoring preference for compounds similar to knowns.

### Section 8: run a ‘small’-scale in-stock screen

1. 89

To collect an in-stock screening library, go to http://zinc20.docking.org/tranches/home/.

2. 90

At the top of the tranche viewer (Fig. 9), select ‘3D’ representation, set Purchasability to ‘In-stock’, and select the charge states of interest to your target (i.e., only 0). Next, select the molecular weight range you want to screen. Finally, select a LogP range tailored to your target. If no specific LogP range is desired, we recommend LogP ≤ 3.5 to avoid insoluble compounds that might aggregate and yield false positives in the experimental assay.

3. 91

When only the tranches of interest are selected, click the download button in the upper right corner of the screen. Select ‘DOCK37’ as the download format and ‘cURL’ or ‘WGET’ as the download method. When done, click ‘Download’ to obtain a text file with all the cURL or wget commands needed to download the set of selected tranches.

4. 92

Prepare a split database index file with the path to all of the newly downloaded compounds.

5. 93

Split the docking run over 1,000 jobs to efficiently run the screen as the number of compounds is usually ~1–4 million.

$DOCKBASE/docking/setup/setup_zinc15_file_number.py ./ instock instock.sdi 1000 count 6. 94 Run the screen on a cluster as in Step 38B. 7. 95 Run the analysis of the screen to generate a LogAUC. It should be on par or better than LogAUC values seen previously, as the set of decoy molecules grows in size and departs further from the set of known ligands. 8. 96 Test out hit picking strategies. Use any metric for filtering hits (Table 2), and visually examine the hits that pass the selected filters. • Are the compounds forming the expected interactions with the binding pocket? • Are they sampling in the correct region of the pocket? If there are any issues, determine if it is a sampling or scoring problem and alter the previous steps (matching spheres or dielectric boundary modifying spheres, respectively) to optimize. If the top poses capture expected interactions, then it is worth moving into a large-scale prospective screen. 9. 97 Choose settings for the large-scale docking experiment. If multiple docking parameters were tested in this section (i.e., section 8), one method for selecting the best settings for large-scale docking is to choose the parameters that yield the greatest number of viable compounds after all of the post-docking filters have been applied. This will ensure a large number of hits in the large-scale campaign. Alternatively, one can purchase hits from the different screens and see which setup leads to more true actives in an experimental setting. Even if just a few low-potency hits are obtained at this point, it may suggest which interactions are critical for activity in the pocket. ### Section 9: large-scale docking Timing 1–5 d, depending on number of compounds and compute nodes 1. 98 Create a new directory for the large-scale campaign with the final set of parameters, i.e., the INDOCK file and the dockfiles/ directory. 2. 99 Change the mol2_maximum_cutoff in the INDOCK file to a value likely to eliminate ~90% of the docked molecules to save on disk storage as compounds with total scores worse than a certain value (more positive) are unlikely to be considered hits from the large screen and should not be saved (use the results of the in-stock screen to help inform on the best mol2_maximum_cutoff values to use). 3. 100 Obtain the ZINC20 library. The ZINC20 library contains a prebuilt 3D conformer library of nearly 700 million fully protonated and tautomerized compounds (i.e., protomers). The library is largely built on Enamine’s readily accessible library, which consists of molecules that have a >80% likelihood of synthesis in one or two reaction steps11. This library can be downloaded from https://files.docking.org/3D/ or using the tranche browser as in Fig. 9 and consists of ~60 TB of data. 4. 101 Select properties of the ZINC20 library for the screen such as molecular weight, charge and LogP. We recommend screening compounds separately by MWT range (i.e., fragments ≤ 250 amu) as larger compounds often score better due to their ability to form additional interactions within the binding pocket117. To obtain a split database index (sdi) file for a predownloaded ZINC20 database, use the ‘DOCK Database Index’ download format in the tranche viewer (Fig. 9) and provide a path to the downloaded database in the ‘ZINC DB Root’ field. 5. 102 Split the docking campaign over 1,000 or more jobs. Typically, ~100 million compounds can be screened per day on an academic 1,000 core cluster of recent vintage (as of 2020).$DOCKBASE/docking/setup/setup_zinc15_file_number.py ./ largescaledocking zinc20library.sdi 1000 count

6. 103

Run the screen on a cluster as in Step 38B.

### Section 10: hit picking

Timing Days to weeks

1. 104

Extract the scores of the top compounds as in Step 40 with the maximum score cutoff set to the mol2_maximum_cutoff set in Step 99

python $DOCKBASE/analysis/extract_all_blazing_fast.py ./dirlist extract_all.txt -50 2. 105 Collect between 300,000 and 1,000,000 poses from the best-scoring compounds. python$DOCKBASE/analysis/getposes_blazing_faster.py ./ extract_all.sort.uniq.txt 1000000 poses.mol2 test.mol2.gz

3. 106

Use any number of post-docking filters including but not limited to those described in Table 2.

4. 107

From the compounds that remain after filtering, visually examine up to 5,000 molecules depending on how viable the docked compounds look. We recommend that visual examination is done by more than one person, because of the different knowledge base that each individual brings to the selection process (i.e., a medicinal chemist will prioritize different features over a molecular biologist and vice versa)19. This step should be done before prioritizing compounds for purchase, as we have found human-picked compounds have better efficacy than compounds obtained from fully automated hit-picking12. By using post-docking filters as in Step 106 prior to visual examination, one can search more deeply through the list of top ranked molecules to identify molecules for testing.

5. 108

Choose 50–200 compounds for wet-lab testing depending on price, capability of testing and confidence of success as informed from retrospective calculations.

## Troubleshooting

### Structure preparation

Alternative conformations or missing side chains should be completely modeled before starting the grid generation (blastermaster.py). Otherwise, any superfluous or missing atoms will result in erroneous VDW surface or partial charge calculations (see below).

### Blastermaster

#### Section 1, Step 5

All input and output files as well as options and flags of the blastermaster program are listed in the Blastermaster Guide (see Supplementary Information). Of important note is that blastermaster requires protein and ligand structures to be provided as PDB files called rec.pdb (and working/rec.crg.pdb in case a protonated structure is used) and xtal-lig.pdb. Alternative file names will not be recognized in the default settings.

#### Section 1, Step 10

If blastermaster does not successfully finish the grid generation, log files in the working directory will yield the required information to backtrace the error (see Blastermaster Guide in Supplementary Information). We suggest running blastermaster in the verbose mode (-v) as it allows to easily detect at which step the program failed.

Most common errors are related to unknown atom or residue types in rec.pdb (and/or rec.crg.pdb). Missing parameters for specific atom types can be added to the following parameter files:

$DOCKBASE/proteins/defaults/radii (required for surface calculation by the dms program)$DOCKBASE/proteins/defaults/amb.crg.oxt (partial charges for qnifft) $DOCKBASE/proteins/defaults/vdw.siz (Van der Waals radii for qnifft)$DOCKBASE/proteins/defaults/prot.table.ambcrg.ambH (atom typing for chemgrid) $DOCKBASE/proteins/defaults/vdw.parms.amb.mindock (Van der Waals parameters for minimizer) In our experience, most protein input structures can be converted into complete docking grids with the default parameters provided in DOCKBASE. For some modifications such as capped termini or structural waters, parameters are included in the provided files (e.g., amb.crg.oxt) but may use slightly different atom names compared with default names from modeling programs such as Chimera, PyMol or Maestro. In these cases, we suggest adapting the corresponding atom names in the input protein structure files (rec.pdb, rec.crg.pdb) rather than adding more atom types to the default parameter files. Common naming errors occur for disulfide bonded cysteines (CYX in prot.table.ambcrg.ambH) and water atoms (HOH, TIP, WAT and SPC in prot.table.ambcrg.ambH). The qnifft program may crash if too many atoms (>50,000) are present in rec.crg.pdb. While we do rarely encounter this problem, particularly large proteins or other molecular complexes may have to be reduced in atom number (e.g., removing fusion proteins or distal monomers). Typically, protein segments >20 Å away from the binding pocket are unlikely to influence the resulting docking grids. In certain cases, blastermaster may be able to calculate all grids, even though some parameters might be missing. For example, if nonstandard amino acids are used, the protonation state as well as partial charges or VDW surfaces may not be computed correctly, but will not cause blastermaster to crash. To check if all force field parameters were assigned to the protein structure correctly, we suggest visual inspection of docking grids (Chimera). The scoring grids (vdw.vdw, trim.electrostatics.phi, as well as ligand.desolv.heavy and ligand.desolv.hydrogen) are found in the dockfiles directory. The following scripts can be run in the dockfiles directory to convert the grids into dx files for visualization. Convert VDW grid: python$DOCKBASE/proteins/grid_visualization/create_VDW_DX.py

Input: vdw.vdw, vdw.bump (default)

Output: vdw.dx (default)

Convert electrostatics grid:

python $DOCKBASE/proteins/grid_visualization/create_ES_DX.py trim.electrostatics.phi trim.electrostatics.dx Input: trim.electrostatics.phi Output: trim.electrostatics.dx Convert desolvation grid: python$DOCKBASE/proteins/grid_visualization/create_LigDeSolv_DX.py

Input: ligand.desolv.heavy (default)

Input: ligdesolv.dx (default)

To visualize the grids, first open the rec.crg.pdb file in Chimera. The dx files can be opened with the Volume Viewer application. The vdw.dx file should resemble the surface of the receptor. The ligdesolv.dx file demonstrates various solvation levels of the pocket (the grid itself is a continuum). By changing the level of the representation, the volume should fill the pocket illustrating the ligand moving from a less solvated state to the most solvated state. For the trim.electrostatics.dx file, it is best to set the minimum and maximum levels to −100 and 100, respectively. Then, set the colors to red for −100 and blue for 100. This now shows regions of positive partial charges (blue) and negative partial charges (red). For all grids, ensure that the representations match what is expected for the binding pocket and that there are no regions of missing grid points. If there are obvious holes in any of the grids, in close proximity to the ligand, certain residue types were not identified correctly and corresponding parameters could not be assigned.

#### Section 4, Step 64

If ligands dock in poses far from the binding pocket, visualize the matching spheres to ensure they occupy the same area as the xtal-lig. The matching_spheres.sph file in the dockfiles directory contains the set of matching spheres. To view this file in a protein visualization application, they first must be converted to a pdb file.

csh \$DOCKBASE/proteins/showsphere/doshowsph.csh matching_spheres.sph 1 matching_spheres.pdb

Input: matching_spheres.sph

Output: matching_spheres.pdb

The output file, matching_spheres.pdb, can be visualized with the receptor and ligand in either Chimera or PyMOL. The matching spheres should align to the ligand’s heavy atoms, and additional random spheres should be scattered around the ligand as in Fig. 7a.

If many ligands fail to dock (>50%), a number of issues may be at fault. It is common for 10–20% of ligands to fail to dock in retrospective calculations because of mismatches between the pocket and ligand properties. In the OUTDOCK file, two common statements may indicate these issues: ‘Grids too small’ and ‘Skip size’. The ‘Grids too small’ statement indicates that the ligands do not fit in the binding pocket. The ‘Skip size’ statement is used when the number of atoms in the ligand exceeds the atom_maximum value in the INDOCK file. Adjusting this number to a higher value will ensure that larger ligands will be docked and scored. If the OUTDOCK file shows many compounds getting scored but the poses are not being saved, the mol2_maximum_cutoff value may be too stringent. Relaxing this term will save more poses, but be careful to not go too high as the size of the output files will correspondingly increase and may be an issue for disk storage capacity. Additional INDOCK parameters can be adjusted according to the INDOCK Guide available in the Supplementary Information.

## Anticipated results

At the end of this protocol, a receptor will have been converted into a dock-readable binding pocket, the system optimized on retrospective control calculations, and the system screened prospectively against a large chemical library. While this protocol has been specific to a single target that yielded successful results13, we have equally applied this protocol to a number of targets with general success demonstrating its broad applicability12,83,118. We have attempted to curate all of the relevant steps, though expert intuition brought by each user for their target will result in a different selection of final hits for purchase. For these intuition-based steps, we recommend multiple people review the data before making the final decision regarding which compounds to buy.

If none of the purchased compounds demonstrates activity at the target of interest as determined experimentally, there remain several reasons for failure, including:

• The calculations were performed using the wrong conformation of the binding pocket

• The compound library did not contain enough examples of the class of compound most likely to dock

• Retrospective optimization was not possible due to lack of known chemical matter

• The experimental structure contained ambiguities inside chain or loop placement due to poor electron density or refinement errors

• Incorrect or low-quality homology model

For example, targets that bind dianions or dications will suffer from much smaller library sizes, while targets that bind large protein ligands may be difficult to modulate with small molecules. As in experimental biology, no set of retrospective controls will ever guarantee prospective success, and the many approximations in docking ensure that we still measure success by hit rates.

Nevertheless, these controls and optimization steps can reduce obvious sources of failure, and allow one to better design a subsequent campaign. The experience of the field by now suggests that the approach is likely enough to succeed to be worth the investment, while the novel ligands that result can bring genuinely new biological insight to a field, a longstanding goal of the structure-based enterprise.

### Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.