DNA repair glycosylase hNEIL1 triages damaged bases via competing interaction modes

DNA glycosylases must distinguish the sparse damaged sites from the vast expanse of normal DNA bases. However, our understanding of the nature of nucleobase interrogation is still limited. Here, we show that hNEIL1 (human endonuclease VIII-like 1) captures base lesions via two competing states of interaction: an activated state that commits catalysis and base excision repair, and a quarantine state that temporarily separates and protects the flipped base via auto-inhibition. The relative dominance of the two states depends on key residues of hNEIL1 and chemical properties (e.g. aromaticity and hydrophilicity) of flipped bases. Such a DNA repair mechanism allows hNEIL1 to recognize a broad spectrum of DNA damage while keeps potential gratuitous repair in check. We further reveal the molecular basis of hNEIL1 activity regulation mediated by post-transcriptional modifications and provide an example of how exquisite structural dynamics serves for orchestrated enzyme functions.

3', where Z stands for A, G or C) were purchased from Sangon Biotech without any further purification. All the oligonucleotides were validated by MALDI-TOF-MS.

Generation of Gh-and Sp1-containing oligo DNAs
The Gh-and Sp1-containing oligo DNAs were obtained by oxidation of oligomers containing 8-oxoguanine base referring to a previous study with minor modifications 1 .
Briefly, to generate the Gh-containing oligo DNA products, 5 nmol 8-oxo-dG oligomers were firstly added into ddH2O and incubated at 4 o C for 30 min. Then 90 nmol Na2IrCl6 (SIGMA-ALDRICH) were titrated into the sample in a 500 μl final volume followed by 50 minutes of incubation. The reaction was finally quenched with 500 nmol EDTA (pH 8.0). To obtain the Sp1 oligo DNA products, 5 nmol 8-oxo-dG oligomers were dissolved in 75 mM (iii) 5'-O-(4,4'-Dimethoxytrityl)-2'-deoxy-2'-fluoro-5,6-dihydrouridine Compound 2 (500 mg, 2.0 mmol) was dissolved in pyridine (10 mL), and 4,4'dimethoxytrityl chloride (813 mg, 2.4 mmol) was added to the solution. After being stirred overnight at room temperature under a nitrogen atmosphere, the reaction mixture was quenched with MeOH (0.5 mL) and stirred for an additional 10 min 3 . The solution was diluted with EtOAc (150 mL), then washed with water (2×60 mL) and brine (60 mL). Organic phases were dried over anhydrous Na2SO4 and concentrated under vacuum. The residue was purified by silica gel chromatography with EtOAc/Petroleum Ether (1:1 to 5:  we also assigned an index for each model. The models directly built from crystal structures have a "crystal" tag for the origin; whereas those containing mutations were tagged by "mutated" from the corresponding crystal-structure model. We listed all the models and their corresponding indices and origins in Supplementary Table 2. We will refer to these models by their indices hereafter.
Furthermore, in order to build reliable initial structures for our purposes, we adopted multi-references to construct the recognition loop. Particularly, if the electron density of the recognition loop in a crystal structure is poor, we will first confirm its loop conformation to be either apo or 242-in or 244-in, using the original crystal structure. We then preserve the well-resolved regions of the loop and the other part including the active pocket, and consult the high-resolution crystal structure which has the same loop conformation for reference to make up the less-resolved or missing parts. Note that although some regions of the flexible loop may suffer low resolution, the positions of the active site, including the R/K242, Y244 and E6 etc., are mostly well-defined in the crystal structures.
It should be noted that the mutated residues which deactivated the catalysis pocket introduced in crystallization experiments were recovered to wild types during the above modeling procedure. The heavy atoms including water molecules with high density in crystal structure were preserved, while those of ions and other agents (e.g., glycerol, Tris) were removed. The hydrogen atoms in the protein and DNA were automatically recovered by tLEAP module (AMBER tools) according to AMBER-ff99SB 4 . Counter ions were added to neutralize the overall system. We then added extra water molecules to the system, ensuring each atom of the protein-DNA complex (and ions) were encompassed by water shell at least 10 Å thick. TIP3P water model 5 was adopted to describe the explicit solvation.
Each all-atom model contains about 50,000 atoms. The force-field parameters for the unnatural bases involved in the models (i.e., Tg and DHU) were taken from or prepared as in reference 6 .
It is worth mentioning how we treated the protonation states of residues in the hNEIL1 protein. Since the residue 242 (i.e., R242 or K242) in the recognition loop may be subjected to tautomerization in the 242-in loop conformation, one may concern whether its protonation state is different from titratable conditions. We examined its environment and found that K/R242 is readily accessible to solvent water molecules in both crystal structures ( Fig. 4h) and atomistic simulations (Fig. 4g). Besides, we also observed a salt bridge can be formed between K/R242 with an adjacent Glu6 residue in the active pocket. Therefore, we assumed that K/R242 residue is protonated or charged in the 242-in loop conformation.
The H++ server (3.2 version; http://biophysics.cs.vt.edu/H++) 7 also verified our assumption of a charged K/R242 under a neutral pH condition. The protonation states of other residues were also set as suggested by the H++ server.

Force-field relaxation of all-atom models of hNEIL1-DNA complexes
Unless specified otherwise, in all the simulations, we treated the simulation box with periodic boundary conditions, and set a 10 Å cutoff for Ewald summation accounting for the electrostatic interactions. All the simulation models were treated by the according force fields.
We first minimized the energy of the initial structures using 5000 steps of steepest gradient descent followed by 5000 steps of conjugate gradient descent. After energy minimization, each of the simulation systems was heated up to 300 K under 1 atm, during which a positional restraint (10.0 kcal•mol•Å -2 ) over protein and DNA was applied. SHAKE algorithm 8 was adopted to constrain all motions involving hydrogens, and a uniform 2-fs step was used for all force-field MD simulations. The heating process lasted for 0.5 ns, followed by 1 ns equilibration stage, during which the positional restraint was gradually tuned off (reducing 1.0 kcal•mol•Å -2 every 100 ps). The final structures reached at the end of the equilibration was then used for initializing all the following simulation studies.

Calculation of reaction free energy profile for 244-in pathway of base excision (i) Selection of QM-treated region
According to references 6,9 , in the ribose-protonated reaction pathway, the reactive sites involving significant electronic structure changes (i.e., bond forming and/or breaking) include Pro2 (the Pro2 is neutral at the N-terminal 6  (ii) Umbrella sampling We followed the hypothetic reaction pathway as in Fig. 2d in the main text. The reaction coordinates were chosen to account for the bond-forming and bond-breaking in each step, which are shown with details in Supplementary Fig.6.
QM/MM steered MD was first performed to generate a series of structures along each reaction coordinate. The semi-empirical quantum method SCC-DFTB 10,11 with proper dispersion terms was adopted to quantify the potential of the QM-treated region as defined in the above section, provided that it generally ensures satisfying accuracy, performs reliably in polar systems, and is computationally extremely efficient compared to the firstprinciple methods 12 . For every reaction coordinate, the steered MD lasted for no less than 2000 fs (with 1-fs integration time step), which ensured at least 1500 fs for a change of 1 Å along the reaction coordinate so as to relax the environment, and the steering force ranged from 80 to 200 kcal•mol•Å -2 . The force constant used in steered MD was used as reference for the following umbrella sampling.
We computed the free energy profile along the 244-in reaction pathway via umbrella sampling 13 . The umbrella sampling was done using SCC-DFTB/AMBER QM/MM simulations where the QM-treated region was identical to the steered MD simulations.
SCC-DFTB was chosen so that the result can be fairly compared with related studies 6 . The harmonic restraint (used to keep the configuration locally fluctuated) was in line with the parameters in the steered MD. Specifically, the force constant of the harmonic potential for Step I and Step III are 200, 80, and 160 kcal•mol•Å -2 , respectively. We chose these values because the corresponding forces were needed to overcome the reaction barrier(s) during the steered MD. The window was 0.1 Å in width along Step I and Step III, and 0.05 Å for Step II which merely involves proton transfers. The time step was 1 fs, and simulations were executed under NPT ensemble (300 K, 1 atm). For each window, up to 100 ps QM/MM simulation was performed, and the last 80-ps trajectory was collected for data analysis. The weighted histogram analysis method (WHAM) 14 was applied to reconstruct the unbiased free energy profiles for each individual reaction step ( Supplementary Fig. 8 where the maxima (transition states) and minima (intermediate and stable states) along each reaction coordinate were extracted and aligned to plot the illustrative overall free-energy profile throughout the entire reaction (Fig. 2d in the main text).

Structural fluctuations of the catalytic pocket and the recognition loop
We performed several force-field MD simulations to study the conformational or structural fluctuations of the hNEIL1-DNA complexes.
(i) Distance between Pro2 and the flipped base Since the nucleophilic attacking from the nitrogen atom on Pro2 to the C1' atom of the flipped nucleotide is the key step that initializes the enzymatic reaction ( Supplementary   Fig.6), the distance between these two atoms are thus indicative of the ease of the reaction initialization. Given that the relative positions between the flipped nucleotide and the Pro2 vary in different crystal structures and depend on the loop conformation, we launched force-field simulations using M1 and M7 models (Supplementary Table 2), respectively, to investigate how the loop conformation changes the distance between the two reactive sites.
The production MD simulation ran 20 ns for each model, based on which we plotted the distribution of the distance between the N atom of Pro2 and C1' atom of the flipped ribose ( Supplementary Fig. 7).

(ii) Configurations of residue 249
We simulated the configurations of the residue 249 in different loop conformers. In total 3 simulations were conducted, using M3, M9 and M13 (Supplementary Table 2 Table 2). M3, M5 and M7 share a common residue 244, namely, a wild-type tyrosine, but differs with each other in the base type (DHU in M3, dT in M5 and Tg in M7). On the other hand, M14 contains a Y244H mutation (we introduced a neutral Histidine in 244, annotated as "HIE" in AMBER by convention) and the flipped base is DHU. We considered a neutral Y244H mutation hereby because a neutral histidine is supposed to lend itself better for hydrophobic interactions with the base, i.e., pi-stacking, compared to a highly hydrophilic charged counterpart. Compared to HID, HIE was the more commonly adopted neutral form of histidine in literature, and was recommended as default by the H++ server and the force field. The simulation lasted for 20 ns for each system, and the collected data were used to compute the statistics of the stacking interactions. Specifically, we characterized the strengths of the stacking interaction by two quantities, the overlapping area and the distance between the sidechain of the aromatic sidechain of residue 244 and the base (Supplementary Fig. 11d and Supplementary Fig. 12).
The overlapping area was calculated as follows. We first defined two planes spanned Given the defined geometric plane for the base, we computed the distance from the heavy atoms of the aromatic ring of residue 244 to this plane. We averaged all the distances to give the averaged distance (i.e., the distance between aromatic ring of residue 244 and the base) reported in Supplementary Fig. 11d and Supplementary Fig. 12a.  Table 2) with the same simulation settings as above.
As reported in Fig. 4g in the main text, we defined the distance from the base to a water molecule as the shortest one among all the pairwise distances between the water oxygen and any heavy atom on the base. We then counted the number of water molecules, water , falling short of a given cutoff as the solvated water number in Fig. 4g 4a in the main text) as the collective variables (CVs). Based on these CVs, diffusion map 15 was exploited to determine the "frontier configuration" 16 . After every 100-ps MD simulation, we selected a frontier configuration and re-started the MD simulation thereof.
Second, we guided the simulation by a target potential (which is a function of the CVs).
Similar to targeted MD, the target potential takes a harmonic form which has the equilibrium position as a target value of the CVs. We used the backbone torsions in the crystal structure as the target values for the CVs provided that little conformational deviations of the recognition loop residues were observed during relaxations of the atomistic models.
We imposed a dynamic schedule over the stiffness (or the force constant) of the harmonic potential: In each 100-ps long MD simulation, the stiffness was gradually tuned up during a warm-up stage in the first 50-ps simulation, then decayed to zero during the remaining 50-ps simulation. Two crystal structures were employed to generate the reference target values: one has a 242-in loop conformation (crystal structure 6LWG, Supplementary Table   2) and the other has an apo loop conformation (crystal structure 5ITQ, Supplementary Table 2), and we launched two independent adaptive sampling MD targeted to these two references. The adaptive sampling was terminated once the final configuration exhibited a CV value agreeing with the target.
The adaptive MD helped explore the loop conformational space by interpolating between different crystal structures. However, since the adaptive MD simulations were offequilibrium, the generated samples cannot represent the equilibrium distribution (or free energy). Indeed, the structures generated along the adaptive MD were mostly perturbed conformations. As in Markov state models (MSM) 17 and information distilling of metastability (IDM) 18 , we relaxed the yielded configurations by shooting short NPT MD simulations (each lasted for 200 ps) thereof to investigate the metastability of the conformations. Finally, we collected these relaxed samples and projected them to the 2D space spanned by the base-Y244 distance and base-R242 distance, and we fitted the density of the projected 2D space via a Gaussian mixture model (implemented by scikitlearn python library 19 where we evoked a non-isotropic non-diagonal covariate Gaussian mixture model) and reported the result in Fig. 3a of the main text.

(ii) Free energy calculation of the loop conformational transition
The relaxed adaptive sampling and metastability analysis can help identify metastable conformations, however, they cannot give the relative stability of different conformers due to their off-equilibrium nature. In order to quantify the free energy difference between different metastable states, we performed force-field umbrella sampling. Firstly, we selected two metastable conformations which we aimed to compute the relative stability, and obtained a 1-dimensional reaction coordinate through linear discriminant analysis (LDA) 20 . The LDA linearly transforms a set of candidate order parameters (we used the base-Y244 distance, base-R242 distance, Y244-F85 distance and R242-F85 distance as the order parameters) into a 1D combined reaction coordinate. We then divided the resulting 1D reaction coordinate into 90 to 100 uniformly spaced bins (each bin corresponds to a "window" for umbrella sampling), and selected the configurations (yielded by the adaptive sampling) which fall into each bin as the initial configuration for the umbrella sampling of the corresponding window ( Supplementary Fig. 9).
In umbrella sampling, we applied harmonic potentials over the reaction coordinate with a force constant of 8000 kJ/nm 2 to ensure good overlap of sampling between adjacent windows. The restraints were implemented via PLUMED plugin 21 for AMBER. For each window, 6 ns MD simulations with the corresponding harmonic restraint force were conducted to generate samples where the last 5-ns simulation samples were collected for analysis. WHAM was implemented to recover the free energy profile along the reaction coordinate using the generated samples from the 100 windows. The uncertainties of the resulting free energy profiles were assessed by a Monte Carlo bootstrapping procedure 22 .
For DHU as the damaged base (i.e., M3 in Supplementary Table 2), we computed two free energy profiles in total: One for apo to 244-in loop transition, i.e., E→Q (Supplementary Fig. 9 and Supplementary Fig. 10), the other for 244-in to 242-in loop transition, i.e., conf (Supplementary Fig. 9 and Supplementary Table 4). We also computed conf for dT and Tg as the flipped base ( Supplementary Fig. 9) following the same umbrella sampling protocol and summarized the results in Supplementary Table 4.
The initial structures for these two systems were the same as the above obtained DHU models except that DHU was mutated to the corresponding new base (dT or Tg). Note that in all the above-mentioned umbrella sampling and free energy calculations, the residue 242 in the model was exclusively arginine regardless of the identity of the damaged base. Table 4, throughout this paper we assumed that replacement of R242 by K242 would not change conf . (iii) Quantification of the free-energy difference between Q state and A state.

As reported in Supplementary
The umbrella sampling gives the conf but not chem of the chemical check. In order to compute the relative stability between Q and A state, we also need to account for the tautomerization equilibrium of the active pocket (Fig. 4d). Since different tautomerization states mainly differ with each other in energy while the entropy difference can be neglected, we approximated chem with taut (the energy difference between different tautomers). We conducted QM/MM to minimize the tautomer of the active pocket and record the energy of the final configuration. The QM-treated region included the sidechain of residue 242 and Glu6 as well as the flipped base (the ribose is excluded). We adopted the same QM method as the QM/MM umbrella sampling of the catalytic reaction pathway (Section 3 above).
chem was approximated as the energy difference between the canonical form of the active pocket and the most energetically favorable tautomer.
Specifically, we found that deprotonation of R242 sidechain was energetically prohibited, (1) Step I: Rh/C (5%), H2; (2) Step II: DMTrCl, pyridine; (3) Step III: amidite reagent, DIPEA, CH2Cl2. The nitrogen atom on Pro2 to the C1' atom of the flipped nucleotide is the key step that initializes the enzymatic reaction, and hence the distance between these two atoms is indicative of the ease of the reaction initialization. The force-field simulations were launched using M1 and M7 models (Supplementary Table 2 a, Potential of mean force (PMF) for "Step I" reaction described in Supplementary Fig. 6.

Supplementary
b, Potential of mean force (PMF) for the "Step II" reaction described in Supplementary Fig.   6. c, Potential of mean force (PMF) for "Step III" reaction described in Supplementary Fig.   6. All the landmarks, including Q-1, Q-TS-1, Q-2, Q-TS-2, Q3 and Q-TS-3, are indicated by the red arrows and shown in Fig. 2d. Note that we did not further the computation of the product release step. In Fig. 2d, we assumed the final released state of 244-in pathway The editing level of hNEIL1 transcript shows a varied distribution within different tissues.
The bioinformatics analysis is based on the existing RNA-seq data (sequence read archive: SRP039090).

Supplementary Tables
Supplementary Table 1. X-ray data collection and refinement statistics.