Atomic-resolution mapping of transcription factor-DNA interactions by femtosecond laser crosslinking and mass spectrometry

Transcription factors (TFs) regulate target genes by specific interactions with DNA sequences. Detecting and understanding these interactions at the molecular level is of fundamental importance in biological and clinical contexts. Crosslinking mass spectrometry is a powerful tool to assist the structure prediction of protein complexes but has been limited to the study of protein-protein and protein-RNA interactions. Here, we present a femtosecond laser-induced crosslinking mass spectrometry (fliX-MS) workflow, which allows the mapping of protein-DNA contacts at single nucleotide and up to single amino acid resolution. Applied to recombinant histone octamers, NF1, and TBP in complex with DNA, our method is highly specific for the mapping of DNA binding domains. Identified crosslinks are in close agreement with previous biochemical data on DNA binding and mostly fit known complex structures. Applying fliX-MS to cells identifies several bona fide crosslinks on DNA binding domains, paving the way for future large scale ex vivo experiments.

T ranscription factors (TFs) are key players in the regulation of gene expression and control a multitude of cellular functions, including differentiation, maintenance of cellular identity, cell homeostasis, as well as highly cell specific functions such as immune response 1 . Due to their pivotal role in cellular signaling, mutations of TFs are often linked to human diseases [2][3][4] .
TFs exert their gene regulatory function through the recognition of specific DNA-binding elements in spatial vicinity of target genes and by the recruitment of coregulators, which may have transcriptional activating or repressing functions. DNA binding is mediated by specific DNA-binding domains (DBDs). Evolution gave rise to various different classes, including zinc finger, HMGbox, leucine zipper, helix-turn-helix, and helix-loop-helix domains 1 . Most DBDs of known and putative TFs are identified and classified by sequence homology to a previously characterized DBD 5 and large-scale studies verified the DNA-binding specificity of several hundred individual domains 6,7 . Nevertheless, for several DNA-binding proteins the DBD is unknown, due to the lack of homology with classical domains. Even for domains that have been proven to bind DNA in a stand-alone context, it is not certain that the domain will have the same functionality in the full-length protein.
The molecular mechanism by which TFs bind to DNA can be elucidated by cocrystallization of protein-DNA complexes, which provides insight into the amino acids that are in closest vicinity to the DNA and therefore most likely involved in DNA binding 8,9 . NMR spectroscopy has been used to gain similar information 10 . Furthermore, the composition and stoichiometry of large protein-DNA complexes can be disentangled using highresolution electron microscopy (EM) 11 . While all those methods allow to study protein-DNA complexes in great detail, for many TFs they are very time consuming or not feasible at all. In addition, especially for crystallization, they reflect a frozen state, which can be different from the dynamic binding behavior of TFs to DNA in solution.
With the advances in mass spectrometry (MS) over the past decade 12 , cross-linking MS (XL-MS) has become a viable complementary method to study the structure of protein complexes. The use of chemical crosslinkers allowed the analysis of stoichiometry and spatial arrangement of proteins organized into large complexes (reviewed in ref. 13 ). More recently, XL-MS has also entered the field of protein-RNA interactions. Here, ultraviolet (UV) irradiation can create "zero-length" cross-links in the native state of a protein-RNA complex, meaning the direct covalent attachment of an amino acid to a nucleotide. Pioneering studies applied UV irradiation and MS analysis to identify RNAbinding proteins on a system-wide scale in yeast and mammalian cells [14][15][16] . Improvements in bioinformatic tools further allowed the localization of RNA-protein cross-links at the level of single amino acids 17 , providing complementary information about RNA-binding domains.
Despite these developments in applying UV XL-MS to study protein interaction with RNA, the technology has not been applicable for protein-DNA interactions so far. This is largely due to the fact that double-stranded oligonucleotides are about an order of magnitude less efficiently cross-linked by UV than single-stranded oligonucleotides 18 . Yet, over the last three decades a small number of studies have shown that the efficiency of protein-DNA cross-linking can be increased by using UV lasers [19][20][21][22][23][24] . For a given total energy, the efficiency of protein-DNA cross-linking was shown to largely depend on the length of the laser pulses. Highest crosslinking efficiency can be reached with an ultrafast femtosecond laser, providing 30 times higher efficiency than a nanosecond laser 20 .
To map protein-DNA interaction in a highly specific manner, we here present a pipeline for femtosecond UV-laser-induced cross-linking combined with high-resolution MS (fliX-MS). Our workflow is capable of mapping protein-DNA interactions of in vitro assembled nucleosomes as well as in vitro and ex vivo TF-DNA interactions. Our method successfully confirms protein-DNA binding sites predicted by structural studies, and provides insights into the extent of flexibility within DBDs.

Results
A fliX-MS pipeline to map protein-DNA interactions. UVlaser cross-linking with ultrafast pulses can cross-link TFs and DNA with high efficiency 20 . Here, we developed a pipeline, which combines that technology with a high-resolution MS methodology in order to map DNA-protein interactions on amino acid level (Fig. 1). To this end, we used a femtosecond fiber laser at 515 nm, and further doubled its wavelength to 258 nm with a beta barium borate (BBO) crystal (Fig. 1a). Its frequency was 0.5 MHz and pulse duration about 500 fs. The laser beam was adjusted to 2.5 mm (e −2 ), in order to match the inner diameter of a 1.5 ml Eppendorf tube containing the sample. Following UV irradiation, we denatured protein-DNA complexes, cut the DNA to mono or short oligonucleotide size using a mix of three different nucleases, and digested proteins to peptides with trypsin and Lys-C (Fig. 1b). We then separated peptides from free DNA with StageTips loaded with C18 material 25 , enriched peptide-DNA cross-links using titanium dioxide (TiO 2 ) coated beads, and analyzed them by high-resolution MS (see "Methods"). Peptide-DNA cross-links were searched in MS data using the RNP(xl) software, which was originally developed for the identification of peptide-RNA cross-links 17 (Fig. 1c). Processing nonirradiated control samples in parallel allowed us to subtract any spectra that were not UV cross-linking specific, massively reducing the search space. To improve detection of true DNA cross-links, we further manually validated and annotated all cross-linked peptide fragmentation spectra, considering y-, b-, and a-ion series, as well as internal fragment ions.
Optimization of cross-linking conditions. To maximize the cross-linking rate and therefore the identification of protein-DNA cross-links, we first optimized the femtosecond UV-laser parameters. UV-dependent DNA cross-linking is a twophoton process and depends on both intensity and pulse length 20 . As the pulse length is determined by the laser setup, we tested different pulse energies, as well as increasing amounts of total energy.
We used a recombinant TF-porcine nuclear factor 1/C (NF1) -and let it bind to a biotinylated oligonucleotide containing its specific DNA-consensus binding site or a mutated version of it (Fig. 2a). As the binding was much stronger for the wild-type binding site, compared with its mutant counterpart, we concluded that the protein-DNA interaction was functional. The minor binding to the mutant consensus site can be explained by the ability of NF1 to bind DNA also in unspecific manner 26 . Next, we UV-irradiated the NF1-DNA complex with a pulse energy of 7 nJ and increasing amounts of total energy followed by western blotting and detection of protein-DNA cross-links using a streptavidin-HRP conjugate (Fig. 2b). There was a direct relationship between total energy and cross-linking yield at the beginning of the curve and only a minor increase of cross-linked species from 350 mJ onward. With higher total energy, we also observed protein-protein cross-links bound to biotin-DNA, reflected in an increasing signal in the higher molecular weight range ( Supplementary Fig. 1a).
To determine the optimal pulse energy, we next irradiated the TF-DNA complex with increasing pulse energies, keeping the total energy at 1 J (Fig. 2c, Supplementary Fig. 1b). Maximum cross-linking efficiency occurred at about 40 nJ pulse energy, whereas it strongly decreased at both lower and higher pulse energies. While the lower cross-linking efficiency with less pulse energy can be explained by a minimum energy requirement for the two-photon processes to take place, the reduction at higher pulse rates is either due to saturation effects or DNA damage. We conclude that a maximal energy of 50 nJ per pulse is sufficient to cross-link protein-DNA complexes, and an increase of pulse energy does not enhance the process.
To investigate whether the formed protein-DNA cross-links reflected functional TF-DNA interactions, we repeated the titration of the total pulse energy with the optimal pulse energy of 50 nJ for NF1 bound to a DNA oligo containing either its wildtype consensus binding sequence or a mutated form of it (Fig. 2d, Supplementary Fig. 1c). Western Blot analysis of the biotin-DNA complex revealed that protein-DNA cross-linking was specific for the wild-type sequence. Notably, this was also the case for the higher molecular weight fraction, indicating that protein-protein cross-linking does not affect DNA-binding specificity, even at a total energy of 1.25 J.
To quantify the cross-link efficiency, we irradiated NF1-DNA complex (pulse energy of 7 nJ and a total energy of 350 mJ) and probed the western blot with an antibody directed against the His-tag of NF1 (Fig. 2e, Supplementary Fig. 1d). We observed a shifted double band at 60-65 kDa, which disappeared when digesting the sample with either DNase I or proteinase K suggesting that the signal is derived from the NF1 bound to single-and double-stranded DNA. Reblotting the stripped membrane with the streptavidin-HRP conjugate recognizing biotinylated DNA confirmed this observation. Quantification of the mono-NF1-DNA cross-links revealed a cross-linking efficiency of 7.5%. Taking into account also the high-molecular weight population and extrapolating from the cross-linking efficiency of mono-NF1-DNA and the intensities of the 65, 130, and 185 kDa bands in the DNA-biotin blot, we estimate a cross-linking efficiency of 14% under these energy conditions ( Supplementary Fig. 1d).
To validate the observations with another TF-DNA complex, we UV-irradiated recombinant TATA-box binding protein (TBP) bound to an oligo containing either the wild-type TATAA sequence or a single point mutant of it (TGTAA), known to decrease TBP binding by 49% 27 (Fig. 2f). As expected, we observed a stronger signal for the TBP-TATAA complex compared with the TBP-TGTAA, which disappeared with either DNase I or proteinase K treatment indicating that fliX-MS works effectively also for TBP. Of note, the difference in the cross-link efficiency for the two sequences was also visible in the highmolecular weight fraction, corresponding to multiple copies of TBP bound to DNA (Supplementary Fig. 1e).  Fig. 1 Schematic workflow of the fliX-MS pipeline. a A pulsed laser beam was generated using a femtosecond fiber laser with 515 nm wavelength, repetition rate of 0.5 MHz, and pulse duration of 500 fs. The wavelength was doubled to 258 nm by second harmonic generation (SHG) over a beta barium borate (BBO) crystal and the laser beam adjusted to fit the inner diameter of a regular 1.5 ml Eppendorf tube. b Protein-DNA complexes were irradiated or left untreated as control. Samples were denatured, DNA digested to mono/short oligonucleotides by a mix of Mnase, DNase I, and Benzonase, and proteins digested to peptides by trypsin and Lys-C. Peptides and peptide-nucleotide cross-links were separated from free DNA on C18 StageTips 25 , and cross-links subsequently enriched with TiO 2 beads. c Peptides were measured by LC-MS/MS and data analyzed with the RNP(xl) software package implemented in the proteome discoverer software 50 followed by manual annotation of candidate spectra. The NF1-DNA complex was separated from free DNA by nondenaturing gel electrophoresis and visualized by SYBR Green staining. b NF1-DNA (5′biotinylated) complex was irradiated with increasing total energy and constant pulse energy of 7 nJ. Samples were separated by denaturing gel electrophoresis, protein-DNA complexes transferred to a nitrocellulose membrane, and biotinylated DNA visualized by probing with an HRP-coupled streptavidin conjugate. Intensities of the cross-linked protein-DNA bands (x-linked species) were quantified and plotted relative to the most intense band at 700 mJ. c NF1-DNA (5′-biotinylated) complex was cross-linked applying increasing pulse energies, and a constant total energy of 1 J. Cross-linked protein-DNA complexes were detected as in b. Band intensities were plotted relative to the most intense band at a pulse energy of 40 nJ. d NF1 bound to a DNA oligo harboring its consensus site or a mutated version of it was irradiated with increasing total energy and constant pulse energy of 50 nJ: crosslinking depended on a functional protein-DNA interaction. e NF1-DNA complex was cross-linked with a pulse energy of 7 nJ and 350 mJ total energy (XL) or left untreated (Ctrl). Cross-linked samples were further optionally treated with DNase I (DN) or proteinase K (PK) and loaded on a SDS-PAGE followed by western blotting. After detection of His-NF1 using an anti-His antibody, the membrane was stripped and reprobed with an HRP-coupled streptavidin conjugate to detect biotin-labeled DNA. The percentage of cross-linked protein-DNA complexes (x-linked species) was calculated as the intensity of the cross-linked band (dashed rectangle) divided by the sum of intensities of all bands observed in the cross-linked sample. f TBP bound to DNA oligos containing either a wild-type (TATAA) or point-mutated (TGTAA) consensus motif were UV irradiated (pulse energy 50 nJ, total energy 1.25 J) and biotin-DNA detected by western blot. Full-scale versions of all blots are depicted in Supplementary Fig. 1.
Protein-DNA cross-linking of recombinant human nucleosomes. We next applied the fliX-MS workflow to in vitro assembled human nucleosomes, as this structure involves a large number of protein-DNA contacts. This identified 12 unique peptide-nucleotide cross-links, located on seven different peptides (Fig. 3a, Supplementary Data 1). The cross-linked peptides had MS1 mass shifts corresponding to one to four nucleotides. Considering the base specific MS2 mass shifts, we were able to unambiguously call the nucleotide that was cross-linked in all of the DNA-modified peptides. Cross-links to nucleotides of pyrimidine bases represented the large majority, with six and four cross-links on thymidine and deoxycytidine, respectively. However, fliX-MS also revealed cross-links to nucleotides with purine bases, with one cross-link to deoxyadenosine and one to deoxyguanosine (Fig. 3b). This imbalance between the different base classes is likely due to their different susceptibility to the twophoton processes 28 . In any case, our results show that ultrashort laser UV pulses are capable to cross-link nucleotides of all four bases.
Cross-link-derived mass shifts in MS2 spectra also allowed the localization of the cross-link within the DNA-modified peptides. In seven cases we could pinpoint the cross-link to a single amino acid and in five other cases, we could narrow down the cross-link localization to stretches of two to six amino acids (Fig. 3a).
Comparing our results with the crystal structure of the human nucleosome 8 , 8 of the 12 cross-links were in close vicinity to the DNA, with side chains of the respective amino acids pointing toward the DNA double helix (Fig. 3c). Yet, for four DNA-crosslinked peptides (DCPs 9-12, Fig. 3a, c), the distance of the closest possible cross-linked amino acid to the DNA was between 16.5 and 22.1 Å and therefore too large to be explained by a direct protein-DNA contact. As nucleosomes are known to undergo structural changes due to transient unwrapping of DNA 29,30 , we hypothesized that the distant cross-links were derived from different conformational states that are not reflected in the crystal structure. In support of this notion, all cross-links that were unexpectedly far away from the DNA in the nucleosome structure, were located on the α3-helix of H2A, which is a b  For cross-links that could be located on one or several amino acids, the location within the peptide is marked in red letters. The crosslinked nucleotide sequence derived from precursor mass differences (A: deoxyadenosine, C: deoxycytidine, G: deoxyguanosine, T: thymidine), charge state, mass-to-charge ratio (m/z), and mass error (Δm) are shown. The cross-linked base is marked in red letters. b Base distribution among cross-links. c Crystal structure of the human recombinant nucleosome (PDB ID: 2CV5 8 ), with cross-linked amino acids marked in red (close to DNA) and orange (distant to DNA). For cross-links with more than one potential cross-linked amino acid, the residue closest to the DNA is marked.
particularly rearranged during partial unwrapping of DNA from the nucleosome 29,30 . We therefore conclude that fliX-MS is able to detect different conformational states of a protein-DNA complex in solution.
fliX-MS applied to the NF1-DNA complex. Next, we enriched peptide-DNA cross-links from the NF1-DNA complex following femtosecond laser irradiation. Subjecting the cross-linked peptides to high-resolution MS, we identified five unique peptides shifted by a mass corresponding to mono-, di-, or trinucleotides in the precursor ions (Fig. 4a). All cross-linked peptides were part of the DBD of the porcine nuclear factor 1/C (amino acids , demonstrating the structural specificity of fliX-MS (Fig. 4b).
In addition, all cross-links were located on peptides between amino acids 83 and 174 indicating a specific binding region in this part of the protein. NF1 and especially its CTF/ NF1-DBD are highly conserved across species ( Supplementary  Fig. 2 [IGVA] Intensity (10 3   ) Intensity (10 3   ) sequence-specific DNA binding, while amino acids 1-78 had only nonspecific DNA-binding affinity 26 . Notably, all our cross-linked peptides located in the region responsible for sequence-specific DNA binding, highlighting the capability of fliX-MS to detect specific protein-DNA contacts (Fig. 4b, c). For all cross-linked peptides, we defined the nucleotides that were cross-linked to the peptides making use of characteristic differences in the precursor mass. In addition, specific product ion mass shifts in the MS/MS spectra allowed us to define the exact bases that formed the cross-links (Fig. 4a, d-g). In addition to three cytosine cross-links and one thymine cross-link, one cross-link occurred to guanine, once more underscoring the potential of fliX-MS to cross-link purine bases.
The DNA contact sites of NF1 are known from DNA modification studies 31,32 . To a large extent, DNA binding is mediated by contact to the TTGG motif in the forward strand, as well as additional nucleotides in the reverse strand, which point in the same direction of the double helix (Fig. 4c). Our cross-link data covered interactions of the TTGG motif with two unique peptides (DCPs 15 and 17). In addition, we identified three cytosine cross-links, two of which were specific for the reverse strand (DCPs 13 and 16). While cytosine interactions have not been investigated previously, our data strongly suggest binding to the cytosines opposite of the TTGG sequence. Taken together, all identified cross-links fit to the defined NF1 consensus motif TTGGC(N)6CC 32 .
In four out of the five DCPs, mass shifts in the MS2 spectra allowed us to locate the interactions to one, two, or three amino acids. For instance, the peptide RIDCLR cross-linked to a thymidine dinucleotide (DCP15), revealed a specific marker ion of the mass of an arginine immonium ion shifted by the mass of thymidine (Fig. 4e). As the presence of a DNA cross-link on the C-terminal arginine is unlikely due to steric interference during trypsin digest, we allocated the cross-link to R117. This residue is in close vicinity to L121/R122, which in a previous mutation study conferred DNA-binding activity of NF1 33 . On the same line, the seven amino acid long DCP13, which did not reveal a specific cross-linked amino acid (Fig. 4d), overlaps with the C88/ R89 mutation site, which also significantly reduced DNA-binding affinity in the previous study.
Analysis of fragment spectra of the other cross-linked NF1 peptides provided additional technical characterization of fliX-MS. Both C104 and C163 were trioxidated to cysteic acid, likely as a result of sample preparation under nonreducing conditions [34][35][36] (Fig. 4f-g). In the MS2 fragmentation, the trioxidized cysteine underwent neutral loss of sulfurous acid H 2 SO 3 ( Fig. 4f-g), as has been reported previously 37 . Yet, in case of 101 APGCVLSNPDQK 112 , we also observed an alternative neutral loss of 34.005 Da, which corresponds to the molecular weight of hydrogen peroxide H 2 O 2 (Fig. 4g). Moreover, we observed multiple fragments with neutral losses of ammonia on the guanine (Fig. 4f) and cytosine base (Fig. 4g). Such neutral losses have been reported previously for the measurement of free guanine, cytosine, and adenine per MS [38][39][40] . Including neutral loss of ammonia in the search for MS2 fragment ions that are characteristic for these base adducts strongly enhanced the capability of localizing DNA modifications on individual amino acids. In case of DCP14, the loss of the mononucleotide indicates a cross-link between the amino group of cytosine and the aspartate side chain, which dissociated during higher-energy collisional dissociation (HCD) fragmentation.
Cross-linking of the TATA-box binding TF TBP. We next applied the fliX-MS workflow to human TBP bound to the adenovirus major late promoter containing a TATA box. MS analysis of the cross-linked protein identified four cross-links on three unique peptides (Fig. 5a). As in the case of NF1, all of the TBP peptides with DNA modifications were exclusively located on the DBD of TBP (Fig. 5b).
The precursor of the peptide 255 IQNMVGSCDVK 265 was shifted by the mass of a TT-HPO 3 dinucleotide. Detailed analysis of the MS2 spectrum narrowed down the cross-link to either N257 or M258 (Supplementary Data 1). In the crystal structure of TBP bound to the Adml promoter 41,42 , N257 is in close contact to the DNA and located between the two thymines and the two adenines of the complementary strand (Fig. 5c). The distance to either of the thymines is very short with 6.1 or 6.3 Å, respectively, thus both thymines are likely to be cross-linked to the contacting aspartic acid.
In addition, we observed an adenine cross-link to one of the amino acids G217-V220 (Fig. 5d). Based on information from the crystal structure, V220 has been mapped to interact with an adenine in the TATA box 9,42 , given an extremely short distance of 3.5 Å (Fig. 5c). Hence, also this cross-link fits to the published structure with high probability. Notably, the same peptide, which contains the V220-A modification, has a second cross-link to a cytosine on A211, which in the crystal structure is located on the fourth strand of the beta sheet (Fig. 5c, d). The closest cytosine is the first nucleotide downstream of the TATAAAA sequence, on the opposite strand, with a distance of 13.4 Å. The coexistence of both cross-links on the same peptide indicates that A211 infers additional DNA binding of TBP, reaching toward a nucleotide adjacent to the TATA box.
The third TBP DCP ( 178 LDLKTIALR 186 ) reflected a cross-link of a cytosine to L178 (Supplementary Fig. 3a). This leucine is located between the four adenine bases and the following guanine stretch downstream of the TATA box. The closest cytosine is the same nucleotide, which we found cross-linked to A211. However, compared with the other TBP cross-links, the distance in the crystal structure to the cytosine is comparably large (17.3 Å, Supplementary Fig. 3b). One explanation to this discrepancy Fig. 4 Mapping protein-DNA interactions in the transcription factor nuclear factor 1/C. a Overview of the identified peptide-nucleotide cross-links. The possible cross-link locations are indicated by red letters in the peptide sequence. Cross-linked (XL)-nucleotide and XL-base information derived from specific MS1 and MS2 mass shifts are specified. b Location of the annotated DNA-binding domain of nuclear factor 1/C (NF1) and location of the detected cross-links (red stars). A represents the unspecific DNA-binding subdomain and B the sequence-specific DNA-binding subdomain according to Dekker et al. 26 . c Location of the cross-links (red stars) on the palindromic consensus DNA-binding sequence of NF1. Blue letters indicate nucleotides, which fit to the NF1 consensus sequence TTGGC(N)6CC 32 . d-g MS2 ion series and spectra of four NF1-DNA cross-links. In the MS2 spectra, nucleotides are annotated in red, amino acids in regular letters. N′ denotes the nucleobase, and N the deoxynucleotide monophosphate (with N being one of the four bases A/T/G/C). The following abbreviations describe neutral losses after MS2 fragmentation: Asterisk: neutral loss of H 2 SO 3 , −CO: neutral loss of carbon monoxide, −A: neutral loss of ammonia, −/+W: neutral loss or adduct of water, −HP: neutral loss of hydrogen peroxide, −p: neutral loss of HPO 3 , −P: neutral loss of H 3 PO 4. In the MS2 ion series, cross-linked fragments are depicted with the cross-linked nucleotide (A/T/G or C) in superscript. M Ox represents oxidated methionine and C Triox trioxidated cysteine (cysteic acid). The prefix IM before the respective amino acid indicates an immonium ion. The superscripted NL represents the neutral loss of sulfurous acid or hydrogen peroxide. All other symbols represent the same neutral losses as in the MS2 spectra. NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-16837-x ARTICLE NATURE COMMUNICATIONS | (2020) 11:3019 | https://doi.org/10.1038/s41467-020-16837-x | www.nature.com/naturecommunications could be a higher flexibility of the TBP-DNA complex in solution, compared with the "frozen" picture of the crystal structure.
An interesting observation in the MS2 spectrum of the 178 LDLKTIALR 186 peptide is that its fragment ions y6, y7, y8, and y9 are exclusively observed with a mass shift of +27.995 Da, corresponding to the addition of carbon monoxide (CO) (Supplementary Fig. 3a). Searching for the source of this adduct, we analyzed all peaks in the lower m/z range and identified a prominent peak at m/z = 89.06 that equaled deoxyribose after loss of CO. Together with a strong marker ion of [deoxycytidine −CO], this provides evidence that the CO adduct is derived from the deoxyribose part of the deoxycytidine, which is additionally cross-linked to the central lysine of the peptide and cut off during HCD fragmentation (Supplementary Fig. 3c). Therefore, we hypothesize that both L178 and K181 were crosslinked to deoxycytidine at the same time and to different parts of the nucleotide.
Ex vivo fliX-MS in mouse embryonic stem cells (ESCs). Having established that fliX-MS is highly specific for cross-linking protein-DNA interactions in in vitro assembled protein-DNA complexes, we next asked whether the method could be also applied to cells. To investigate this, we resuspended mouse ESCs (mESCs) in phosphate-buffered saline (PBS) and subjected them to femtosecond UV-laser radiation. We isolated chromatin from the cross-linked cells, following a DNA biotinylation protocol 43 , and enriched peptide-DNA cross-links as in the standard fliX-MS workflow (Fig. 6a). Comparison with a nonirradiated control allowed the identification of specific peptide-DNA cross-links.
Analyzing the data with the RNP(xl) software identified several high-confidence cross-links on TFs. Among those, we manually annotated and validated six bona fide cross-links (Fig. 6b, d, e,  Supplementary Fig. 4c-e). All cross-links were exclusively present on the DBDs, which once more highlights the specificity of fliX-MS. In addition, fliX-MS was capable to cover different types of protein-DNA interactions, as cross-linked DBDs represented d b 50 100 200  250  300  1  150 TATA-box binding domains  Fig. 5 Cross-linking of the TATA-box binding transcription factor TBP. a Overview of the four identified cross-links on three unique peptides. Blue background indicates information obtained from cross-linking experiments and gray background information obtained from the crystal structure (PDB ID: 1C9B 41 ). Red letters indicate possible cross-linked amino acids or cross-linked nucleotides, respectively. b Schematic view of the domain structure of TBP, the TATA box (gray shading), and surrounding nucleotides, as well as the cross-links. c Crystal structure of TBP bound to an extended Adml promoter (PDB ID: 1C9B 41 ). Location of the cross-link of N257 to one of the two thymidines (green) and of the cross-links of amino acid V220 to deoxyadenosine (orange) and A211 to deoxycytidine (light green). Cross-linked amino acids are depicted in red, the peptide 255 IQNMVGSCDVK 265 in light red, and the peptide 209 TTALIFSSGKMVCTGAK 225 in blue. Dashed lines represent the distance of amino acid to nucleotide and distance is shown above the line. d MS2 ion series and spectra of the cross-linked TTALIFSSGKMVCTGAK peptide. Abbreviations in the MS2 spectra and MS2 ion series: caret: neutral loss of CH 4 SO, Tox: trioxidated cysteine (cysteic acid). Other abbreviations as in Fig. 4.   four different classes, including homeo-prospero, bHLH, ZNF, and SANT/Myb domains. The Prox1/2 peptide 584 HLKKAK 589 was cross-linked to a deoxycytidine monophosphate via K587 and is part of the DNAbinding homeo-prospero domain (Fig. 6b, c). Since this domain is fairly large, we wondered whether the interaction would agree with known structural data. Locating the 584 HLKKAK 589 crosslink in the crystal structure of the highly conserved prospero protein in D. melanogaster (Supplementary Fig. 4a), we observed that the Drosophila counterpart peptide ( 1552 HLRKAK 1557 ) is in an alpha helix in close vicinity to the DNA, where K1557 points toward the deoxycytidine with a distance of 7.8 Å (Fig. 6c). This demonstrates that the ex vivo generated cross-link specifically reflected a TF-DNA binding event.
The peptide KPLLEK was cross-linked to a dithymidine and could be mapped to several different TFs, namely Oct1, Oct2, Oct11, and Hes2, as well as to the mitotic spindle assembly checkpoint protein Mad2l2 (Fig. 6d). Analyzing the proteome of the same murine ES cell line to a depth of >9700 proteins ( Supplementary Fig. 4b) revealed exclusive expression of Oct1 and Oct11 in this dataset, suggesting that the cross-linked peptide is derived from one of the two proteins. In both cases the peptide forms part of the conserved Pou-specific DBD, again underlining the feasibility of fliX-MS to identify functional ex vivo protein-DNA contacts.
The high-confidence cross-linked peptides of Znf541, Smarca1, Zfp91, and Znf354c supported this further (Fig. 6e, Supplementary Fig. 4c-e). As for the other two ex vivo cross-links, our data defined both the exact cross-linked nucleotide, as well as the amino acid position with a precision of maximum two adjacent amino acids. Of technical note, the spectrum of Znf541 contained a rare C1 ion, which can be formed during HCD fragmentation of peptides with an asparagine or glutamine in second position 44 .

Discussion
Although interaction of TFs with DNA is a hallmark of gene transcription, it has remained an understudied area of biology due to several technical limitations: (i) Current methodologies such as chromatin immunoprecipitation followed by nextgeneration sequencing (ChIP-Seq) or proteomics (ChIP-MS) cannot differentiate between direct DNA binding and corecruitment via other DNA-binding proteins. (ii) Direct TF-DNA binding assays depend on the availability of recombinant proteins and do not necessarily reflect DNA binding in living cells. (iii) Cocrystallization or NMR of protein-DNA complexes are highly laborious and not even possible for many TFs. Hence, a tool to directly assign protein-DNA interactions with amino acid and nucleotide resolution would have a strong impact on biological research.
High-intensity femtosecond lasers provide a plethora of applications reaching from ultrafine material processing 45 , highprecision medical surgery 46 , to the detection of biomolecular processes 47 . In the search for effective cross-linking methods of proteins and DNA, we and others have previously shown that femtosecond lasers are promising for this purpose because they provide high cross-linking yields while minimizing DNA damage 20,24,48,49 . With recent advances in XL-MS in the sample preparation, MS instrumentation, and bioinformatics side 17,50 , we here combined this highly effective cross-linking strategy with an optimized purification protocol for cross-linked peptides, and MS-based read out of protein-DNA cross-links. Our method can map protein-DNA interactions both in vitro as well as in cells, making it a powerful tool for many different research topics.
As a proof of principle, we applied our femtosecond laserinduced cross-linking followed by high-resolution MS (fliX-MS) pipeline to in vitro assembled nucleosomes, as well as to recombinant TFs. Notably, we were able to detect cross-links to all four DNA bases. For recombinant TFs, all cross-links mapped exclusively to annotated DBDs, providing confidence for future applications of fliX-MS for the de novo identification of protein-DNA interactions. Although UV cross-linking in addition to DNA-protein cross-links also produced protein-protein cross-links, the observed DNA-protein cross-links strongly depended on a specific DNA-consensus site, suggesting that femtosecond UV-laser irradiation does not interfere with the protein conformation.
One technical limitation of the current fliX-MS workflow is the dependency on enzymatic protein digestion for MS analysis. In case of the nucleosome, many of the annotated amino acid-DNA contacts locate in regions, which are enriched in lysine and arginine residues and the resulting peptides are often too short to be measurable by LC-MS/MS. For instance, histone H2A has seven annotated DNA-binding sites (R30, R33, R36, K37, R43, K75, and R78, Interpro: P04908) in regions where tryptic digestion would produce peptides less than seven amino acids in length, which are difficult to observe by MS analysis. This limitation could be overcome by the use of enzymes with different specificity such as Arg-C or chymotrypsin, or by chemically modifying all lysine residues in the protein complex, which is commonly applied for the analysis of histone posttranslational modifications by MS 51 .
Apart from localizing protein regions, our method revealed detailed structural information of DNA-protein interactions, especially where no crystal structure was available. Despite being one of the first studied DNA-binding proteins 52 , mechanistic information on DNA interaction of nuclear factor 1/C (NF1) has been limited to mutation 33 and truncation 26 studies, as well as DNA-binding analyses in combination with modified bases 31 . Notably, our fliX-MS data on the NF1-DNA complex were in close agreement with the previous biochemical data. All crosslinks were in the subregion of the CTF/NF1 binding domain that was reported to confer sequence-specific DNA-binding activity 26 , while no cross-link was found in the remaining part of the CTF/ NF1 binding domain that mediates only unspecific DNA binding. Furthermore, two cross-linked amino acids were in close vicinity to mutation sites that had been shown to reduce or eliminate DNA binding 33 . Taking advantage of the sequence information provided by the cross-linked di-and trinucleotides, we explicitly localized the cross-links on the NF1 consensus sequence in four out of five cases, confirming the interaction with both DNA strands originally proposed of early NF1-DNA contact site analyses 31 . In addition, we revealed interactions of NF1 with the cytosines on the TTGGA reverse strand, which have not been observed before. Given the detailed information of binding contacts from our experiments, molecular modeling of the NF1-DNA complex might now be feasible. In fact, the CTF/NF1 domain shares structural homology with the structurally resolved SMAD DBD 53 . With the additional information gained by fliX-MS, we envision that the structure of the NF1-DBD in complex with DNA can be finally resolved.
Comparing our data on recombinant nucleosomes and TBP bound to its target DNA with the respective crystal structures showed that the peptide-DNA cross-links were largely in agreement with the intramolecular distances in the electron density maps. However, three cross-links of the nucleosome, and two of TBP revealed distances between amino acid side chains and nucleotides that were too large (>16 Å) to support a direct contact according to the crystal structure. The most likely explanation is that our method is capable to detect different conformational states of protein-DNA complexes in solution, while crystal structures reflect only a single discrete structural conformation. In support, cryo-EM studies on nucleosomes 29,30 revealed a large degree of structural dynamics, based on partial unwrapping of the DNA, also known as DNA breathing. Notably, all distant crosslinks lie on an H2A helix, which was described to be especially susceptible to conformational rearrangements in the nucleosome 29,30 . In case of TBP, the two distant cross-links all pointed to the same nucleotide, namely the first cytosine downstream of the TATA box on the reverse strand. TBP binding to the DNA requires significant DNA deformation, including opening of the minor groove and a reduction of the helical twist 9,54 . To generate the cross-links identified here, the DNA must be able to take up a much stronger deformed conformation than the crystal structure would suggest. Taken together, this demonstrates that fliX-MS is capable to add additional information to crystal structure data, by providing evidence for structural flexibility of certain subregions.
Having established the potential of fliX-MS to accurately map DNA-binding contacts in vitro, we were encouraged to also extend our cross-linking strategy to cells. Despite a potential for optimizing both chromatin enrichment efficiency and MS sensitivity much further, we were able to identify several bona fide examples of TF-DNA cross-links. Reassuringly, these cross-links were all located on DBDs, suggesting that fliX-MS can indeed identify specific protein-DNA interactions in cells. In addition, our method might be also applicable to DNA-pulldown experiments 55 , after laser irradiation of the eluted protein-DNA complexes. This would be especially useful for analysis of selected TFs, which cannot be expressed recombinantly.
In conclusion, we have developed a workflow to map protein-DNA contacts in both in vitro and cellular contexts. Given the scientific importance of such contacts, we believe that fliX-MS will have major impacts in many fields of biology and even clinical research. Current developments on both MS technology and data analysis side may even allow the mapping of global DNA interactomes in near future.
6xHis-tagged recombinant NF1 was cloned into a baculovirus vector, expressed in Sf9 cells, and purified by nickel column chromatography 59 .
Recombinant TBP was purchased from Active Motif (81114).  Fig. 2d  carried out for 90 min at 37°C. Trypsin and Lys-C were added at a ratio of 1:40 (w/ w) compared with the protein amount and incubated for 30 min at 37°C, followed by overnight incubation at 25°C. The next day, formic acid (FA) was added to 0.1% final concentration.
LC-MS/MS analysis. Online chromatography was performed with a Thermo EASY-nLC 1200 UHPLC system (Thermo Fisher Scientific, Bremen, Germany) coupled online to a Q Exactive HF-X mass spectrometer with a nanoelectrospray ion source (Thermo Fisher Scientific). Analytical columns (50 cm long, 75 μm inner diameter) were packed in-house with ReproSil-Pur C18 AQ 1.9 μm reversed phase resin (Dr. Maisch GmbH, Ammerbuch, Germany) in buffer A (0.1% FA). During online analysis the analytical columns were placed in a column heater (Sonation GmbH, Biberach, Germany) regulated to a temperature of 60°C. Peptide mixtures were loaded onto the analytical column in buffer A and separated with a linear gradient of 5-20% buffer B (80% ACN and 0.1% FA) for 50 min, and 20-30% buffer B for 10 min, at a flow rate of 250 nl min −1 . MS data were acquired with a Q Exactive HF-X instrument programmed with a data-dependent top 12 method in positive mode using Tune 2.9 and Xcalibur 4.1. The S-lens RF level was 40.0 and capillary temperature was 250°C. Full scans were acquired at 60,000 resolution with a maximum ion injection time of 20 ms and an AGC target value of 3E6. Selected precursor ions were isolated in a window of 2.0 m/z, fragmented by HCD with normalized collision energies of 30 for in vitro complexes and 35 for samples derived from cell cross-linking), and measured at 15,000 resolution with maximum injection time of 60 ms and AGC target of 1E5 ions. Precursor ions with unassigned or single states were excluded from fragmentation selection and repeated sequencing minimized by a dynamic exclusion window of 20 s. Manual curation of spectra proposed by RNP(xl) was performed as follows: (i) Precursor ions were evaluated for the correct assignment of the charge state and monoisotopic peak. (ii) The corresponding MS2 spectra were evaluated for >40% amino acid coverage combining a, b, and y ions. (iii) If the mass shift on the precursor ion reflected more than one nucleotide, nucleotides were required to be observed as marker ions in the low mass range. (iv) High-intensity fragment ions, which did not represent the unmodified peptide sequence, needed to be explainable by the DNA cross-link. RNP(xl) automatically annotates a, b, and y ions and immonium ions shifted by a nucleotide or nucleobase. In addition, all spectra were further analyzed for shifted and nonshifted internal ions using ProteinProspector (v 5.24.0). Supplementary Data 2 lists the identified a, b, and y ions and massshifted ions with additional information for all spectra.
Analysis of crystal structures and validation of the cross-links in crystal structures was performed using PyMol (the PyMOL Molecular Graphics System, version 1.2r3pre, Schrödinger, LLC).
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.