Chemical profiling of DNA G-quadruplex-interacting proteins in live cells

DNA–protein interactions regulate critical biological processes. Identifying proteins that bind to specific, functional genomic loci is essential to understand the underlying regulatory mechanisms on a molecular level. Here we describe a co-binding-mediated protein profiling (CMPP) strategy to investigate the interactome of DNA G-quadruplexes (G4s) in native chromatin. CMPP involves cell-permeable, functionalized G4-ligand probes that bind endogenous G4s and subsequently crosslink to co-binding G4-interacting proteins in situ. We first showed the robustness of CMPP by proximity labelling of a G4 binding protein in vitro. Employing this approach in live cells, we then identified hundreds of putative G4-interacting proteins from various functional classes. Next, we confirmed a high G4-binding affinity and selectivity for several newly discovered G4 interactors in vitro, and we validated direct G4 interactions for a functionally important candidate in cellular chromatin using an independent approach. Our studies provide a chemical strategy to map protein interactions of specific nucleic acid features in living cells.

I ntricate networks of direct and coordinated interactions between proteins and nucleic acids are of vital importance in the regulation of numerous cellular processes, such as gene expression, DNA replication or DNA repair 1 . Robust methods that can interrogate these interaction networks in a native chromatin context are key to understand the underlying molecular mechanisms 2,3 . Chromatin immunoprecipitation (ChIP) has been coupled with mass spectrometry (MS)-based proteomics analysis to characterize the composition of particular chromatin-associated protein complexes [4][5][6] . However, these approaches require high-affinity and high-selectivity antibodies that typically explore one protein of interest at a time. Alternatively, enzyme-catalysed proximity labelling approaches, such as BioID or APEX, target promiscuous labelling enzymes to specific proteins of a subcellular compartment by genetic fusion, by which they promote the covalent tagging of endogenous neighbouring proteins 3,7 . Despite several successful examples, applicability and spatial resolution can be hindered by relatively slow labelling kinetics, toxicity and the size of the engineered fusion proteins 8 .
In contrast, photoactivation of small-molecule crosslinkers allows for a precise control of the reaction and shorter labelling times to provide relatively low background binding and good spatial and temporal resolution 9 . In affinity-based protein profiling, small molecules are linked to photocrosslinkers that mediate the irreversible binding to cellular protein targets in situ, followed by characterization via quantitative proteomics 10,11 . However, such approaches have so far been used to map direct protein interactors of drugs or small-molecule fragments 12,13 rather than interaction networks. Thus, novel strategies that circumvent these limitations and provide a more holistic view of protein interactions at particular functional genomic sites are highly required.
DNA G-quadruplexes (G4s) are non-canonical, four-stranded nucleic acid structures that comprise stacked G-tetrads within certain G-rich sequences (Fig. 1a) 14,15 . DNA G4s have been shown to exist in human cells [16][17][18] , and their formation is dynamic in live cells 19 . G4 sequencing (G4-seq) identified more than 700,000 sites in human genomic DNA that have the biophysical potential to form G4s (potential G4s) 20 . G4 chromatin immunoprecipitation sequencing (G4 ChIP-seq) 21 found endogenous DNA G4s enriched in open chromatin regions and promoters of highly expressed cancer genes 22 , and these G4s were recently linked to underlying transcription factor programmes in breast cancer 23 . Notably, the formation of endogenous G4s is cell-type specific with only 1% (~10,000 sites) of the in vitro potential G4s 20 being detected in chromatin 21 . Taken together, these data suggest that G4 folding in chromatin is dynamic and that G4 homeostasis and functions may be intricately linked to interacting proteins 24 . A variety of proteins, such as helicases 25,26 , transcription factors [27][28][29] and epigenetic modulators 30 , have been shown to interact with DNA G4s in vitro. However, DNA G4 binding proteins have mostly been explored by affinity enrichment from lysed samples using synthetic G4 oligonucleotides as baits [31][32][33] . Such affinity purification experiments do not account for the native chromatin environment, which is intricately linked to G4 biology 22 .
Here, we report a co-binding-mediated protein profiling (CMPP) approach for the investigation of DNA G4-interacting proteins in living cells. In this strategy, functionalized small-molecule ligands are designed to bind G4 structures in cellular chromatin, which serve as docking sites to bring the probes into close proximity to the G4-interacting proteins and enable labelling by subsequent photocrosslinking (Fig. 1b). We first showed that this concept can be efficiently applied with minimal perturbation of G4-protein interactions by photoproximity crosslinking of a G4-binding antibody in vitro. We then employed this approach in human cells to identify hundreds of putative G4-interacting proteins that comprised diverse functional classes. Next, we characterized the G4 binding properties for a representative set of proteins in vitro and found strong and selective G4 binding interactions for several of the novel candidates. Lastly, we further investigated one of the candidates, the chromatin remodeller SMARCA4, and revealed its recruitment to endogenous promoter G4s in chromatin.
Global profiling of DNA G4-interacting proteins in cells. We next employed our approach to identify G4-interacting proteins in human cells. Embryonic kidney HEK293T cells were treated with probes 1 and 2, and control 3 (20 μM), followed by photocrosslinking at 365 nm. The nuclear extract was conjugated with TAMRA-azide via the copper-catalysed azide-alkyne cycloaddition reaction, separated by SDS-PAGE and visualized by in-gel fluorescence scanning (Fig. 3a) 13 . We observed distinct bands over a range of concentrations for both probes 1 and 2 ( Fig. 3b and Extended Data Fig. 2a,b), which confirmed specific protein labelling as well as a good cell permeability and nuclear uptake, although probe 1 displayed a slightly higher efficiency. In addition, the probes did not show cell toxicity under the treatment conditions employed (Extended Data Fig. 2c). Next, to identify the target proteins captured by G4-ligand probes, we employed a label-free, quantitative liquid chromatography (LC)-MS proteomics approach 4 . After photocrosslinking and extraction of the nuclear lysate, proteins were conjugated to biotin-azide and affinity purified on streptavidin beads, followed by on-bead digestion and quantitative LC-MS/MS analysis (Fig. 3a). Proteins that were detected in at least two out of four biological replicates and appeared significantly enriched over the non-specific probe 3 (fold change (FC) >2, false discovery rate (FDR) <0.05) were considered as candidate G4-interacting proteins. In total, we obtained 248 and 209 enriched protein targets for 1 and 2, respectively, from diverse functional classes (Fig. 3c,d). Interestingly, probe 2 shares ~96% (201 out of 209) of candidates with 1 (Fig. 3e), which suggests the linker length was not critical, in line with our observations for single protein BG4 labelling in vitro. Some of the candidate G4-interacting proteins overlapped with previously reported G4-interacting proteins 41 for both probes 1 (19/79, 24%) and 2 (11/79, 14%), which provides independent corroboration for some of the findings, as well as new candidates, with our method.
Analysis of the annotated biological processes (Methods) revealed that the identified candidates are implicated in various different nuclear processes (Fig. 3f). In particular, we observed a large number of proteins involved in transcription, which is consistent with the emerging role of DNA G4s in transcriptional regulation 24 . Among the enriched proteins from diverse functional classes (Fig. 3g), we identified 19 of previously reported G4 interactors, such as hnRNP A1 42 and nucleolin 32 . Importantly, we identified numerous novel candidate G4 interactors, such as a master epigenetic regulator UHRF1, transcription termination factor TTF2, ATP-dependent RNA helicases (for example, DDX1 and DDX24) and pre-mRNA-splicing factor RBM22, that have been shown to have a direct association with chromatin 43 . Interestingly, we also identified several subunits of the chromatin remodelling complex SWI/SNF (SWItch/sucrose non-fermentable), such as SMARCA4 and SMARCC1, which have only recently been linked to DNA G4s 31,44 .

Characterization of candidate proteins in vitro.
Candidate G4-interacting proteins identified by co-binding-mediated proximity labelling could potentially bind to G4 directly or as part of a protein complex bound to G4 or in close proximity to G4s. To better characterize the binding properties for a selection of candidate proteins, we employed a selection of 3′-biotinylated, well-characterized G4 oligonucleotides that can form different types of G4 structures, which include parallel (Myc, Kit1 and Kit2), antiparallel (TBA) and hybrid (BCL2) G4s (Supplementary Table 3). The corresponding   mutated single-stranded mutant sequences that cannot fold into G4s and dsDNA were used as controls (Extended Data Fig. 3). The oligonucleotides were immobilized on streptavidin beads and used to affinity-enrich target proteins from HEK293T nuclear lysates, followed by western blot analysis. We investigated a selection of candidates identified by CMPP (SMARCA4, UHRF1, RBM22, TTF2, DDX24, DDX1 and HMGB2) that represent a variety of different functional protein classes (Fig. 3c,d). Strikingly, six out seven candidates showed G4-specific binding compared with that of the corresponding controls ( Fig. 4a and Supplementary Table 4). One protein, HMGB2, displayed single-stranded DNA and dsDNA, but no G4 binding (Extended Data Fig. 4a-c), which indicates that HMGB2 may bind to the dsDNA adjacent to G4s or to the single-stranded opposite strand. Intriguingly, all the other six G4 binding proteins displayed selectivity for different G4 topologies. Although SMARCA4, TTF2 and DDX24 each showed a preference for a particular G4 sequence, RBM22, UHRF1 and DDX1 bound equally strongly to all parallel G4s (Myc, Kit1 and Kit2) and well to hybrid-type G4 (BCL2) (Fig. 4a). Importantly, our findings for DDX1 are in line with its reported G4 binding affinity, which validates the approach 45 . Notably, RBM22 showed a particularly high enrichment of relative intensity for G4s (Myc, Kit1, Kit2 and BCL2) compared with that of the 10% lysate control ( Fig. 4a and Supplementary Table 5).
In principle, these affinity-enrichment experiments cannot distinguish direct G4 binders from proteins that are co-precipitated. Therefore, we carried out enzyme-linked immunosorbent assays (ELISAs) to assess the binding affinities for a selection of purified recombinant proteins (SMARCA4, UHRF1, DDX1, DDX24 and RBM22) (Supplementary Table 6). All five candidates displayed selective and high-affinity binding to G4s. SMARCA4 bound G4 Kit1 with K d = 40.6 ± 5.1 nM (Fig. 4b). UHRF1 showed tight binding to G4 Kit1 with K d = 1.2 ± 0.2 nM, which is more than 7-fold lower than that of its known substrate hemi-methylated dsDNA (K d = 8.5 ± 1.1 nM) and 20-fold lower than its unmethylated duplex control (K d = 21.2 ± 3.5 nM) (Fig. 4c). Similarly, DDX1 and DDX24 showed a low nanomolar affinity to G4 Myc (K d = 5.1 ± 1.1 nM) and Kit1 (K d = 58.2 ± 14.1 nM), respectively (Fig. 4d,e). RBM22 selectively bound to both DNA and RNA G4s and a preference for RNA NRAS G4 (K d = 52.1 ± 11.3 nM) was observed ( Fig. 4f and Extended Data Fig. 4d). Consistent with the affinity-enrichment experiments, considerably weaker or negligible binding was observed towards the control oligomers.
The affinity enrichment coupled with western blot analysis and ELISA experiments confirmed that our novel CMPP approach identifies genuine G4-interacting proteins in cells.

SMARCA4 binds at endogenous G4 in chromatin.
Chromatin architecture is tightly linked to the presence of endogenous DNA G4s 22 and may affect the binding of protein interactors. To further validate G4 binding interactions in a chromatin context, we focused on the candidate interactor SMARCA4, which is a part of the SWI/ SNF chromatin remodelling complex that plays a key role in transcriptional regulation 46 . Given that endogenous G4s have recently  been mapped to open chromatin regions and promoters of highly expressed genes 22 , SMARCA4 may be linked to G4 function. We focused on human K562 chronic myelogenous leukaemia cells in which we previously mapped endogenous G4s via G4 ChIP-seq 21,30 . In this cell line, we performed SMARCA4 ChIP-seq and identified 28,265 SMARCA4 high-confidence binding sites from three biological replicates (Extended Data Fig. 5a). Strikingly, we observed that the majority of endogenous G4s (7,565 of 8,995, 84%) overlapped with SMARCA4 binding sites (Fig. 5a,b). Moreover, the SMARCA4 ChIP-seq signal was highly enriched and centred on endogenous G4 sites supportive of a direct SMARCA4-G4 binding interaction in chromatin (Fig. 5c). In contrast, no particular signal enrichment was observed at control sites that have the biophysical potential to form G4 single-stranded human DNA (potential G4s) 20,47 , but do not actually form folded G4 structures in chromatin for this cell line (Fig. 5c). Thus, the data show SMARCA4 binds to folded G4 secondary structures in chromatin, but not to the underlying G-rich dsDNA primary sequence in chromatin.
Investigating SMARCA4 binding sites at different functional genomic regions, we observed the largest proportion of SMARCA4-G4 co-localization at promoters (42% of peaks), which suggests that these interactions may play a particular role in SMARCA4 promoter activity (Fig. 5d) 48 . In addition, although most SMARCA4 binding sites contained A/T-rich motifs (Extended Data Fig. 5b), a dominant G-rich motif was found in binding sites marked by endogenous G4s, which supports a direct binding to G4 structures and indicates an important alternative mode of recruitment to chromatin.

Discussion
Here we present a chemical CMPP approach to identify the cellular interactome of DNA G4 structures in native chromatin. The method employs functionalized, structure-specific small-molecule ligands that bind to G4s and mediate proximity labelling of endogenous G4 binding proteins via photoactivatable diazirine groups. Compared with proteomic approaches carried out in vitro, the in situ capture in cells takes into account the local chromatin environment in a functioning cell and should also facilitate the detection of transient G4-protein interactions that are lost during cell lysis or washing steps 7 .
Using the approach, we identified several hundred G4-associated proteins of which some were known G4-binders and many were not previously described. Several new G4 binding proteins were separately validated by in vitro assays and shown to be specific, high-affinity G4 binders. Given their distinct properties and various functions in biological processes, these proteins may play different key roles in regulation of the endogenous G4 landscape and G4 biology. The protein SMARCA4, which is part of a chromatin remodelling complex, was followed up further using genomic ChIP-seq methodology to demonstrate that SMARCA4 does, indeed, bind substantially to genomic sites in which G4 structures have been detected. This outcome confirms that our CMPP methodology does identify proteins that bind to G4 structures in cellular chromatin, particularly at gene promoters, and also implicates that SMARCA4-G4 interactions may be important for transcriptional control. Further experiments that involve protein knockdown or overexpression coupled with G4 ChIP-seq may ultimately help elucidate the associated mechanisms in more detail.
Although the CMPP probes were employed for relatively short treatment times, we cannot rule out the possibility that the ligands partially influence the endogenous G4 landscape and interactome. In this study and in other work 35 , PDS and G4-interacting proteins have been shown to co-bind to the same G4 structure; however,  the situation can be more complex at high PDS concentrations, in which it has been shown to inhibit the binding of certain proteins to G4s 34,49 . In addition, G4 ligands may induce the stabilization of weaker, more transient G4s or alter the folded topology of G4s in ways that may influence protein binding. For these reasons it is essential to validate candidate G4 interactors with orthogonal approaches in vitro and in untreated cells, as we show in this study. We were mindful of observations that prolonged treatment with G4 ligands can induce DNA damage and recruit associated proteins 16 . Therefore, we limited ligand treatment times and concentrations to avoid potential artefacts and did not observe a particular enrichment of DNA damage-related proteins in our experiments.
In principle, the approach we describe here should be applicable to a wide range of cell types and cell states, which in turn may help reveal specific differences in G4 interactomes and biology. During the revision of this article, we became aware of an independent study that involved a pyrrolidine derivative of PDS 50 and reported the identification of G4-related proteins in human SV589 and MM231 cells 51 . Although we noted some overlap between the studies (61 shared protein candidates), which somewhat validates the independent approaches, most of the G4-associated proteins identified by our CMPP approach were not found in the independent study. The different outcomes may have arisen due to variations in protein expression levels, chromatin states and G4 biology between the different cell lines. There were also some important technical differences between the two studies, which may have contributed to differences in the outcomes. In our study, we fractionated the nuclear proteins to focus on chromatin-associated proteins involved in G4 biology, and also to minimize the masking of physiologically relevant DNA G4 interactors by high-abundance, cytosolic RNA-binding proteins (for example, ribosomal proteins and elongation factors) 52 . In addition, we employed the diazirine crosslinker control 3, which lacks a G4 binding moiety to account for and factor out background binding (Methods), as considerable off-target binding to diazirine photocrosslinkers has been reported previously 37,53 .
Overall, our chemical method shows that it can provide an unbiased strategy for the global mapping of interacting proteins of nucleic acid structural features in live cells. Although this study focused on DNA G4 interactors, we also identified several candidates that are annotated as RNA-binding proteins. PDS can bind both DNA and RNA G4s with comparable affinity 43 and, therefore, some of the identified proteins might, in principle, bind to nuclear RNA G4s. We envisage that future studies with RNA G4-specific probes 49 might employ a similar approach to explore endogenous RNA G4-protein interactions. We also envision that the general principle will enable further studies to map endogenous interactomes of other nucleic acid structural features.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/ s41557-021-00736-9.

Methods
Detailed synthetic procedures and full characterization of photoPDS-1 (1) and photoPDS-2 (2), biophysical assays and more detailed methods as well as general information are described in the Supplementary Information. Cell culture. Human embryonic kidney HEK293T cells (ATCC, CRL-3216) were grown in high-glucose DMEM (l-glutamine and pyruvate plus, GIBCO) supplemented with 10% (v/v) heat-inactivated fetal bovine serum (FBS). Human chronic myelogenous leukaemia K562 cells (ATCC, CCL-243) were cultured in RPMI1640 (Glutamine plus, Life Technologies) supplemented with 10% FBS (Life Technologies). Both cell lines were grown at 37 °C in a 5% CO 2 atmosphere. Cells used in the experiments were passaged at least twice after being thawed. Cells were tested periodically for mycoplasma contamination.
Co-binding-mediated proximity labelling of BG4. G4 Myc (7.3 µM) and the single-stranded mutated oligonucleotides were annealed in 10 mM Tris, pH 7.4, 200 mM KCl and ds Myc in 10 mM Tris, pH 7.4, 200 mM NaCl. The G4-specific antibody BG4 17 (5 µl of 6.6 µM in PBS) was then incubated with 5 µl of annealed oligonucleotides at room temperature by gently shaking for 1 h, followed by adding 5 µl of the indicated probes in 10 mM Tris HCl, pH 7.4, 100 mM KCl and incubated at room temperature for another hour. The solution was directly irradiated under 365 nm light on ice for 10 min, and 1.7 µl of the 'click' mixture (2 μl of 50 mM CuSO 4 in H 2 O, 2 μl of 50 mM TCEP (tris(2-carboxyethyl)phosphine) in H 2 O, 1 μl of 10 mM TAMRA-azide in DMSO and 5 μl of 2 mM TBTA (tris((1-benzyl-1H-1,2,3-triazol-4-yl)methyl)amine) in 1/4 DMSO/t-BuOH) was added and the mixture was gently shaken at room temperature for 1 h. Next, 5.6 µl of LDS loading buffer (4×) was added and the solution was heated at 70 °C for 10 min. Each sample (~22 μl) was loaded and separated by SDS-PAGE (NuPAGE 4 to 12% and Bis-Tris, 1.0 mm), visualized on a Bio-Rad ChemiDoc MP system and the obtained images processed using Image Lab (version 6.1.0) software. Three biological replicates were performed.

Proximity labelling of G4 interactomes in live cells.
The protocol was adapted from that described previously 13 . For gel-based experiments, HEK293T cells were grown in 6 cm dishes to a ~90% confluence at the time of treatment. Cells were carefully washed with 5 ml of Dulbecco's phosphate-buffered saline (DPBS) (GIBCO) and then incubated with the indicated probe-containing fresh FBS-free DMEM media (2.5 ml) at 37 °C for 1 h, followed by direct irradiation under 365 nm light (UVP CL-1000 Ultraviolet Crosslinker, Fisher Scientific) on ice for 10 min. To harvest cells in cold DPBS (3 ml) they were scraped, centrifuged (300g, 5 min, 4 °C) and then washed with cold DPBS twice. Cell pellets were either treated directly or kept frozen at -80 °C until use. For MS-based experiments, a similar protocol as that above was used with minor modifications, which included that HEK293T cells were grown in 15 cm dishes to 80-90% confluence and then treated with 15 cm fresh FBS-free media that contained the indicated probes.
Nuclear protein extraction for gel-and MS-based analysis. The cell pellets for 6 cm and 15 cm dishes were gently resuspended in 250 μl and 2.25 ml, respectively, of Hypotonic Buffer (10 mM HEPES, pH 7.4, 10 mM KCl and 1.5 mM MgCl 2 ) with a protease inhibitor cocktail (PIC) (ThermoFisher, catalogue no. 78438) by pipetting several times and swelled on ice for 15 min. NP-40 (10%, 12.5 and 112.5 μl, respectively) was added and the pellets were vortexed at the highest setting for 10 s, centrifuged (900g, 10 min, 4 °C) to afford the nuclear pellets, which were then washed once with Hypotonic Buffer (250 μl and 1.5 ml, respectively). The isolated nuclear pellets were lysed in 50 and 250 μl, respectively, of high-salt Hypotonic Buffer (10 mM HEPES, pH 7.4, 400 mM NaCl, 10 mM KCl and 1.5 mM MgCl 2 ) that contained PIC, 0.5% NP-40 and 2 mM phenylmethylsulfonyl fluoride, followed by adding 0.25 and 1.25 μl, respectively, of benzonase (Sigma-Aldrich, catalogue no. E1014) and incubating on ice for 30 min with vortexing at 10 min intervals. The lysates were centrifuged (16,000g, 10 min, 4 °C) to give the supernatant that contained nuclear proteome, which was transferred to a clean protein LoBind tube, and the protein concentration was determined by a BCA (bicinchoninic acid) protein assay.
Gel-based analysis of probe-labelled nuclear G4 interactomes. Nuclear proteins (100 μg) were diluted with 50 mM HEPES, pH 7.4, to 80 μl in a clean 1.5 ml microcentrifuge tube. To dissolve the proteins, 10 μl of 4% SDS 50 mM HEPES, pH 7.4, was added, followed by adding 10 μl of a freshly prepared click mixture (2 μl of 50 mM CuSO 4 in H 2 O, 2 μl of 50 mM TCEP in H 2 O, 1 μl of 10 mM TAMRA-azide in DMSO and 5 μl of 2 mM TBTA in 1/4 DMSO/t-BuOH). The mixture was gently shaken at room temperature for 1 h, followed by adding prechilled methanol (400 μl) and keeping it at -20 °C overnight. The precipitated protein pellets were collected by centrifuge (16,000g, 10 min, 4 °C) and washed with prechilled methanol (400 μl). After drying the pellets at room temperature for 5 min, 50 μl of a 1× LDS sample buffer that contained 2.5% v/v 2-mercaptoethonal was added and the solution was heated at 95 °C for 10 min. The sample (20 μ) was loaded per gel lane for SDS-PAGE (NuPAGE 4 to 12% and Bis-Tris, 1.0 mm) analysis, visualized by in-gel fluorescence scanning on a Bio-Rad ChemiDoc MP system. Three biological replicates for each experiment were performed.
Label-free quantitative proteomics data analysis. The label-free experiment consisted of 24 samples distributed in 6 groups, which included the treatments with the G4-ligand probes 1 and 2 and the negative control probe 3. Missing values for 3 are imputed by replacing them with the minimum value, whereas those for 1 and 2 are imputed using the nearest neighbour method after removing peptides missing in more than half of samples in each group. The peptide intensities of the filtered peptides were analysed using the Bioconductor library qPLEXanalyzer 54 . To find differentially expressed proteins, a statistical analysis was carried out using the Bioconductor library limma 55 . Visualization of the results was performed with volcano plots and Venn diagrams using the R libraries ggplot2 (https://cran.r-project.org/web/packages/ggplot2/index.html), ggrepel (https://cran.r-project.org/web/packages/ggrepel/index.html) and VennDiagram (https://cran.r-project.org/web/packages/VennDiagram/index.html). UniprotKB keywords of differentially expressed proteins were extracted using the Retrieve/ ID mapping online functionality 56 . The list of 79 G4-associated proteins in humans was downloaded from G4IPDB 41 (accessed 20th November, 2020). The code is available on the github page dedicated to this study, https://github.com/ sblab-bioinformatics/cmpp G4 affinity enrichment and western analysis. HEK293T cells were grown to ~80% confluence at the time of treatment. Cell pellets were swelled at a density of 10 million cells per 300 µl in a low salt buffer (20 mM HEPES, pH 7.4, 10 mM NaCl, 3 mM MgCl 2 , 0.2 mM EDTA and 1 mM dichlorodiphenyltrichloroethane (DTT)) that contained PIC on ice for 15 min. Then, 15 μl of 10% NP-40 was added and pellets were vortexed for 1 min, centrifuged (900 g, 10 min, 4 °C) to afford the nuclear pellets, which were then washed with low salt buffer. The nuclear pellets were lysed at a density of 30 million cells per 250 µl in high salt buffer (20 mM HEPES, pH 7.4, 500 mM NaCl, 3 mM MgCl 2 , 0.2 mM EDTA, 0.5% NP-40 and 1 mM DTT) that contained PIC by sonicating in a Diagenode Bioruptor Plus (ten cycles, 30 s on and 30 s off at each high setting, 4 °C). The lysates were centrifuged (16,000g, 10 min, 4 °C) to afford the nuclear proteins, and the concentration was measured using the BCA protein assay.
A slurry (50 µl) of Streptavidin MagneSphere paramagnetic beads (Promega, catalogue no. Z5481) was prewashed with pull-down buffer (25 mM HEPES, 10.5 mM NaCl, 110 mM KCl, 1 mM MgCl 2 , 0.01 mM ZnCl 2 , 20% v/v glycerol, 0.1% Igepal C-630, 1 mM DTT and PIC) that contained 3% bovine serum albumin (BSA) and 0.2 g l -1 salmon sperm DNA (Invitrogen, catalogue no. 15632011) three times (2 ml), and then 75 µg of nuclear proteins was added into 500 μl of pull-down buffer that contained 3% BSA and 0.2 g l -1 salmon sperm DNA, and precleared by incubating with the prewashed beads at 4 °C for 2 h. Meanwhile, another 50 µl of beads was washed in the same manner as above. Then, 50 µl of 10 µM annealed biotinylated oligonucleotides (Sigma-Aldrich) was added into 500 µl of pull-down buffer and incubated with the prewashed beads by rotation at room temperature for 30 min. The oligonucleotide immobilized beads were then washed with pull-down buffer (2 m 3×) and incubated with the precleared lysates (500 µl) by rotation at 4 °C overnight. The beads were washed with cold pull-down buffer (500 µl 5×) and the biotinylated oligonucleotides on the beads were eluted in 25 µl  Fig. 5 | Properties of SMArCA4 binding sites. a, Overlap of binding sites identified by SMARCA4 ChIP-seq in K562 chromatin across three biological replicates. Binding sites identified in at least two replicates were considered as high confidence binding sites. b, Binding motifs identified in SMARCA4 binding sites that are marked by or lack and endogenous G4. The top3 motifs identified by EM for Motif Elicitation (MEME) 67 analysis are shown.