Single-molecule imaging reveals replication fork coupled formation of G-quadruplex structures hinders local replication stress signaling

Guanine-rich DNA sequences occur throughout the human genome and can transiently form G-quadruplex (G4) structures that may obstruct DNA replication, leading to genomic instability. Here, we apply multi-color single-molecule localization microscopy (SMLM) coupled with robust data-mining algorithms to quantitatively visualize replication fork (RF)-coupled formation and spatial-association of endogenous G4s. Using this data, we investigate the effects of G4s on replisome dynamics and organization. We show that a small fraction of active replication forks spontaneously form G4s at newly unwound DNA immediately behind the MCM helicase and before nascent DNA synthesis. These G4s locally perturb replisome dynamics and organization by reducing DNA synthesis and limiting the binding of the single-strand DNA-binding protein RPA. We find that the resolution of RF-coupled G4s is mediated by an interplay between RPA and the FANCJ helicase. FANCJ deficiency leads to G4 accumulation, DNA damage at G4-associated replication forks, and silencing of the RPA-mediated replication stress response. Our study provides first-hand evidence of the intrinsic, RF-coupled formation of G4 structures, offering unique mechanistic insights into the interference and regulation of stable G4s at replication forks and their effect on RPA-associated fork signaling and genomic instability.


Supplementary Figure 2. SMLM visualization and quantification of DNA G4 structures and their spatial associations with replisome upon PDS treatments.
(a and d) Representative SMLM images of NT or 1hr, 20μM PDS-treated siCTRL or siFANCJ U2OS nuclei labeled for the indicated targets. Scale bars, 2 µm. (b, e, f) Nuclear densities of G4 (b), EdU (e), or PCNA (f) in NT or PDS-treated siCTRL or siFANCJ cells, calculated with AC analysis. (c) Fraction of PCNA foci that is associated with G4 foci in NT or PDS-treated siCTRL or siFANCJ cells, calculated with DBSCAN/NND analysis. The colocalizations between PCNA and G4 from experiment (EXP) and randomized (RAN) samples are compared to show the significance of colocalization. Noted that the fraction of PCNA associated with G4 is lower in NT siFANCJ cells than in siCTRL cells. We reasoned that this is due to the increased amount of overall abundance of active replication forks in when FANCJ is depleted 1 . The level of active replication was quantified via AC analysis of EdU (e) and of chromatin bound PCNA (f), with each serving as distinct marker of active replications forks. This revealed a ~55-60% increase in the active replication level in NT-siFANCJ cells as compared to siCTRL cells, supporting that extra origin firing results in the relative decrease in G4-replication fraction in siFANCJ cells. (a and b) Representative SMLM images of NT or 4hr, 20μM PDS-treated siCTRL or siFANCJ U2OS labeled for the indicated targets. Scale bars, 2 µm. (c) Local densities of RPA at All-Replisomes in NT or 1hr PDS-treated siCTRL or siFANCJ U2OS cells. We did not observe any significant changes in the levels of RPA recruitment under these conditions in either cell-lines, indicative of rapid cellular response in resolving stable G4s shortly after PDS treatment 2 . Based on this, we projected that longer durations of PDS treatments were needed to reveal such changes. We therefore extended the duration of PDS treatment and found that 4-hr treatment was sufficient to induce an increase in RPA signal at All-Replisomes in siCTRL cells ( TTA GGG TTA GGG  TTA GGG TTT TTT TTT TTT TTT TTT Table 3. DNA Substrates used in this study.

Supplementary Note 1: Validation of the G4 antibodies.
To image DNA G4 structures in SMLM, we tested and further optimized the two previously validated G4-specific antibodies, 1H6 3 and BG4 4 , confirming their activities. Cells that were co-stained with 1H6 and BG4 displayed significant overlaps between the two antibodies (Supplementary Note 1 Fig. 1a), indicating that both antibodies predominantly recognize G4 structures in cells. We further quantified their colocalizations by comparing the magnitude of their colocalizations within the nuclei with that of simulated images portraying uncorrelated relationship between the two antibodies, in which random colocalization is expected, further establishing an overlap in their specificities (Supplementary Note 1 Fig. 1b). Since BG4 was found to target a broader range of G4 structures, including RNA G4s 5 , we have decided to predominantly use 1H6 in our SMLM experiments to lower possible backgrounds in our imaging. The single-molecule specificity of 1H6 was also validated by performing modified single-molecule pulldown assays (SimPull) 6 . These measurements confirmed the selective binding of 1H6 to folded DNA G4 substrates, but not to random ssDNA substrates (Supplementary Note 1 Fig. 1d

Supplementary Note 2: Estimating the fraction of replication sites with G4 signals.
Our multi-color SMLM platform has enabled us to determine the association of DNA G4 structures with the replication machinery. By utilizing quantitative SMLM clustering (DBSCAN) and colocalization (Nearest Neighbor Distance (NND)) approaches 7 are more stable (60 O C < T1/2 < 65 O C ) and therefore can contribute to genomic instability, as shown by the Nicolas lab 11 . While the total number of G4-L1 motifs in the human genome is estimated to be close to 39,000, the more thermodynamically stable G4-L1 motifs (G4-L1T, G4-L1G and G4-L1C) with higher chance of stably forming into a G4 in cell are only ~9% of that population, which is approximately ~3,500 motifs 9 . Considering the genome size to be ~ 6 billion base pairs, with the average size of a replicon to be 100-120kb 12 , each RF is about 50-60kb, and ~100,000 forks in total are needed to replicate the entire genome in one cell. Based on this, we estimate that a total of 3.5% (3,500 G4 motifs/100,000 forks) of forks could 'encounter' a stable G4-L1 motifs throughout the course of the entire S phase. The actual fraction would vary from early to late S-phase with changes in replication pace, distribution and whether the stable G4-L1 motifs actually fold into G4 structures. Thus, the percentage of observed forks with G4 structures is expected to be in the range of 2-3%.
• Estimates based on G4-ChIP-seq measurements: A recent study using G4-ChIP-seq had detected ~1,000 to ~10,000 G4 in different human cell-lines 13 . Using these values, we estimate the probability of observing a G4-RF to be 1-10%.
• Estimates based on live cell studies: In a very recent paper utilizing elegant detection of G4s in live cells, the authors have had estimated a total number of G4s in a single U2OS cell of ~3000 14 . Using this value, the fraction of G4-RF would be ~3%.
While our observations (~2.24%) is consistent with the estimates provided above, we would like to emphasize that our calculations are only mean to provide an approximation rather than an absolute number. In this manuscript we also utilized the probabilistic TC analysis approach (Supplementary Note 3) 15,16 to measure the correlation between G4 and replisome complex (MCM-EdU). We found that in intact cells, the three components significantly associate with each other. This provides, to the best of our knowledge, the first direct quantitative visualization of G4 localizing within the replisome, addressing the previously unanswered question of how G4 can interact with and affect the replisome.

Supplementary Note 3: Principle of Triple-Correlation Function.
Multi-color single-molecule localization microscopy (SMLM) is one of the leading superresolution imaging approaches that enables for visualization of spatial organizations of molecular complexes at nanoscale level inside cells 17,18 . However, as we obtained more detailed information from the highly sensitive SMLM imaging, we also face challenges in accessing the numerous information resolved from SMLM images. This is especially critical for analyzing images with particularly dense populations (such as the highly abundant replication proteins inside a nucleus) that are subjected to heterogeneous distributions, various orientations, and random co-localization incidence. To address these challenges, we utilized an algorithm employing the Triple-Correlation (TC) Function for unbiased pattern recognition and quantification of any three-component-labeled molecular complexes in three-color SMLM images 15,16 .
The TC algorithm samples the triplet complexes configured by all molecules from each of the three labeled species and computes their probability density as a function of their relative displacement. In brief, the TC algorithm samples every possible three-component geometric pattern formed by any of the individual red, green, and blue signals from a nucleus (Supplementary Note 3 Fig. 1a and b). If a triplet pattern, defined by the length of the edges of the triplet, constantly presents within the nucleus, it can be identified by the TC analysis as a TC triplet as its population distinguishes from other stochastic combination of the red, green, and blue molecules (Supplementary Note 3 Fig. 1c), whereas if the distributions of the three colors are not correlated with each other (i.e. random distribution), no TC triplet will be identified 15,16 . By overlaying different TC triplets acquired from multiple nuclei we imaged and analyzed, such that the relative frequencies of the TC triplets from each nucleus are represented by the sizes of the circles, we then obtain a statistical description of the spatial relationship among the three colors (Supplementary Note 3 Fig. 1d and e). We note that when the scale of the pattern is comparable to or smaller than the combined spatial resolution (~40 nm), the pattern tends to present as an equilateral triangles due to the disparity between the center-to-center distance and the average distance among three co-localized clusters that can overlap with each other (see Supplementary Figure 5  complexes. Note that the local densities of RED only increases within the Y-B-R-G complexes but not within the B-R-G complexes. As we submit these simulations for TC computation to quantify the relative local densities of RED, we found a significant increase trend for RED within the Y-R-G triplets (Supplementary Note 3 Fig. 3b) but a mild increase trend for RED within the B-R-G triplets (Supplementary Note 3 Fig. 3c). On the other hand, as we simulate a series of images with increasing RED within the B-R-G complexes but not the Y-B-R-G complexes (Supplementary Note 3 Fig. 3d), we did see a significant increase trend for RED only within the B-R-G triplets (Supplementary Note 3 Fig. 3f), but not within the Y-R-G triplets (Supplementary Note 3 Fig. 3e). We used this strategy in the main text to distinguish the local abundances of RPA and EdU at progressing replisomes with or without G4 associations.
As we couple SMLM imaging with TC analysis, we note that despite extensive sample optimization (detailed below) the antibody labeling efficiency for each targeted molecule, as well as the various photoswitching properties of different fluorophores, may still affect the number of detected molecules of each target with respect to their actual amounts in cells, and consequently the correlation frequency among the three coupled molecules.
To address these issues, we analyzed the triple-correlation of EdU+PCNA (both served as replication fork markers, Supplementary Note 3 Fig. 4a) or RPA (Supplementary Note 3 Fig. 4b) were triple-stained in different colors. We found their resulting triple-correlation magnitude, regardless of the color combination, to be significantly stronger than the correlation magnitude obtained from simulated images with random colocalizations (see method and Fig. S3 and Fig. S4 in Chen, Y. H. et al. 19 for randomization procedure).
These measurements indicate that the TC approach is capable of robust determination of distinct molecular configurations despite potential under-sampling due to different fluorophore photoswitching properties, labeling efficiencies, or the crowded environment of our region of interest (the nucleus). It is important to note that the distinct configurations are well resolved notwithstanding the expected changes in the absolute TC magnitudes obtain for different color combinations. Finally, we note that in order to minimize errors and variabilities that may arise from different blinking and labeling biases we thoroughly test, validate and optimize the labeling conditions for each molecule-of-interest. Once these conditions are established, we maintain these conditions throughout our experiments by using the same antibody dilutions and fluorophore conjugations for the same targets to avoid any comparison of the same target between different antibodies and different colors (Supplementary Table 2).