Identification of G-quadruplex clusters by high-throughput sequencing of whole-genome amplified products with a G-quadruplex ligand

G-quadruplex (G4) is a DNA secondary structure that has been found to play regulatory roles in the genome. The identification of G4-forming sequences is important to study the specific structure-function relationships of such regions. In the present study, we developed a method for identification of G4 clusters on genomic DNA by high-throughput sequencing of genomic DNA amplified via whole-genome amplification (WGA) in the presence of a G4 ligand. The G4 ligand specifically bound to G4 structures on genomic DNA; thus, DNA polymerase was arrested on the G4 structures stabilised by G4 ligand. We utilised the telomestatin derivative L1H1-7OTD as a G4 ligand and demonstrated that the efficiency of amplification of the G4 cluster regions was lower than that of the non-G4-forming regions. By high-throughput sequencing of the WGA products, 9,651 G4 clusters were identified on human genomic DNA. Among these clusters, 3,766 G4 clusters contained at least one transcriptional start site, suggesting that genes are regulated by G4 clusters rather than by one G4 structure.

In this study, we performed a whole genome amplification (WGA) 24 in the presence of a G4 ligand, followed by high-throughput sequencing of WGA products to identify G4 clusters in human genomic DNA. It has been reported that DNA polymerase was arrested on G4 structures stabilised by G4 ligand 25 . This led us to hypothesise that genomic DNA would be amplified by WGA, except for G4 clusters, in the presence of a G4 ligand. Hence, we could identify G4 clusters by analysing the WGA products using high-throughput sequencing technologies.

Results
Analysis of inhibitory activity of G4 ligand against DNA polymerase extension on G4-forming sequences. The G4 ligand 7OTD is a telomestatin derivative that binds to the top of the G-tetrad structure through π-stacking and electrostatic interactions [26][27][28] . To investigate whether 7OTD inhibits DNA polymerase extension on G4-forming regions, polymerase chain reaction (PCR) was performed on the human genomic DNA in the presence of 7OTD. For amplifying G4-forming regions, PCR primers for c-MYC, c-KIT, BCL2 and VEGFA G4 regions in the human genomic DNA were designed. For amplifying non-G4-forming regions, PCR primers for MBD3L3, CD4, CNDP2, and SOD1 were designed. In the absence of 7OTD, all of these regions were accurately amplified from genomic DNA by PCR (Fig. 1). The G4-forming and non-G4-forming regions were amplified in the presence of less than 100 nM 7OTD. In the presence of 1 µM 7OTD, the non-G4-forming regions were amplified, whereas the G4-forming ones were not. These results demonstrate that 7OTD specifically inhibits DNA polymerase extension on G4-forming regions in PCR.
Measurement of removal efficiency of BODIPY-labelled 7OTD from genomic DNA. PCR is a suitable method to confirm the efficiency of WGA of target regions; however, owing to the interference of 7OTD with PCR in the G4-forming regions, 7OTD should be removed from WGA products before PCR analysis. To evaluate the removal efficiency of 7OTD from genomic DNA, 10 µM BODIPY-labelled 7OTD was incubated with human genomic DNA. Then, LiCl solution (final concentration 4 M) was added to the mixture since lithium ions destabilise G4 structures 29 . The genomic DNA was purified by gel filtration and then the fluorescence intensity of BODIPY was measured to calculate the efficiency of removal of the residual BODIPY-labelled 7OTD from the genomic DNA. The results showed that 4.5% of the fluorescence intensity was detected in the purified samples, indicating that more than 95% of BODIPY-labelled 7OTD was successfully removed from genomic DNA by LiCl and gel filtration.
In the PCR analysis, 1 µM 7OTD inhibited PCR on G4-forming regions, but 100 nM 7OTD did not interfere with the reaction. Therefore, we assumed that 1 µM 7OTD is a suitable concentration for the WGA reaction, since the amount of 7OTD remaining after the purification would be less than 50 nM, which would not subsequently inhibit the PCR. To confirm that the residual 7OTD would not inhibit PCR on G4-forming regions, PCR was performed using genomic DNA purified from a mixture of 1 µM 7OTD and genomic DNA. The results showed that no PCR inhibition for c-MYC, c-KIT, BCL2 and VEGFA G4 regions was detected (Fig. 2).
Whole genome amplification in the presence of 7OTD. WGA based on multiple displacement amplification using Phi29 DNA polymerase was utilised. In this system, the average product length is typically greater than 10-kb. When HeLa genomic DNA was amplified by WGA in the absence of 7OTD, WGA products of around 23-kb were obtained (Fig. 3). On the other hand, when HeLa genomic DNA was amplified by the WGA in the presence of 1 µM 7OTD, WGA products were not detected in 1% agarose gel electrophoresis. The G4-seq analysis revealed that the human genome contains 716,310 G4-forming sequences stabilised by a G4 ligand 23 . These results suggest that Phi29 DNA polymerase would be arrested on numerous G4-forming regions on genomic DNA and its inhibition would reduce the yield of total WGA products.
To analyse whether 7OTD specifically inhibits the extension of DNA polymerase on G4-forming regions in WGA, the WGA products were purified by LiCl and gel filtration and analysed by PCR. After gel filtration of the products amplified by WGA in the absence of 7OTD, the approximately 23-kb WGA product was not detected in 1% agarose electrophoresis since the column is not suitable for long-DNA purification. However,  the G4-forming regions and non-G4-forming regions were amplified from the purified WGA products by PCR; thus, the products amplified by WGA in the presence of 7OTD were analysed by PCR (Fig. 4). Although the non-G4-forming regions were amplified by PCR, only slight PCR amplification of the G4-forming regions was detected. Quantitative analysis of the band intensity revealed that the average WGA efficiencies in the non-G4-forming and G4-forming regions were 50% and 17%, respectively. These results demonstrate that the efficiency of amplification of the G4-forming regions by WGA was lower than that of the non-G4-forming regions in the presence of 7OTD.
High-throughput sequencing of the WGA products for identification of G4 clusters in human genomic DNA. To identify the G4-forming regions on the human genome, high-throughput sequencing of the WGA products was performed on an Illumina HiSeq X10 platform using the paired-end mode (150 bp x2). The WGA products were purified before PCR analysis, as described above. In contrast, the WGA products were directly used as templates for high-throughput sequencing, without any purification because DNA polymerase-based reactions would be inhibited on the G4-forming regions in the presence of 7OTD during the library preparation and sequencing reaction. High-throughput sequencing yielded 337 million reads (35 × depth of coverage) and 311 million reads (32 × depth of coverage) for 7OTD and control libraries, respectively.
First, the mapped reads were counted per 200 bp window, with a sliding size of 200 for the entire genome. Consistent with the PCR analysis of the WGA products, a decrease of the mapped reads in the 7OTD library was detected in c-MYC, c-KIT, BCL2 and VEGFA G4 regions ( Fig. 5 and Fig. S5). In contrast, specific peaks at the G4-forming sequences were not detected in the regions because the decrease of the mapped reads was detected over the whole region. The occurrence of PQS predicted by G4Hunter is 5.02 per 10 kbp in the human genome 17 . In contrast, 35, 15, 26 and 60 PQS were predicted in the 10 kbp region of c-MYC, c-KIT, BCL2 and VEGFA G4, respectively. Clusters of G4-forming sequences that promote transcription and replication-dependent DNA damage induced by a G4 ligand have been identified by ChIP-Seq for the DNA damage marker γH2AX 22 . ChIP-Seq demonstrated that the γH2AX domains were enriched on chromosomes that have high PQS frequencies. Our sequencing results also demonstrated that the mean depth of coverage of the 7OTD library was lower than that of the control library on chromosomes 16, 17, 19, 20 and 22, which have high PQS frequencies (Fig. S6). These results indicated that clusters of G4-forming sequences would be identified by counting the mapped reads over large windows in our sequencing results.
We then counted the mapped reads per 1.0, 2.5, 5.0 and 10 kbp windows with sliding sizes of 1.0, 2.5, 5.0 and 10 kbp, respectively. The correlation coefficients between PQS frequencies and ratios of the reads for the 7OTD library to the reads for the control library in 1.0, 2.5, 5.0 and 10 kbp windows were −0.52, −0.61, −0.65 and −0.66, respectively. Therefore, counted mapped reads per 10 kbp windows were utilised to identify G4 clusters (Supplementary Dataset 1). On the 25 γH2AX-enriched genes, the average PQS numbers was 24 per 10 kbp, the average G4 numbers identified by G4-seq using PDS was 9.8 per 10 kbp and the average ratio of the reads for the 7OTD library to the reads for the control library was 0.292 (Supplementary Dataset 2). In contrast, the average PQS numbers was 2.4 per 10 kbp, the average G4 numbers identified by G4-seq was 1.1 per 10 kbp and the average ratio of the reads was 1.92 on the two γH2AX-negative genes (Supplementary Dataset 3). Therefore, we defined the threshold for identification of G4 clusters as a PQS number is ≥24 per 10 kbp, the number of G4 identified by G4-seq using PDS is ≥10 and the ratio is ≤0.292. By these criteria, we identified 9,651 G4 clusters in the human genome (Supplementary Dataset 4). In the 9,651 G4 clusters, the average ratio of reads, number of PQS and number of G4 identified by G4-seq using PDS was 0.133, 39.8 and 14.9, respectively. G4-seq using PDS identified 716,310 G4-forming sequences, whereas G4-seq using K + identified 525,890 G4-forming sequences in human genomic DNA. This suggests that there are G4-forming sequences for which G4 folding is induced by a G4 ligand. G4 formations may be induced by G4 binding proteins in cells. Therefore, extraction of the ligand-inducible G4 clusters would be important. To extract G4 clusters, we used G4-seq results performed in K + -stabilised condition. The average number of G4 was 6.2 per 10 kbp on the 25 γH2AX-enriched genes. We defined the threshold for identification of the ligand-inducible G4 clusters from 9,651 G4 clusters as ≤6 G4 identified by G4-seq using K + . Using this criterion, we extracted 1,622 ligand-inducible G4 clusters (Supplementary Dataset 5).
Among identified 9,651 G4 clusters, 3,766 G4 clusters (39.0%) contained at least one transcriptional start site (Supplementary Dataset 6). In the entire genome, 25301 windows (8.3%) contain at least one transcriptional start site, indicating that G4 clusters are enriched for transcriptional start sites. It has been reported that ATRX interacts with PQS clusters to regulate gene expression 30 . These results, therefore, suggest that genes are regulated by G4 clusters rather than by one G4 structure.

Discussion
ChIP-Seq for the DNA damage marker γH2AX is useful to identify G4 clusters that fold into G4 structures in vivo; however, the method was applied on only oncogenes and tumour suppressor genes because of the broad coverage of γH2AX signatures 22 . Moreover, G4 structure formation would be affected by the chromatin state in vivo. In contrast, we identified 9,651 G4 clusters in the whole human genome because our method directly detected G4 clusters that were stabilised by a G4 ligand in vitro. We also demonstrated that 3,766 G4 clusters contain at least one transcriptional start site. The phi29 DNA polymerase used in the WGA reaction is a replicative polymerase from the Bacillus subtilis phage phi29 (Φ29), which has strand displacement and processive synthesis properties 31 , meaning that it can imitate the replication process. G4-forming structures that play critical roles in replication have been identified, such as Rif1-binding sequences 32 . Therefore, the G4 clusters identified in this study would be involved in not only transcriptional regulation but also DNA replication. Several structure-specific ligands that have the specificity to bind to different G4 structures have been reported; for example, 4,2-L2H2-6OTD and 5,1-L2H2-6OTD induced an antiparallel topology and a hybrid-type topology of telomeric DNA, respectively 33 . In addition, InEt2 and InPr2 G4 ligands specifically stabilised the parallel topology of c-MYC, c-KIT1 and c-KIT2 G4 structures 34 . The G4 ligand inhibited DNA polymerase extension on c-MYC G4 DNA, whereas no significant amount of stop product of telomeric G4 DNA was observed. Our method may thus contribute to the identification of G4 clusters containing specific G4 structures using topology-specific ligands. Moreover, the binding specificity of a telomestatin derivative against multimeric G4 structures can be improved by multimerization of the G4 ligand 35 . These results suggest that gene-specific G4 cluster ligands may be developed by multimerization of suitable G4 ligands, with designing the linker lengths.
It has been reported that the Bcl-2 G4 structure and quadruplex structure of C9orf72 repeat were stabilised by DNA methylation 36,37 . We reported that the initial elongation efficiency of PCR decreased with increasing DNA methylation levels in VEGFA and RET G4-forming sequences 38 , indicating that the G4 structures are also stabilised by DNA methylation. These reports suggest that our method could be applied to detect epigenetic modification by identifying G4 clusters stabilised by DNA methylation.

Methods
Analysis of inhibitory activity of 7OTD on PCR. Human genomic DNA was purified from HeLa (RBRC-RCB0007, RIKEN) cells using DNeasy blood and tissue kit (Qiagen). As G4-forming regions, c-MYC, c-KIT, BCL2 and VEGFA G4-forming regions were used. As non-G4-forming regions, MBD3L3, CD4, CNDP2, and SOD1 regions were used. All PCR primers (Table S1)  Whole genome amplification in the presence of 7OTD. In the presence or absence of 1 µM 7OTD, WGA was performed using REPLI-g Mini Kit (QIAGEN), in accordance with the manufacturer's protocol. Briefly, 2.5-µL of HeLa genomic DNA (100 ng) was incubated with 2.5-µL of denaturation solution at room temperature for 3 min and then 5-µL of neutralisation buffer was added. Next, 40-µL of master mix containing phi29 DNA polymerase with 7OTD was added and incubated at 30 °C for 16 h. To remove 7OTD, 25 µL of 8 M LiCl was added to 25-µL of the WGA product and then incubated at 95 °C for 5 min. The 7OTD was removed by LiCl and Illustra MicroSpin G-25 Columns (GE), as described above. The purified WGA product (3.2-µL) was used as a template for PCR in a 20-μL reaction volume, as described above.