High-throughput sequencing of DNA G-quadruplex structures in the human genome

Chambers, Vicki S; Marsico, Giovanni; Boutell, Jonathan M; Di Antonio, Marco; Smith, Geoffrey P; Balasubramanian, Shankar

doi:10.1038/nbt.3295

Letter
Published: 20 July 2015

High-throughput sequencing of DNA G-quadruplex structures in the human genome

Vicki S Chambers ORCID: orcid.org/0000-0003-3339-1813¹^na1,
Giovanni Marsico²^na1,
Jonathan M Boutell³,
Marco Di Antonio^1,2,
Geoffrey P Smith³ &
…
Shankar Balasubramanian^1,2,4

Nature Biotechnology volume 33, pages 877–881 (2015)Cite this article

28k Accesses
797 Citations
42 Altmetric
Metrics details

Subjects

Abstract

G-quadruplexes (G4s) are nucleic acid secondary structures that form within guanine-rich DNA or RNA sequences. G4 formation can affect chromatin architecture and gene regulation and has been associated with genomic instability, genetic diseases and cancer progression^1,2,3,4. Here we present a high-resolution sequencing–based method to detect G4s in the human genome. We identified 716,310 distinct G4 structures, 451,646 of which were not predicted by computational methods^5,6,7. These included previously uncharacterized noncanonical long loop and bulged structures^8,9. We observed a high G4 density in functional regions, such as 5′ untranslated regions and splicing sites, as well as in genes previously not predicted to contain these structures (such as BRCA2). G4 formation was significantly associated with oncogenes, tumor suppressors and somatic copy number alterations related to cancer development¹⁰. The G4s identified in this study may therefore represent promising targets for cancer intervention.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: A schematic of the G4-seq method.**

**Figure 2: Analysis of G4-seq for known G4 sequences.**

**Figure 3: Structural analysis of OQs.**

**Figure 4: Genomic distribution of experimentally determined OQs.**

Direct genome-wide identification of G-quadruplex structures by whole-genome resequencing

Article Open access 14 October 2021

Jing Tu, Mengqin Duan, … Zuhong Lu

Single-cell mapping of DNA G-quadruplex structures in human cancer cells

Article Open access 08 December 2021

Winnie W. I. Hui, Angela Simeone, … Shankar Balasubramanian

The regulation and functions of DNA and RNA G-quadruplexes

Article 20 April 2020

Dhaval Varshney, Jochen Spiegel, … Shankar Balasubramanian

Accession codes

Primary accessions

Gene Expression Omnibus

GSE63874

References

Rodriguez, R. & Miller, K.M. Unravelling the genomic targets of small-molecules using high-throughput sequencing. Nat. Rev. Genet. 15, 783–796 (2014).
Article CAS Google Scholar
Wolfe, A.L. et al. RNA G-quadruplexes cause eIF4A-dependent oncogene translation in cancer. Nature 513, 65–70 (2014).
Article CAS Google Scholar
Maizels, N. Genomic stability: FANCJ-dependent G4 DNA repair. Curr. Biol. 18, R613–R614 (2008).
Article CAS Google Scholar
Haeusler, A.R. et al. C9orf72 nucleotide repeat structures initiate molecular cascades of disease. Nature 507, 195–200 (2014).
Article CAS Google Scholar
Huppert, J.L. & Balasubramanian, S. Prevalence of quadruplexes in the human genome. Nucleic Acids Res. 33, 2908–2916 (2005).
Article CAS Google Scholar
Eddy, J. & Maizels, N. Gene function correlates with potential for G4 DNA formation in the human genome. Nucleic Acids Res. 34, 3887–3896 (2006).
Article CAS Google Scholar
Kikin, O., D'Antonio, L. & Bagga, P.S. QGRS Mapper: a web-based server for predicting G-quadruplexes in nucleotide sequences. Nucleic Acids Res. 34, W676–W682 (2006).
Article CAS Google Scholar
Mukundan, V.T. & Phan, A.T. Bulges in G-quadruplexes: broadening the definition of G-quadruplex-forming sequences. J. Am. Chem. Soc. 135, 5017–5028 (2013).
Article CAS Google Scholar
Guédin, A., Gros, J., Alberti, P. & Mergny, J.L. How long is too long? Effects of loop size on G-quadruplex stability. Nucleic Acids Res. 38, 7858–7868 (2010).
Article Google Scholar
Zack, T.I. et al. Pan-cancer patterns of somatic copy number alteration. Nat. Genet. 45, 1134–1140 (2013).
Article CAS Google Scholar
Bochman, M.L., Paeschke, K. & Zakian, V.A. DNA secondary structures: stability and function of G-quadruplex structures. Nat. Rev. Genet. 13, 770–780 (2012).
Article CAS Google Scholar
Cruz, J.A. & Westhof, E. The dynamic landscapes of RNA architecture. Cell 136, 604–609 (2009).
Article CAS Google Scholar
Davis, J.T. G-quartets 40 years later: from 5′-GMP to molecular biology and supramolecular chemistry. Angew. Chem. Int. Ed. Engl. 43, 668–698 (2004).
CAS Google Scholar
Biffi, G., Tannahill, D., McCafferty, J. & Balasubramanian, S. Quantitative visualization of DNA G-quadruplex structures in human cells. Nat. Chem. 5, 182–186 (2013).
Article CAS Google Scholar
Henderson, A. et al. Detection of G-quadruplex DNA in mammalian cells. Nucleic Acids Res. 42, 860–869 (2014).
Article CAS Google Scholar
Biffi, G., Tannahill, D., Miller, J., Howat, W.J. & Balasubramanian, S. Elevated levels of G-quadruplex formation in human stomach and liver cancer tissues. PLoS ONE 9, e102711 (2014).
Article Google Scholar
Weitzmann, M.N., Woodford, K.J. & Usdin, K. The development and use of a DNA polymerase arrest assay for the evaluation of parameters affecting intrastrand tetraplex formation. J. Biol. Chem. 271, 20958–20964 (1996).
Article CAS Google Scholar
Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).
Article CAS Google Scholar
Rodriguez, R. et al. A novel small molecule that alters shelterin integrity and triggers a DNA-damage response at telomeres. J. Am. Chem. Soc. 130, 15758–15759 (2008).
Article CAS Google Scholar
Hud, N.V., Smith, F.W., Anet, F.A.L. & Feigon, J. The selectivity for K⁺ versus Na⁺ in DNA quadruplexes is dominated by relative free energies of hydration: A thermodynamic analysis by H-1 NMR. Biochemistry 35, 15383–15390 (1996).
Article CAS Google Scholar
Ewing, B., Hillier, L., Wendl, M.C. & Green, P. Base-calling of automated sequencer traces using Phred. 1. Accuracy Assessment. Genome Res. 8, 175–185 (1998).
Article CAS Google Scholar
Rodriguez, R. et al. Small-molecule-induced DNA damage identifies alternative DNA structures in human genes. Nat. Chem. Biol. 8, 301–310 (2012).
Article CAS Google Scholar
Fernando, H. et al. A conserved quadruplex motif located in a transcription activation site of the human c-kit oncogene. Biochemistry 45, 7854–7860 (2006).
Article CAS Google Scholar
Rankin, S. et al. Putative DNA quadruplex formation within the human c-kit oncogene. J. Am. Chem. Soc. 127, 10584–10589 (2005).
Article CAS Google Scholar
Marchand, A. et al. Ligand-induced conformational changes with cation ejection upon binding to human telomeric DNA G-quadruplexes. J. Am. Chem. Soc. 137, 750–756 (2015).
Article CAS Google Scholar
De Cian, A., DeLemos, E., Mergny, J.-L., Teulade-Fichou, M.-P. & Monchaud, D. Highly efficient G-quadruplex recognition by bisquinolinium compounds. J. Am. Chem. Soc. 129, 1856–1857 (2007).
Article CAS Google Scholar
Palumbo, S.L., Ebbinghaus, S.W. & Hurley, L.H. Formation of a unique end-to-end stacked pair of G-quadruplexes in the hTERT core promoter with implications for inhibition of telomerase by G-quadruplex-interactive ligands. J. Am. Chem. Soc. 131, 10878–10891 (2009).
Article CAS Google Scholar
Bugaut, A. & Balasubramanian, S. A sequence-independent study of the influence of short loop lengths on the stability and topology of intramolecular DNA G-quadruplexes. Biochemistry 47, 689–697 (2008).
Article CAS Google Scholar
Siddiqui-Jain, A., Grand, C.L., Bearss, D.J. & Hurley, L.H. Direct evidence for a G-quadruplex in a promoter region and its targeting with a small molecule to repress c-MYC transcription. Proc. Natl. Acad. Sci. USA 99, 11593–11598 (2002).
Article CAS Google Scholar
Paeschke, K. et al. Pif1 family helicases suppress genome instability at G-quadruplex motifs. Nature 497, 458–462 (2013).
Article CAS Google Scholar
Cock, P.J., Fields, C.J., Goto, N., Heuer, M.L. & Rice, P.M. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 38, 1767–1771 (2010).
Article CAS Google Scholar

Download references

Acknowledgements

We thank C. Lowe and D. Tannahill for critical reading of the manuscript and D. Beraldi for technical support. We thank P. McCauley (Illumina) who prepared the custom sequencing buffers. We are grateful to the Biotechnology and Biological Sciences Research Council (BBSRC) and Illumina for the studentship supporting V.S.C. (BB/I015477/1). The S.B. research group is supported by program funding from Cancer Research UK and the European Research Council and project funding from BBSRC.

Author information

Vicki S Chambers and Giovanni Marsico: These authors contributed equally to this work.

Authors and Affiliations

Department of Chemistry, University of Cambridge, Cambridge, UK
Vicki S Chambers, Marco Di Antonio & Shankar Balasubramanian
Cancer Research UK, Cambridge Research Institute, Li Ka Shing Centre, Cambridge, UK
Giovanni Marsico, Marco Di Antonio & Shankar Balasubramanian
Illumina Cambridge Ltd., Chesterford Research Park, Little Chesterford, Saffron Walden, UK.,
Jonathan M Boutell & Geoffrey P Smith
School of Clinical Medicine, University of Cambridge, Cambridge, UK
Shankar Balasubramanian

Authors

Vicki S Chambers
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Marsico
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan M Boutell
View author publications
You can also search for this author in PubMed Google Scholar
Marco Di Antonio
View author publications
You can also search for this author in PubMed Google Scholar
Geoffrey P Smith
View author publications
You can also search for this author in PubMed Google Scholar
Shankar Balasubramanian
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

V.S.C. and J.M.B. carried out the experiments. G.M. designed, implemented and performed the analysis. V.S.C., J.M.B., G.M., M.D.A., S.B. and G.P.S. designed the experiments. V.S.C., G.M., M.D.A. and S.B. interpreted the results and co-wrote the manuscript with input from all authors.

Corresponding author

Correspondence to Shankar Balasubramanian.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Overall sequencing quality in sequencing experiments with the cations Li⁺, Na⁺ and K⁺.

Each plot visually shows base calling quality (Phred quality score, Q; y-axes) for the 250 sequenced bases (x-axes), in two independent experiments, with sequencing buffers containing Li⁺ (top), Na⁺ (middle) and K⁺ (bottom), as generated by the program FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Yellow bars and black whiskers are box plots for the respective base positions; red lines are median values; blue lines are mean values.

Supplementary Figure 2 Sequencing quality and sequencing errors (% mismatches) for control sequences.

Bar plots showing: (a) the average Phred quality score (Q) and (b) % mismatches for the 4 control sequences when sequencing with different cations Li⁺ (left), Na⁺ (middle) and K⁺ (right). c-kit, c-myc: positive controls; c-myc-opp, c-myc-mut: negative controls (see Methods). Data is taken from a number of independent sequencing experiments: 3 for Na⁺, 2 for Li⁺ and K⁺. The numbers of different control sequences (i.e. independent sequencing clusters on the flow cell) in the combined experiments are (order; c-kit, c-myc-opp, c-myc, c-myc-mut): 2741, 1139, 1040, 10945 for Li⁺; 8235, 3076, 2787, 26974 for Na⁺; 2935, 1315, 1, 12809 for K⁺. Bars are standard deviations. No error bar present for c-myc in K⁺ (n=1).

Supplementary Figure 3 Sequencing errors (% mismatches) for controls in PDS conditions.

% mismatches for the control sequences in the same sequencing experiment with Na⁺ sequencing buffers during the first read (Read-1; left) followed by the addition of the small-molecule PDS in Na⁺ throughout the second read (Read-2; right). Error bars are SEMs (respectively: 0.16, 0.02, 0.18 and 0.07 for left plot; 0.12, 0.08, 0.15 and 0.09 for right plot). N = 948, 367, 367 and 3990 for c-kit, c-myc-opp, c-myc, c-myc-mut.

Supplementary Figure 4 Heat-map of c-kit control sequence representing the base called at each position along the template in read 1 Na⁺ conditions and read 2 Na⁺ + PDS G4-stabilizing conditions.

Each line represents an independent read. Each column is a position in the template. The different colors represent the different bases which are called: A= green, C= blue, T= red, G= orange. As depicted in the figure below, in Read-1 (top panel) the sequencing polymerase incorporates the expected bases with high fidelity until the end of the template, after which scrambled sequencing is observed as the polymerase no longer has a template to process. In Read-2 (bottom panel), under G4-stabilizing conditions, a similar effect on base-calling is observed from the G4 structure onwards, suggesting that the polymerase stalls at the G4 structure. Ordinarily, DNA fragments displaying such truncations are removed in the data analysis, whereas in our study they are intentionally retained.

Supplementary Figure 5 Phred Quality Score (Q) trend along the c-kit control sequence.

Phred Quality Score (Q) along the c-kit control sequence in: Read-1 Na⁺ conditions (top panel) and Read-2 Na⁺ + PDS G4 stabilizing conditions (bottom panel). A clear drop in Q score is observed in Na⁺ at the truncation site (end of the fragment). A similar effect can be observed at the G4 start site exclusively under G4-stabilizing conditions.

Supplementary Figure 6 Overlap between OQs in K⁺ and Na⁺ + PDS conditions.

a) Venn diagram showing the overlap between the two conditions. The numbers in brackets at the top indicate the total OQs detected in K⁺ (525,890) and Na⁺+PDS (716,310); the percentages on the side (27% and 46%) indicate the OQs detected exclusively by either condition; the percentages in the middle overlapping both circles indicate OQs detected by both conditions, as percentage of total K⁺ OQs (73%) or total PDS OQs (54%). The 27% OQs detected solely in K⁺ displayed an average mismatch percentage of ~18.5% in PDS (panel b); the 46% OQs detected solely in PDS displayed an average mismatch percentage of ~10% in K⁺ (panel c). The average mismatch percentage of randomly selected genomic locations is below 2%. b) Mismatches for OQs detected in K⁺ only (27% subset in the Venn diagram). c) Mismatches for OQs detected in PDS only (46% subset in the Venn diagram). In both b) and c), red lines indicate the thresholds set for the two data sets, 18% for K⁺ and 25% for PDS (Methods).

Supplementary Figure 7 G4-seq analysis obtained using PhenDC3 (1 μM) as a G4-stabilizing ligand.

a) % mismatches for the control sequences in the same sequencing experiment with Na⁺ sequencing buffers for the first read (Read-1; left) followed by the addition of the small-molecule PhenDC3 in Na⁺ throughout the second read (Read-2; right). b) % OQs detected in both K⁺ and PDS G4-Seq experiments (Left), K⁺ and PhenDC3 (Middle), PDS and PhenDC3 (Right). 85% of all OQs detected in PDS were also observed with PhenDC3. c) % PQs detected in both K⁺ and PDS G4-Seq experiments (Left), K⁺ and PhenDC3 (Middle), PDS and PhenDC3 (Right). 93% of all PQs detected in PDS were also observed with PhenDC3. d) % predicted G4s (PQs) among all the detected G4s (OQs) in different G4-Seq experiments. PhenDC3 and PDS show similar capability to detect G4s genome-wide. e) Summary of the comparison between different G4-Seq experiments.

Supplementary Figure 8 Different families of G-quadruplex structures.

Left: canonical PQs predicted by Quadparser (L1-3=N1-7, with N=A|C|T|G). Middle: PQs with longer loops (L1-3=N8-12 or L2=N8-21). Right: PQs with a single bulge B1=H1-7 or multiple bulges B2=H1-5 (H=A|T|C).

Supplementary Figure 9 Detection of OQs representing different G-quadruplex structural families in PDS versus K⁺ conditions.

Fold enrichment (ratio) between the numbers of OQs in PDS over K⁺ for each structural family. Values > 1 indicate higher numbers in PDS. G-quadruplex structural families: Loop 1-3; Loop 4-5; Loop 6-7: OQs with at least one loop of the indicated length; Long loops: OQs with any loop of length 8 to 12 for L1-3 or 8 to 21 for L2; Bulges: OQs with one bulge of 1 to 7 bases (A, T, C) or multiple bulges of 1 base.

Supplementary Figure 10 Comparison of genomic regions in PDS and K⁺ sequencing conditions.

a) Genome browser view of a genomic region within MYC oncogene. Red and orange tracks: % mismatches in reads aligning to the reverse strand (-) for PDS and K⁺, respectively. OQ intervals are shown as red and orange bars below the corresponding peaks.. b) Genome browser view of a genomic region within the MYL5-MFSD7 gene. Black and blue tracks: % mismatches in reads aligning to the forward strand (+) for PDS and K⁺, respectively. OQ intervals are shown as black and blue bars below the corresponding peaks. c) Genome browser view of a genomic region within the MYL9 gene. All colors and features as in a). See Supplementary Table 5 for sequence details. For all panels, OQs not predicted by Quadparser are indicated by * and Quadparser PQs are shown as black bars.

Supplementary Figure 11 Comparison of forward versus reverse strands in PDS sequencing conditions for the MYL9 gene.

A) Genomic region within the MYL9 gene. Red and black tracks: % mismatches in reads aligning to the reverse strand (-) and forward strand (+), respectively. OQs intervals are shown as red and black bars below corresponding peaks. Quadparser PQs are shown below in black. OQs not predicted by Quadparser are indicated by asterisks (*). See Supplementary Table 5 for sequence details.

Supplementary Figure 12 Circular dichroism (CD) spectra of noncanonical G4 sequences identified in the FOSB and MUC20 genes as OQs by G4-seq.

a) a sequence from the FOSB gene promoter with a single bulge in one G-run: GGC GCG GGC GGG GCG CGG G; b) an additional sequence from the FOSB gene promoter with a long loop: GGG GCG GGT GAC GTA AGC AGG GGG GCG GG; c) a sequence from the FOSB gene 5' UTR containing both a long loop and two bulges: GAG GTA CAG CGG CAT CCT GTG GGG GCC TGG G; d) a sequence from the MUC20 gene promoter containing multiple bulges: GGA GTG TTT ACC GGG CAA GGA AGG CAG GG. The CD analysis is consistent with G4 formation for all of these sequences (λ _max ~263 nm: λ _min ~240 nm)^S1.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chambers, V., Marsico, G., Boutell, J. et al. High-throughput sequencing of DNA G-quadruplex structures in the human genome. Nat Biotechnol 33, 877–881 (2015). https://doi.org/10.1038/nbt.3295

Download citation

Received: 12 January 2015
Accepted: 19 June 2015
Published: 20 July 2015
Issue Date: August 2015
DOI: https://doi.org/10.1038/nbt.3295

This article is cited by

TMPRSS2 isoform 1 downregulation by G-quadruplex stabilization induces SARS-CoV-2 replication arrest
- Alessio De Magis
- Philipp Schult
- Katrin Paeschke
BMC Biology (2024)
G-quadruplexes in MTOR and induction of autophagy
- Piyali Majumder
- Chinmayee Shukla
- Bhaskar Datta
Scientific Reports (2024)
In vivo dynamics and regulation of DNA G-quadruplex structures in mammals
- Ze-Hao Zhang
- Sheng Hu Qian
- Zhen-Xia Chen
Cell & Bioscience (2023)
Integrative genomic analyses of promoter G-quadruplexes reveal their selective constraint and association with gene activation
- Guangyue Li
- Gongbo Su
- Guangchao Sui
Communications Biology (2023)
Revisiting mutagenesis at non-B DNA motifs in the human genome
- R. J. McGinty
- S. R. Sunyaev
Nature Structural & Molecular Biology (2023)