Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

High-throughput sequencing of DNA G-quadruplex structures in the human genome

Abstract

G-quadruplexes (G4s) are nucleic acid secondary structures that form within guanine-rich DNA or RNA sequences. G4 formation can affect chromatin architecture and gene regulation and has been associated with genomic instability, genetic diseases and cancer progression1,2,3,4. Here we present a high-resolution sequencing–based method to detect G4s in the human genome. We identified 716,310 distinct G4 structures, 451,646 of which were not predicted by computational methods5,6,7. These included previously uncharacterized noncanonical long loop and bulged structures8,9. We observed a high G4 density in functional regions, such as 5′ untranslated regions and splicing sites, as well as in genes previously not predicted to contain these structures (such as BRCA2). G4 formation was significantly associated with oncogenes, tumor suppressors and somatic copy number alterations related to cancer development10. The G4s identified in this study may therefore represent promising targets for cancer intervention.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: A schematic of the G4-seq method.
Figure 2: Analysis of G4-seq for known G4 sequences.
Figure 3: Structural analysis of OQs.
Figure 4: Genomic distribution of experimentally determined OQs.

Accession codes

Primary accessions

Gene Expression Omnibus

References

  1. 1

    Rodriguez, R. & Miller, K.M. Unravelling the genomic targets of small-molecules using high-throughput sequencing. Nat. Rev. Genet. 15, 783–796 (2014).

    CAS  Article  Google Scholar 

  2. 2

    Wolfe, A.L. et al. RNA G-quadruplexes cause eIF4A-dependent oncogene translation in cancer. Nature 513, 65–70 (2014).

    CAS  Article  Google Scholar 

  3. 3

    Maizels, N. Genomic stability: FANCJ-dependent G4 DNA repair. Curr. Biol. 18, R613–R614 (2008).

    CAS  Article  Google Scholar 

  4. 4

    Haeusler, A.R. et al. C9orf72 nucleotide repeat structures initiate molecular cascades of disease. Nature 507, 195–200 (2014).

    CAS  Article  Google Scholar 

  5. 5

    Huppert, J.L. & Balasubramanian, S. Prevalence of quadruplexes in the human genome. Nucleic Acids Res. 33, 2908–2916 (2005).

    CAS  Article  Google Scholar 

  6. 6

    Eddy, J. & Maizels, N. Gene function correlates with potential for G4 DNA formation in the human genome. Nucleic Acids Res. 34, 3887–3896 (2006).

    CAS  Article  Google Scholar 

  7. 7

    Kikin, O., D'Antonio, L. & Bagga, P.S. QGRS Mapper: a web-based server for predicting G-quadruplexes in nucleotide sequences. Nucleic Acids Res. 34, W676–W682 (2006).

    CAS  Article  Google Scholar 

  8. 8

    Mukundan, V.T. & Phan, A.T. Bulges in G-quadruplexes: broadening the definition of G-quadruplex-forming sequences. J. Am. Chem. Soc. 135, 5017–5028 (2013).

    CAS  Article  Google Scholar 

  9. 9

    Guédin, A., Gros, J., Alberti, P. & Mergny, J.L. How long is too long? Effects of loop size on G-quadruplex stability. Nucleic Acids Res. 38, 7858–7868 (2010).

    Article  Google Scholar 

  10. 10

    Zack, T.I. et al. Pan-cancer patterns of somatic copy number alteration. Nat. Genet. 45, 1134–1140 (2013).

    CAS  Article  Google Scholar 

  11. 11

    Bochman, M.L., Paeschke, K. & Zakian, V.A. DNA secondary structures: stability and function of G-quadruplex structures. Nat. Rev. Genet. 13, 770–780 (2012).

    CAS  Article  Google Scholar 

  12. 12

    Cruz, J.A. & Westhof, E. The dynamic landscapes of RNA architecture. Cell 136, 604–609 (2009).

    CAS  Article  Google Scholar 

  13. 13

    Davis, J.T. G-quartets 40 years later: from 5′-GMP to molecular biology and supramolecular chemistry. Angew. Chem. Int. Ed. Engl. 43, 668–698 (2004).

    CAS  Google Scholar 

  14. 14

    Biffi, G., Tannahill, D., McCafferty, J. & Balasubramanian, S. Quantitative visualization of DNA G-quadruplex structures in human cells. Nat. Chem. 5, 182–186 (2013).

    CAS  Article  Google Scholar 

  15. 15

    Henderson, A. et al. Detection of G-quadruplex DNA in mammalian cells. Nucleic Acids Res. 42, 860–869 (2014).

    CAS  Article  Google Scholar 

  16. 16

    Biffi, G., Tannahill, D., Miller, J., Howat, W.J. & Balasubramanian, S. Elevated levels of G-quadruplex formation in human stomach and liver cancer tissues. PLoS ONE 9, e102711 (2014).

    Article  Google Scholar 

  17. 17

    Weitzmann, M.N., Woodford, K.J. & Usdin, K. The development and use of a DNA polymerase arrest assay for the evaluation of parameters affecting intrastrand tetraplex formation. J. Biol. Chem. 271, 20958–20964 (1996).

    CAS  Article  Google Scholar 

  18. 18

    Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).

    CAS  Article  Google Scholar 

  19. 19

    Rodriguez, R. et al. A novel small molecule that alters shelterin integrity and triggers a DNA-damage response at telomeres. J. Am. Chem. Soc. 130, 15758–15759 (2008).

    CAS  Article  Google Scholar 

  20. 20

    Hud, N.V., Smith, F.W., Anet, F.A.L. & Feigon, J. The selectivity for K+ versus Na+ in DNA quadruplexes is dominated by relative free energies of hydration: A thermodynamic analysis by H-1 NMR. Biochemistry 35, 15383–15390 (1996).

    CAS  Article  Google Scholar 

  21. 21

    Ewing, B., Hillier, L., Wendl, M.C. & Green, P. Base-calling of automated sequencer traces using Phred. 1. Accuracy Assessment. Genome Res. 8, 175–185 (1998).

    CAS  Article  Google Scholar 

  22. 22

    Rodriguez, R. et al. Small-molecule-induced DNA damage identifies alternative DNA structures in human genes. Nat. Chem. Biol. 8, 301–310 (2012).

    CAS  Article  Google Scholar 

  23. 23

    Fernando, H. et al. A conserved quadruplex motif located in a transcription activation site of the human c-kit oncogene. Biochemistry 45, 7854–7860 (2006).

    CAS  Article  Google Scholar 

  24. 24

    Rankin, S. et al. Putative DNA quadruplex formation within the human c-kit oncogene. J. Am. Chem. Soc. 127, 10584–10589 (2005).

    CAS  Article  Google Scholar 

  25. 25

    Marchand, A. et al. Ligand-induced conformational changes with cation ejection upon binding to human telomeric DNA G-quadruplexes. J. Am. Chem. Soc. 137, 750–756 (2015).

    CAS  Article  Google Scholar 

  26. 26

    De Cian, A., DeLemos, E., Mergny, J.-L., Teulade-Fichou, M.-P. & Monchaud, D. Highly efficient G-quadruplex recognition by bisquinolinium compounds. J. Am. Chem. Soc. 129, 1856–1857 (2007).

    CAS  Article  Google Scholar 

  27. 27

    Palumbo, S.L., Ebbinghaus, S.W. & Hurley, L.H. Formation of a unique end-to-end stacked pair of G-quadruplexes in the hTERT core promoter with implications for inhibition of telomerase by G-quadruplex-interactive ligands. J. Am. Chem. Soc. 131, 10878–10891 (2009).

    CAS  Article  Google Scholar 

  28. 28

    Bugaut, A. & Balasubramanian, S. A sequence-independent study of the influence of short loop lengths on the stability and topology of intramolecular DNA G-quadruplexes. Biochemistry 47, 689–697 (2008).

    CAS  Article  Google Scholar 

  29. 29

    Siddiqui-Jain, A., Grand, C.L., Bearss, D.J. & Hurley, L.H. Direct evidence for a G-quadruplex in a promoter region and its targeting with a small molecule to repress c-MYC transcription. Proc. Natl. Acad. Sci. USA 99, 11593–11598 (2002).

    CAS  Article  Google Scholar 

  30. 30

    Paeschke, K. et al. Pif1 family helicases suppress genome instability at G-quadruplex motifs. Nature 497, 458–462 (2013).

    CAS  Article  Google Scholar 

  31. 31

    Cock, P.J., Fields, C.J., Goto, N., Heuer, M.L. & Rice, P.M. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 38, 1767–1771 (2010).

    CAS  Article  Google Scholar 

Download references

Acknowledgements

We thank C. Lowe and D. Tannahill for critical reading of the manuscript and D. Beraldi for technical support. We thank P. McCauley (Illumina) who prepared the custom sequencing buffers. We are grateful to the Biotechnology and Biological Sciences Research Council (BBSRC) and Illumina for the studentship supporting V.S.C. (BB/I015477/1). The S.B. research group is supported by program funding from Cancer Research UK and the European Research Council and project funding from BBSRC.

Author information

Affiliations

Authors

Contributions

V.S.C. and J.M.B. carried out the experiments. G.M. designed, implemented and performed the analysis. V.S.C., J.M.B., G.M., M.D.A., S.B. and G.P.S. designed the experiments. V.S.C., G.M., M.D.A. and S.B. interpreted the results and co-wrote the manuscript with input from all authors.

Corresponding author

Correspondence to Shankar Balasubramanian.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Overall sequencing quality in sequencing experiments with the cations Li+, Na+ and K+.

Each plot visually shows base calling quality (Phred quality score, Q; y-axes) for the 250 sequenced bases (x-axes), in two independent experiments, with sequencing buffers containing Li+ (top), Na+ (middle) and K+ (bottom), as generated by the program FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Yellow bars and black whiskers are box plots for the respective base positions; red lines are median values; blue lines are mean values.

Supplementary Figure 2 Sequencing quality and sequencing errors (% mismatches) for control sequences.

Bar plots showing: (a) the average Phred quality score (Q) and (b) % mismatches for the 4 control sequences when sequencing with different cations Li+ (left), Na+ (middle) and K+ (right). c-kit, c-myc: positive controls; c-myc-opp, c-myc-mut: negative controls (see Methods). Data is taken from a number of independent sequencing experiments: 3 for Na+, 2 for Li+ and K+. The numbers of different control sequences (i.e. independent sequencing clusters on the flow cell) in the combined experiments are (order; c-kit, c-myc-opp, c-myc, c-myc-mut): 2741, 1139, 1040, 10945 for Li+; 8235, 3076, 2787, 26974 for Na+; 2935, 1315, 1, 12809 for K+. Bars are standard deviations. No error bar present for c-myc in K+ (n=1).

Supplementary Figure 3 Sequencing errors (% mismatches) for controls in PDS conditions.

% mismatches for the control sequences in the same sequencing experiment with Na+ sequencing buffers during the first read (Read-1; left) followed by the addition of the small-molecule PDS in Na+ throughout the second read (Read-2; right). Error bars are SEMs (respectively: 0.16, 0.02, 0.18 and 0.07 for left plot; 0.12, 0.08, 0.15 and 0.09 for right plot). N = 948, 367, 367 and 3990 for c-kit, c-myc-opp, c-myc, c-myc-mut.

Supplementary Figure 4 Heat-map of c-kit control sequence representing the base called at each position along the template in read 1 Na+ conditions and read 2 Na+ + PDS G4-stabilizing conditions.

Each line represents an independent read. Each column is a position in the template. The different colors represent the different bases which are called: A= green, C= blue, T= red, G= orange. As depicted in the figure below, in Read-1 (top panel) the sequencing polymerase incorporates the expected bases with high fidelity until the end of the template, after which scrambled sequencing is observed as the polymerase no longer has a template to process. In Read-2 (bottom panel), under G4-stabilizing conditions, a similar effect on base-calling is observed from the G4 structure onwards, suggesting that the polymerase stalls at the G4 structure. Ordinarily, DNA fragments displaying such truncations are removed in the data analysis, whereas in our study they are intentionally retained.

Supplementary Figure 5 Phred Quality Score (Q) trend along the c-kit control sequence.

Phred Quality Score (Q) along the c-kit control sequence in: Read-1 Na+ conditions (top panel) and Read-2 Na+ + PDS G4 stabilizing conditions (bottom panel). A clear drop in Q score is observed in Na+ at the truncation site (end of the fragment). A similar effect can be observed at the G4 start site exclusively under G4-stabilizing conditions.

Supplementary Figure 6 Overlap between OQs in K+ and Na+ + PDS conditions.

a) Venn diagram showing the overlap between the two conditions. The numbers in brackets at the top indicate the total OQs detected in K+ (525,890) and Na++PDS (716,310); the percentages on the side (27% and 46%) indicate the OQs detected exclusively by either condition; the percentages in the middle overlapping both circles indicate OQs detected by both conditions, as percentage of total K+ OQs (73%) or total PDS OQs (54%). The 27% OQs detected solely in K+ displayed an average mismatch percentage of ~18.5% in PDS (panel b); the 46% OQs detected solely in PDS displayed an average mismatch percentage of ~10% in K+ (panel c). The average mismatch percentage of randomly selected genomic locations is below 2%. b) Mismatches for OQs detected in K+ only (27% subset in the Venn diagram). c) Mismatches for OQs detected in PDS only (46% subset in the Venn diagram). In both b) and c), red lines indicate the thresholds set for the two data sets, 18% for K+ and 25% for PDS (Methods).

Supplementary Figure 7 G4-seq analysis obtained using PhenDC3 (1 μM) as a G4-stabilizing ligand.

a) % mismatches for the control sequences in the same sequencing experiment with Na+ sequencing buffers for the first read (Read-1; left) followed by the addition of the small-molecule PhenDC3 in Na+ throughout the second read (Read-2; right). b) % OQs detected in both K+ and PDS G4-Seq experiments (Left), K+ and PhenDC3 (Middle), PDS and PhenDC3 (Right). 85% of all OQs detected in PDS were also observed with PhenDC3. c) % PQs detected in both K+ and PDS G4-Seq experiments (Left), K+ and PhenDC3 (Middle), PDS and PhenDC3 (Right). 93% of all PQs detected in PDS were also observed with PhenDC3. d) % predicted G4s (PQs) among all the detected G4s (OQs) in different G4-Seq experiments. PhenDC3 and PDS show similar capability to detect G4s genome-wide. e) Summary of the comparison between different G4-Seq experiments.

Supplementary Figure 8 Different families of G-quadruplex structures.

Left: canonical PQs predicted by Quadparser (L1-3=N1-7, with N=A|C|T|G). Middle: PQs with longer loops (L1-3=N8-12 or L2=N8-21). Right: PQs with a single bulge B1=H1-7 or multiple bulges B2=H1-5 (H=A|T|C).

Supplementary Figure 9 Detection of OQs representing different G-quadruplex structural families in PDS versus K+ conditions.

Fold enrichment (ratio) between the numbers of OQs in PDS over K+ for each structural family. Values > 1 indicate higher numbers in PDS. G-quadruplex structural families: Loop 1-3; Loop 4-5; Loop 6-7: OQs with at least one loop of the indicated length; Long loops: OQs with any loop of length 8 to 12 for L1-3 or 8 to 21 for L2; Bulges: OQs with one bulge of 1 to 7 bases (A, T, C) or multiple bulges of 1 base.

Supplementary Figure 10 Comparison of genomic regions in PDS and K+ sequencing conditions.

a) Genome browser view of a genomic region within MYC oncogene. Red and orange tracks: % mismatches in reads aligning to the reverse strand (-) for PDS and K+, respectively. OQ intervals are shown as red and orange bars below the corresponding peaks.. b) Genome browser view of a genomic region within the MYL5-MFSD7 gene. Black and blue tracks: % mismatches in reads aligning to the forward strand (+) for PDS and K+, respectively. OQ intervals are shown as black and blue bars below the corresponding peaks. c) Genome browser view of a genomic region within the MYL9 gene. All colors and features as in a). See Supplementary Table 5 for sequence details. For all panels, OQs not predicted by Quadparser are indicated by * and Quadparser PQs are shown as black bars.

Supplementary Figure 11 Comparison of forward versus reverse strands in PDS sequencing conditions for the MYL9 gene.

A) Genomic region within the MYL9 gene. Red and black tracks: % mismatches in reads aligning to the reverse strand (-) and forward strand (+), respectively. OQs intervals are shown as red and black bars below corresponding peaks. Quadparser PQs are shown below in black. OQs not predicted by Quadparser are indicated by asterisks (*). See Supplementary Table 5 for sequence details.

Supplementary Figure 12 Circular dichroism (CD) spectra of noncanonical G4 sequences identified in the FOSB and MUC20 genes as OQs by G4-seq.

a) a sequence from the FOSB gene promoter with a single bulge in one G-run: GGC GCG GGC GGG GCG CGG G; b) an additional sequence from the FOSB gene promoter with a long loop: GGG GCG GGT GAC GTA AGC AGG GGG GCG GG; c) a sequence from the FOSB gene 5' UTR containing both a long loop and two bulges: GAG GTA CAG CGG CAT CCT GTG GGG GCC TGG G; d) a sequence from the MUC20 gene promoter containing multiple bulges: GGA GTG TTT ACC GGG CAA GGA AGG CAG GG. The CD analysis is consistent with G4 formation for all of these sequences (λ max ~263 nm: λ min ~240 nm)S1.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–12 and legends (PDF 5196 kb)

Supplementary Tables Integrated PDF

Supplementary Tables 1–4 and legends; Supplementary Tables 7–8 and legends; Legends for Supplementary Tables 5 and 6 (PDF 457 kb)

Supplementary Table 5

Genomic coordinates and full sequences of selected OQ regions (XLSX 67 kb)

Supplementary Table 6

Number of OQs present in annotated genes (XLSX 3513 kb)

Supplementary Scripts 1

Sequencing script for G4-Seq MiSeq experiments for either R2=K+ or Na+ + PDS conditions (XML 29 kb)

Supplementary Scripts 2

Sequencing script for G4-Seq HiSeq experiments in which R1=Na and R2= K+ (XML 28 kb)

Supplementary Scripts 3

Sequencing script for G4-Seq MiSeq HiSeq experiments in which R1=Na and R2= Na+ + PDS (XML 33 kb)

Supplementary Code

Computational analysis scripts for G4-Seq (ZIP 14 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chambers, V., Marsico, G., Boutell, J. et al. High-throughput sequencing of DNA G-quadruplex structures in the human genome. Nat Biotechnol 33, 877–881 (2015). https://doi.org/10.1038/nbt.3295

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing