Shankar Balasubramanian has been studying G-quadruplexes for a long time—15 years by his count—with many methodological successes along the way. He and his team at the University of Cambridge now show that it is possible to locate these structures, knot-like stacks of guanine tetrads held together by hydrogen bonds, throughout the genome.

G-quadruplexes, also called G4s, are diverse structures made from one, two or even four unconnected DNA strands. Between the tetrads there can be long loops and bulges—features usually associated with the more flexible RNA molecule. (RNA also forms G4s.) This diversity means that guanines can be separated in the linear sequence and makes it difficult to computationally predict where G4s form. “Once you can tolerate a lot of discontinuities, the universe of possibilities becomes enormous,” says Balasubramanian.

Their G4-sequencing (G4-seq) approach is based on the fact that G4s can block polymerases, similar to the principle behind polymerase stop assays that have been used to query individual structures. “The key was to find near-native conditions under which one would expect these structures might form, and find conditions under which these structures would be disfavored,” explains Balasubramanian. Physiological levels of potassium stabilize G-quadruplexes, but sodium or lithium at the same ionic strength have the opposite effect. The small molecule pyridostatin stabilized G4s even more.

To start, genomic DNA is sequenced under disfavoring conditions on an Illumina flow cell to determine the identity of every fragment. The synthesized strand is then peeled away and sequenced again under G4-favoring conditions. The polymerase begins adding gibberish bases when it reaches a stable G4, causing sudden discontinuities in base quality and sequence identity.

In primary human B lymphocytes, G4-seq sites agreed well with computationally predicted G4s, but their number more than doubled to over 716,000 despite a conservative detection cutoff. A large subset consisted of noncanonical G4 structures that are computationally difficult to detect, and these were enriched in gene bodies. The researchers also confirmed some novel G4s using biophysical methods.

It is not possible to know what fraction of all possible structures G4-seq detects or which structures form in the cellular environment. Sites found using pyridostatin reflect its unique structural specificity and do not overlap exactly with those detected using potassium. But Balasubramanian notes that their assay identifies motifs that are sufficiently stable in near-physiological conditions to perturb a biochemical mechanism, akin to polymerase stalling at replication forks, for instance.

What do these kinks in the DNA do? “For the moment, there are opinions,” says Balasubramanian. Some believe that the structures are irrelevant, but that is rapidly becoming the minority position. It is “an empirically observed fact that [G-quadruplexes are] enriched in regulatory regions,” says Balasubramanian, adding that “there are a lot of smoking guns for a link between quadruplexes and genome instability.”

There are implications for disease: the structures are over-represented in cancers, and the G4-seq work identified G-quadruplexes in the cancer-related genes BRCA1, BRCA2 and MAP3K8. Balasubramanian hypothesizes that G4s play a role in cancer by promoting genome instability or by creating genetic dysfunction because resolvase-type enzymes that normally remove G4 structures are depleted in tumors.

The researchers are working to pin down mechanisms and put G4 formation in a cellular context, using immunoprecipitation-based enrichment, for example. Their work may eventually have a practical benefit. “What I'd like to understand is whether this link to G-quadruplexes, cancer and all the genes that may be players in the process could present an opportunity for a therapeutic paradigm,” says Balasubramanian.