A DNA molecule is not just defined by its genetic components, the primary nucleotide sequence, but also by epigenetic modifications, such as the methyl groups added to cytosines at CpG dinucleotides. These methyl-cytosines, often referred to as the fifth base, are crucial for normal development, whereas aberrant cytosine methylation is the hallmark of many diseases.

To quantitatively measure methylation in the human genome, two independent research teams, led by George Church and Jin Billy Li from Harvard University (Ball et al., 2009) and Kun Zhang from the University of California in San Diego together with Yuan Gao from the Virginia Commonwealth University (Deng et al., 2009), combined traditional epigenetic tools such as bisulfite conversion of DNA with a recently developed targeted genome capture technique and high-throughput sequencing.

Both groups used padlock probes to enrich for selected parts of the genome. These linear oligonucleotide probes are designed so that each end hybridizes on either side of the targeted genomic region, then a DNA polymerase extends one end across the capture region, and after a final ligation step, the now circular DNA is amplified and sequenced.

Schematic of targeted genomic capture using padlock probes. Reprinted from Nature Biotechnology.

Previously, Church and his colleague Jay Shendure had used padlock probes for exon capture; although the probes proved to be very specific for the targeted regions, they showed high allelic bias and poor reproducibility. Shendure's team has now addressed these problems for exon capture (Correspondence on p. 315), but using padlock probes for bisulfite-treated DNA poses an extra challenge: during bisulfite conversion, all methyl-cytosines are converted to uracil and subsequently to thymidine, thus reducing the complexity of the sequence and making specific probe design more difficult. Zhang's group, in collaboration with the Church team, optimized the capture protocol, including probe design.

To get a global profile of the methylome, Church and colleagues targeted their probes to selected genomic regions irrespectively of whether these regions are enriched in CpG dinucleotides, so-called CpG islands, and they complemented this targeted enrichment with a genome-wide counting of cuts after digest with a methylation-sensitive enzyme. After analyzing several cell lines including fibroblasts and induced pluripotent stem cells (iPSCs), they found that gene expression correlated negatively with methylation at the promoter region and positively with methylation in the body of the gene. Church is convinced that this is biologically very relevant, and he adds: “There are cases where a single methyl will matter to a biological function, and we need to be open to the possibility that methyl groups could be just about anywhere.” To follow up on this hypothesis, the Church team is working on high-throughput methods for allele-specific methylation.

The Zhang and Gao groups, in contrast, focused mostly on CpG islands, partly because those are the regions with higher methylation, and partly because they are clearly defined and thus present a stable set of targets. They compared the methylation patterns in all CpG islands on two chromosomes in iPSCs and human embryonic stem cells (hESCs). To their surprise, the researchers noted that only 10% of the regions show a difference in methylation between the cell lines. For Zhang, this underscores the advantages of a targeted strategy over genome-wide sequencing. “Full methylome sequencing is not cost-effective,” he concludes, “because 90% of your data will not give you too much information.”

As Church's and Li's teams, Zhang and his colleagues saw decreased promoter methylation and increased gene body methylation in highly expressed genes. In addition, they observed that the methylation patterns of iPSCs and hESCs differ. Zhang describes their findings: “iPSCs tend to be more methylated... and this could be causing an extra effort to do the re-differentiation.” To assess this difference in more detail, Zhang plans to look at the methylation state in 'clean' iPSCs, that is, cells free of inducing factors, and their intermediate and fully differentiated descendents.

With these techniques, the role of the fifth base is becoming a lot more prominent.