Credit: P. Morgan/Macmillan Publishers Limited

Two new studies report nanopore-based methodologies for direct, single-molecule and long-read characterization of cellular DNA methylation.

Sequencing-based detection of DNA methylation, such as bisulfite sequencing, can be highly accurate, but typically has various limitations such as requiring a substantial amount of input DNA, reliance on PCR, reporting on only one type of DNA modification (such as 5-methylcytosine (5mC)) per experiment and an output of short sequencing reads.

Motivated by these drawbacks, efforts are underway to adopt single-molecule sequencing technologies for the direct detection of modifications in unamplified DNA. The new studies by Simpson et al. and Rand et al. focus on the Oxford Nanopore Technologies MinION nanopore sequencer, in which single-stranded DNA is driven through a protein pore while an electrical current is applied. Shifts in current (known as 'events') are detected and used to infer the identity of bases passing through. Accurate reading of bases (and particularly their modifications) is a substantial technical and algorithmic challenge: events reflect a composite signal of the 6-mer sequence within the pore, hence events are heavily influenced by sequence context rather than being unique to single modified nucleotides. Additionally, unlike PCR products for other detection methods, the unamplified native DNA contains a range of modifications and DNA damage that can complicate analyses.

Simpson et al. PCR-amplified Escherichia coli DNA to generate an unmethylated control sample and also treated it with the M.SssI CpG methyltransferase to generate a near-fully 5mC-methylated comparator. They observed differences in event distribution from these two samples, and used these patterns to train a hidden Markov model (HMM) algorithm to detect 5mC in a CpG context in nanopore sequencing data. They then used the same approach to optimize methylation-calling thresholds of the HMM for NA12878 human lymphoblast DNA, enabling 5mC status to be called at 77% of CpG sites with 94% accuracy. Using the trained HMM, 5mC was profiled in native unamplified human DNA, and the nanopore-based detection recapitulated various features of bisulfite-based profiling, including finding a depletion of 5mC around transcription start sites in NA12878 cells and identifying hypermethylated regions in MDA-MB-231 malignant breast cancer cells relative to untransformed MCF10A breast epithelial cells. Importantly, comparing single nanopore reads revealed that multi-kilobase hypermethylated regions in MDA-MB-231 cells consisted of a mixture of almost fully methylated DNA molecules and sparsely methylated DNA molecules, indicating substantial intercellular heterogeneity and/or long-range coordination of methylation in cis, which would not have been apparent from short-read bisulfite sequencing.

Rand et al. used a related approach to detect a range of DNA methylation types in nanopore sequencing data. They trained a variable-order HMM with a hierarchical Dirichlet process (HMM-HDP) on nanopore output from synthetic DNA oligonucleotides harbouring 5mC and 5-hydroxymethylcytosine (5hmC), which could be identified with a mean accuracy of 74% (as expected, accuracy was influenced by the surrounding sequence context). Next they used a DNA plasmid that had been methylated in E. coli cells expressing the DNA methyltransferases Dcm (for 5mC) and Dam (for N6-methyladenine (6mA)), and achieved a mean calling accuracy of 79% for 5mC and 70% for 6mA. Calling accuracy was highest for high-quality sequencing reads (those with greatest alignment to the known plasmid sequence), and applying read-quality thresholds to 40× coverage data increased the methylation calling accuracy to 96% for 5mC and 86% for 6mA. Finally, Rand et al. analysed E. coli genomic DNA and were able to detect known global changes in 5mC and 6mA levels during different growth phases of the E. coli culture.

detecting multiple types of DNA methylation in unamplified DNA

These studies are a valuable step forwards for detecting multiple types of DNA methylation in unamplified DNA, and they could be applied to various sample types, including low-abundance specimens. The analysis pipelines can also be used on existing nanopore sequencing data sets that were not initially produced for DNA methylation profiling, and could be trained further to improve accuracy and to learn signatures of additional DNA modifications. Such advances may provide insight into the combinatorial interplay of DNA modifications for genome regulation.