In the human genome, information is stored in two dimensions: first as genetic code and second as epigenetic information. While cancer cells exhibit both genetic and epigenetic abnormalities, current sequencing methods are unable to simultaneously capture both. Furthermore, commonly used epigenome sequencing methods rely on base conversion chemistries, such as whole-genome bisulfite sequencing (WGBS) and enzymatic-methyl sequencing (EM-seq). A caveat common to these approaches is the masking of the most frequent mutation in cancer (cytosine-to-thymine), and the inability to differentiate between the two most abundant epigenetic bases, 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC). This leads to a loss of information in the readout. Alternative methods that differentiate 5mC from 5hmC exist, but these are dependent on complex multilayer workflows that require more sample than is readily attainable from most cancer biopsies. Together, this results in an incomplete understanding of the genomic and epigenomic landscape, limiting our biological understanding of cancer and consequently narrows the possibilities for diagnosis and early intervention. Obtaining genetic and epigenetic information simultaneously is critical to retrieve the genetic mutations, identify regulatory sequence variation and determine the tissue of origin of tumour DNA from the same sample.
The 6-letter whole-genome sequencing technology can acquire phased genetic and epigenetic information from human genomic and cell-free DNA with reduced sample size in a single experimental and computational workflow. This technique can unambiguously resolve the four genetic bases (guanine, adenine, cytosine thymine) and the epigenetic bases 5mC and 5hmC. Furthermore, the purely enzymatic workflow bypasses previous methodological limitations such as DNA degradation and coverage biases, while providing very high conversion efficiencies. Using a strand-specific two-base coding approach enables unequivocal decoding of up to 16 states of DNA chemical information, corresponding to the four genetic bases and multiple epigenetic states, in an allele-specific fashion. In brief, DNA methyltransferase 5 (DNMT5) copies 5mC bases on the original DNA strand to the complementary strand and 5hmC bases are glycosylated by β-glycosyltransferase to prevent such copy. Lastly, APOBEC3A, assisted by the UvrD helicase, is used to deaminate unmodified cytosines to uracil, which is subsequently read as thymine. A single phased digital readout is produced, which is intrinsically less prone to errors compared to two separate readouts of the genome and epigenome.
This is a preview of subscription content, access via your institution