Low biological fluctuation of mitochondrial CpG and non-CpG methylation at the single-molecule level

Mammalian cytosine DNA methylation (5mC) is associated with the integrity of the genome and the transcriptional status of nuclear DNA. Due to technical limitations, it has been less clear if mitochondrial DNA (mtDNA) is methylated and whether 5mC has a regulatory role in this context. Here, we used bisulfite-independent single-molecule sequencing of native human and mouse DNA to study mitochondrial 5mC across different biological conditions. We first validated the ability of long-read nanopore sequencing to detect 5mC in CpG (5mCpG) and non-CpG (5mCpH) context in nuclear DNA at expected genomic locations (i.e. promoters, gene bodies, enhancers, and cell type-specific transcription factor binding sites). Next, using high coverage nanopore sequencing we found low levels of mtDNA CpG and CpH methylation (with several exceptions) and little variation across biological processes: differentiation, oxidative stress, and cancer. 5mCpG and 5mCpH were overall higher in tissues compared to cell lines, with small additional variation between cell lines of different origin. Despite general low levels, global and single-base differences were found in cancer tissues compared to their adjacent counterparts, in particular for 5mCpG. In conclusion, nanopore sequencing is a useful tool for the detection of modified DNA bases on mitochondria that avoid the biases introduced by bisulfite and PCR amplification. Enhanced nanopore basecalling models will provide further resolution on the small size effects detected here, as well as rule out the presence of other DNA modifications such as oxidized forms of 5mC.


Results
Nanopore sequencing reliably detects CpG DNA methylation in CpG context (5mCpG). Long read sequencing is a rapidly evolving field that is largely still in its infancy. Hence, we first sought to determine the reliability of using nanopore sequencing to detect DNA methylation from native DNA in our own hands. To do so, we sequenced genomic DNA extracted from the human liver cell line HepaRG, using an Oxford Nanopore Minion device (ONT). Global patterns of DNA methylation were consistent with the known depletion of 5mCpG at CpG islands (CGIs) (Fig. 1A). We then compared genome-wide methylation patterns of DNA from HepaRG cells sequenced with nanopore, to those obtained with EPIC Bead Arrays (Illumina) 20 . To overcome the problem of sparsity in DNA methylation data, we aggregated CpG methylation values from more than 130 k transcription binding site loci corresponding to hepatocyte-specific (FOXA2 and HNF4A) and control (PAX5 and PU.1) target regions. Both, EPIC and Nanopore data are able to capture the expected dip in methylation associated with active regulatory regions (Fig. 1B, top panels) 21 . In contrast, non-active transcription factor binding sites produce a flat methylation profile after aggregation of a similar number of genomic regions in both EPIC and Nanopore data (Fig. 1B, bottom panels). Both techniques were highly correlated when aggregated data from all transcription factor binding sites was taken together (Fig. 1C). Different methods have been described for extracting base modification information from nanopore sequencing data, and recently they have been carefully benchmarked 22 . In line with recommendations, we tested two of the best established tools to detect CpG methylation, Nanopolish 23 which uses a hidden markov model to detect DNA methylation, and the more recent Guppy + Medaka which has been trained to basecall for modified human CpG dinucleotides using a recurrent neural network 24 . According to the benchmarking described above, while Guppy correctly identified unmethylated sites, Nanopolish was better at recovering fully methylated sites 22 . To perform a site-level correlation, we used targeted nanopore sequencing data from HepaRG cells with a higher coverage in a set of hepatocyte identity genes (i.e. ALB, F2, HNF4A, SLC10A1, DHCR24, HDAC4 and SERPINA1). Using this method of comparison, DNA methylation values were highly correlated, with Guppy + Medaka having a higher tendency towards calling cytosines as unmethylated (Fig. 1D). These results are in line with former studies showing a slightly higher tendency for Medaka to call unmethylated cytosines 25 , and Guppy to underpredict high methylation compared to Nanopolish 22 .
Therefore, in agreement with recent publications, 5mCpG methylation can be reliably obtained from native DNA using nanopore sequencing and different bioinformatic algorithms. For all CpG analyses presented below we used Guppy + Medaka for extraction of 5mCpG values and Nanopolish for verification and visualization.
Detection of mtDNA 5mCpG in long reads. Having shown the suitability of nanopore sequencing for analysis of nuclear 5mCpG, we used the same strategy on mtDNA enriched by subcellular fractionation of different cell lines ( Fig. 2A). Importantly, mtDNA was linearized enzymatically before sequencing using a Minion device. This technique enabled the clear enrichment of the mitochondrial cellular fraction, measured by protein expression of mitochondrial or cytosolic markers ( Fig. 2B and Supplementary Figure S1). After sequencing, we obtained a high fraction of reads mapping to mtDNA. Of note, due to long read length, the proportion of mapped reads and their coverage was higher than 80%. Indeed, some reads consisted of full-length mtDNA sequences (Fig. 2C). Interestingly, we observed an unequal representation of the heavy strand (HS) and light strand (LS) of mtDNA (Fig. 2C). We attributed this to the efficiency of the BamHI enzyme in its ability to cut the HS more www.nature.com/scientificreports/ efficiently and leaving a slightly higher ratio of 5′ ends available for the ligation of adapters before loading onto the Nanopore sequencing device. Moreover, recent findings suggest there can be an unequal representation of mtDNA CpG methylation on the HS and LS 10 . Hence, to reduce any potential bias to the average methylation of each CpG site, we considered the methylation of the HS and the LS separately for further analysis. Furthermore, mitochondrial populations can be heterogeneous within a single cell. Therefore, we also took advantage of single molecule methylation, by visualizing the methylation of whole mtDNA reads to better understand the single molecule methylation heterogeneity in our samples. (B) Aggregated DNA methylation data was obtained for hepatocyte active (FOXA2 and HNF4A, top panels) and control (PAX5 and PU.1, bottom panels) transcription factor binding regions. Methylation profiles from EPIC bead array (epic) and Nanopore sequencing data (nano) (red and blue lines, respectively) are shown for each aggregated dataset. (C) EPIC-Nanopore correlation for 5mCpG data on all aggregated datasets shown in (B). (D) Nanopore targeted sequencing for a panel of hepatocyte identity genes was used to basecall 5mCpG using two different bioinformatic pipelines: Nanopolish and Guppy + Medaka (see "Methods"). Single CpG level correlations are shown.   Figure S1). (C) The same protocol was used on the liver cell line HepaRG. In addition, whole genome amplification was used to produce a "fully" unmethylated control (FU), and followed by DNA methylase (M.Sssl) treatment to produce a "fully" methylated control (FM). Nanopore sequencing coverage for the heavy strand (HS) and the light strand (LS) in FU and FM mtDNA-enriched HepaRG samples. (D) Nanopolish was used to infer 5mCpG likelihood and extract methylation frequency tables (corresponding to the proportion of methylation). Methplotlib (see "Methods") was used to plot a histogram of 5mCpG frequencies in FU and FM, colored by strand. (E) Strand-specific 5mCpG frequency plots (colored lines), and 5mCpG likelihood pile-plots (100 reads per sample). Gene mapping to mtDNA are shown in the bottom track as colored arrows. "Modified frequency" refers to the proportion of modified base calls, in this case 5mCpG, for each of the samples and CpG sites. www.nature.com/scientificreports/ To validate the accuracy of nanopore for detecting 5mCpG, we prepared fully unmethylated (FU) and fully methylated (FM) mtDNA controls. FU was prepared by whole genome amplification and then FM was prepared by methylation of CpG nucleotides using DNA methyltransferase (M.Sssl). As expected, 5mCpG profiles were opposite in FU and FM mtDNA controls (Fig. 2C,D). Some residual methylation was observed in the FU control and we considered these levels as a baseline for this technique (Fig. 2D,E). Indeed, we used the FU control as our background to call detectable methylation. This value was obtained by dividing the number of called sites as methylated by the total number of called sites in the FU sample. The background calculated for the mtDNA HS was 0.022 and for the LS 0.016. We were also able to identify some fully unmethylated reads in the FM control. We attributed this to incomplete efficiency of the DNA methyltransferase (M.Sssl) in methylating these reads, and therefore a stochastic effect. Furthermore, this observation highlights the utility of this approach to identify a mixture of DNA in a single sample. In addition, we observed low basal levels of 5mCpG in mtDNA from HepaRG cells. Indeed, we did not identify any differential methylation between the FU control and HepaRG cells, either globally or at the CpG site or strand-specific levels (see next Section).
These data shows that Nanopore sequencing is able to detect mtDNA methylation above an acceptable background. 5mCpG is not different from the unmethylated control at HepaRG basal conditions. We next went further to investigate 5mCpG in several conditions known to modify mitochondrial activity. mtDNA CpG methylation was not affected by in vitro hepatocyte differentiation. Hepatocyte differentiation implies metabolic rewiring and changes in mitochondrial content and activity 26 . As such, this dynamic process may involve concomitant changes in mtDNA methylation. The bipotent liver progenitor cells, HepaRG, are capable of in vitro differentiation into hepatocytes and biliary cells. By plating HepaRG cells under differentiating conditions during four weeks we obtain a mixture of the hepatocyte and biliary lineages 20,27,28 . This well-established model allows us to compare hepatic "progenitor like" cells to their "differentiated" counterpart. We used minimally photo-toxic holo-tomographic microscopy combined with mitochondrial labelling (using MitoTracker Green) to determine mitochondrial content. We observed in both cellular tomogram and MitoTracker staining profile that differentiated HepaRG cells have a higher mitochondrial content, as well as more lipid droplets when compared to their progenitors (Fig. 3A).
To determine the effect of hepatocyte differentiation on mtDNA methylation, we used the nanopore sequencing protocol described above, comparing progenitor-like HepaRG cells with their differentiated progeny. In both cases, methylation values were not different from the fully unmethylated control (Fig. 3B). There was no differential methylation when directly comparing proliferative and differentiated HepaRG cells (Fig. 3C). Similar results were obtained when analyzing both strands together or independently. Interestingly, the likelihood of methylation, as calculated with nanopolish, was higher in differentiated HepaRG cells. This can be seen at the read level (likelihood scale in Fig. 3C shows mainly blue reads in proliferative and mainly gray reads in differentiated cells), although this difference was not high enough to be called as methylation and/or may represent additional nucleotide modifications. mtDNA CpG methylation does not fluctuate under in vitro oxidative stress conditions. In addition to differentiation, mitochondrial activity is largely associated to oxidative stress, and therefore an interesting process where to study 5mCpG variation. To induce oxidative stress in vitro, we used an established method utilizing hydrogen peroxide (H2O2) to induce reactive oxygen species 29 . Several cell lines were tested (data not shown) and Homo sapiens embryonic kidney 293 T cells emerged as an ideal candidate for an oxidative stress model. Treatment for two hours was sufficient to induce oxidative stress in 293 T cells as measured by MitoSox staining, which could be rescued by treatment with N-acetylcysteine (NAC) (Fig. 4A,B).
At basal levels, 293 T cells exhibited higher global levels of 5mCpG than HepaRG cells (p value = 0.05). However, the same strand specific methylation was observed (i.e. higher 5mCpG in the HS). We then compared 5mCpG levels in control, H2O2-treated, and NAC-rescued 293 T cells. There were no global differences in mean methylation between groups, and with a threshold of 10%, we did not observe any differential methylation on the HS or the LS at the single CpG level (Fig. 4C,D).
Although 5mCpG was clearly detectable in 293 T cells, and higher than HepaRG cells at steady state, it was not significantly affected by H2O2 exposure.
Effect of cell culture on CpG methylation. Global differences in DNA methylation have been described after cells are plated in culture conditions 30 . Moreover, major metabolic alterations, notably metabolic repression, have been described after hepatocytes are placed in culture (Cassim et al., 2017). While the liver cell line HepaRG did not display 5mCpG above background (Fig. 3), human liver tissues were consistently methylated at discrete CpG sites and globally more methylated (Fig. 5A). In line with this, we observed intermediate 5mCpG values in primary human hepatocytes (PHH) after two weeks in culture (Fig. 5A). Globally, 5mCpG was not different in HepaRG as compared to the FU control (p = 0.5). In contrast, 5mCpG was higher in PHH relative to HepaRG (p < 2.2e−16), and higher in liver tissues relative to PHH (p < 2.2e−16) (Fig. 5E). This result was similar when analyzing separately both mtDNA strands.
The results above suggest that 5mCpG may be lost in culture conditions. To systematically evaluate the effect of cell isolation and culture on mtDNA methylation under controlled conditions, we obtained healthy liver tissue and isolated PHH from the same sample (Fig. 5B). DNA was extracted from liver tissue, PHH in suspension (0 h) and also placed in culture for 24 h or 7 days before isolation of DNA, enzymatic linearization and sequencing with Nanopore's MinION (Fig. 5B). Despite omitting the mtDNA enrichment step, between 1-2% of reads aligned to the mitochondrial chromosome, representing coverage of over 1000 × for each condition (Fig. 5C). Visualization of 5mCpG at single base resolution revealed slightly higher levels in liver tissue compared to PHH  www.nature.com/scientificreports/ cultured for 7 days on the L-strand and the H-strand (Fig. 5D,E). However, we did not observe any differences between PHH 0 h in culture and 7 days. Thus, 5mCpG differences between liver tissue and cultured PHH are more likely due to the combination of cells present in tissue rather than culture conditions. While levels of 5mC were low in mtDNA, we were confident in these findings after validating with an additional BS-qMSP technique (Supplementary Figure S2). Nanopore sequencing detects low CpH mtDNA methylation across diverse biological conditions. In addition to 5mCpG, new algorithms have become recently available for the assessment of 5mC in non-CpG (CpH) contexts from nanopore sequencing data. These algorithms are based on training with reference sequences using deep learning and are contained in the Rerio nanopore repository (ONT). Once the model is defined, a research command line tool (i.e. Megalodon, see "Methods") is used to extract high accuracy modified base from raw nanopore reads by anchoring the basecalling output to a reference genome. From the Rerio repository, we selected the most recent version trained to detect all-context 5mC on MinION raw sequencing data (see "Methods"). To validate the performance of this algorithm for CpG and CpH 5mC, we first used www.nature.com/scientificreports/ nuclear DNA extracted from central nervous tissues, known to be relatively abundant in 5mCpH in physiological conditions. 5mC was overall higher in mouse brain, compared to the cell line levels described above (Wilcoxon rank sum test p-value < 2.2e−16) (Fig. 6A). Mouse brain DNA displayed the expected genomic 5mCpG distribution of absence in CpG islands, enrichment in gene bodies and intermediate levels in distal enhancers (Fig. 6B). As previously described, CpH methylation followed this same global pattern with particular enrichment (up to 5% methylation) in gene bodies, although we did not observe an obvious bias towards CAC versus CAG context (Fig. 6B) [31][32][33] . Even low coverage sequencing obtained with Cas9-targeting followed by nanopore sequencing, was able to show gene body CpH methylation in a panel of human glioblastoma cell lines and diffuse glioma tissues (Fig. 6D,E). Similarly, CpH was absent or low in CpG islands (CGI) and enhancers. Next, we used the same algorithm to extract base modification information from mouse brain mtDNA. Interestingly, 5mCpG was detectable above background levels for both, L-and H-strands (Fig. 6C). In contrast, most cytosines in CpH context displayed levels below the threshold of detection in both mtDNA strands (Fig. 6C). We were also able to detect the expected patterns of nuclear CpG and CpH methylation in human liver tissues using this all-context model (Fig. 6F). However, both CpG and CpH were detected at low levels in mtDNA, confirming our previous findings with different algorithms (i.e. Nanopolish and Guppy + Medaka) (Fig. 6G).
Our results for mtDNA 5mCpG using this new model confirmed our previous results obtained in differentiation and oxidative stress cell line models. In addition, CpH levels were low and stable across all conditions. Interestingly, in the PHH time-point experiment, despite the low levels of non-CpG methylation in mtDNA, liver tissue had higher levels of this modified base compared to PHH in culture for > 24 h (Fig. 6H,I), similar to what we observed for 5mCpG, and suggesting that cell plating results in loss of 5mCpN in general. Low 5mC levels were validated by qMSP (Supplementary Figure S2).
In summary, new deep learning algorithms for calling base modifications confirm our results for mtDNA 5mCpG. In addition, they are useful in detecting expected nuclear CpH methylation and suggest general low levels and fluctuation in mtDNA 5mCpH. mtDNA methylation in cancer. Most cancer cells undergo a switch in their metabolic configuration, primarily relying on aerobic glycolysis instead of mitochondrial oxidative phosphorylation 34 . In addition, it was recently described that mtDNA from liver cancer cells had higher levels of CpG and nonCpG methylation than that of non-tumorigenic liver cells in vitro 35 . With this in mind, we performed whole genome nanopore sequencing of DNA from matched tumor/normal samples of liver (hepatocellular carcinoma [HCC]) (n = 10 pairs) and head and neck (HNC) (n = 6 pairs) cancers, and evaluated all-context 5mC mtDNA using the Rerio/Megalodon model validated in the previous section. HNC samples were sequenced with a MinION device, while a deeper sequencing was done in HCC samples using the high throughput version of nanopore technology, the Pro-methION system. Accordingly, we used the corresponding all-context neural network model for calling base modifications from each of these platforms (see "Methods").
In liver, both tumor and non-tumor tissues displayed 5mCpG methylation above background levels at several CpG sites (Fig. 7A). However, we did not find differentially methylated sites when comparing tumor and their matched adjacent tissues (paired, multifactor approach). A subset of CpG sites with lowest p values for this comparison (non-adjusted p < 0.05), were able to partially discriminate tumors from non-tumor tissues, with the latter displaying slightly higher levels of 5mCpG (Fig. 7B). Rather than differential methylation between tumors and non-tumor tissues, we found consistent 5mCpG at discrete sites (when compared to the FU background control) in both type of samples. Most 5mCpG was detected exclusively in the HS, and only 3 CpG sites were consistently found in the LS (i.e. chrM:314, chrM:5469, and chrM:14382). There were also more sites detected as methylated in non-tumor tissues, probably due to a higher 5mCpG variation in tumor samples (Fig. 7C). Contrary to 5mCpG, 5mCpH levels were below background and were not able to distinguish tumors from their normal counterparts.
In contrast to liver, HNC tissues displayed higher levels of both, CpG and CpH methylation. Global levels of 5mC were higher in tumor tissues (Fig. 7D), although the effect size was below 5% in all comparisons. HNC displayed higher mean methylation when taking all cytosines together (paired wilcox test p-value = 9.501e−08, mean methylation of non-tumors = 0.58%, mean methylation in tumors = 0.68%), or CpG sites (paired wilcox test p-value = 4.19e−05, mean methylation of non-tumors = 5.12%, mean methylation in tumor = 5.96%) and CpH sites (paired wilcox test p-value = 0.0002, mean methylation of non-tumors = 0.12%, mean methylation in tumors = 0.15%) independently. In addition, only site-level CpG differences between tumor and adjacent tissues reached statistical significance (Fig. 7E). Specifically, positions chrM:8582, chrM:8579, and chrM:5755 were differentially methylated in a paired tumor/adjacent analyses (FDR adjusted p value < 0.05). Such differences were found in both L-and H-strands. Of note, to avoid biases dependent on differences in coverage, we excluded cytosines with less than 10 × coverage in all samples. Furthermore, contaminating nuclear mitochondrial DNA inserts (NUMTs) are a well known source of error due to their sequence homology with mtDNA 36 . To exclude a bias due to NUMTs, we performed a separate analysis of 5mC only in nanopore reads mapping exclusively to those sequences (1030 NUMTs after liftover to the hg38 assembly, as previously described 36 ). We did not find significant differences in 5mC at those locations between tumors and adjacent tissues. Moreover, in a separate analysis, we excluded all reads aligning to mtDNA with a length below 8798 bp (the longest NUMT in our reference database). After removing those reads, the global differences between tumor and non-tumor HNC tissues was still significant.
Therefore, while several cytosines were found differentially methylated in cancer tissues from HCC and HNC, we were able to detect CpH above background levels only in HNC tissues. This could be the result of tissue-dependent profiles, or the different algorithms used to call base modifications from the two devices used in these two independent datasets.

Discussion
In the present study we have shown that nanopore sequencing can reliably detect 5mC in mitochondrial DNA from human and mouse cells. Exploiting the advantages of long reads and native DNA sequencing, we show that 5mCpG and 5mCpH can be detected at discrete locations at levels that depend on the sample (i.e. immortalized cell line versus tissues). In addition, despite global low levels, we found differential mtDNA methylation in HNC samples compared to their matched adjacent tissues. However, we did not observe differential 5mC in two biological contexts (i.e. in vitro differentiation of a liver progenitor cell line and in vitro induction of oxidative stress) or a different cancer type (HCC), where 5mC was generally low or below the threshold of detection for CpG and CpH, respectively. The absence of changes in response to oxidative stress may be specially surprising. While the effect of oxidative stress on mitochondrial activity has been extensively studied (Ashari et al., 2020; Yu et al., 2020), there had not yet been a comprehensive mapping of mtDNA 5mCpG in oxidative stress conditions. A previous study showed that downregulation of the Dnmt1 (isoform 3) under oxidative and nutritional stress, resulted in hypomethylation of mitochondrial genome. However, 5mC levels were low (close to 5% in most conditions) and only 6 selected CCGG sites were analyzed 37 . An interesting follow up of our experiment would be the study of a potential bias due to the selection of mitochondria that are being replicated after longer time points. Full mtDNA length reads would be ideal to rule out the enrichment on a particular mitochondria subpopulation. The literature on mtDNA methylation is populated by contradictory findings that have been attributed to the limitations of bisulfite-based techniques that are considered the gold standard for base-resolution profiling of DNA methylation. In particular, it has been shown that linear mtDNA had significantly higher bisulfite conversion efficiency and resulted in lower methylation values compared with circular mtDNA 38 . In response, a series of works outlining amendments to the bisulfite conversion protocol for mitochondria have been published in order to ensure the bisulfite conversion efficiency is properly controlled for 15 . Moreover, recent publications tried to overcome such issues and conclude that mtDNA is highly methylated, in particular in a strand-specific and nonCpG context 10,35,39 . We have not been able to reproduce such findings. Indeed, absolute levels of methylation and its variation were generally low in all our samples, regardless of their origin and treatment. Although some of these discrepancies may be related to differences in sample type, this is clearly an insufficient explanation. On one hand, despite careful protocol optimization, it is difficult to fully rule out incomplete bisulfite modification due to the secondary structure of mtDNA, biases introduced by PCR amplification of two strands with a markedly different base content, or misalignments to nuclear mitochondrial sequences (i.e. NUMTs) which are more frequent with short read sequencing. On the other hand, there are limitations still present or even inherent to the novel nanopore sequencing technology. In the first case, heteroplasmy represents a challenge when identifying base-level differences in 5mC. This is especially true at the population level, but does not explain little variation in the isogenic controlled in vitro experiments shown here. In the second case, we cannot rule out that base modification calling could have been affected by local sequence patterns, such as G homopolymers. In addition, there is still room for improvement of neural network-derived models. Indeed, there is not currently a model with optimal performance which can detect 5mC, 5hmC and 6 mA at the same time. Although the accuracy of the all-context 5mC model is high, future algorithms that incorporate new base modifications simultaneously may be available in the short term, something that becomes a hindrance for bisulfite-based techniques.
We are aware of three independent studies that used nanopore sequencing to detect 5mCpG in mtDNA [40][41][42] . As part of a study of mitochondrial DNA alterations in oral squamous cell carcinoma, one of them evaluated the correlation between mtDNA alterations and cisplatin sensitivity using two oral squamous cell carcinoma cell lines 40 . The authors also used native DNA samples digested with BamHI to linearise the circular mitochondrial genome, followed by Nanopolish to call 5mC. Similar to our findings, they reported a low level of methylation across the whole mitochondrial genome in all tested cell lines, around 6-7%. However, the exception were three methylated sites in the H103 cell line. The authors discussed that such differences could have arisen from distinct mitochondrial and nuclear genotypes 40 . In addition, during the revision of our manuscript, two preprints were published describing the use of nanopore sequencing to detect mtDNA methylation in CpG context. One of them used a workflow similar to ours (i.e. mtDNA enrichment, linearization, and Nanopolish methylation calling) to study 5mCpG in whole blood 42 . In agreement with our findings, they detected low-level methylation, with exceptions at 32 unique CpG sites and positive correlation between both strands. Interestingly, by using synthetic DNA samples, they show that highly methylated reads had a higher tendency to be considered as failed during basecalling. Although including those reads did not dramatically change their conclusions, they rightly conclude that reanalysis of highly methylated CpG sites within mtDNA may be of importance 42 . A second preprint 41 described a protocol also based on BamHI linearization to sequence human mtDNA. Indeed, they show that compared to fragmentation-based methods, BamHI digestion did not result in strand bias. Methylation calling using nanopolish resulted in negligible mtDNA methylation levels in multiple human primary and cancer cell lines. To avoid biases the authors recommend higher read depths that can help in the identification of heteroplasmic mtDNA variants. Indeed, they illustrate the use of de novo assembly before methylation calling to avoid such bias.
Our work is pioneer in testing all-context models of nanopore 5mC basecalling. We were able to illustrate the ability of this model to produce the expected genomic distribution of CpH in those tissues where this base www.nature.com/scientificreports/ ). In addition, mCpHs are mostly spatially correlated with mCpGs [45][46][47] , a tendency that we also observed and within values closed to what has been described for brain and other tissues (i.e. below 1%) 47 . Of note, using such all-context 5mC model, we were able to find discrete differences in cancer and in particular in HNC, both at the CpG and the CpH level. Future improvements of the models used to call methylation may validate these findings while ruling out additional base modifications, a requirement for considering such changes as potential clinical biomarkers and key targets for study of their biological functions. We provide the first comprehensive characterization of mtDNA methylation in a diversity of sample types and biological conditions, using nanopore long read sequencing. We were able to use this novel tool to detect the methylation patterns along 16 kb reads spanning the entire mitochondrial chromosome with deep coverage of > 1000 × on native DNA. In doing so, we have produced a map of mtDNA 5mC, that has eliminated any introduced bias from bisulfite conversion and PCR amplification.

Methods
Cell culture, maintenance and differentiation. HepaRG cells were cultured in Williams media enriched with 10% Fetal calf serum clone II, 1% Penicillin/Streptomycin, L-glutamine (2 mM), insulin (5 µg/ mL) and hydrocortisone (25 µg/mL). Proliferative HepaRGs were taken before reaching 50% confluence and differentiated HepaRGs were differentiated as previously described 20,27,28 . HEK293T, immortal cells derived from embryonic kidney were grown in tissue culture dishes (Falcon, Becton Dickinson) and cultured in DMEM 1X media containing 1% Penicillin/Streptomycin, 1% sodium pyruvate, 1% L-glutamine, 1% non-essential aminoacids, all from Life Technologies, and 10% fetal bovine serum (Eurobio Abcys). Primary human hepatocytes (PHH) were isolated from healthy liver tissue as previously described 48 . DNA was extracted from liver tissue, PHH in suspension (0 h in culture) and PHH after 24 h and 7 days in culture (experimental set up outlined in Fig. 5B) using Epicentre MasterPure DNA purification kit, according to manufacturer's instructions. In the timepoint PHH experiment, DNA extraction performed with Monarch HMW DNA extraction kit for tissue (#T3060 New England Biolabs) according to manufacturers instructions. DNA was sheared using G-Tubes (Covaris) to 20 KB, 500 ng was taken forward for Rapid barcode sequencing library prep, before sequencing on a MinION device.
Induction of oxidative stress. HEK293T cells were treated with hydrogen peroxide (H2O2) (Sigma-Aldrich, 216763) at a concentration of 500 μM for 2 h, alone or in combination with 5 mM N-acetyl-cyteine (NAC) (Sigma-Aldrich, A7250). When using NAC, cells were pre-treated for 2hrs with 5 mM NAC. The mitochondrial superoxide indicator stain MitoSOX (ThermoFisher, M36008) was used to probe the relative oxidative stress in live cells. Cells were stained with 1uM MitoSox diluted in DMEM. 250,000 cells were incubated with 330 ul for 30 min and analyzed by flow cytometry, then washed with PBS and trypsinized. Flow cytometry tubes were kept on ice and in the dark until use. Flow cytometry analysis was performed with a FACSCalibur (BD Biosciences). The mean fluorescence intensity of minimum 10,000 stained cells and unstained control cells were recorded and plotted for analysis. Alternatively, MitoSOX was analyzed by epifluorescence microscopy (Zeiss, Axio Observer).
Tissue samples. Human biological samples and associated data were obtained from the "Tissu-Tumorothèque Est" (CRB-HCL, Hospices Civils de Lyon Biobank, BB-0033-00046) and the "Centre de Ressources Biologiques" (CRB Centre Léon Bérard, BB-0033-00050, Lyon France). This study was approved by the ethical review board of Centre Léon Bérard. The material used in the study has been collected in agreement with all applicable laws, rules, and requests of French and European government authorities, including the patient's informed consents.
The brains of P13 OF1 mice were harvested and placed on ice-cold Dulbecco's Phosphate Buffered Saline (DPBS) for microdissection. The brain was placed on a Mouse Brain Matrix (Zivic Instruments) and coronal section of the forebrain cut at 500 µm. The dorsal subventricular zone (SVZ) was microdissected from 3 to 4 consecutive sections under a binocular microscope and collected in an Eppendorf tube containing ice-cold DPBS before being centrifuged and stored at -20 °C.
For human and mouse samples, DNA was extracted using the Epicentre MasterPure DNA purification kit, according to manufacturer's instructions. Subcellular fractionation and mtDNA extraction. Subcellular fractionation was performed as previously described 49  www.nature.com/scientificreports/ (Wheaton, USA) with a loose and tight pestle (100 strokes with each pestle). Cells were observed under microscope (Axiovert 40C, Zeiss) with trypan blue dye to assess cell membrane disruption followed by centrifugation at 500 g for 5 min at 4 °C. The supernatant was collected before centrifugation at 10 000 g for 30 min at 4 °C. DNA extraction (Nucleospin Tissue, Macherey-Nagel) was performed on the resulting pellet according to manufacturer instructions. mtDNA was digested using BamHI HF (New England BioLabs) in order to linearize the mtDNA genome.
Fully unmethylated and fully methylated controls. After mtDNA enrichment and linearization, we prepared a negative (FU = fully unmethylated) control sample from differentiated HepaRG mtDNA by performing whole genome amplification using a repliG kit (Qiagen) according to manufacturer's instructions. After amplification, a positive control for methylation (FM = fully methylated) was prepared. Briefly, CpG dinucleotides were methylated by incubating 1 µg of DNA with S-Adenosyl methionine (SAM) ( Bioinformatic analyses. Our analyses are divided in two parts, our initial CpG methylation pipeline based on running Nanopolish in parallel to Guppy + Medaka for each sample, and the second one based on Megalodon to call 5mC in all contexts (i.e. CpG and CpH). For our initial CpG methylation analyses, basecalling was performed with Guppy version 4.0.15 (ONT). We first determined the methylation status of each CpG site on every read by using the widely used tool, nanopolish 23 used recently by 51 . For validation, we also called DNA methylation using the novel tool, Medaka (https:// github. com/ nanop orete ch/ medaka). Medaka is a tool to create a consensus sequence from nanopore sequencing data. This task is performed using neural networks applied from a pileup of individual sequencing reads against a draft assembly. It outperforms graph-based methods operating on basecalled data, and can be competitive with state-of-the-art signal-based methods, whilst being much faster. Both tools have been recently benchmarked 22 . PycoQC was used for data inspection and quality control (https:// github. com/a-slide/ pycoQC), and methplotlib (https:// github. com/ wdeco ster/ methp lotlib) for read-level visualizations. Called CpG sites in the FU control were used to determine a baseline of methylation. The following calculation was utilised: FalsePositiveRate = [#called methylated cytosines in FU/#called cytosines in FU].
To call 5mC in all contexts, we first demultiplexed the reads at the raw (fast5) level using Deepbinner (v.0.2.0) 52 , before basecalling and extracting base modification information using the most recent model from the Rerio repository (https:// github. com/ nanop orete ch/ rerio) implemented through the Megalodon tool (v.2.2.9) (https:// github. com/ nanop orete ch/ megal odon). Megalodon extracts high accuracy modified base and sequence variant calls from raw nanopore reads by anchoring the information rich basecalling neural network output to a reference genome. Specifically, we used the corresponding MinION (res_dna_r941_min_modbases_5mC_v001. cfg v.4.2.2) and PromethION (res_dna_r941_prom_modbases_5mC_v001.cfg v.4.2.2) models, with a -modbinary-threshold of 0.8, as recommended. To validate CpH analyses, we used Megalodon (as described above) to re-basecall raw fast5 data from a published dataset 53 (https:// www. ebi. ac. uk/ ena/ brows er/ view/ PRJEB 33258). As these libraries were prepared after cas9-targeting of selected genes (3 gRNA designs, that resulted in 19 hg38 genomic regions), we pooled the reads from several samples to be able to plot as enriched heatmaps (Fig. 6D,E). www.nature.com/scientificreports/ For differential methylation analyses we used DSS (Dispersion shrinkage for sequencing data) 54 adapted for nanopore sequencing 51 . Briefly, DSS tests for differential methylation at single CpG-sites, using a Wald test on the coefficients of a beta-binomial regression of count data with an 'arcsine' link function. In order to set minimum requirements for DSS analysis, an internal comparison of biological replicates of differentiated HepaRG cells was undertaken. From this we were able to better understand the background and determine the minimum smoothing and delta values. These values were set at a delta of 0.05 with minimum P-value of 0.05. For transcription factor binding site analyses, we used the bioconductor packages MIRA 21 for methylation data aggregation, and LOLA for dataset selection 55 . Mann-Whitney's test was used for pairwise comparisons of 5mCpG distribution.

Data availability
Datasets generated during the current study have been uploaded to the GEO repository (fastq files and medakaextracted methylation tables have been submitted under accession #GSE158224 and # GSE159582; submitter: H Hernandez-Vargas).