Abstract
Somatic cells accumulate genomic alterations with age; however, our understanding of mitochondrial DNA (mtDNA) mosaicism remains limited. Here we investigated the genomes of 2,096 clones derived from three cell types across 31 donors, identifying 6,451 mtDNA variants with heteroplasmy levels of ≳0.3%. While the majority of these variants were unique to individual clones, suggesting stochastic acquisition with age, 409 variants (6%) were shared across multiple embryonic lineages, indicating their origin from heteroplasmy in fertilized eggs. The mutational spectrum exhibited replication-strand bias, implicating mtDNA replication as a major mutational process. We evaluated the mtDNA mutation rate (5.0 × 10−8 per base pair) and a turnover frequency of 10–20 per year, which are fundamental components shaping the landscape of mtDNA mosaicism over a lifetime. The expansion of mtDNA-truncating mutations toward homoplasmy was substantially suppressed. Our findings provide comprehensive insights into the origins, dynamics and functional consequences of mtDNA mosaicism in human somatic cells.
Similar content being viewed by others
Main
Genomic alterations accumulate in somatic cells throughout an individual’s lifetime1,2,3,4,5. Recent sequencing studies have documented mutations in the nuclear genome and frequent clonal competition of normal cells carrying mutations6,7,8,9,10,11. However, the landscape of mitochondrial DNA (mtDNA) mosaicism in normal human tissues remains unexplored.
Mitochondria are organelles involved in energy metabolism, cell signaling, apoptosis and biosynthesis12,13,14,15,16, carrying their own 16.6 kb-long, circular DNA17. mtDNA mutations can be acquired somatically during development and aging18,19,20,21,22, shaping the genetic mosaicism in somatic tissues23,24,25. Generally, revealing somatic mosaicism is challenging, as most acquired alterations are confined to a single or a tiny fraction of cells in an individual body26. Capturing mtDNA mosaicism is more complex than in nuclear DNA (nDNA), as a cell contains hundreds to thousands of mtDNA copies, and newly acquired mtDNA mutations would only be confined to a small fraction of mtDNA copies even in a single cell12. Recently, the single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) has been applied to reveal mtDNA mutations in single cells27,28, but insufficient mtDNA depth per cell has disallowed sensitive profiling29.
Most of our understanding of acquired mtDNA alterations has been derived from cancer studies30,31,32,33,34,35. In this study, we aimed to investigate the whole-genome sequences (WGSs) from 2,096 colonies expanded from healthy (nontumor) single cells (hereafter referred to as clones)5,6,7,8. This approach enabled the sensitive and accurate detection of single-cell mtDNA variants in multiple cells of an individual. Using this approach, we traced the origin of heteroplasmic mtDNA variants, the absolute rate of mtDNA mutations and the dynamics of age-dependent changes in heteroplasmy levels in somatic lineages.
Results
Landscape of mtDNA heteroplasmy in normal cells
We explored 2,096 WGSs of clones expanded from nonneoplastic healthy single cells collected from the colorectal epithelium (431 crypts from 20 individuals)6, fibroblasts (334 cells from 7 individuals)5 and hematopoietic stem and progenitor cells (HSPCs; 1,331 cells from 4 individuals)7,8 (Fig. 1a and Supplementary Tables 1 and 2). In addition, we analyzed 31 WGSs from tumors, including 19 matched colorectal carcinoma bulk tissues from individuals who donated normal colorectal clones and 12 clones established from adenomatous polyps from one individual with MUTYH-associated polyposis6.
Using the variant allele frequencies (VAFs) of the somatically acquired mutations in nDNA, we verified the clonality of the clones (Extended Data Fig. 1a). The average mtDNA read-depth was 6,931× from normal clones (188× to 40,421×; Extended Data Fig. 1b,c), allowing for robust assessment of mtDNAs in a single clone to a heteroplasmy level of ~0.3%. For more systematic analysis, we established and applied a locus-specific background noise matrix (Methods; Extended Data Fig. 1d–g and Supplementary Table 3). To trace the developmental origin of mtDNA alterations, we constructed the early embryonic phylogeny of the clones using shared somatic nDNA mutations3,5. Of note, the first branching in each phylogeny was close to the first cell division in life, as reported previously3,5, given the VAFs of lineage-defining variants in the matched bulk blood tissues (Extended Data Fig. 2 and Supplementary Note 1).
Overall, we identified 6,451 mosaic mtDNA base substitutions and insertions and deletions (InDels) from the normal clones, revealing an average of 3.1 mtDNA alterations per clone (Fig. 1b and Supplementary Table 4). Most clones (92.4%; 1,937 of 2,096) exhibited one or more mtDNA alterations, and approximately 18% of the clones (383 of 2,096) carried one or more nearly homoplasmic mtDNA alterations (defined as VAF > 90%). We believe that VAFs of mtDNA alterations in each clone (referred to as clone-VAFs hereafter) are approximate to original levels in the clone’s founder cell, as clone-VAFs were overall consistent throughout cell culture (Extended Data Fig. 3a–c and Supplementary Note 2). Additionally, direct genome sequencing of colorectal crypts obtained via laser-capture microdissection (LCM) revealed a highly similar mtDNA mutational landscape, indicating minimal culture-associated bias in mutational diversity (Extended Data Fig. 3d–g and Supplementary Note 2).
The spectrum of mtDNA base substitutions was predominantly composed of transitions (C:G>T:A and T:A>C:G base substitutions; collectively 95%; Fig. 1c). These alterations exhibited an extreme level of replication-strand asymmetry, as previously observed in cancers31,32. Generally (outside the heavy strand replication origin; m.192-16,196), mutated cytosine bases of C:G>T:A alterations were predominantly on the heavy strand (92.5%), despite the scarcity of cytosines on the strand (ncytosine:nguanine = 1:2.4; Fig. 1d). Similarly, mutated thymine bases of T:A>C:G alterations were predominantly on the light strand (63.4%), despite their relative rarity on the strand (nthymine:nadenine = 1:1.3; Fig. 1d). Additionally, the strand asymmetry was reversed within the replication origin (m.16,197-191; Fig. 1c,e), where the bidirectional mtDNA replication process is operative36,37. These collectively suggest that mtDNA variant acquisition is tightly coupled with the strand-asymmetric mtDNA replication process, as speculated previously31. However, the strand asymmetry was not completely uniform across cell types (P = 3.3 × 10−52, Pearson’s chi-squared test; Fig. 1d), implying that the mtDNA replication processes may be slightly different across cell types.
We occasionally observed localized acquisition of multiple mtDNA variants32. For example, 12 substitutions, with similar clone-VAFs (1.1–2.5%), were detected in a fibroblast clone (Fig. 1f). These were predominantly T:A>C:G substitutions (11 of 12), and six of them were enriched in a localized region (m.7,318-8,388) with direct evidence of coclonality in phasing, suggesting that a single mutational hit may create multiple mutations in mtDNA, like kataegis in nDNA38.
Two origins of mtDNA alterations
Using shared patterns in the developmental phylogenies and tissues, we categorized the origin of mosaic mtDNA alterations into the following two main groups: (1) heteroplasmy in the fertilized egg (termed HetFE variants; n = 409 alterations, 153 events when collapsed) and (2) postzygotic mutations acquired in somatic lineages (termed postzygotic mutations; n = 6,042; Fig. 2a,b and Extended Data Fig. 4a). Briefly, consistent with their presence from the first cell of life, HetFE variants were shared by multiple clones and/or tissues in a particular individual. In contrast, postzygotic mutations were predominantly confined to one or a few clones (n = 5,652; either as singletons (n = 5,276) or coincidentally recurrent mutations (n = 376); referred to as postzygotic simple (PZsimple) mutations). A small subset of postzygotic mutations (n = 390 from 32 mtDNA sites) were recurrent across multiple clones and not confined to a specific donor (referred to as postzygotic recurrent (PZrecurrent) mutations), suggesting a higher mutation rate at these sites compared to other mtDNA loci.
mtDNA heteroplasmy in the fertilized egg
Annotating mtDNA mosaicism with early developmental phylogenies enabled us to capture HetFE variants5. For example, m.16,400 C>T substitution was shared by 14 fibroblast clones (51.9% of 27 clones) established from DB2 (Fig. 2c). Despite its high prevalence in DB2, the variant was extremely rare in clones from other donors (0.1%; two of 2,069 clones). Similarly, m.7,496 T>C substitution was recurrently but exclusively observed in HC19, including three normal colorectal clones (13.0% of 23 clones) and their matched colorectal cancer tissue (Fig. 2d). In both cases, the mutant clones converged at the first node of each phylogeny (Fig. 2c,d). These patterns strongly suggest that the most recent common ancestor (MRCA) cell, possibly the fertilized egg, carried the heteroplasmic variants. Consistent with their pregastrulation timing, these variants were also found in matched bulk blood tissues with substantial VAFs (0.584 and 0.149, respectively; Fig. 2c,d).
Overall, we categorized 153 variant events as HetFE variants (Supplementary Table 5). They include 391 shared variants by multiple clones in an individual (6.1% of the total mtDNA variants; 135 events when collapsed) and 18 singleton variants in clones but shared by matched blood tissues. These variants were twofold enriched in the D-loop (m.16,024-576) and 1.5-fold depleted in the rRNA regions (m.648-1,601 and m.1,671-3,229) compared to PZsimple mutations39 (P = 0.0031 and 0.0363, respectively, two-sided Fisher’s exact test; Extended Data Fig. 4b).
Then, we inferred the original heteroplasmy levels in the fertilized egg of HetFE variants. Notably, we observed that the average clone-VAF value of a HetFE variant across all clones from a donor (referred to as clone-averaged VAF (caVAF); Extended Data Fig. 4c) closely correlated with the heteroplasmy level in the matched polyclonal blood tissue (R = 0.967, P = 2.3 × 10−16, Pearson’s correlation; Fig. 2e). We speculated that a plausible mechanistic link between these two independent values was the heteroplasmy level in its origin; although clone-VAFs of a HetFE variant may fluctuate across clonal lineages with aging, the average (caVAF) would remain overall stable from the original heteroplasmy level, consistent with our computational simulation (Extended Data Fig. 4d). Similarly, as the VAF from the bulk blood tissues (blood-VAF) inherently represents an averaged VAF among many polyclonal blood cells, it should also closely reflect the initial heteroplasmy level. We extended our speculation to the correlation of VAFs of heteroplasmic mtDNA variants between buccal–buccal and/or buccal–blood tissues in 19 monozygotic twins (Extended Data Fig. 4e,f). Therefore, we used caVAF as a proxy for the heteroplasmy level in the fertilized egg of a HetFE variant (Supplementary Note 3).
Most donors (80.6%; 25 of 31) carried one or more HetFE variants with caVAFs over 0.03% (Fig. 2f). Twelve individuals (39%) had HetFE variants with substantial caVAFs (>4%). As expected, the statistical power for capturing HetFE variants was associated with the number of clones in a donor. For example, a HetFE variant was identified with caVAF as low as 0.047% from HC02 (22 clones). In contrast, the minimum caVAF of a HetFE variant was sevenfold lower (0.0067%) in KX008 (364 clones). Considering the detection sensitivity, we profiled the average landscape of HetFE variants, which showed ~2 HetFE variants over 0.5% heteroplasmy level per fertilized egg (Fig. 2f).
Notably, we believe that the actual number of HetFE variants is higher than we observed, as our detection thresholds were ~0.02% for most donors. Given that a fertilized egg typically contains ~100,000 mtDNA copies40, HetFE variants detectable in this study should be shared by at least 20 mtDNA copies in the first cell, and those restricted to a smaller number of mtDNA copies would likely be undetectable. In addition, we speculate that the origin of most HetFE variants found in this study was the maternal germline rather than new acquisitions in the fertilized egg, as newly acquired mutations would be restricted to a single mtDNA copy.
To validate our findings, we explored WGSs from bulk blood tissues of 294 families (including 407 mother–offspring pairs)41. We discovered 425 heteroplasmic variants (>0.5% VAF) in the polyclonal blood of offspring, which are most likely HetFE variants in offspring (Fig. 2g). We further found that ~20% of the variants were heteroplasmic in the polyclonal blood of the mother (likely HetFE variants of the mother; Fig. 2g,h, Extended Data Fig. 4g and Supplementary Note 4). Our findings collectively indicate that (1) mtDNA heteroplasmy in the fertilized egg is not rare, likely being continuously generated in the germline; (2) a substantial fraction of HetFE variants are transmitted to the next generation39, despite the purification process during oogenesis in the maternal germline lineage42,43 and (3) these variants are one of the sources of mtDNA mosaicism observed in aged somatic cells.
mtDNA turnover and drift in somatic lineages
The distribution of clone-VAFs of a HetFE variant among the clones exhibited pressures that were shifting them to both extremes (0% or 100%) from the initial heteroplasmy level (Fig. 3a; two examples in Fig. 2c,d). For instance, the m.16,256 C>T mutation in DB10 (39 clones), which had a caVAF of 0.32, was observed as homoplasmic in 11 clones (28.2%) and almost wild type in 25 clones (64.1%; Fig. 3b). Two underlying possible scenarios include the following: (1) early embryonic bottleneck during progressive mtDNA copy number reduction in the cleavage of early embryogenesis44,45 and (2) lifetime drift through the continuous mtDNA turnovers in each somatic lineage for a lifetime46,47 (Fig. 3c).
The foundation of the early embryonic mtDNA bottleneck is caused by the lack of mtDNA replication until a certain stage of embryogenesis48,49 (Fig. 3c). If each embryonic cell has one or only a few mtDNA copies at a certain stage, the heteroplasmy level can be quantized according to the composition of founder mtDNAs in each embryonic cell.
In parallel, mtDNAs are lost and newly replicated in somatic lineages12,50,51 (for example, cell-cycle-dependent mtDNA duplication and random segregation by half in two daughter cells in dividing cells (mitotic turnover) or cell-cycle-independent homeostatic mtDNA replacement in nondividing cells (homeostatic turnover); Fig. 3c). The processes can slightly drift heteroplasmy levels continuously over time, generating a substantial impact in a lifetime. Of these two nonexclusive scenarios, our observations indicate that the lifetime drift is dominant.
First, purification of HetFE variants was age-dependent or much weaker in clones from young donors (for example, clones established from an aborted 19-week-old fetus; Fig. 3d,e and Extended Data Fig. 5a). This suggests that purification was not fixed in the early stages of human life. Second, sister clone pairs that branched out at a later time point did not exhibit more similar heteroplasmy levels of a HetFE variant than clone pairs that diverged earlier (Fig. 3f). For example, clone pairs that had an MRCA cell at the ~30th cell generation, which was much later than the early embryonic bottleneck, showed tremendous heterogeneity in clone-VAFs of a HetFE variant (Fig. 3f).
Finally, the computational simulation suggested that the lifetime drift model alone was sufficient to explain the skewed distribution of clone-VAFs in a HetFE variant. Simulation studies using the mitotic turnover model (Extended Data Fig. 5b,c) indicated that 1,440 rounds of mtDNA mitotic turnovers shifted a HetFE variant with 10% initial heteroplasmy level (caVAF) to homoplasmy (100%) in ~10% of the clones when clones had 750 basal mtDNA copy numbers (a turnover was defined as replication of an mtDNA for n times, where n is the basal mtDNA copy number in a somatic cell; Fig. 3g). Likewise, simulations assuming homeostatic turnover (Extended Data Fig. 5d,e) suggested a similar conclusion, but ~50% of rounds were necessary for a similar effect under the same conditions (Supplementary Note 5).
Based on the clone-VAF distributions of HetFE variants, the maximum likelihood mtDNA turnover rates across cell types were inferred (14.3, 20.8 and 17.9 mitotic turnovers per year, or 6.5, 11.5 and 9.4 homeostatic turnovers per year for the colon epithelium, fibroblasts and HSPCs, respectively; Fig. 3h). Although we believe that mitotic and homeostatic mtDNA turnovers are predominant mechanisms for colorectal epithelium and fibroblasts, respectively, their relative balance between two turnover models in each cell type is uncertain.
Postzygotic mtDNA mutations
Of the 2,096 clones, 6,042 mtDNA variants (93.7% of all the variants) were categorized as postzygotic mutations, newly acquired from each somatic lineage. As mentioned above, 32 mtDNA loci showed an elevated mutation rate with 390 PZrecurrent variants (Supplementary Table 6 and Supplementary Note 6). These mutations were predominantly located in the hypervariable regions of the D-loop, homopolymer sequences or both33,52 (Extended Data Fig. 6a). Interestingly, mutations in a hotspot (m.414 T>G) were recurrently found in clones with ultraviolet (UV) light exposure (estimated using UV-associated somatic mutations in the nDNA of a clone53), suggesting UV-dependent acquisition54,55 (Extended Data Fig. 6b).
Except for HetFE and PZrecurrent mutations, we detected 5,652 PZsimple mtDNA alterations. Unlike somatic mutations in nuclear genomes, it is challenging to absolutely count PZsimple mutations, as mutations with clone-VAFs below our detection threshold (~0.3%) would remain undetected. Indeed, the crude number of PZsimple mutations detected in clones was not substantially correlated with age (R = 0.282, P = 0.131, Pearson’s correlation; Fig. 4a). Instead, the overall heteroplasmy levels of PZsimple mutations in clones displayed stronger clock-like properties—PZsimple mutations with higher clone-VAF were more frequent in aged donors than young donors (Fig. 4b), and the sum of the clone-VAFs of all detected PZsimple mutations in a clone (referred to as SVAF) showed more measurable characteristics. For example, in an older individual (DB8; 93 years old), 55% of clones (26 of 47) had an SVAF of ~1.0 by 1–3 clone-specific PZsimple mutations (Fig. 4c). In contrast, in a young individual (HC10; 37 years old), all clones exhibited an SVAF far below 1.0 (0.55 versus 0, P = 5.2 × 10−6, two-sided Fisher’s exact test; Fig. 4c). Of note, there was no significant difference in the crude number of PZsimple mutations between the clones of the two individuals (Extended Data Fig. 6c). The average SVAF in the clones of an individual exhibited a strong positive correlation with age (Extended Data Fig. 6d). The correlation became stronger when the age of individuals was converted to turnover numbers from birth using the cell-type-specific turnover rates estimated from HetFE variants (R = 0.787, P = 2.5 × 10−7, Pearson’s correlation; Fig. 4d).
Interestingly, we observed that a few fibroblast clones with a higher amount of lifetime UV-light exposure exhibited a higher SVAF of PZsimple mutations than those with a lower amount of lifetime UV-light exposure (P = 7.7 × 10−4, two-sided Fisher’s exact test; Fig. 4e). This indicates that UV exposure accelerated mtDNA turnovers in the cellular lineage. We speculate that UV exposure damages mtDNA, followed by mtDNA degradation and triggering additional mtDNA replications for their replacement21. Of note, the mtDNA mutational signatures in clones with a higher UV exposure were similar to the other clones (Extended Data Fig. 6e), indicating that UV light does not directly lead to PZsimple mutations fixed in mtDNA.
With the mtDNA turnover rates estimated using HetFE variants and the landscape of detectable PZsimple mutations, we estimated the absolute number of mtDNA alterations that are newly appearing in every mtDNA replication. In all individuals and both turnover models, the absolute mtDNA mutation rates converged to 5.0 × 10−8 alterations per base pair (bp) replication (Fig. 4f). Interestingly, our estimate was within the range of error rates of polymerase γ (POLG), the mitochondrial genome’s DNA polymerase56,57. The converged rate reassures that (1) endogenous mtDNA replication is the dominant process for mtDNA mutation acquisition in somatic cells31,58,59 and (2) both turnover models (and their turnover rates) are reliable. Given the ~750 mtDNA copies in a single somatic cell, our absolute mutation rate implies an average of 0.31 de novo PZsimple mtDNA alteration is acquired per daughter cell per cell division.
Selective pressure of mtDNA mutations in normal cells
To understand the selective pressure on PZsimple mutations, we calculated the dN/dS ratio60,61,62. The ratio of missense or truncating mutations to synonymous mutations was not substantially higher than mtDNA mutations randomly generated according to the mtDNA mutational signature, indicating general neutrality in mutation acquisition (Fig. 5a). However, truncating mutations exhibited lower clone-VAFs than synonymous mutations in all three cell types, with no mutations exceeding 90% clone-VAFs, suggesting constrained expansion of mtDNAs carrying inactivating mutations due to functional disadvantage when reaching homoplasmy (P = 0.0211, 0.0017 and 0.0013 for the colon epithelium, fibroblasts and HSPCs, respectively, two-sided Fisher’s exact test; Fig. 5b). These observations were consistent with previous observations in cancer tissues31,32.
Despite the expansion constraint, 15 truncating mutations displayed high clone-VAFs among the clones (clone-VAF > 0.6), accompanied by upregulated RNA expression levels of mtDNA genes (Fig. 5c,d). This phenomenon is likely attributable to a compensatory response where transcript degradation is inhibited when the protein product is dysfunctional63,64. The similarity in clone-VAFs between genome and transcriptome sequences indicates that this inhibitory effect does not distinguish between wild-type and truncated mtDNA (Fig. 5e).
We further compared the clone-VAFs of PZsimple mutations in genome and transcriptome sequences (Fig. 5e). Although most mtDNA mutations showed similar clone-VAFs in both, a subset of tRNA mutations exhibited elevated clone-VAFs in transcriptomes, which is consistent with a previous report65. In contrast, a subset of rRNA mutations showed reduced clone-VAFs in transcriptomes. These mutations were predominantly clustered within stem regions of tRNA and rRNA (P = 0.0158 and 0.0329 for tRNA and rRNA mutations, respectively, one-sided Wilcoxon test; Fig. 5f,g). We speculated that these mutations influence the stability and regulation of these RNAs, leading to tRNA accumulation and rRNA degradation65,66.
mtDNA copy number and structural variations (SVs) in normal cells
The average mtDNA copy number was ~750 per cell (per diploid nuclear genome), but large variations in mtDNA copy number were observed across clones, even in an individual (Fig. 6a). For example, mtDNA copy numbers among the clones of HSPCs from KX004 ranged from ~20 to 3,700. There was no apparent correlation between median mtDNA copy number and age (R = 0.127, P = 0.381, Pearson’s correlation). Notably, interclonal mtDNA copy number variations were less substantial in colorectal clones (Fig. 6a). Despite these variations, gene expression levels of mtDNA and nDNA genes were not substantially altered among the clones, suggesting that the mtDNA copy number is not a bottleneck for the transcription of mtDNA genes, at least at the resting stage (Extended Data Fig. 7a,b).
Two colorectal clones had notable SVs within their mtDNA (Fig. 6b and Extended Data Fig. 7c), with deletions of 10,951 bp and 3,389 bp, respectively, at approximately 45% heteroplasmy levels. As expected, gene expression levels in the deleted loci were lower than in the flanking regions (P < 0.05, Wald test; Extended Data Fig. 7d). Notably, these large deletions have been observed in cancers at a similar frequency32. Our findings illustrate that SVs can occur in normal clones67; however, these rare events involve only approximately 0.1% of normal cells.
Accelerated mtDNA turnover in tumorigenesis
In 19 matched colorectal cancer tissues, we observed, on average, more detectable mutations (5.3 versus 3.8; P = 0.0301, Wilcoxon signed rank exact test; Fig. 6c) and higher SVAF values (P = 8.5 × 10−4, Wilcoxon signed rank exact test; Fig. 6d) than normal clones from the same donor. Our findings suggest an elevated mtDNA mutation rate, turnover rate or both during tumor initiation and clonal evolution68. Consistent with this speculation, in 12 clones established from MUTYH-associated adenomatous polyps6, homoplasmic mtDNA mutations were more frequently observed in lineages with more driver mutations (Extended Data Fig. 7e).
We further investigated detectable PZsimple mutations in 70 colorectal carcinomas (19 matched and 51 unrelated colorectal cancers69; Supplementary Table 7). Qualitatively, colorectal cancers exhibited a notably higher prevalence of truncating mutations with >0.6 VAFs than normal clones (0.0203 versus 0.0026, P = 1.5 × 10−4, two-sided Fisher’s exact test; Fig. 6e). This finding suggests increased accumulation of deleterious mutations in colorectal cancers, as observed previously32.
Finally, compared to the mtDNA copy numbers in normal clones, 19 matched colon cancer tissues demonstrated biased copy number changes (per diploid nuclear genome) toward either gain or loss of mtDNA copies at face value (Fig. 6f). To gain insights into the mtDNA copy numbers in pure colon cancer cells without co-existing tumor microenvironmental cells, such as infiltrating lymphocytes, we correlated mtDNA copy numbers of cancer tissues with their tumor cell fractions estimated from genome sequences70 and found a strong positive linear relationship (R = 0.715, P = 5.7 × 10−4, Pearson’s correlation; Fig. 6g). Extrapolation of the regression line suggested ~1,266 mtDNA copies per diploid nuclear genome at 100% tumor cell fraction, which is 70% higher than in normal colorectal clones. Indeed, we confirmed an mtDNA copy number increase in colon cancer cells by WGSs of 14 colon cancer organoids (100% tumor cell fraction; 1,224 mtDNA copies per diploid cancer cell; Extended Data Fig. 7f). The underlying reason for the mtDNA copy number gain in cancer cells is uncertain.
Similarly, mtDNA copy numbers in cancer tissues were negatively correlated with the amount of infiltrating CD3+ T cells (Extended Data Fig. 7g). Genome sequencing of T cells sorted from the peripheral blood suggested that there were ~123 mtDNA copies per T cell (Extended Data Fig. 7f), which was close to the value extrapolated from the regression line (Fig. 6g).
Discussion
By leveraging WGSs derived from 2,096 healthy normal clones encompassing three different tissues, we elucidated the landscape of mtDNA mosaicism across single cells. Our system allowed the tracing of the embryonic origin of the mtDNA variants. Unlike the conventional wisdom of homogeneous mtDNA in fertilized eggs42,43, we conclude that human fertilized eggs frequently harbor heteroplasmic mtDNA variants, often showing substantial heteroplasmic levels (that is, VAF > 30%).
The detection of HetFE variants allowed the determination of one of two essential parameters contributing to the landscape of mtDNA mosaicism—the mtDNA turnover rates in somatic cells. Then, by applying the turnover rate to the landscape of PZsimple mutations, the other critical parameter, the absolute mtDNA mutation rate per mtDNA replication, was elucidated. Despite their importance in understanding mtDNA mutational dynamics in somatic cells, it has been challenging to decompose these two parameters individually, as both are intermingled. For example, mtDNA mutations without mtDNA expansion cannot be detected, and mtDNA expansion cannot be tracked without mtDNA mutations.
Our findings suggest that stochastic lifetime drift alone can shape the mtDNA heteroplasmy landscape observed in this study (Fig. 7a). The replication-strand-asymmetric mutational spectrum and constant mutation rate suggest that PZsimple mutations arise primarily through replication-associated mechanisms. Our lifetime drift model illustrates that 1,000 mitotic turnovers induce one of 750 mtDNA copies to have a completely purified mtDNA composition in ~30% of somatic cells (Fig. 7b and Supplementary Table 8) and result in homoplasmic PZsimple mutations in ~5% of somatic cells (Fig. 7c and Supplementary Table 9). The extent of lifetime drift generally decreases as the basal mtDNA copy number increases (Extended Data Figs. 8 and 9).
The acquisition of truncating mutations was not substantially constrained, but their drift to homoplasmy was repressed in somatic lineages due to functional disadvantages. The mtDNA-truncating mutations were more common in colorectal cancers than in normal tissues, indicating that cancer cells may depend less on functional mitochondria. Increased mutation counts, expansion and mtDNA copy numbers in colorectal tumors suggest that mtDNA dynamics may change with a potential impact during colorectal tumorigenesis.
When certain mtDNA mutations exceed specific heteroplasmy levels, they can cause mitochondrial dysfunction, a hallmark of aging71. Our study meticulously outlines the general landscape of mtDNA mosaicism in apparently normal cells and the forces shaping it throughout life. Similar but more comprehensive analyses with diseased and aged cells are warranted to provide more specific evidence of how mtDNA mutations contribute to phenotypic changes and disease development.
Methods
Sample cohort
Publicly available WGSs of single clones from four previous datasets were used—one for colorectal epithelium from our previous study (405 normal clones and 19 matched colorectal carcinomas from 19 individuals, 12 MUTYH-associated adenomatous clones from 1 individual)6, one for mesenchymal fibroblasts from our previous study (334 normal clones from 7 individuals)5 and two for HSPCs (1,331 clones from 4 individuals)7,8. WGSs of colorectal epithelium were established from single-crypt-derived organoids, and the others were generated from single-cell expanded clones. We included only clones with >0.4 VAF and >10 average depths in the nuclear genome to ensure clonality and quality.
To deeply understand the mtDNA heteroplasmy in early embryogenesis, we established 26 single-crypt-derived organoids of colorectal epithelium from one 19-week-old aborted fetus, as previously reported6. Genomic DNA materials were extracted using the DNeasy Blood and Tissue Kit (Qiagen). DNA libraries were generated using TruSeq DNA PCR-Free Library Prep Kits (Illumina) and sequenced on the NovaSeq 6000 platform. All the procedures in this study were approved by the Institutional Review Board of Korea Advanced Institute of Science and Technology (approval: KH2021-096), and informed consent was obtained from the parents of this individual.
In addition, to assess the prevalence of mtDNA variants across normal clones, we included 52 in-house WGSs of 13 individuals generated from organoids of various tissue types and 432 WGSs of 42 individuals generated from LCM patches of colorectal tissues72. To validate heteroplasmy profiles in the fertilized egg, we further explored 938 WGSs of bulk blood from 275 families41. To confirm that the observed VAF in bulk tissues matches the VAF in the fertilized egg, we examined 108 in-house WGSs obtained from cord blood and buccal swabs of 19 monozygotic twin families.
To understand the association between gene expression and mtDNA mutations, 312 whole-transcriptome sequences of colon clones from our previous study were also included6.
To compare mtDNA mutations between tumor and normal samples, we further explored 51 WGSs of colorectal carcinomas from the Pan-Cancer Analysis of Whole-Genome Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA)69. Findings from colorectal cancer genome sequences, including tumor cell fraction, tumor ploidy and driver mutations, were analyzed by CancerVision (Inocras)6,73. Additionally, we used 17 in-house WGSs to compare the mtDNA copy numbers of normal colon cells, T cells and colorectal cancer cells. For T cells (n = 3), we used samples obtained by sorting and clustering T cells followed by bulk sequencing. For colorectal cancer cells (n = 14), we sequenced colorectal cancer organoids with a tumor cell fraction of 1 to ascertain the mtDNA copy number exclusively from cancer cells.
Calling and filtering of mtDNA mutations
Sequenced reads were aligned to the human reference genome build 37 (GRCh37) using the BWA-MEM algorithm74. Duplicated reads were removed by Picard (available at https://broadinstitute.github.io/picard/), and reads mapped to the mitochondrial genome were extracted by SAMtools75. To be aware of misaligned reads due to nuclear-mitochondrial DNA segments (NUMTs), we only included paired reads that were (1) both mapped to mtDNA, (2) not chimeric aligned and (3) correctly oriented. mtDNA mutations were called using HaplotypeCaller2 (ref. 76) and VarScan2 (ref. 77), and any mutation detected by either one was added to the mutation sets for high sensitivity.
Mutations were then filtered out using the following criteria: (1) low mapping quality (<25); (2) low base quality (<15); (3) skewed average mutation position (<15% or >85% of supporting reads); (4) unbalanced ratio between forward and reverse supporting reads (<10% or >90%) and (5) five or more mismatches in supporting reads. Mutations in the regions with low complexity or a gap in the reference genome (m.3,107N) were explicitly discarded31:
-
(1)
Misalignment due to ACCCCCCCTCCCCC (rCRS 302-315)
-
(2)
Misalignment due to GCACACACACACC (rCRS 513-525)
-
(3)
Misalignment due to 3107N in rCRS (rCRS 3,105-3,109)
-
(4)
Misalignment due to ACCCCC (rCRS 16,182-16,187)
More strict criteria were applied to InDel mutations, so mutations with a high proportion of additional InDels in supporting reads (>50%) were filtered out. Furthermore, InDels within noisy regions, primarily attributed to C homopolymers, were excluded from the analysis—m.567, 955 and 5,894. When visually inspected using Integrative Genomics Viewer78, although long-read sequences had precise profiles, various types of InDels were detected at these loci in short-read sequences, making it challenging to identify mutations clearly.
A background noise matrix was generated for each locus of alternate alleles using all WGSs of normal clones to establish high-confidence mutation sets. Due to mtDNA’s repetitive nature, background noise rates vary across loci. We systematically measured VAFs within every normal clone and constructed VAF distributions for each locus and alternate allele, considering the background noise matrix. Then, we overlaid the called variant set onto the background noise matrix.
To determine the background noise criteria, we first calculated the average and s.d. of the VAFs in each locus, computing the one-sided 95% confidence interval. Clones with VAF beyond the interval were considered mutants in the locus. In parallel, VAFs in a specific position were sorted in ascending order, and gaps between adjacent VAFs were examined. We considered the quantum jump of the gap as the cutoff value between true signals and background noises. To this end, we calculated the relative gap between adjacent VAFs as follows:
where VAF and VAF′ denote the lower and higher adjacent VAFs from an mtDNA locus. When calculating the relative gap, we only use VAFs between 0.05% and 15%. After calculating the relative gap, we identified the adjacent VAFs, which showed the largest gap among those with a relative gap of 0.33 or higher. We considered this gap the boundary between the background noises and true signals. The actual threshold value was set as a smaller value between (1) the average of VAF and VAF′ or (2) VAF × 1.33. If a variant did not exceed this threshold even after being called, it was considered a false-positive and excluded from the variant set. Conversely, if a variant was not called but exceeded this threshold, it was considered a false-negative and rescued.
Classification of mtDNA alterations
mtDNA alterations shared in at least two clones of the same individual were classified into HetFE variants, PZrecurrent mutations and PZsimple mutations. If a shared mtDNA alteration was identified only in one individual, it was considered a HetFE variant. When an alteration was detected in two or more clones within a single individual but found in just a few clones within other individuals, the binomial test was used to provide a statistical framework for classifying the mutation, whether a HetFE variant or a PZsimple mutation. Concerning these mutations of interest, we used a maximum likelihood estimation method to estimate the probability of their occurrence by chance within each clone:
where n denotes the total number of normal clones, excluding the individual harboring the shared mutation, and xexp denotes the subset of these clones where the mutation was detected. The resultant \({\hat{P}}_{{\rm{ML}}}\) reflects the estimated rate at which the mutation spontaneously arises. By using a binomial distribution, we subsequently computed the probability that the shared mutation emerged by random chance within the individual:
where nobs and xobs represent the number of clones within the individual and the number of clones harboring the shared mutation in the individual, respectively. If the calculated probability was below 0.01, the mutation was categorized as a HetFE variant; otherwise, it was classified as a PZsimple mutation.
A mutation was categorized as a PZrecurrent mutation when observed in two or more individuals, each exhibiting a minimum of two clones carrying the identical mutation. Moreover, mutations found in at least ten individuals within the cohort of 86 individuals (31 individuals in the study and 55 other individuals from in-house WGSs or WGSs of LCM patches72) were designated as PZrecurrent mutations. Identifying PZrecurrent mutations within a specific tissue was contingent upon their notable prevalence among older individuals (Wilcoxon rank-sum test) and their simultaneous occurrence in multiple individuals within the same tissue.
Mutations exclusive to late-branched clones were designated as PZsimple mutations. Late-branched clones were defined by their common ancestor, accumulating at least 100 mutations before diverging. Particularly within the context of HSPCs, many clones exhibited branching events during postearly embryogenesis development. This necessitated a meticulous mutation assessment to avoid misclassifications as HetFE variants.
Fixation index of HetFE variants
Fixation index (FST), or Wright’s F statistics, is a statistical metric for quantifying genetic differentiation79. Using this, we computed the diversity of clone-VAF for each HetFE variant to assess the impact of the lifetime drift effect:
where \({\sigma }_{S}^{2}\) represents the variance in clone-VAFs among clones within an individual, \({\sigma }_{T}^{2}\) represents the variance in clone-VAFs across the entire cells of the individual and \(\bar{P}\) signifies the average clone-VAF across clones in the individual (caVAF), which approximates the average frequency in the entire cell population under the assumption of Hardy–Weinberg equilibrium. Notably, FST computations were exclusively conducted for HetFE variants with \(\bar{P}\) values exceeding 0.01.
Simulation framework for the mitochondrial turnover
We developed a computational algorithm to simulate lifetime drift in mtDNA heteroplasmy. mtDNA undergoes continuous random degradation and replication processes independently of cell division, even without cell division. If cell division occurs during this process, random segregation of mtDNA accompanies it. This series of processes, whereby mtDNA undergoes 100% refreshment within a cell, is defined as mitochondrial turnover. To reflect this nature of mtDNA and intuitively mimic mitochondrial turnover, we developed the following two turnover models: the mitotic turnover model and the homeostatic turnover model.
Both models involve random replication of mtDNA copies. In the mitotic turnover model, mtDNA copies increase one by one through random replication until the original amount is doubled and then halved randomly through mitosis. We define this series of processes as one mitotic turnover. In the homeostatic turnover model, random replication is iterated whenever one mtDNA copy undergoes degradation, ensuring constant mtDNA copy numbers in a cell. When this iteration occurs for the number of times equivalent to the total amount of mtDNA, we define it as one homeostatic turnover. Thus, while both models involve random replication, the mitotic turnover model integrates cell division. In contrast, the homeostatic turnover model focuses solely on mtDNA replacement. The scheme of two turnover models is illustrated in Extended Data Fig. 5b–e and discussed in detail in Supplementary Note 5.
Estimation of average turnover to fixation and turnover rate
We simulated how HetFE variants change as turnover repeats by lifetime drift. We set up wild-type and mutant mtDNA to co-exist within a single cell. Then, using the mitotic and homeostatic turnover models, we aimed to infer the average turnover count until fixation into a clone-VAF of 100% (homoplasmy) and the turnover rate.
While specific parameters such as mtDNA copy number (n) and VAF in the fertilized egg (P; caVAF) differ depending on the inference being made, the foundational structure of the model remains unchanged. The central assumptions of this model encompass (1) the constancy of the average mtDNA copy number within a cell per turnover, (2) the neutrality of mutant mtDNA about selection and (3) simultaneous turnovers occurring in 10,000 cells. In the case of the mitotic turnover model, one more assumption, the random segregation of mtDNA into two daughter cells with an equal amount, is added. At the commencement of the simulation, each cell initiated with a VAF of P, mirroring the condition observed in the fertilized egg. We conducted the mitotic and homeostatic turnover model simulations with the same simulation parameters. Each simulation persisted until 10,000 turnovers were executed, with clone-VAF per cell documented at each turnover.
To infer the average turnover count for fixation to homoplasmy, n was set to 750, and P spanned the range of (0.1, 0.2, …, 0.8, 0.9). Subsequently, in each of the 10,000 individual cells, the number of turnovers required for mtDNA variants to attain homoplasmy was determined. The average number of turnovers required for fixation at each parameter P was computed across all 10,000 cells.
Regarding the inference of turnover rate, we selected HetFE variants with the caVAF exceeding 0.005 for simulation. The value of n was established as the observed average mtDNA copy number for the individual, while P was determined as the caVAF. Both mitotic and homeostatic turnover models underwent 10,000 simulations for each unique parameter set. In each iteration, cells were randomly selected 100 times to replicate the sequencing process, considering the specific number of clones within the individual. This yielded a total of 25 summary statistics for each simulation—the count of cells within specified clone-VAF ranges (0.5–5%, 5–10%, …, 90–95%, 95–100%), mean clone-VAF, s.d. of clone-VAF, the proportion of cells categorized as wild type (clone-VAF < 5%), heteroplasmic (5% < clone-VAF < 90%) and homoplasmic (clone-VAF > 90%). We then compared the summary statistics derived from each simulation to the observed summary statistics. The mitotic and homeostatic turnover rates were estimated by minimizing the mean squared error (MSE). The estimation of the turnover rate was performed for each specific tissue type.
Estimation of mtDNA mutation rate
We used the estimated turnover rate to infer the mtDNA mutation rate. The mtDNA mutation rate was inferred through simulations conducted separately for the mitotic and homeostatic turnover models. The core algorithm parallels the model elucidated earlier but is oriented toward simulating PZsimple mutations. This model comprises the following three key parameters: mtDNA copy number (n), total turnover count (g) and mutation rate (r). For each individual, this simulation was performed with n set to the individual’s observed average mtDNA copy number and g calculated as the product of the estimated tissue-specific turnover rate and the individual’s age. The tissue-specific turnover rate for the parameter g is determined based on whether this simulation corresponds to the mitotic or homeostatic turnover models. The logarithm (base 10) of r was sampled from a uniform distribution spanning the range of (−9, −3), corresponding to a minimum r of 1 × 10−9 mutations per bp and a maximum r of 1 × 10−3 mutations per bp. The number of mutations occurring within a cell was drawn from a Poisson distribution with a parameter lambda set to r × 16,569 for each replication event. These mutations were then introduced into the mtDNA of each cell. Following g turnovers, cells were randomly chosen ten times to simulate the sequencing process, considering the specific number of clones within the individual.
This entire simulation was iterated 1,000 times, resulting in 10,000 simulation outcomes per individual. For each simulation, a set of 22 summary statistics was generated—the count of cells with the maximum clone-VAF falling within specified clone-VAF ranges (0.5–5%, 5–10%, …, 90–95%, 95–100%), the count of cells with two homoplasmic mutations (clone-VAF > 90%) and the count of cells with three homoplasmic mutations (clone-VAF > 90%). Subsequently, the summary statistics from each simulation were compared to the observed summary statistics. The mutation rates of each mitotic and homeostatic turnover were estimated by minimizing the MSE.
Modeling the mtDNA dynamics
We conducted simulations in the mitotic and homeostatic turnover models to explore how mtDNA and heteroplasmy levels of PZsimple mutations change due to lifetime drift. Initially, we determined the mtDNA copy number (n) in a single cell and labeled each n mtDNA molecule differently to allow for individual tracking. Subsequently, we simulated turnovers, recording the changes in the frequency of each mtDNA with each turnover. We then identified the most abundant mtDNA frequency in a cell. This process was simultaneously performed in a total of 10,000 cells, and the distributions were plotted for each turnover based on the most abundant mtDNA frequency in each cell.
Then, we induced PZsimple mutations in the abovementioned process using the mtDNA mutation rate we inferred. Subsequently, we tracked how the clone-VAF of PZsimple mutations changed with each turnover and determined the highest clone-VAF of PZsimple mutations in a cell. This process was also conducted simultaneously in 10,000 cells, and distributions were plotted for each turnover based on the highest PZsimple mutation clone-VAF in each cell.
Each simulation was conducted using five different mtDNA copy numbers (500, 750, 1,000, 1,500 and 2,000) in two turnover models—the mitotic and homeostatic turnover models.
Selective pressure on mtDNA
mtDNA’s evolutionary history in germline cells exhibits a notable bias toward missense mutations31. We aimed to compute the dN/dS ratio for each individual’s unique mtDNA sequence and probabilistically assess the likelihood of these mutations occurring randomly80,81.
To this end, we simulated the null neutrality hypothesis by introducing random mutations into individual-specific mtDNA sequences. These sequences accounted for the interindividual variability in germline mutations. We then quantified the possible occurrence of synonymous, missense and truncating mutations within each mtDNA. Using computational simulation, we generated random mutations with the exact mutation count in the individual’s observed data and then annotated the functional consequences to calculate the simulated dN/dS ratio60,61,62. This simulation was iterated 10,000 times across all individuals, yielding a null distribution of dN/dS values for each individual under the neutrality assumption. These outcomes were further aggregated for individuals within the same tissue types. Ultimately, we compared the dN/dS ratios observed for missense and truncating mutations to the null distribution of simulated dN/dS values for each tissue type. We subsequently evaluated the probability of the observed dN/dS ratios occurring by chance.
Inference of mtDNA copy number
We estimated the mtDNA copy numbers per clone using the below formula:
where coveragemtDNA and coveragenDNA denote the mean coverage depth of mtDNA and the mean coverage depth of nDNA, respectively. The ploidy was fixed at two regardless of normal clone or cancer tissue to obtain more reliable mtDNA copy numbers regardless of tumor cell fraction and ploidy values. The mean mtDNA and nDNA coverage depths were computed using mosdepth82 and an in-house script.
Statistics and reproducibility
No statistical method was used to predetermine the sample size. No data were excluded from the analyses. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
Whole-genome sequencing data used in the study are publicly available5,6,7,8 at the European Genome-Phenome Archive (EGA) with accessions EGAD00001007032, EGAD00001010183, EGAD00001004086 and EGAD00001007851. Whole-genome sequencing data of normal colorectal epithelium and fibroblast clones, extracted from the mitochondrial genome, are deposited in the EGA with accession EGAS50000000254 and available for general research use. The base substitutions and InDels identified in the mtDNA are available in Supplementary Table 4. The human reference genome, GRCh37, is available at https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_000001405.13. Source data are provided with this paper.
Code availability
The open-source software and tools used in this study are detailed in Methods. Custom scripts for simulations and figures, written in Python (v3.7.0) and R (v4.1.3), are available on GitHub at https://github.com/jisong-an/mtDNA_mosaicism.
References
Stratton, M. R., Campbell, P. J. & Futreal, P. A. The cancer genome. Nature 458, 719–724 (2009).
Ju, Y. S. et al. Somatic mutations reveal asymmetric cellular dynamics in the early human embryo. Nature 543, 714–718 (2017).
Coorens, T. H. H. et al. Extensive phylogenies of human development inferred from somatic mutations. Nature 597, 387–392 (2021).
Moore, L. et al. The mutational landscape of human somatic and germline cells. Nature 597, 381–386 (2021).
Park, S. et al. Clonal dynamics in early human embryogenesis inferred from somatic mutation. Nature 597, 393–397 (2021).
Nam, C. H. et al. Widespread somatic L1 retrotransposition in normal colorectal epithelium. Nature 617, 540–547 (2023).
Lee-Six, H. et al. Population dynamics of normal human blood inferred from somatic mutations. Nature 561, 473–478 (2018).
Mitchell, E. et al. Clonal dynamics of haematopoiesis across the human lifespan. Nature 606, 343–350 (2022).
Martincorena, I. et al. Tumor evolution. High burden and pervasive positive selection of somatic mutations in normal human skin. Science 348, 880–886 (2015).
Martincorena, I. et al. Somatic mutant clones colonize the human esophagus with age. Science 362, 911–917 (2018).
Abby, E. et al. Notch1 mutations drive clonal expansion in normal esophageal epithelium but impair tumor growth. Nat. Genet. 55, 232–245 (2023).
Stewart, J. B. & Chinnery, P. F. The dynamics of mitochondrial DNA heteroplasmy: implications for human health and disease. Nat. Rev. Genet. 16, 530–542 (2015).
Chandel, N. S. Evolution of mitochondria as signaling organelles. Cell Metab. 22, 204–206 (2015).
Nunnari, J. & Suomalainen, A. Mitochondria: in sickness and in health. Cell 148, 1145–1159 (2012).
Picard, M. & Shirihai, O. S. Mitochondrial signal transduction. Cell Metab. 34, 1620–1653 (2022).
Hengartner, M. O. The biochemistry of apoptosis. Nature 407, 770–776 (2000).
Anderson, S. et al. Sequence and organization of the human mitochondrial genome. Nature 290, 457–465 (1981).
Elson, J. L., Samuels, D. C., Turnbull, D. M. & Chinnery, P. F. Random intracellular drift explains the clonal expansion of mitochondrial DNA mutations with age. Am. J. Hum. Genet. 68, 802–806 (2001).
Cortopassi, G. A. & Arnheim, N. Detection of a specific mitochondrial DNA deletion in tissues of older humans. Nucleic Acids Res. 18, 6927–6933 (1990).
Corral-Debrinski, M. et al. Mitochondrial DNA deletions in human brain: regional variability and increase with advanced age. Nat. Genet. 2, 324–329 (1992).
Sreedhar, A., Aguilera-Aguirre, L. & Singh, K. K. Mitochondria in skin health, aging, and disease. Cell Death Dis. 11, 444 (2020).
Lawless, C., Greaves, L., Reeve, A. K., Turnbull, D. M. & Vincent, A. E. The rise and rise of mitochondrial DNA mutations. Open Biol. 10, 200061 (2020).
Soong, N. W., Hinton, D. R., Cortopassi, G. & Arnheim, N. Mosaicism for a specific somatic mitochondrial DNA mutation in adult human brain. Nat. Genet. 2, 318–323 (1992).
Li, M., Schröder, R., Ni, S., Madea, B. & Stoneking, M. Extensive tissue-related and allele-related mtDNA heteroplasmy suggests positive selection for somatic mutations. Proc. Natl Acad. Sci. USA 112, 2491–2496 (2015).
Forsberg, L. A., Gisselsson, D. & Dumanski, J. P. Mosaicism in health and disease—clones picking up speed. Nat. Rev. Genet. 18, 128–142 (2017).
Youk, J., Kwon, H. W., Kim, R. & Ju, Y. S. Dissecting single-cell genomes through the clonal organoid technique. Exp. Mol. Med. 53, 1503–1511 (2021).
Lareau, C. A. et al. Massively parallel single-cell mitochondrial DNA genotyping and chromatin profiling. Nat. Biotechnol. 39, 451–461 (2021).
Ludwig, L. S. et al. Lineage tracing in humans enabled by mitochondrial mutations and single-cell genomics. Cell 176, 1325–1339 (2019).
Lareau, C. A. et al. Mitochondrial single-cell ATAC–seq for high-throughput multi-omic detection of mitochondrial genotypes and chromatin accessibility. Nat. Protoc. 18, 1416–1440 (2023).
Polyak, K. et al. Somatic mutations of the mitochondrial genome in human colorectal tumours. Nat. Genet. 20, 291–293 (1998).
Ju, Y. S. et al. Origins and functional consequences of somatic mitochondrial DNA mutations in human cancer. eLife 3, e02935 (2014).
Yuan, Y. et al. Comprehensive molecular characterization of mitochondrial genomes in human cancers. Nat. Genet. 52, 342–352 (2020).
Gorelick, A. N. et al. Respiratory complex and tissue lineage drive recurrent mutations in tumour mtDNA. Nat. Metab. 3, 558–570 (2021).
Wallace, D. C. Mitochondria and cancer. Nat. Rev. Cancer 12, 685–698 (2012).
Petros, J. A. et al. mtDNA mutations increase tumorigenicity in prostate cancer. Proc. Natl Acad. Sci. USA 102, 719–724 (2005).
Holt, I. J., Lorimer, H. E. & Jacobs, H. T. Coupled leading- and lagging-strand synthesis of mammalian mitochondrial DNA. Cell 100, 515–524 (2000).
Sanchez-Contreras, M. et al. A replication-linked mutational gradient drives somatic mutation accumulation and influences germline polymorphisms and genome composition in mitochondrial DNA. Nucleic Acids Res. 49, 11103–11118 (2021).
Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).
Wei, W. et al. Germline selection shapes human mitochondrial DNA diversity. Science 364, eaau6520 (2019).
Floros, V. I. et al. Author correction: segregation of mitochondrial DNA heteroplasmy through a developmental genetic bottleneck in human embryos. Nat. Cell Biol. 25, 194 (2023).
Kim, I. B. et al. Non-coding de novo mutations in chromatin interactions are implicated in autism spectrum disorder. Mol. Psychiatry 27, 4680–4694 (2022).
Fan, W. et al. A mouse model of mitochondrial disease reveals germline selection against severe mtDNA mutations. Science 319, 958–962 (2008).
Stewart, J. B. et al. Strong purifying selection in transmission of mammalian mitochondrial DNA. PLoS Biol. 6, e10 (2008).
Cree, L. M. et al. A reduction of mitochondrial DNA molecules during embryogenesis explains the rapid segregation of genotypes. Nat. Genet. 40, 249–254 (2008).
Lee, H.-S. et al. Rapid mitochondrial DNA segregation in primate preimplantation embryos precedes somatic and germline bottleneck. Cell Rep. 1, 506–515 (2012).
Coller, H. A. et al. High frequency of homoplasmic mitochondrial DNA mutations in human tumors can be explained without selection. Nat. Genet. 28, 147–150 (2001).
Wonnapinij, P., Chinnery, P. F. & Samuels, D. C. The distribution of mitochondrial DNA heteroplasmy due to random genetic drift. Am. J. Hum. Genet. 83, 582–593 (2008).
Bachvarova, R. et al. Amounts and modulation of actin mRNAs in mouse oocytes and embryos. Development 106, 561–565 (1989).
St John, J. C., Facucho-Oliveira, J., Jiang, Y., Kelly, R. & Salah, R. Mitochondrial DNA transmission, replication and inheritance: a journey from the gamete through the embryo and into offspring and embryonic stem cells. Hum. Reprod. Update 16, 488–509 (2010).
Bogenhagen, D. & Clayton, D. A. Mouse L cell mitochondrial DNA molecules are selected randomly for replication throughout the cell cycle. Cell 11, 719–727 (1977).
Antes, A. et al. Differential regulation of full-length genome and a single-stranded 7S DNA along the cell cycle in human mitochondria. Nucleic Acids Res. 38, 6466–6476 (2010).
Stoneking, M. Hypervariable sites in the mtDNA control region are mutational hotspots. Am. J. Hum. Genet. 67, 1029–1032 (2000).
Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
Birch-Machin, M. A. & Swalwell, H. How mitochondria record the effects of UV exposure and oxidative stress using human skin as a model tissue. Mutagenesis 25, 101–107 (2010).
Birket, M. J. & Birch-Machin, M. A. Ultraviolet radiation exposure accelerates the accumulation of the aging-dependent T414G mitochondrial DNA mutation in human skin. Aging Cell 6, 557–564 (2007).
Lee, H. R. & Johnson, K. A. Fidelity of the human mitochondrial DNA polymerase. J. Biol. Chem. 281, 36236–36240 (2006).
Longley, M. J., Nguyen, D., Kunkel, T. A. & Copeland, W. C. The fidelity of human DNA polymerase γ with and without exonucleolytic proofreading and the p55 accessory subunit. J. Biol. Chem. 276, 38555–38562 (2001).
Kennedy, S. R., Salk, J. J., Schmitt, M. W. & Loeb, L. A. Ultra-sensitive sequencing reveals an age-related increase in somatic mitochondrial mutations that are inconsistent with oxidative damage. PLoS Genet. 9, e1003794 (2013).
Zheng, W., Khrapko, K., Coller, H. A., Thilly, W. G. & Copeland, W. C. Origins of human mitochondrial point mutations as DNA polymerase γ-mediated errors. Mutat. Res. 599, 11–20 (2006).
Greenman, C., Wooster, R., Futreal, P. A., Stratton, M. R. & Easton, D. F. Statistical analysis of pathogenicity of somatic mutations in cancer. Genetics 173, 2187–2198 (2006).
Yang, Z., Ro, S. & Rannala, B. Likelihood models of somatic mutation and codon substitution in cancer genes. Genetics 165, 695–705 (2003).
Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell 173, 1823 (2018).
Rossi, A. et al. Genetic compensation induced by deleterious mutations but not gene knockdowns. Nature 524, 230–233 (2015).
El-Brolosy, M. A. et al. Genetic compensation triggered by mutant mRNA degradation. Nature 568, 193–197 (2019).
Stewart, J. B. et al. Simultaneous DNA and RNA mapping of somatic mitochondrial mutations across diverse human cancers. PLoS Genet. 11, e1005333 (2015).
Mercer, T. R. et al. The human mitochondrial transcriptome. Cell 146, 645–658 (2011).
Bi, C. et al. Quantitative haplotype-resolved analysis of mitochondrial DNA heteroplasmy in human single oocytes, blastoids, and pluripotent stem cells. Nucleic Acids Res. 51, 3793–3805 (2023).
Grasso, D., Zampieri, L. X., Capelôa, T., Van de Velde, J. A. & Sonveaux, P. Mitochondria in cancer. Cell Stress 4, 114–146 (2020).
ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020).
Favero, F. et al. Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data. Ann. Oncol. 26, 64–70 (2015).
López-Otín, C., Blasco, M. A., Partridge, L., Serrano, M. & Kroemer, G. The hallmarks of aging. Cell 153, 1194–1217 (2013).
Lee-Six, H. et al. The landscape of somatic mutation in normal colorectal epithelial cells. Nature 574, 532–537 (2019).
Kim, R. et al. Clinical application of whole-genome sequencing for precision oncology of solid tumors. Preprint at medRxiv https://doi.org/10.1101/2024.02.08.24302488 (2024).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Van der Auwera, G. A. et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinformatics 43, 11.10.1–11.10.33 (2013).
Koboldt, D. C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012).
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Holsinger, K. E. & Weir, B. S. Genetics in geographically structured populations: defining, estimating and interpreting FST. Nat. Rev. Genet. 10, 639–650 (2009).
Grandhi, S. et al. Heteroplasmic shifts in tumor mitochondrial genomes reveal tissue-specific signals of relaxed and positive selection. Hum. Mol. Genet. 26, 2912–2922 (2017).
Triska, P. et al. Landscape of germline and somatic mitochondrial DNA mutations in pediatric malignancies. Cancer Res. 79, 1318–1330 (2019).
Pedersen, B. S. & Quinlan, A. R. Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics 34, 867–868 (2018).
Acknowledgements
This work was supported by the Suh Kyungbae Foundation (SUHF-18010082 to Y.S.J.) and the National Research Foundation of Korea funded by the Korean Government (NRF-2022R1A5A102641311 and Leading Researcher Program NRF-2020R1A3B2078973 (both to Y.S.J.)).
Author information
Authors and Affiliations
Contributions
J.A. and Y.S.J. conceived the study. H.W.K., J.W.O., Y.L. and H.W. developed the entire protocol of clonal expansion of a single cell and conducted experiments. J.-H.K., J.K.J., E.-C.S., B.K., Y.J.C., J.Y.P. and M.J.K. collected samples and clinical histories from patients. J.A. conducted most genome and statistical analysis, with contributions from C.H.N., R.K., S.P., W.H.L., H.P., C.J.Y., Y.A., J.M.B. and Y.S.J. J.A. and Y.S.J. wrote the manuscript with contributions from all authors. Y.S.J. supervised the overall study.
Corresponding author
Ethics declarations
Competing interests
Y.S.J. is a cofounder and chief executive officer of Inocras. The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks Kamila Naxerova, Ed Reznik and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Coverage of mtDNA and background noise matrix.
a, A scatter plot showing each clone’s mean sequencing coverage and peak VAF of somatic nDNA mutations. b, Histogram depicting the average mtDNA coverage in 2,096 normal clones. c, Median read-depth per mtDNA locus across 2,096 normal clones. d, A bar graph showing cumulative background noise cutoff for each mtDNA locus. e–g, Examples of background noise matrix. Clone-VAFs for each mutation in normal clones are sorted in descending order. The yellow bar denotes the clone-VAF of the called mutation in a clone, and the gray bars indicate clone-VAFs of the same mutation in other clones. e, An example of a true-positive call (m.1,197 G>A). f, An example of a rescued variant (m.16,384 G>A). g, An example of a false-positive call (m.11,009 T>C). ALT, alternate allele.
Extended Data Fig. 2 Phylogenetic trees annotated with VAF in bulk blood.
a, Phylogenetic trees of 26 donors colored by the VAFs of lineage-defining EEMs in bulk blood tissues. The average VAFs of EEMs constituting the first two branches diverging from the MRCA (referred to as lineage 1 and lineage 2) for each phylogeny were shown. The phylogenetic trees display up to 30 EEMs, and each donor’s name is shown at the top right of each panel. b, Cumulative bar graphs showing the average VAFs of lineages 1 and 2 EEMs in bulk blood.
Extended Data Fig. 3 Culture-associated events in mtDNA.
a, Experimental design for assessing culture-associated events5,6. A total of 47 pairs of mother–daughter clones were obtained from ten mother clones. b, Linear correlation between clone-VAFs of mother clones and median clone-VAFs of daughter clones. The gray line represents the diagonal line y = x. Pearson’s correlation coefficient and P value are provided. Two-sided Pearson’s correlation. c, Comparison of clone-VAFs between mother and daughter clones for 49 variants. The mother clone and variants are provided for each panel. Boxplots illustrate median values with IQRs and whiskers (1.5× IQRs). d–g, Comparison between 431 clones from single-crypt-derived organoids (20 individuals) and 432 patches from colon crypts obtained via LCM (42 individuals)72, including linear correlation between age and average SVAF of clones from the individual (d), mutational spectrum (e), the proportion of mutations classified based on functional consequences (f) and clone-VAF distribution (g). d, The gray line and shaded area represent the regression line and its 95% confidence interval. Vertical lines indicate the range of SVAF per clone in an individual.
Extended Data Fig. 4 Features of HetFE variants and caVAF.
a, A diagram showing the classification of shared mtDNA alterations, with respective variant counts in parentheses. b, The region preference of HetFE variants to PZsimple mutations. The y axis indicates the log2-transformed ratio of the prevalence of the HetFE variant to PZsimple mutation. c, A schematic diagram illustrating caVAF calculation using clone-VAFs within an individual. The gray box contains clone-VAFs of the HetFE variant. d, The relationship between initial VAF and caVAF from computational simulations. Simulated caVAFs were calculated using 100 simulated cells, with 1,000 iterations per mitotic turnover and initial VAF. Circles and vertical lines denote average caVAF values and 95% confidence intervals. e,f, Scatter plots comparing VAFs from bulk tissues between monozygotic twins (e) and within one twin (f). The gray lines represent the diagonal line y = x. Pearson’s correlation coefficient and P value are presented. Two-sided Pearson’s correlation. g, A scatter plot depicting the VAF of mtDNA variants in 407 mother–offspring pairs. The gray lines represent the diagonal line y = x.
Extended Data Fig. 5 Clone-VAF distribution of HetFE variants and the overview of the turnover model.
a, Heatmaps representing clone-VAFs of HetFE variants in each clone for three donors. The columns correspond to individual clones, ordered according to their phylogenetic relationships. b,c, Schematic diagrams illustrating the concept of mitotic turnover. Examples of changes in mtDNA frequency during one turnover (b) and mtDNA copy number (c) are depicted. d,e, Schematic diagrams illustrating the concept of homeostatic turnover. Examples of changes in mtDNA frequency during one turnover (d) and changes in mtDNA copy number (e) are depicted.
Extended Data Fig. 6 Mutational profiles of post-zygotic mutations.
a, Categories of PZrecurrent mutations (n = 32) based on types of homopolymer and region. b, Boxplots illustrating the median SBS7 mutation counts in 301 clones without m.414 T>G and 33 clones harboring m.414 T>G. Boxplots illustrate median values with IQRs and whiskers (1.5× IQRs). Two-sided Wilcoxon test. c, Boxplots illustrating PZsimple mtDNA mutation count in clones of DB8 (n = 47; 93 years old) and HC10 (n = 20; 37 years old). Boxplots illustrate median values with IQRs and whiskers (1.5× IQRs). Two-sided Wilcoxon test; NS, not significant. d, Linear correlation between SVAF and age across 31 individuals. The gray line and the shaded area represent the regression line and its 95% confidence interval. Vertical lines crossing each dot indicate the range of SVAF per clone in an individual. Clones with high UV exposure were excluded to remove UV radiation’s impact. Pearson’s correlation coefficient and P value are provided. Two-sided Pearson’s correlation. e, PZsimple mutation spectrum between clones with high and low UV exposure. UV exposure levels are determined by SBS7 mutation counts in the nuclear genome, with high exposure defined as over 10,000 mutations and low exposure as under 1,500 mutations. H, H strand; L, L strand.
Extended Data Fig. 7 Characteristics of mtDNA copy number.
a, A volcano plot showing differentially expressed genes between clones with high and low mtDNA copy number (mtCN) in HC17. The x axis represents log2-transformed fold changes, and the y axis represents adjusted P values (−log10-transformed). Gray and green dots indicate nonsignificant differences and significantly downregulated genes in high mtCN clones. Two-sided Wald test with Benjamini-Hochberg correction. b, A dot plot of mtCNs in 14 clones of HC17, with high and low mtCNs distinguished by a gray dashed line at 1,200. c, A screenshot of Integrative Genomics Viewer showing mtDNA SV in HC21-16. d, log2-transformed fold changes comparing normalized read counts of HC21-16 with SV to other HC21 clones per mtDNA gene. Two-sided Wald test, *P < 0.05, **P < 0.01, ***P < 0.001. Exact P values are 0.0468, 3.5 × 10−4, 1.4 × 10−5, 8.5 × 10−6, 2.3 × 10−3, 1.9 × 10−11, for MT-CO2, MT-ATP8, MT-ATP6, MT-CO3, MT-ND3 and MT-ND4L, respectively. e, Phylogenetic tree of MUTYH-associated adenomatous clones with maximum clone-VAFs. f, Boxplots comparing mtCNs in 3 bulk T cells, 431 normal colonic clones and 14 colorectal cancer cells. Boxplots illustrate median values with IQRs and whiskers (1.5× IQRs). Two-sided Wilcoxon test, **P < 0.01, ***P < 0.001. Exact P values are 0.0029 for bulk T cells and 5.7 × 10−12 for colorectal cancer cells. g, The relationship between mtCN per diploid nuclear genome in cancer tissue and the number of CD3+ T cells. TCF, tumor cell fraction.
Extended Data Fig. 8 Model for mtDNA population change.
a–d, Contour plots representing how an mtDNA population changes with continuous mitotic turnover from simulation studies assuming four different baseline mtDNA copy numbers, including 500 (a), 1,000 (b), 1,500 (c) and 2,000 (d). The x axis shows the mitotic turnover count and the y axis shows the frequency of the most prevalent mtDNA, regardless of mutations.
Extended Data Fig. 9 Model for mtDNA heteroplasmy change.
a–d, Contour plots representing how clone-VAF of PZsimple mtDNA mutations changes with continuous mitotic turnover from simulation studies assuming four different baseline mtDNA copy numbers, including 500 (a), 1,000 (b), 1,500 (c) and 2,000 (d) and absolute mutation rate of 5.0 × 10−8 per bp replication. The x axis shows the mitotic turnover count and the y axis shows the top heteroplasmy level of postzygotic mutation.
Supplementary information
Supplementary Information
Supplementary Notes 1–6 and Supplementary Figs. 1–4.
Supplementary Tables
Supplementary Tables 1–9.
Supplementary Data
Supporting data for Supplementary Figs. 2 and 3.
Source data
Source Data Extended Data Fig. 8
Source data for Extended Data Fig. 8.
Source Data Extended Data Fig. 9
Source data for Extended Data Fig. 9.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
An, J., Nam, C.H., Kim, R. et al. Mitochondrial DNA mosaicism in normal human somatic cells. Nat Genet 56, 1665–1677 (2024). https://doi.org/10.1038/s41588-024-01838-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-024-01838-z