Main

Genomic alterations accumulate in somatic cells throughout an individual’s lifetime1,2,3,4,5. Recent sequencing studies have documented mutations in the nuclear genome and frequent clonal competition of normal cells carrying mutations6,7,8,9,10,11. However, the landscape of mitochondrial DNA (mtDNA) mosaicism in normal human tissues remains unexplored.

Mitochondria are organelles involved in energy metabolism, cell signaling, apoptosis and biosynthesis12,13,14,15,16, carrying their own 16.6 kb-long, circular DNA17. mtDNA mutations can be acquired somatically during development and aging18,19,20,21,22, shaping the genetic mosaicism in somatic tissues23,24,25. Generally, revealing somatic mosaicism is challenging, as most acquired alterations are confined to a single or a tiny fraction of cells in an individual body26. Capturing mtDNA mosaicism is more complex than in nuclear DNA (nDNA), as a cell contains hundreds to thousands of mtDNA copies, and newly acquired mtDNA mutations would only be confined to a small fraction of mtDNA copies even in a single cell12. Recently, the single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) has been applied to reveal mtDNA mutations in single cells27,28, but insufficient mtDNA depth per cell has disallowed sensitive profiling29.

Most of our understanding of acquired mtDNA alterations has been derived from cancer studies30,31,32,33,34,35. In this study, we aimed to investigate the whole-genome sequences (WGSs) from 2,096 colonies expanded from healthy (nontumor) single cells (hereafter referred to as clones)5,6,7,8. This approach enabled the sensitive and accurate detection of single-cell mtDNA variants in multiple cells of an individual. Using this approach, we traced the origin of heteroplasmic mtDNA variants, the absolute rate of mtDNA mutations and the dynamics of age-dependent changes in heteroplasmy levels in somatic lineages.

Results

Landscape of mtDNA heteroplasmy in normal cells

We explored 2,096 WGSs of clones expanded from nonneoplastic healthy single cells collected from the colorectal epithelium (431 crypts from 20 individuals)6, fibroblasts (334 cells from 7 individuals)5 and hematopoietic stem and progenitor cells (HSPCs; 1,331 cells from 4 individuals)7,8 (Fig. 1a and Supplementary Tables 1 and 2). In addition, we analyzed 31 WGSs from tumors, including 19 matched colorectal carcinoma bulk tissues from individuals who donated normal colorectal clones and 12 clones established from adenomatous polyps from one individual with MUTYH-associated polyposis6.

Fig. 1: Landscape of mtDNA heteroplasmy in normal cells.
figure 1

a, Experimental design. b, mtDNA mosaicism identified in 2,096 normal clones from 31 donors. Donor, developmental phylogeny, total number of mtDNA variants, maximum clone-VAF, tissue type and donor age for each clone are shown. Shared variants among clones from an individual are shown in yellow. c, Landscape of mosaic mtDNA variants. Variants were classified as heavy (external circle) or light (internal circle) strands according to the mutated pyrimidine. Mutation types and consequences are represented by colors and shapes. The heavy strand replication origin region (Orib-OH) is highlighted in yellow. d, Strand-asymmetric mutational spectrum across three tissue types. e, Strand bias of C:G>T:A and T:A>C:G base substitutions according to the heavy strand replication origin (Orib-OH; m.16,197-191). The log2-transformed ratio between the numbers of heavy and light strand mutations is shown. f, Distribution of clone-VAFs and phasing of mtDNA alterations in a fibroblast clone, 10_ARL10-4_001D4. H, heavy strand; L, light strand.

Using the variant allele frequencies (VAFs) of the somatically acquired mutations in nDNA, we verified the clonality of the clones (Extended Data Fig. 1a). The average mtDNA read-depth was 6,931× from normal clones (188× to 40,421×; Extended Data Fig. 1b,c), allowing for robust assessment of mtDNAs in a single clone to a heteroplasmy level of ~0.3%. For more systematic analysis, we established and applied a locus-specific background noise matrix (Methods; Extended Data Fig. 1d–g and Supplementary Table 3). To trace the developmental origin of mtDNA alterations, we constructed the early embryonic phylogeny of the clones using shared somatic nDNA mutations3,5. Of note, the first branching in each phylogeny was close to the first cell division in life, as reported previously3,5, given the VAFs of lineage-defining variants in the matched bulk blood tissues (Extended Data Fig. 2 and Supplementary Note 1).

Overall, we identified 6,451 mosaic mtDNA base substitutions and insertions and deletions (InDels) from the normal clones, revealing an average of 3.1 mtDNA alterations per clone (Fig. 1b and Supplementary Table 4). Most clones (92.4%; 1,937 of 2,096) exhibited one or more mtDNA alterations, and approximately 18% of the clones (383 of 2,096) carried one or more nearly homoplasmic mtDNA alterations (defined as VAF > 90%). We believe that VAFs of mtDNA alterations in each clone (referred to as clone-VAFs hereafter) are approximate to original levels in the clone’s founder cell, as clone-VAFs were overall consistent throughout cell culture (Extended Data Fig. 3a–c and Supplementary Note 2). Additionally, direct genome sequencing of colorectal crypts obtained via laser-capture microdissection (LCM) revealed a highly similar mtDNA mutational landscape, indicating minimal culture-associated bias in mutational diversity (Extended Data Fig. 3d–g and Supplementary Note 2).

The spectrum of mtDNA base substitutions was predominantly composed of transitions (C:G>T:A and T:A>C:G base substitutions; collectively 95%; Fig. 1c). These alterations exhibited an extreme level of replication-strand asymmetry, as previously observed in cancers31,32. Generally (outside the heavy strand replication origin; m.192-16,196), mutated cytosine bases of C:G>T:A alterations were predominantly on the heavy strand (92.5%), despite the scarcity of cytosines on the strand (ncytosine:nguanine = 1:2.4; Fig. 1d). Similarly, mutated thymine bases of T:A>C:G alterations were predominantly on the light strand (63.4%), despite their relative rarity on the strand (nthymine:nadenine = 1:1.3; Fig. 1d). Additionally, the strand asymmetry was reversed within the replication origin (m.16,197-191; Fig. 1c,e), where the bidirectional mtDNA replication process is operative36,37. These collectively suggest that mtDNA variant acquisition is tightly coupled with the strand-asymmetric mtDNA replication process, as speculated previously31. However, the strand asymmetry was not completely uniform across cell types (P = 3.3 × 10−52, Pearson’s chi-squared test; Fig. 1d), implying that the mtDNA replication processes may be slightly different across cell types.

We occasionally observed localized acquisition of multiple mtDNA variants32. For example, 12 substitutions, with similar clone-VAFs (1.1–2.5%), were detected in a fibroblast clone (Fig. 1f). These were predominantly T:A>C:G substitutions (11 of 12), and six of them were enriched in a localized region (m.7,318-8,388) with direct evidence of coclonality in phasing, suggesting that a single mutational hit may create multiple mutations in mtDNA, like kataegis in nDNA38.

Two origins of mtDNA alterations

Using shared patterns in the developmental phylogenies and tissues, we categorized the origin of mosaic mtDNA alterations into the following two main groups: (1) heteroplasmy in the fertilized egg (termed HetFE variants; n = 409 alterations, 153 events when collapsed) and (2) postzygotic mutations acquired in somatic lineages (termed postzygotic mutations; n = 6,042; Fig. 2a,b and Extended Data Fig. 4a). Briefly, consistent with their presence from the first cell of life, HetFE variants were shared by multiple clones and/or tissues in a particular individual. In contrast, postzygotic mutations were predominantly confined to one or a few clones (n = 5,652; either as singletons (n = 5,276) or coincidentally recurrent mutations (n = 376); referred to as postzygotic simple (PZsimple) mutations). A small subset of postzygotic mutations (n = 390 from 32 mtDNA sites) were recurrent across multiple clones and not confined to a specific donor (referred to as postzygotic recurrent (PZrecurrent) mutations), suggesting a higher mutation rate at these sites compared to other mtDNA loci.

Fig. 2: Fertilized egg-originated variants in mtDNA.
figure 2

a, A simple diagram showing how mtDNA alterations were categorized. b, Schematic diagram illustrating different shared patterns of mtDNA alterations according to their origin. c,d, Examples of HetFE variants—m.16,400 C>T (c) and m.7,496 T>C (d). Early clonal phylogenies, reconstructed using somatic nDNA mutations, are shown. Branch lengths are proportional to the number of somatic nDNA mutations. Clone-VAF in each clone, caVAF and blood-VAF are represented by bar plots at the bottom. Two pie charts in c and d indicate the proportions of mutant clones among clones of the individual (left) and among clones of other individuals (right). e, Linear correlation between the caVAF and blood-VAF. The blue line represents the regression line, and the shaded area indicates its 95% confidence interval. Pearson’s correlation coefficient and P value are provided. Two-sided Pearson’s correlation. f, Landscape of HetFE variants detected. The colors in ‘number of clones’ represent the number of clones for each individual with corresponding values indicated alongside. The numbers within the yellow circles indicate the count of detected HetFE variants with the caVAF within the specified ranges. The gray area represents the range of caVAF that cannot be detected when considering the number of clones for each individual. The average landscape of HetFE variants is shown on the right. g, The VAF distribution of heteroplasmic mtDNA variants in offspring obtained from bulk blood of 407 mother–offspring pairs. h, A bar plot categorizing HetFE variants found in the mother’s bulk blood (HetFE variants in 407 maternal blood) and the offspring’s bulk blood (HetFE variants in 407 offspring’s blood).

mtDNA heteroplasmy in the fertilized egg

Annotating mtDNA mosaicism with early developmental phylogenies enabled us to capture HetFE variants5. For example, m.16,400 C>T substitution was shared by 14 fibroblast clones (51.9% of 27 clones) established from DB2 (Fig. 2c). Despite its high prevalence in DB2, the variant was extremely rare in clones from other donors (0.1%; two of 2,069 clones). Similarly, m.7,496 T>C substitution was recurrently but exclusively observed in HC19, including three normal colorectal clones (13.0% of 23 clones) and their matched colorectal cancer tissue (Fig. 2d). In both cases, the mutant clones converged at the first node of each phylogeny (Fig. 2c,d). These patterns strongly suggest that the most recent common ancestor (MRCA) cell, possibly the fertilized egg, carried the heteroplasmic variants. Consistent with their pregastrulation timing, these variants were also found in matched bulk blood tissues with substantial VAFs (0.584 and 0.149, respectively; Fig. 2c,d).

Overall, we categorized 153 variant events as HetFE variants (Supplementary Table 5). They include 391 shared variants by multiple clones in an individual (6.1% of the total mtDNA variants; 135 events when collapsed) and 18 singleton variants in clones but shared by matched blood tissues. These variants were twofold enriched in the D-loop (m.16,024-576) and 1.5-fold depleted in the rRNA regions (m.648-1,601 and m.1,671-3,229) compared to PZsimple mutations39 (P = 0.0031 and 0.0363, respectively, two-sided Fisher’s exact test; Extended Data Fig. 4b).

Then, we inferred the original heteroplasmy levels in the fertilized egg of HetFE variants. Notably, we observed that the average clone-VAF value of a HetFE variant across all clones from a donor (referred to as clone-averaged VAF (caVAF); Extended Data Fig. 4c) closely correlated with the heteroplasmy level in the matched polyclonal blood tissue (R = 0.967, P = 2.3 × 10−16, Pearson’s correlation; Fig. 2e). We speculated that a plausible mechanistic link between these two independent values was the heteroplasmy level in its origin; although clone-VAFs of a HetFE variant may fluctuate across clonal lineages with aging, the average (caVAF) would remain overall stable from the original heteroplasmy level, consistent with our computational simulation (Extended Data Fig. 4d). Similarly, as the VAF from the bulk blood tissues (blood-VAF) inherently represents an averaged VAF among many polyclonal blood cells, it should also closely reflect the initial heteroplasmy level. We extended our speculation to the correlation of VAFs of heteroplasmic mtDNA variants between buccal–buccal and/or buccal–blood tissues in 19 monozygotic twins (Extended Data Fig. 4e,f). Therefore, we used caVAF as a proxy for the heteroplasmy level in the fertilized egg of a HetFE variant (Supplementary Note 3).

Most donors (80.6%; 25 of 31) carried one or more HetFE variants with caVAFs over 0.03% (Fig. 2f). Twelve individuals (39%) had HetFE variants with substantial caVAFs (>4%). As expected, the statistical power for capturing HetFE variants was associated with the number of clones in a donor. For example, a HetFE variant was identified with caVAF as low as 0.047% from HC02 (22 clones). In contrast, the minimum caVAF of a HetFE variant was sevenfold lower (0.0067%) in KX008 (364 clones). Considering the detection sensitivity, we profiled the average landscape of HetFE variants, which showed ~2 HetFE variants over 0.5% heteroplasmy level per fertilized egg (Fig. 2f).

Notably, we believe that the actual number of HetFE variants is higher than we observed, as our detection thresholds were ~0.02% for most donors. Given that a fertilized egg typically contains ~100,000 mtDNA copies40, HetFE variants detectable in this study should be shared by at least 20 mtDNA copies in the first cell, and those restricted to a smaller number of mtDNA copies would likely be undetectable. In addition, we speculate that the origin of most HetFE variants found in this study was the maternal germline rather than new acquisitions in the fertilized egg, as newly acquired mutations would be restricted to a single mtDNA copy.

To validate our findings, we explored WGSs from bulk blood tissues of 294 families (including 407 mother–offspring pairs)41. We discovered 425 heteroplasmic variants (>0.5% VAF) in the polyclonal blood of offspring, which are most likely HetFE variants in offspring (Fig. 2g). We further found that ~20% of the variants were heteroplasmic in the polyclonal blood of the mother (likely HetFE variants of the mother; Fig. 2g,h, Extended Data Fig. 4g and Supplementary Note 4). Our findings collectively indicate that (1) mtDNA heteroplasmy in the fertilized egg is not rare, likely being continuously generated in the germline; (2) a substantial fraction of HetFE variants are transmitted to the next generation39, despite the purification process during oogenesis in the maternal germline lineage42,43 and (3) these variants are one of the sources of mtDNA mosaicism observed in aged somatic cells.

mtDNA turnover and drift in somatic lineages

The distribution of clone-VAFs of a HetFE variant among the clones exhibited pressures that were shifting them to both extremes (0% or 100%) from the initial heteroplasmy level (Fig. 3a; two examples in Fig. 2c,d). For instance, the m.16,256 C>T mutation in DB10 (39 clones), which had a caVAF of 0.32, was observed as homoplasmic in 11 clones (28.2%) and almost wild type in 25 clones (64.1%; Fig. 3b). Two underlying possible scenarios include the following: (1) early embryonic bottleneck during progressive mtDNA copy number reduction in the cleavage of early embryogenesis44,45 and (2) lifetime drift through the continuous mtDNA turnovers in each somatic lineage for a lifetime46,47 (Fig. 3c).

Fig. 3: mtDNA turnover leading to mtDNA drift over a lifetime.
figure 3

a, Clone-VAFs of HetFE variants across clones. Clones with a complete absence of corresponding HetFE variants (that is, clone-VAF = 0) are not shown. The colors in age and ‘number of clones’ represent age and the number of clones for each individual, respectively, with corresponding values indicated alongside. Each column represents each HetFE variant, and a vertical line connects clones carrying the same HetFE variant. Gray bars (bottom) show caVAF values for each HetFE variant. b, Clone-VAF distributions of a HetFE variant, m.16,256 C>T, in DB10. A gray dashed line indicates the caVAF. c, Schematic diagram illustrating clone-VAF dynamics by early embryonic bottleneck and lifetime drift. Expected mtDNA copy numbers per cell (bottom) are shown. Two alternative drift models (mitotic and homeostatic turnovers) are depicted in different colors. d, Clone-VAF distributions of HetFE variants with similar caVAF values in four individuals of different ages. e, Linear correlation between age and fixation index for HetFE variants of caVAF > 0.01. Circles represent variants colored by caVAF values. A gray line and the shaded area represent the regression line and its 95% confidence interval. Pearson’s correlation coefficient and P value are presented. Two-sided Pearson’s correlation. f, Clone-VAF difference of a HetFE variant between all possible clone pairs based on their embryonic branching time. Differences were computed when at least one clone-VAF exceeded 0.5 from each pair. Cell generations were calculated using a fixed mutation rate previously reported5. g, Average mitotic turnover counts to reach homoplasmy according to caVAF from the simulation studies. h, Estimated turnover rates for each HetFE variant in mitotic and homeostatic turnover models. Error bars represent the range of 50 simulated results with the lowest MSEs of 1,000,000 simulations per HetFE variant, with circles representing average values. HetFE variants with caVAF > 0.005 were included. Dark gray lines and shaded areas represent average turnover rates and their 95% confidence intervals in each tissue type. EEM, early embryonic mutation; CG, cell generation.

The foundation of the early embryonic mtDNA bottleneck is caused by the lack of mtDNA replication until a certain stage of embryogenesis48,49 (Fig. 3c). If each embryonic cell has one or only a few mtDNA copies at a certain stage, the heteroplasmy level can be quantized according to the composition of founder mtDNAs in each embryonic cell.

In parallel, mtDNAs are lost and newly replicated in somatic lineages12,50,51 (for example, cell-cycle-dependent mtDNA duplication and random segregation by half in two daughter cells in dividing cells (mitotic turnover) or cell-cycle-independent homeostatic mtDNA replacement in nondividing cells (homeostatic turnover); Fig. 3c). The processes can slightly drift heteroplasmy levels continuously over time, generating a substantial impact in a lifetime. Of these two nonexclusive scenarios, our observations indicate that the lifetime drift is dominant.

First, purification of HetFE variants was age-dependent or much weaker in clones from young donors (for example, clones established from an aborted 19-week-old fetus; Fig. 3d,e and Extended Data Fig. 5a). This suggests that purification was not fixed in the early stages of human life. Second, sister clone pairs that branched out at a later time point did not exhibit more similar heteroplasmy levels of a HetFE variant than clone pairs that diverged earlier (Fig. 3f). For example, clone pairs that had an MRCA cell at the ~30th cell generation, which was much later than the early embryonic bottleneck, showed tremendous heterogeneity in clone-VAFs of a HetFE variant (Fig. 3f).

Finally, the computational simulation suggested that the lifetime drift model alone was sufficient to explain the skewed distribution of clone-VAFs in a HetFE variant. Simulation studies using the mitotic turnover model (Extended Data Fig. 5b,c) indicated that 1,440 rounds of mtDNA mitotic turnovers shifted a HetFE variant with 10% initial heteroplasmy level (caVAF) to homoplasmy (100%) in ~10% of the clones when clones had 750 basal mtDNA copy numbers (a turnover was defined as replication of an mtDNA for n times, where n is the basal mtDNA copy number in a somatic cell; Fig. 3g). Likewise, simulations assuming homeostatic turnover (Extended Data Fig. 5d,e) suggested a similar conclusion, but ~50% of rounds were necessary for a similar effect under the same conditions (Supplementary Note 5).

Based on the clone-VAF distributions of HetFE variants, the maximum likelihood mtDNA turnover rates across cell types were inferred (14.3, 20.8 and 17.9 mitotic turnovers per year, or 6.5, 11.5 and 9.4 homeostatic turnovers per year for the colon epithelium, fibroblasts and HSPCs, respectively; Fig. 3h). Although we believe that mitotic and homeostatic mtDNA turnovers are predominant mechanisms for colorectal epithelium and fibroblasts, respectively, their relative balance between two turnover models in each cell type is uncertain.

Postzygotic mtDNA mutations

Of the 2,096 clones, 6,042 mtDNA variants (93.7% of all the variants) were categorized as postzygotic mutations, newly acquired from each somatic lineage. As mentioned above, 32 mtDNA loci showed an elevated mutation rate with 390 PZrecurrent variants (Supplementary Table 6 and Supplementary Note 6). These mutations were predominantly located in the hypervariable regions of the D-loop, homopolymer sequences or both33,52 (Extended Data Fig. 6a). Interestingly, mutations in a hotspot (m.414 T>G) were recurrently found in clones with ultraviolet (UV) light exposure (estimated using UV-associated somatic mutations in the nDNA of a clone53), suggesting UV-dependent acquisition54,55 (Extended Data Fig. 6b).

Except for HetFE and PZrecurrent mutations, we detected 5,652 PZsimple mtDNA alterations. Unlike somatic mutations in nuclear genomes, it is challenging to absolutely count PZsimple mutations, as mutations with clone-VAFs below our detection threshold (~0.3%) would remain undetected. Indeed, the crude number of PZsimple mutations detected in clones was not substantially correlated with age (R = 0.282, P = 0.131, Pearson’s correlation; Fig. 4a). Instead, the overall heteroplasmy levels of PZsimple mutations in clones displayed stronger clock-like properties—PZsimple mutations with higher clone-VAF were more frequent in aged donors than young donors (Fig. 4b), and the sum of the clone-VAFs of all detected PZsimple mutations in a clone (referred to as SVAF) showed more measurable characteristics. For example, in an older individual (DB8; 93 years old), 55% of clones (26 of 47) had an SVAF of ~1.0 by 1–3 clone-specific PZsimple mutations (Fig. 4c). In contrast, in a young individual (HC10; 37 years old), all clones exhibited an SVAF far below 1.0 (0.55 versus 0, P = 5.2 × 10−6, two-sided Fisher’s exact test; Fig. 4c). Of note, there was no significant difference in the crude number of PZsimple mutations between the clones of the two individuals (Extended Data Fig. 6c). The average SVAF in the clones of an individual exhibited a strong positive correlation with age (Extended Data Fig. 6d). The correlation became stronger when the age of individuals was converted to turnover numbers from birth using the cell-type-specific turnover rates estimated from HetFE variants (R = 0.787, P = 2.5 × 10−7, Pearson’s correlation; Fig. 4d).

Fig. 4: Postzygotic mtDNA variants toward homoplasmy in aged cells.
figure 4

a, Linear correlation between the average PZsimple mutation count and age across 31 individuals. The gray line and the shaded area represent the regression line and its 95% confidence interval. One individual aged 0 was not included in the regression. Vertical lines indicate the range of PZsimple mutation counts per clone in an individual. b, Proportions of clones with maximum clone-VAFs across 31 individuals (individual ages indicated in parentheses). Individuals are sorted by tissues, then by age, in ascending order. c, Bar plots of SVAF for each clone in two individuals, DB8 (93 years old; top) and HC10 (37 years old; bottom), with developmental phylogenies. Bar plots include up to the top three clone-VAF PZsimple mutations. Pie charts categorize clones based on SVAF. d, Linear correlation between average SVAF and total mitotic turnovers across 31 individuals. The total number of mitotic turnovers was calculated using the rates estimated by HetFE variants for each tissue type. The gray line and shaded area represent the regression line and its 95% confidence interval. One individual aged 0 was not included in the regression. Vertical lines indicate the range of SVAF per clone in an individual. (a,b,d, Clones with high UV exposure were excluded to remove UV radiation’s impact. Pearson’s correlation coefficient and P value are provided. Two-sided Pearson’s correlation.) e, Comparison of clone proportions with maximum clone-VAFs of PZsimple mutations in fibroblast clones with low and high UV-derived nDNA mutation burdens in three donors. f, Estimated mtDNA mutation rate in 31 individuals under mitotic and homeostatic turnover models (individual ages indicated in parentheses). Error bars represent the range of 50 simulated results with the lowest MSEs of 10,000 simulations per individual, with circles representing average values.

Interestingly, we observed that a few fibroblast clones with a higher amount of lifetime UV-light exposure exhibited a higher SVAF of PZsimple mutations than those with a lower amount of lifetime UV-light exposure (P = 7.7 × 10−4, two-sided Fisher’s exact test; Fig. 4e). This indicates that UV exposure accelerated mtDNA turnovers in the cellular lineage. We speculate that UV exposure damages mtDNA, followed by mtDNA degradation and triggering additional mtDNA replications for their replacement21. Of note, the mtDNA mutational signatures in clones with a higher UV exposure were similar to the other clones (Extended Data Fig. 6e), indicating that UV light does not directly lead to PZsimple mutations fixed in mtDNA.

With the mtDNA turnover rates estimated using HetFE variants and the landscape of detectable PZsimple mutations, we estimated the absolute number of mtDNA alterations that are newly appearing in every mtDNA replication. In all individuals and both turnover models, the absolute mtDNA mutation rates converged to 5.0 × 10−8 alterations per base pair (bp) replication (Fig. 4f). Interestingly, our estimate was within the range of error rates of polymerase γ (POLG), the mitochondrial genome’s DNA polymerase56,57. The converged rate reassures that (1) endogenous mtDNA replication is the dominant process for mtDNA mutation acquisition in somatic cells31,58,59 and (2) both turnover models (and their turnover rates) are reliable. Given the ~750 mtDNA copies in a single somatic cell, our absolute mutation rate implies an average of 0.31 de novo PZsimple mtDNA alteration is acquired per daughter cell per cell division.

Selective pressure of mtDNA mutations in normal cells

To understand the selective pressure on PZsimple mutations, we calculated the dN/dS ratio60,61,62. The ratio of missense or truncating mutations to synonymous mutations was not substantially higher than mtDNA mutations randomly generated according to the mtDNA mutational signature, indicating general neutrality in mutation acquisition (Fig. 5a). However, truncating mutations exhibited lower clone-VAFs than synonymous mutations in all three cell types, with no mutations exceeding 90% clone-VAFs, suggesting constrained expansion of mtDNAs carrying inactivating mutations due to functional disadvantage when reaching homoplasmy (P = 0.0211, 0.0017 and 0.0013 for the colon epithelium, fibroblasts and HSPCs, respectively, two-sided Fisher’s exact test; Fig. 5b). These observations were consistent with previous observations in cancer tissues31,32.

Fig. 5: Selection and transcription of mtDNA variants.
figure 5

a, dN/dS ratios for missense and truncating mutations in each tissue type with simulated null distributions. Error bars represent 25th and 75th percentiles from simulations (10,000 simulations for each donor). b, Clone-VAF distribution of synonymous, missense and truncating mutations. Low clone-VAF variants (<0.1) were not included. Two-sided Fisher’s exact test, *P < 0.05, **P < 0.01; NS, not significant. Exact P values are 0.0211, 0.0017 and 0.0013 for the colon epithelium, fibroblasts and HSPCs, respectively. c,d, log2-transformed fold changes of expression levels. Normalized read counts from a clone were compared to the average normalized read counts among wild-type (other) clones in HC13 (c; 22 clones) and HC17 (d; 14 clones). Red and blue diamonds represent clones with truncating and missense mutations, respectively. Black circles indicate wild-type clones. Boxplots illustrate log2-transformed fold change variation in wild-type clones with median values, IQRs and whiskers (1.5× IQR). Yellow box highlighting the gene with truncating mutations. e, Scatter plots delineating clone-VAFs in genome versus transcriptome sequences according to functional consequences of mutations. Mutations in rRNA and tRNA are further subcategorized based on the secondary structure of the RNA they impact (color-coded). The gray lines represent the diagonal line y = x. f,g, Violin plots illustrating how clone-VAF between the genome and transcriptome varies based on tRNA (f) and rRNA (g) secondary structures. The y axis represents the log2-transformed ratio of clone-VAF in RNA to clone-VAF in DNA. Mutations with clone-VAF lower than 0.01 were excluded. One-sided Wilcoxon test. IQRs, interquartile ranges.

Despite the expansion constraint, 15 truncating mutations displayed high clone-VAFs among the clones (clone-VAF > 0.6), accompanied by upregulated RNA expression levels of mtDNA genes (Fig. 5c,d). This phenomenon is likely attributable to a compensatory response where transcript degradation is inhibited when the protein product is dysfunctional63,64. The similarity in clone-VAFs between genome and transcriptome sequences indicates that this inhibitory effect does not distinguish between wild-type and truncated mtDNA (Fig. 5e).

We further compared the clone-VAFs of PZsimple mutations in genome and transcriptome sequences (Fig. 5e). Although most mtDNA mutations showed similar clone-VAFs in both, a subset of tRNA mutations exhibited elevated clone-VAFs in transcriptomes, which is consistent with a previous report65. In contrast, a subset of rRNA mutations showed reduced clone-VAFs in transcriptomes. These mutations were predominantly clustered within stem regions of tRNA and rRNA (P = 0.0158 and 0.0329 for tRNA and rRNA mutations, respectively, one-sided Wilcoxon test; Fig. 5f,g). We speculated that these mutations influence the stability and regulation of these RNAs, leading to tRNA accumulation and rRNA degradation65,66.

mtDNA copy number and structural variations (SVs) in normal cells

The average mtDNA copy number was ~750 per cell (per diploid nuclear genome), but large variations in mtDNA copy number were observed across clones, even in an individual (Fig. 6a). For example, mtDNA copy numbers among the clones of HSPCs from KX004 ranged from ~20 to 3,700. There was no apparent correlation between median mtDNA copy number and age (R = 0.127, P = 0.381, Pearson’s correlation). Notably, interclonal mtDNA copy number variations were less substantial in colorectal clones (Fig. 6a). Despite these variations, gene expression levels of mtDNA and nDNA genes were not substantially altered among the clones, suggesting that the mtDNA copy number is not a bottleneck for the transcription of mtDNA genes, at least at the resting stage (Extended Data Fig. 7a,b).

Fig. 6: mtDNA copy number in somatic cells and mtDNA in cancer.
figure 6

a, mtDNA copy number distributions among 31 individuals, sorted by tissue type, then by age in ascending order. Black dots and red bars represent clones (n = 2,096) and mean values, respectively. b, Read-depth of the mitochondrial genome showing large deletions in two colorectal clones (HC06-14 and HC21-16). Yellow lines represent the deleted regions. c,d, Mutation number (c) and SVAF (d) in normal colorectal clones (blue) and matched colon cancer tissues (red) are correlated with the number of mitotic turnovers across 19 donors. The age of donors is shown in parentheses. Vertical lines indicate the range across clones in each donor. Red and blue lines represent regression lines. e, The proportion of truncating mtDNA mutations within normal colorectal clones and colorectal cancer tissues. Two-sided Fisher’s exact test. f, Distributions of mtDNA copy numbers in normal colorectal clones and matched cancer tissues among 19 individuals. Boxplots illustrate median values with IQRs and whiskers (1.5× IQR). g, The linear correlation between tumor cell fraction and mtDNA copy numbers per diploid nuclear genome in cancer tissues. The gray line and the shaded area represent the regression line and its 95% confidence interval, respectively. Copy number values at 0% and 100% tumor cell fractions are shown by extrapolation. Pearson’s correlation coefficient and P value are provided. Two-sided Pearson’s correlation. TME, tumor microenvironment.

Two colorectal clones had notable SVs within their mtDNA (Fig. 6b and Extended Data Fig. 7c), with deletions of 10,951 bp and 3,389 bp, respectively, at approximately 45% heteroplasmy levels. As expected, gene expression levels in the deleted loci were lower than in the flanking regions (P < 0.05, Wald test; Extended Data Fig. 7d). Notably, these large deletions have been observed in cancers at a similar frequency32. Our findings illustrate that SVs can occur in normal clones67; however, these rare events involve only approximately 0.1% of normal cells.

Accelerated mtDNA turnover in tumorigenesis

In 19 matched colorectal cancer tissues, we observed, on average, more detectable mutations (5.3 versus 3.8; P = 0.0301, Wilcoxon signed rank exact test; Fig. 6c) and higher SVAF values (P = 8.5 × 10−4, Wilcoxon signed rank exact test; Fig. 6d) than normal clones from the same donor. Our findings suggest an elevated mtDNA mutation rate, turnover rate or both during tumor initiation and clonal evolution68. Consistent with this speculation, in 12 clones established from MUTYH-associated adenomatous polyps6, homoplasmic mtDNA mutations were more frequently observed in lineages with more driver mutations (Extended Data Fig. 7e).

We further investigated detectable PZsimple mutations in 70 colorectal carcinomas (19 matched and 51 unrelated colorectal cancers69; Supplementary Table 7). Qualitatively, colorectal cancers exhibited a notably higher prevalence of truncating mutations with >0.6 VAFs than normal clones (0.0203 versus 0.0026, P = 1.5 × 10−4, two-sided Fisher’s exact test; Fig. 6e). This finding suggests increased accumulation of deleterious mutations in colorectal cancers, as observed previously32.

Finally, compared to the mtDNA copy numbers in normal clones, 19 matched colon cancer tissues demonstrated biased copy number changes (per diploid nuclear genome) toward either gain or loss of mtDNA copies at face value (Fig. 6f). To gain insights into the mtDNA copy numbers in pure colon cancer cells without co-existing tumor microenvironmental cells, such as infiltrating lymphocytes, we correlated mtDNA copy numbers of cancer tissues with their tumor cell fractions estimated from genome sequences70 and found a strong positive linear relationship (R = 0.715, P = 5.7 × 10−4, Pearson’s correlation; Fig. 6g). Extrapolation of the regression line suggested ~1,266 mtDNA copies per diploid nuclear genome at 100% tumor cell fraction, which is 70% higher than in normal colorectal clones. Indeed, we confirmed an mtDNA copy number increase in colon cancer cells by WGSs of 14 colon cancer organoids (100% tumor cell fraction; 1,224 mtDNA copies per diploid cancer cell; Extended Data Fig. 7f). The underlying reason for the mtDNA copy number gain in cancer cells is uncertain.

Similarly, mtDNA copy numbers in cancer tissues were negatively correlated with the amount of infiltrating CD3+ T cells (Extended Data Fig. 7g). Genome sequencing of T cells sorted from the peripheral blood suggested that there were ~123 mtDNA copies per T cell (Extended Data Fig. 7f), which was close to the value extrapolated from the regression line (Fig. 6g).

Discussion

By leveraging WGSs derived from 2,096 healthy normal clones encompassing three different tissues, we elucidated the landscape of mtDNA mosaicism across single cells. Our system allowed the tracing of the embryonic origin of the mtDNA variants. Unlike the conventional wisdom of homogeneous mtDNA in fertilized eggs42,43, we conclude that human fertilized eggs frequently harbor heteroplasmic mtDNA variants, often showing substantial heteroplasmic levels (that is, VAF > 30%).

The detection of HetFE variants allowed the determination of one of two essential parameters contributing to the landscape of mtDNA mosaicism—the mtDNA turnover rates in somatic cells. Then, by applying the turnover rate to the landscape of PZsimple mutations, the other critical parameter, the absolute mtDNA mutation rate per mtDNA replication, was elucidated. Despite their importance in understanding mtDNA mutational dynamics in somatic cells, it has been challenging to decompose these two parameters individually, as both are intermingled. For example, mtDNA mutations without mtDNA expansion cannot be detected, and mtDNA expansion cannot be tracked without mtDNA mutations.

Our findings suggest that stochastic lifetime drift alone can shape the mtDNA heteroplasmy landscape observed in this study (Fig. 7a). The replication-strand-asymmetric mutational spectrum and constant mutation rate suggest that PZsimple mutations arise primarily through replication-associated mechanisms. Our lifetime drift model illustrates that 1,000 mitotic turnovers induce one of 750 mtDNA copies to have a completely purified mtDNA composition in ~30% of somatic cells (Fig. 7b and Supplementary Table 8) and result in homoplasmic PZsimple mutations in ~5% of somatic cells (Fig. 7c and Supplementary Table 9). The extent of lifetime drift generally decreases as the basal mtDNA copy number increases (Extended Data Figs. 8 and 9).

Fig. 7: Model for mtDNA dynamics in somatic lineages over a lifetime.
figure 7

a, Schematic diagram illustrating the origin and dynamics of mtDNA alterations across a lifetime. Heteroplasmic variants in the fertilized egg and somatically acquired postzygotic mutations undergo drift in somatic lineages. Mutation and turnover rates (mitotic and homeostatic turnover rates) are shown at the top right. Stars indicate PZsimple mutations. Bars on the left, middle and right depict clone-VAF distribution within the cell at different time points. b, A contour plot representing how an mtDNA population changes with continuous mitotic turnover from simulation studies assuming a baseline mtDNA copy number of 750. The x axis shows the mitotic turnover count and the y axis shows the frequency of the most prevalent mtDNA, regardless of mutations. c, A contour plot representing how clone-VAF of PZsimple mtDNA mutations changes with continuous mitotic turnover from simulation studies assuming a baseline copy number of 750 and absolute mutation rate of 5.0 × 10−8 per bp replication. The x axis shows the mitotic turnover count and the y axis shows the top heteroplasmy level of postzygotic mutation.

The acquisition of truncating mutations was not substantially constrained, but their drift to homoplasmy was repressed in somatic lineages due to functional disadvantages. The mtDNA-truncating mutations were more common in colorectal cancers than in normal tissues, indicating that cancer cells may depend less on functional mitochondria. Increased mutation counts, expansion and mtDNA copy numbers in colorectal tumors suggest that mtDNA dynamics may change with a potential impact during colorectal tumorigenesis.

When certain mtDNA mutations exceed specific heteroplasmy levels, they can cause mitochondrial dysfunction, a hallmark of aging71. Our study meticulously outlines the general landscape of mtDNA mosaicism in apparently normal cells and the forces shaping it throughout life. Similar but more comprehensive analyses with diseased and aged cells are warranted to provide more specific evidence of how mtDNA mutations contribute to phenotypic changes and disease development.

Methods

Sample cohort

Publicly available WGSs of single clones from four previous datasets were used—one for colorectal epithelium from our previous study (405 normal clones and 19 matched colorectal carcinomas from 19 individuals, 12 MUTYH-associated adenomatous clones from 1 individual)6, one for mesenchymal fibroblasts from our previous study (334 normal clones from 7 individuals)5 and two for HSPCs (1,331 clones from 4 individuals)7,8. WGSs of colorectal epithelium were established from single-crypt-derived organoids, and the others were generated from single-cell expanded clones. We included only clones with >0.4 VAF and >10 average depths in the nuclear genome to ensure clonality and quality.

To deeply understand the mtDNA heteroplasmy in early embryogenesis, we established 26 single-crypt-derived organoids of colorectal epithelium from one 19-week-old aborted fetus, as previously reported6. Genomic DNA materials were extracted using the DNeasy Blood and Tissue Kit (Qiagen). DNA libraries were generated using TruSeq DNA PCR-Free Library Prep Kits (Illumina) and sequenced on the NovaSeq 6000 platform. All the procedures in this study were approved by the Institutional Review Board of Korea Advanced Institute of Science and Technology (approval: KH2021-096), and informed consent was obtained from the parents of this individual.

In addition, to assess the prevalence of mtDNA variants across normal clones, we included 52 in-house WGSs of 13 individuals generated from organoids of various tissue types and 432 WGSs of 42 individuals generated from LCM patches of colorectal tissues72. To validate heteroplasmy profiles in the fertilized egg, we further explored 938 WGSs of bulk blood from 275 families41. To confirm that the observed VAF in bulk tissues matches the VAF in the fertilized egg, we examined 108 in-house WGSs obtained from cord blood and buccal swabs of 19 monozygotic twin families.

To understand the association between gene expression and mtDNA mutations, 312 whole-transcriptome sequences of colon clones from our previous study were also included6.

To compare mtDNA mutations between tumor and normal samples, we further explored 51 WGSs of colorectal carcinomas from the Pan-Cancer Analysis of Whole-Genome Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA)69. Findings from colorectal cancer genome sequences, including tumor cell fraction, tumor ploidy and driver mutations, were analyzed by CancerVision (Inocras)6,73. Additionally, we used 17 in-house WGSs to compare the mtDNA copy numbers of normal colon cells, T cells and colorectal cancer cells. For T cells (n = 3), we used samples obtained by sorting and clustering T cells followed by bulk sequencing. For colorectal cancer cells (n = 14), we sequenced colorectal cancer organoids with a tumor cell fraction of 1 to ascertain the mtDNA copy number exclusively from cancer cells.

Calling and filtering of mtDNA mutations

Sequenced reads were aligned to the human reference genome build 37 (GRCh37) using the BWA-MEM algorithm74. Duplicated reads were removed by Picard (available at https://broadinstitute.github.io/picard/), and reads mapped to the mitochondrial genome were extracted by SAMtools75. To be aware of misaligned reads due to nuclear-mitochondrial DNA segments (NUMTs), we only included paired reads that were (1) both mapped to mtDNA, (2) not chimeric aligned and (3) correctly oriented. mtDNA mutations were called using HaplotypeCaller2 (ref. 76) and VarScan2 (ref. 77), and any mutation detected by either one was added to the mutation sets for high sensitivity.

Mutations were then filtered out using the following criteria: (1) low mapping quality (<25); (2) low base quality (<15); (3) skewed average mutation position (<15% or >85% of supporting reads); (4) unbalanced ratio between forward and reverse supporting reads (<10% or >90%) and (5) five or more mismatches in supporting reads. Mutations in the regions with low complexity or a gap in the reference genome (m.3,107N) were explicitly discarded31:

  1. (1)

    Misalignment due to ACCCCCCCTCCCCC (rCRS 302-315)

  2. (2)

    Misalignment due to GCACACACACACC (rCRS 513-525)

  3. (3)

    Misalignment due to 3107N in rCRS (rCRS 3,105-3,109)

  4. (4)

    Misalignment due to ACCCCC (rCRS 16,182-16,187)

More strict criteria were applied to InDel mutations, so mutations with a high proportion of additional InDels in supporting reads (>50%) were filtered out. Furthermore, InDels within noisy regions, primarily attributed to C homopolymers, were excluded from the analysis—m.567, 955 and 5,894. When visually inspected using Integrative Genomics Viewer78, although long-read sequences had precise profiles, various types of InDels were detected at these loci in short-read sequences, making it challenging to identify mutations clearly.

A background noise matrix was generated for each locus of alternate alleles using all WGSs of normal clones to establish high-confidence mutation sets. Due to mtDNA’s repetitive nature, background noise rates vary across loci. We systematically measured VAFs within every normal clone and constructed VAF distributions for each locus and alternate allele, considering the background noise matrix. Then, we overlaid the called variant set onto the background noise matrix.

To determine the background noise criteria, we first calculated the average and s.d. of the VAFs in each locus, computing the one-sided 95% confidence interval. Clones with VAF beyond the interval were considered mutants in the locus. In parallel, VAFs in a specific position were sorted in ascending order, and gaps between adjacent VAFs were examined. We considered the quantum jump of the gap as the cutoff value between true signals and background noises. To this end, we calculated the relative gap between adjacent VAFs as follows:

$${\rm{relative}}\; {\mathrm{gap}}=\frac{{\mathrm{VAF}}^{\prime} -{\mathrm{VAF}}}{{\mathrm{VAF}}},$$

where VAF and VAF′ denote the lower and higher adjacent VAFs from an mtDNA locus. When calculating the relative gap, we only use VAFs between 0.05% and 15%. After calculating the relative gap, we identified the adjacent VAFs, which showed the largest gap among those with a relative gap of 0.33 or higher. We considered this gap the boundary between the background noises and true signals. The actual threshold value was set as a smaller value between (1) the average of VAF and VAF′ or (2) VAF × 1.33. If a variant did not exceed this threshold even after being called, it was considered a false-positive and excluded from the variant set. Conversely, if a variant was not called but exceeded this threshold, it was considered a false-negative and rescued.

Classification of mtDNA alterations

mtDNA alterations shared in at least two clones of the same individual were classified into HetFE variants, PZrecurrent mutations and PZsimple mutations. If a shared mtDNA alteration was identified only in one individual, it was considered a HetFE variant. When an alteration was detected in two or more clones within a single individual but found in just a few clones within other individuals, the binomial test was used to provide a statistical framework for classifying the mutation, whether a HetFE variant or a PZsimple mutation. Concerning these mutations of interest, we used a maximum likelihood estimation method to estimate the probability of their occurrence by chance within each clone:

$${L}\left({\rm{p}}|{x}_{\exp }\right)=\left({n}\atop{{x}_{\exp }}\right){p}^{\,{x}_{\exp }}{\left(1-p\right)}^{n-{x}_{\exp }}$$
$${\hat{P}}_{{\mathrm{ML}}}={{\mathrm{argmax}}}_{P}{L}\left({{P}}|{x}_{\exp }\right),$$

where n denotes the total number of normal clones, excluding the individual harboring the shared mutation, and xexp denotes the subset of these clones where the mutation was detected. The resultant \({\hat{P}}_{{\rm{ML}}}\) reflects the estimated rate at which the mutation spontaneously arises. By using a binomial distribution, we subsequently computed the probability that the shared mutation emerged by random chance within the individual:

$$P\left(x\ge {x}_{{\rm{obs}}}\right)=\mathop{\sum }\limits_{x={x}_{{\rm{obs}}}}^{{n}_{{obs}}}\left({{n}_{{\rm{obs}}}}\atop{x}\right)\,{{\hat{p}}_{{\rm{ML}}}}^{x}{\left(1-{\hat{p}}_{{\rm{ML}}}\right)}^{{n}_{{\rm{obs}}}-x}$$

where nobs and xobs represent the number of clones within the individual and the number of clones harboring the shared mutation in the individual, respectively. If the calculated probability was below 0.01, the mutation was categorized as a HetFE variant; otherwise, it was classified as a PZsimple mutation.

A mutation was categorized as a PZrecurrent mutation when observed in two or more individuals, each exhibiting a minimum of two clones carrying the identical mutation. Moreover, mutations found in at least ten individuals within the cohort of 86 individuals (31 individuals in the study and 55 other individuals from in-house WGSs or WGSs of LCM patches72) were designated as PZrecurrent mutations. Identifying PZrecurrent mutations within a specific tissue was contingent upon their notable prevalence among older individuals (Wilcoxon rank-sum test) and their simultaneous occurrence in multiple individuals within the same tissue.

Mutations exclusive to late-branched clones were designated as PZsimple mutations. Late-branched clones were defined by their common ancestor, accumulating at least 100 mutations before diverging. Particularly within the context of HSPCs, many clones exhibited branching events during postearly embryogenesis development. This necessitated a meticulous mutation assessment to avoid misclassifications as HetFE variants.

Fixation index of HetFE variants

Fixation index (FST), or Wright’s F statistics, is a statistical metric for quantifying genetic differentiation79. Using this, we computed the diversity of clone-VAF for each HetFE variant to assess the impact of the lifetime drift effect:

$${F}_{{\mathrm{ST}}}=\frac{{\sigma }_{S}^{2}}{{\sigma }_{T}^{2}}=\frac{{\sigma }_{S}^{2}}{\bar{P}\left(1-\bar{P}\right)},$$

where \({\sigma }_{S}^{2}\) represents the variance in clone-VAFs among clones within an individual, \({\sigma }_{T}^{2}\) represents the variance in clone-VAFs across the entire cells of the individual and \(\bar{P}\) signifies the average clone-VAF across clones in the individual (caVAF), which approximates the average frequency in the entire cell population under the assumption of Hardy–Weinberg equilibrium. Notably, FST computations were exclusively conducted for HetFE variants with \(\bar{P}\) values exceeding 0.01.

Simulation framework for the mitochondrial turnover

We developed a computational algorithm to simulate lifetime drift in mtDNA heteroplasmy. mtDNA undergoes continuous random degradation and replication processes independently of cell division, even without cell division. If cell division occurs during this process, random segregation of mtDNA accompanies it. This series of processes, whereby mtDNA undergoes 100% refreshment within a cell, is defined as mitochondrial turnover. To reflect this nature of mtDNA and intuitively mimic mitochondrial turnover, we developed the following two turnover models: the mitotic turnover model and the homeostatic turnover model.

Both models involve random replication of mtDNA copies. In the mitotic turnover model, mtDNA copies increase one by one through random replication until the original amount is doubled and then halved randomly through mitosis. We define this series of processes as one mitotic turnover. In the homeostatic turnover model, random replication is iterated whenever one mtDNA copy undergoes degradation, ensuring constant mtDNA copy numbers in a cell. When this iteration occurs for the number of times equivalent to the total amount of mtDNA, we define it as one homeostatic turnover. Thus, while both models involve random replication, the mitotic turnover model integrates cell division. In contrast, the homeostatic turnover model focuses solely on mtDNA replacement. The scheme of two turnover models is illustrated in Extended Data Fig. 5b–e and discussed in detail in Supplementary Note 5.

Estimation of average turnover to fixation and turnover rate

We simulated how HetFE variants change as turnover repeats by lifetime drift. We set up wild-type and mutant mtDNA to co-exist within a single cell. Then, using the mitotic and homeostatic turnover models, we aimed to infer the average turnover count until fixation into a clone-VAF of 100% (homoplasmy) and the turnover rate.

While specific parameters such as mtDNA copy number (n) and VAF in the fertilized egg (P; caVAF) differ depending on the inference being made, the foundational structure of the model remains unchanged. The central assumptions of this model encompass (1) the constancy of the average mtDNA copy number within a cell per turnover, (2) the neutrality of mutant mtDNA about selection and (3) simultaneous turnovers occurring in 10,000 cells. In the case of the mitotic turnover model, one more assumption, the random segregation of mtDNA into two daughter cells with an equal amount, is added. At the commencement of the simulation, each cell initiated with a VAF of P, mirroring the condition observed in the fertilized egg. We conducted the mitotic and homeostatic turnover model simulations with the same simulation parameters. Each simulation persisted until 10,000 turnovers were executed, with clone-VAF per cell documented at each turnover.

To infer the average turnover count for fixation to homoplasmy, n was set to 750, and P spanned the range of (0.1, 0.2, …, 0.8, 0.9). Subsequently, in each of the 10,000 individual cells, the number of turnovers required for mtDNA variants to attain homoplasmy was determined. The average number of turnovers required for fixation at each parameter P was computed across all 10,000 cells.

Regarding the inference of turnover rate, we selected HetFE variants with the caVAF exceeding 0.005 for simulation. The value of n was established as the observed average mtDNA copy number for the individual, while P was determined as the caVAF. Both mitotic and homeostatic turnover models underwent 10,000 simulations for each unique parameter set. In each iteration, cells were randomly selected 100 times to replicate the sequencing process, considering the specific number of clones within the individual. This yielded a total of 25 summary statistics for each simulation—the count of cells within specified clone-VAF ranges (0.5–5%, 5–10%, …, 90–95%, 95–100%), mean clone-VAF, s.d. of clone-VAF, the proportion of cells categorized as wild type (clone-VAF < 5%), heteroplasmic (5% < clone-VAF < 90%) and homoplasmic (clone-VAF > 90%). We then compared the summary statistics derived from each simulation to the observed summary statistics. The mitotic and homeostatic turnover rates were estimated by minimizing the mean squared error (MSE). The estimation of the turnover rate was performed for each specific tissue type.

Estimation of mtDNA mutation rate

We used the estimated turnover rate to infer the mtDNA mutation rate. The mtDNA mutation rate was inferred through simulations conducted separately for the mitotic and homeostatic turnover models. The core algorithm parallels the model elucidated earlier but is oriented toward simulating PZsimple mutations. This model comprises the following three key parameters: mtDNA copy number (n), total turnover count (g) and mutation rate (r). For each individual, this simulation was performed with n set to the individual’s observed average mtDNA copy number and g calculated as the product of the estimated tissue-specific turnover rate and the individual’s age. The tissue-specific turnover rate for the parameter g is determined based on whether this simulation corresponds to the mitotic or homeostatic turnover models. The logarithm (base 10) of r was sampled from a uniform distribution spanning the range of (−9, −3), corresponding to a minimum r of 1 × 10−9 mutations per bp and a maximum r of 1 × 10−3 mutations per bp. The number of mutations occurring within a cell was drawn from a Poisson distribution with a parameter lambda set to r × 16,569 for each replication event. These mutations were then introduced into the mtDNA of each cell. Following g turnovers, cells were randomly chosen ten times to simulate the sequencing process, considering the specific number of clones within the individual.

This entire simulation was iterated 1,000 times, resulting in 10,000 simulation outcomes per individual. For each simulation, a set of 22 summary statistics was generated—the count of cells with the maximum clone-VAF falling within specified clone-VAF ranges (0.5–5%, 5–10%, …, 90–95%, 95–100%), the count of cells with two homoplasmic mutations (clone-VAF > 90%) and the count of cells with three homoplasmic mutations (clone-VAF > 90%). Subsequently, the summary statistics from each simulation were compared to the observed summary statistics. The mutation rates of each mitotic and homeostatic turnover were estimated by minimizing the MSE.

Modeling the mtDNA dynamics

We conducted simulations in the mitotic and homeostatic turnover models to explore how mtDNA and heteroplasmy levels of PZsimple mutations change due to lifetime drift. Initially, we determined the mtDNA copy number (n) in a single cell and labeled each n mtDNA molecule differently to allow for individual tracking. Subsequently, we simulated turnovers, recording the changes in the frequency of each mtDNA with each turnover. We then identified the most abundant mtDNA frequency in a cell. This process was simultaneously performed in a total of 10,000 cells, and the distributions were plotted for each turnover based on the most abundant mtDNA frequency in each cell.

Then, we induced PZsimple mutations in the abovementioned process using the mtDNA mutation rate we inferred. Subsequently, we tracked how the clone-VAF of PZsimple mutations changed with each turnover and determined the highest clone-VAF of PZsimple mutations in a cell. This process was also conducted simultaneously in 10,000 cells, and distributions were plotted for each turnover based on the highest PZsimple mutation clone-VAF in each cell.

Each simulation was conducted using five different mtDNA copy numbers (500, 750, 1,000, 1,500 and 2,000) in two turnover models—the mitotic and homeostatic turnover models.

Selective pressure on mtDNA

mtDNA’s evolutionary history in germline cells exhibits a notable bias toward missense mutations31. We aimed to compute the dN/dS ratio for each individual’s unique mtDNA sequence and probabilistically assess the likelihood of these mutations occurring randomly80,81.

To this end, we simulated the null neutrality hypothesis by introducing random mutations into individual-specific mtDNA sequences. These sequences accounted for the interindividual variability in germline mutations. We then quantified the possible occurrence of synonymous, missense and truncating mutations within each mtDNA. Using computational simulation, we generated random mutations with the exact mutation count in the individual’s observed data and then annotated the functional consequences to calculate the simulated dN/dS ratio60,61,62. This simulation was iterated 10,000 times across all individuals, yielding a null distribution of dN/dS values for each individual under the neutrality assumption. These outcomes were further aggregated for individuals within the same tissue types. Ultimately, we compared the dN/dS ratios observed for missense and truncating mutations to the null distribution of simulated dN/dS values for each tissue type. We subsequently evaluated the probability of the observed dN/dS ratios occurring by chance.

Inference of mtDNA copy number

We estimated the mtDNA copy numbers per clone using the below formula:

$${\rm{mtDNA}}\; {\mathrm{copy}}\; {\mathrm{number}}=\frac{{{\mathrm{coverage}}}_{{\mathrm{mtDNA}}}}{{{\mathrm{coverage}}}_{{\mathrm{nDNA}}}}\times 2,$$

where coveragemtDNA and coveragenDNA denote the mean coverage depth of mtDNA and the mean coverage depth of nDNA, respectively. The ploidy was fixed at two regardless of normal clone or cancer tissue to obtain more reliable mtDNA copy numbers regardless of tumor cell fraction and ploidy values. The mean mtDNA and nDNA coverage depths were computed using mosdepth82 and an in-house script.

Statistics and reproducibility

No statistical method was used to predetermine the sample size. No data were excluded from the analyses. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.