Massively parallel single-cell mitochondrial DNA genotyping and chromatin profiling


Natural mitochondrial DNA (mtDNA) mutations enable the inference of clonal relationships among cells. mtDNA can be profiled along with measures of cell state, but has not yet been combined with the massively parallel approaches needed to tackle the complexity of human tissue. Here, we introduce a high-throughput, droplet-based mitochondrial single-cell assay for transposase-accessible chromatin with sequencing (scATAC-seq), a method that combines high-confidence mtDNA mutation calling in thousands of single cells with their concomitant high-quality accessible chromatin profile. This enables the inference of mtDNA heteroplasmy, clonal relationships, cell state and accessible chromatin variation in individual cells. We reveal single-cell variation in heteroplasmy of a pathologic mtDNA variant, which we associate with intra-individual chromatin variability and clonal evolution. We clonally trace thousands of cells from cancers, linking epigenomic variability to subclonal evolution, and infer cellular dynamics of differentiating hematopoietic cells in vitro and in vivo. Taken together, our approach enables the study of cellular population dynamics and clonal properties in vivo.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Optimization of a high-throughput single-cell mtDNA genotyping platform with concomitant accessible chromatin measurements.
Fig. 2: Pathogenic mtDNA variability and clonal evolution in cells derived from a patient with MERRF.
Fig. 3: Identification of high-confidence variants and subclonal structure in TF1 cells.
Fig. 4: Clonal and functional heterogeneity in human malignancies resolved by somatic mtDNA mutations.
Fig. 5: Clonal lineage tracing across accessible chromatin landscapes and time in an in vitro model of hematopoiesis.
Fig. 6: Cellular population dynamics in native hematopoiesis in vivo resolved by mtDNA mutations.

Data availability

Data associated with this work is available at GEO accession GSE142745.

Code availability

Software and documentation for mitochondrial variant calling via mgatk are available at Custom code to reproduce all analyses and figures is available at


  1. 1.

    Stewart, J. B. & Chinnery, P. F. The dynamics of mitochondrial DNA heteroplasmy: implications for human health and disease. Nat. Rev. Genet. 16, 530–542 (2015).

    PubMed  CAS  Google Scholar 

  2. 2.

    Shoffner, J. M. & Wallace, D. C. Mitochondrial genetics: principles and practice. Am. J. Hum. Genet. 51, 1179–1186 (1992).

    PubMed  PubMed Central  CAS  Google Scholar 

  3. 3.

    Elliott, H. R., Samuels, D. C., Eden, J. A., Relton, C. L. & Chinnery, P. F. Pathogenic mitochondrial DNA mutations are common in the general population. Am. J. Hum. Genet. 83, 254–260 (2008).

    PubMed  PubMed Central  CAS  Google Scholar 

  4. 4.

    Morris, J. et al. Pervasive within-mitochondrion single-nucleotide variant heteroplasmy as revealed by single-mitochondrion sequencing. Cell Rep. 21, 2706–2713 (2017).

    PubMed  PubMed Central  CAS  Google Scholar 

  5. 5.

    Kang, E. et al. Age-related accumulation of somatic mitochondrial DNA mutations in adult-derived human iPSCs. Cell Stem Cell 18, 625–636 (2016).

    PubMed  CAS  Google Scholar 

  6. 6.

    Ludwig, L. S. et al. Lineage tracing in humans enabled by mitochondrial mutations and single-cell genomics. Cell 176, 1325–1339.e22 (2019).

    PubMed  PubMed Central  CAS  Google Scholar 

  7. 7.

    Xu, J. et al. Single-cell lineage tracing by endogenous mutations enriched in transposase accessible mitochondrial DNA. eLife 8, e45105 (2019).

    PubMed  PubMed Central  CAS  Google Scholar 

  8. 8.

    Lodato, M. A. et al. Somatic mutation in single human neurons tracks developmental and transcriptional history. Science 350, 94–98 (2015).

    PubMed  PubMed Central  CAS  Google Scholar 

  9. 9.

    Lareau, C. A. et al. Droplet-based combinatorial indexing for massive-scale single-cell chromatin accessibility. Nat. Biotechnol. 37, 916–924 (2019).

    PubMed  CAS  Google Scholar 

  10. 10.

    Satpathy, A. T. et al. Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion. Nat. Biotechnol. 37, 925–936 (2019).

    PubMed  PubMed Central  CAS  Google Scholar 

  11. 11.

    Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 1213–1218 (2013).

    PubMed  PubMed Central  CAS  Google Scholar 

  12. 12.

    Corces, M. R. et al. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat. Methods 14, 959–962 (2017).

    PubMed  PubMed Central  CAS  Google Scholar 

  13. 13.

    Ross, M. G. et al. Characterizing and measuring bias in sequence data. Genome Biol. 14, R51 (2013).

    PubMed  PubMed Central  Google Scholar 

  14. 14.

    Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

    PubMed  PubMed Central  Google Scholar 

  15. 15.

    ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    Google Scholar 

  16. 16.

    Green, B., Bouchier, C., Fairhead, C., Craig, N. L. & Cormack, B. P. Insertion site preference of Mu, Tn5, and Tn7 transposons. Mob. DNA 3, 3 (2012).

    PubMed  PubMed Central  CAS  Google Scholar 

  17. 17.

    Dames, S. et al. The development of next-generation sequencing assays for the mitochondrial genome and 108 nuclear genes associated with mitochondrial disorders. J. Mol. Diagn. 15, 526–534 (2013).

    PubMed  CAS  Google Scholar 

  18. 18.

    Wallace, D. C. & Chalkia, D. Mitochondrial DNA genetics and the heteroplasmy conundrum in evolution and disease. Cold Spring Harb. Perspect. Biol. 5, a021220 (2013).

    PubMed  PubMed Central  Google Scholar 

  19. 19.

    Buenrostro, J. D. et al. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523, 486–490 (2015).

    PubMed  PubMed Central  CAS  Google Scholar 

  20. 20.

    Lee, J. H. et al. Fluorescent in situ sequencing (FISSEQ) of RNA for gene expression profiling in intact cells and tissues. Nat. Protoc. 10, 442–458 (2015).

    PubMed  PubMed Central  CAS  Google Scholar 

  21. 21.

    Wu, S.-P. et al. Increased COUP-TFII expression in adult hearts induces mitochondrial dysfunction resulting in heart failure. Nat. Commun. 6, 8245 (2015).

    PubMed  PubMed Central  Google Scholar 

  22. 22.

    Zunino, R., Schauss, A., Rippstein, P., Andrade-Navarro, M. & McBride, H. M. The SUMO protease SENP5 is required to maintain mitochondrial morphology and function. J. Cell Sci. 120, 1178–1188 (2007).

    PubMed  CAS  Google Scholar 

  23. 23.

    Powell, C. A. et al. TRMT5 mutations cause a defect in post-transcriptional modification of mitochondrial tRNA associated with multiple respiratory-chain deficiencies. Am. J. Hum. Genet. 97, 319–328 (2015).

    PubMed  PubMed Central  CAS  Google Scholar 

  24. 24.

    Kugeratski, F. G. et al. Hypoxic cancer–associated fibroblasts increase NCBP2-AS2/HIAR to promote endothelial sprouting through enhanced VEGF signaling. Sci. Signal. 12, eaan8247 (2019).

    PubMed  PubMed Central  CAS  Google Scholar 

  25. 25.

    Brusco, J. & Haas, K. Interactions between mitochondria and the transcription factor myocyte enhancer factor 2 (MEF2) regulate neuronal structural and functional plasticity and metaplasticity. J. Physiol. 593, 3471–3481 (2015).

    PubMed  PubMed Central  CAS  Google Scholar 

  26. 26.

    Lott, M. T. et al. mtDNA variation and analysis using mitomap and mitomaster. Curr. Protoc. Bioinformatics 44, 1.23.1–26 (2013).

    Google Scholar 

  27. 27.

    Bohrson, C. L. et al. Linked-read analysis identifies mutations in single-cell DNA-sequencing data. Nat. Genet. 51, 749–754 (2019).

    PubMed  PubMed Central  CAS  Google Scholar 

  28. 28.

    Zafar, H., Wang, Y., Nakhleh, L., Navin, N. & Chen, K. Monovar: single-nucleotide variant detection in single cells. Nat. Methods 13, 505–507 (2016).

    PubMed  PubMed Central  CAS  Google Scholar 

  29. 29.

    Roos-Weil, D. et al. Mutational and cytogenetic analyses of 188 CLL patients with trisomy 12: a retrospective study from the French Innovative Leukemia Organization (FILO) working group. Genes Chromosomes Cancer 57, 533–540 (2018).

    PubMed  CAS  Google Scholar 

  30. 30.

    Izumi, D. et al. TIAM1 promotes chemoresistance and tumor invasiveness in colorectal cancer. Cell Death Dis. 10, 267 (2019).

    PubMed  PubMed Central  Google Scholar 

  31. 31.

    Hofbauer, S. W. et al. Tiam1/Rac1 signals contribute to the proliferation and chemoresistance, but not motility, of chronic lymphocytic leukemia cells. Blood 123, 2181–2188 (2014).

    PubMed  CAS  Google Scholar 

  32. 32.

    Damm, F. et al. Acquired initiating mutations in early hematopoietic cells of CLL patients. Cancer Discov. 4, 1088–1101 (2014).

    PubMed  CAS  Google Scholar 

  33. 33.

    Kikushige, Y. et al. Self-renewing hematopoietic stem cell is the primary target in pathogenesis of human chronic lymphocytic leukemia. Cancer Cell 20, 246–259 (2011).

    PubMed  CAS  Google Scholar 

  34. 34.

    Alizadeh, A. A. & Majeti, R. Surprise! HSC are aberrant in chronic lymphocytic leukemia. Cancer Cell 20, 135–136 (2011).

    PubMed  CAS  Google Scholar 

  35. 35.

    Lee-Six, H. et al. Population dynamics of normal human blood inferred from somatic mutations. Nature 561, 473–478 (2018).

    PubMed  PubMed Central  CAS  Google Scholar 

  36. 36.

    Osorio, F. G. et al. Somatic mutations reveal lineage relationships and age-related mutagenesis in human hematopoiesis. Cell Rep. 25, 2308–2316.e4 (2018).

    PubMed  PubMed Central  CAS  Google Scholar 

  37. 37.

    Granja, J. M. et al. Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute leukemia. Nat. Biotechnol. 37, 1458–1465 (2019).

    PubMed  PubMed Central  CAS  Google Scholar 

  38. 38.

    Schep, A. N., Wu, B., Buenrostro, J. D. & Greenleaf, W. J. chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data. Nat. Methods 14, 975–978 (2017).

    PubMed  PubMed Central  CAS  Google Scholar 

  39. 39.

    Buenrostro, J. D. et al. Integrated single-cell analysis maps the continuous regulatory landscape of human hematopoietic differentiation. Cell 173, 1535–1548.e16 (2018).

    PubMed  PubMed Central  CAS  Google Scholar 

  40. 40.

    Cusanovich, D. A. et al. A single-cell atlas of in vivo mammalian chromatin accessibility. Cell 174, 1309–1324.e18 (2018).

    PubMed  PubMed Central  CAS  Google Scholar 

  41. 41.

    Pliner, H. A. et al. Cicero predicts cis-regulatory DNA interactions from single-cell chromatin accessibility data. Mol. Cell 71, 858–871.e8 (2018).

    PubMed  PubMed Central  CAS  Google Scholar 

  42. 42.

    Choi, J. et al. Haemopedia RNA-seq: a database of gene expression during haematopoiesis in mice and humans. Nucleic Acids Res. 47, D780–D785 (2019).

    PubMed  CAS  Google Scholar 

  43. 43.

    Jovanovic, M. et al. Immunogenetics. Dynamic profiling of the protein life cycle in response to pathogens. Science 347, 1259038 (2015).

    PubMed  PubMed Central  Google Scholar 

  44. 44.

    Ju, Y. S. et al. Origins and functional consequences of somatic mitochondrial DNA mutations in human cancer. eLife 3, e02935 (2014).

    PubMed Central  Google Scholar 

  45. 45.

    Lareau, C. A., Ludwig, L. S. & Sankaran, V. G. Longitudinal assessment of clonal mosaicism in human hematopoiesis via mitochondrial mutation tracking. Blood Adv. 3, 4161–4165 (2019).

    PubMed  PubMed Central  Google Scholar 

  46. 46.

    Rodriguez-Fraticelli, A. E. et al. Clonal analysis of lineage fate in native haematopoiesis. Nature 553, 212–216 (2018).

    PubMed  PubMed Central  CAS  Google Scholar 

  47. 47.

    Sun, J. et al. Clonal dynamics of native haematopoiesis. Nature 514, 322–327 (2014).

    PubMed  PubMed Central  CAS  Google Scholar 

  48. 48.

    Pei, W. et al. Polylox barcoding reveals haematopoietic stem cell fates realized in vivo. Nature 548, 456–460 (2017).

    PubMed  PubMed Central  CAS  Google Scholar 

  49. 49.

    Biasco, L. et al. In vivo tracking of human hematopoiesis reveals patterns of clonal dynamics during early and steady-state reconstitution phases. Cell Stem Cell 19, 107–119 (2016).

    PubMed  PubMed Central  CAS  Google Scholar 

  50. 50.

    Scala, S. et al. Dynamics of genetically engineered hematopoietic stem and progenitor cells after autologous transplantation in humans. Nat. Med. 24, 1683–1690 (2018).

    PubMed  CAS  Google Scholar 

  51. 51.

    Nam, A. S. et al. Somatic mutations and cell identity linked by genotyping of transcriptomes. Nature 571, 355–360 (2019).

    PubMed  PubMed Central  CAS  Google Scholar 

  52. 52.

    Walker, M. A. et al. Purifying selection against pathogenic mitochondrial DNA in human T cells. N. Engl. J. Med. (2020).

  53. 53.

    Corral-Debrinski, M. et al. Marked changes in mitochondrial DNA deletion levels in Alzheimer brains. Genomics 23, 471–476 (1994).

    PubMed  CAS  Google Scholar 

  54. 54.

    Bender, A. et al. High levels of mitochondrial DNA deletions in substantia nigra neurons in aging and Parkinson disease. Nat. Genet. 38, 515–517 (2006).

    PubMed  CAS  Google Scholar 

  55. 55.

    Lee, S. R. & Han, J. Mitochondrial mutations in cardiac disorders. Adv. Exp. Med. Biol. 982, 81–111 (2017).

    PubMed  CAS  Google Scholar 

  56. 56.

    Triska, P. et al. Landscape of germline and somatic mitochondrial DNA mutations in pediatric malignancies. Cancer Res. 79, 7 (2019).

    Google Scholar 

  57. 57.

    Sun, N., Youle, R. J. & Finkel, T. The mitochondrial basis of aging. Mol. Cell 61, 654–666 (2016).

    PubMed  PubMed Central  CAS  Google Scholar 

  58. 58.

    Hu, J. et al. Isolation and functional characterization of human erythroblasts at distinct stages: implications for understanding of normal and disordered erythropoiesis in vivo. Blood 121, 3246–3253 (2013).

    PubMed  PubMed Central  CAS  Google Scholar 

  59. 59.

    Giani, F. C. et al. Targeted application of human genetic variation can improve red blood cell production from stem cells. Cell Stem Cell 18, 73–78 (2016).

    PubMed  CAS  Google Scholar 

  60. 60.

    Huang, W., Li, L., Myers, J. R. & Marth, G. T. ART: a next-generation sequencing read simulator. Bioinformatics 28, 593–594 (2012).

    PubMed  Google Scholar 

  61. 61.

    Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    PubMed  PubMed Central  CAS  Google Scholar 

  62. 62.

    Lareau, C. A., Ma, S., Duarte, F. M. & Buenrostro, J. D. Inference and effects of barcode multiplets in droplet-based single-cell assays. Nat. Commun. 11, 866 (2020).

    PubMed  PubMed Central  CAS  Google Scholar 

  63. 63.

    Lander, E. S. & Waterman, M. S. Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics 2, 231–239 (1988).

    PubMed  CAS  Google Scholar 

  64. 64.

    Chen, F., Tillberg, P. W. & Boyden, E. S. Optical imaging. Expansion microscopy. Science 347, 543–548 (2015).

    PubMed  PubMed Central  CAS  Google Scholar 

  65. 65.

    van Dekken, H., Pinkel, D., Mullikin, J. & Gray, J. W. Enzymatic production of single-stranded DNA as a target for fluorescence in situ hybridization. Chromosoma 97, 1–5 (1988).

    PubMed  Google Scholar 

  66. 66.

    Larsson, C. et al. In situ genotyping individual DNA molecules by target-primed rolling-circle amplification of padlock probes. Nat. Methods 1, 227–232 (2004).

    PubMed  CAS  Google Scholar 

  67. 67.

    Schwartz, S., Oren, R. & Ast, G. Detection and removal of biases in the analysis of next-generation sequencing reads. PLoS ONE 6, e16685 (2011).

    PubMed  PubMed Central  CAS  Google Scholar 

  68. 68.

    Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).

    PubMed  PubMed Central  CAS  Google Scholar 

  69. 69.

    Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. Preprint at arXiv (2012).

  70. 70.

    Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902.e21 (2019).

    PubMed  PubMed Central  CAS  Google Scholar 

  71. 71.

    Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37, 38–44 (2019).

    CAS  Google Scholar 

Download references


We are grateful to E. Bao, J. Ulirsch, E. Fiskin and other members of the Sankaran and Regev laboratories for helpful discussion. We acknowledge support from the Broad Institute and the Whitehead Institute Flow Cytometry Core facilities. This research was supported by National Institutes of Health grants no. F31 CA232670 (C.A.L.), no. R01 CA208756 (N.H.), no. P01 CA206978 (C.J.W. and G.G.), no. U10 CA180861 (C.J.W.), no. R01 DK103794 (V.G.S.) and no. R33 HL120791 (V.G.S.); a gift from Arthur, Sandra, and Sarah Irving (N.H.); a gift from the Lodish Family to Boston Children’s Hospital (V.G.S.); the New York Stem Cell Foundation (NYSCF, V.G.S.); and the Howard Hughes Medical Institute and Klarman Cell Observatory (A.R.). S.H.G. is supported by funding from the Kay Kendall Leukaemia Fund. K.P. is supported by a research fellowship of the German Research Foundation (DFG) and a Stand Up To Cancer Peggy Prescott Early Career Scientist Award in Colorectal Cancer Research. G.G. is supported by the Paul C. Zamecnick chair. C.J.W. is a scholar of the Leukemia and Lymphoma Society. F.C. and J.D.B were supported by the Allen Distinguished Investigator Program. V.G.S. is an NYSCF-Robertson Investigator. We are grateful to the patients who made this work possible.

Author information




C.A.L. and L.S.L. conceived and designed the project with guidance from A.R. and V.G.S. C.A.L. developed the software and led data analysis. L.S.L. and C.M. developed the mtscATAC-seq experimental protocol. L.S.L. led, designed and performed experiments with assistance from C.M., W.L. and E.C. S.H.G. processed CLL patient samples with L.S.L. T.Z. performed the in situ genotyping experiments. Z.C. and J.M.V. analyzed data. K.P. processed the colorectal cancer specimen. D.R. and G.G. aided with exome sequencing. F.C., J.D.B., M.J.A., G.M.B., N.H., C.J.W., A.R. and V.G.S. each supervised various aspects of this work. A.R. and V.G.S. provided overall project oversight and acquired funding. C.A.L., L.S.L., A.R. and V.G.S. wrote the manuscript with input from all authors.

Corresponding authors

Correspondence to Caleb A. Lareau or Leif S. Ludwig or Aviv Regev or Vijay G. Sankaran.

Ethics declarations

Competing interests

The Broad Institute has filed for a patent related to lineage tracing using mtDNA mutations where C.A.L., L.S.L., C.M., J.D.B., A.R. and V.G.S. are named inventors. J.D.B. holds patents related to ATAC-seq. N.H. and C.J.W. are co-founders, equity holders and SAB members of Neon Therapeutics, Inc., and receive research funding from Pharmacyclics. G.G. receives research funding from IBM and Pharmacyclics. A.R. is a founder and equity holder of Celsius Therapeutics, an equity holder in Immunitas Therapeutics and an SAB member of Syros Pharmaceuticals, Neogene Therapeutics, Asimov and ThermoFisher Scientific.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Additional validation of biotechnological and computational basis for single-cell mtDNA genotyping.

(a) Comparison of chromatin library complexity (estimated number of unique fragments) across screened lysis conditions as shown in Fig. 1. (b) The same variable lysis conditions showing the TSS rate per cell. (c) BioAnalyzer traces of mtscATAC-seq library fragment size distribution for regular conditions and mtDNA-enriched conditions. (d) Heteroplasmy heatmap of single cells (columns) for 43 private homoplasmic mutations (rows) in the TF1 or GM11906 cell lines with (left) and without (right) FA treatment. Color bar, heteroplasmy (% allele frequency). (e) Comparison of mtDNA fragment complexity and chromatin complexity between the original regular 10x scATAC protocol and modified lysis conditions with and without formaldehyde (FA) treatment. (f) Heteroplasmy of sum of single-cell ATAC-seq libraries with variable FA treatment. (g) Schematic, method, and results of improving mtDNA genome coverage via hard-masking the reference genome (Methods). (h) Comparison of % reads mapping to mtDNA and (i) chromatin complexity with (red) and without (blue) the hard masking. (j) Comparison of average coverage of mtscATAC-seq (y axis) and GC content (x axis) at each 50 bp bin (dot) in the mtDNA genome. (k) Accessible chromatin landscapes aggregated from single cells near the ETV2 locus for both cell lines as assayed via regular scATAC-seq and mtscATAC-seq. For boxplots in (a,b,e,h,i), each condition represents the top 1,000 cells (based on chromatin complexity) for one experiment. Boxplots: center line, median; box limits, first and third quartiles; whiskers, 1.5x interquartile range.

Extended Data Fig. 2 Further inferences in analysis of the GM11906 (MERRF) lymphoblastoid cell line.

(a) Alternative field of view for GM11906 in situ genotyping imaging experiment. Representative image selected from one of seven fields of view for one experiment. Pseudo bulk accessibility track plots are shown for the (b) ETV2 and (c) CD19 loci. Pseudo-bulk groups represent 0-10% (low), 10-60% (mid), and 60-100% (high) m.8344 A > G heteroplasmy. (d) Spearman correlation of heteroplasmy against the ChIP-seq deviation scores computed via chromVAR. Each bar is a single transcription factor with selected factors highlighted. (e) Depiction of MEF2C deviation scores from chromVAR for m.8344 A > G heteroplasmy bins, corresponding to 0-10% (Low), 10-60% (Mid), and 60-100% (High). Boxplots: center line, median; box limits, first and third quartiles; whiskers, 1.5x interquartile range. Bins contain single cells collected over one experiment where bins correspond to high (>60%; n = 273), intermediate (10-60%; n = 228), and low (<10%; n = 313) heteroplasmy (see Fig. 2c).

Extended Data Fig. 3 Supporting information for somatic mtDNA mutation calling via mgatk.

(a) Venn diagrams depicting comparisons of heteroplasmic mutations identified by mgatk, samtools/ bcftools, and (b) FreeBayes. (c) Comparison of heteroplasmy estimated from reads aligned to either strand. The top row are three variants called specifically by mgatk; 3549 C > A was identified only by FreeBayes. 7399 C > G and 546 A > C were called specifically by bcftools. (d) Identification of 67 and (e) 36 heteroplasmic variants from previously published Smart-seq2 hematopoietic colony data. Blue variants represent known RNA-editing events. (f) Comparison of population heteroplasmy values for variants replicated by mgatk from a previous supervised approach. Boxplots: center line, median; box limits, first and third quartiles; whiskers, 1.5x interquartile range. Statistical test: two-sided Mann-Whitney U Test. (g) Concordance between discerning cells sharing a clonal origin based on colony-specific mtDNA mutations and their unsupervised identification using indicated algorithms (mgatk, bcftools, FreeBayes) and previously described supervised approach6. Receiver operating characteristic (ROC) using the per cell pair mtDNA similarity metric to identify pairs of cells sharing a clonal origin based on sets of mtDNA variants. The number of variants in each set is also depicted. (h) Area under the ROC (AUROC) is denoted for each donor group and indicated variant caller as depicted in (g). Each bar represents the statistic from one evaluation per donor per tool. (i) Estimated sensitivity (y axis, left), positive predictive value (y axis, right), and (j) estimated % dropout (y axis) for mtscATAC-seq at different simulated levels of heteroplasmy (x axis; Methods). Vertical line: 5% heteroplasmy for a subclonal mutation. The in-graph numbers indicate the values from the curve at a single-cell heteroplasmy of 5% with colors corresponding to different per-cell coverage values in the simulation.

Extended Data Fig. 4 Supporting information for clonal and functional heterogeneity in malignant populations revealed by mtDNA mutations.

(a) Flow cytometry gating strategy of CLL patient derived PBMCs showing expansion of CD19 + cells. (b) Identification of high-confidence variants for Patient 1 (top) and Patient 2 (bottom). The number of variants n is indicated. (c) Inference of subclonal structure from somatic mtDNA mutations for patient 2. Cells (columns) are clustered based on mitochondrial genotypes (rows). Colors at the top of the heatmap represent clusters or putative subclones. Color bar, heteroplasmy (% allele frequency). (d) Dot plots showing the mitochondrial genome coverage (log10; y-axis) for the top 500 cells per technology for four indicated scRNA-seq technologies. (e) The mean per-position mitochondrial genome coverage for the same 500 cells as in (d). (f) Volcano plot showing differential gene expression analysis from major and minor clonotypes defined by BCR sequence. Immunoglobulin (IG) genes are shown in purple; all other genes with an FDR < 0.05 are shown in blue. (g) Results for per-peak chi-squared association with sub-clonal group. Each dot is a peak rank-sorted by the chi-squared statistic. (h) Heteroplasmy from the sum of single-cells in the CD19 + and CD19- mtscATAC-seq experiments for indicated mutations and patients. (i) Histograms showing the distribution of heteroplasmy across the profiled population of cells for six selected variants, four from Patient 1 (left) and two from Patient 2 (right). The number of variants in the top heteroplasmy bin (>90%) are shown in red. (j) Allele frequency from the sum of single cells from the 5’ CD19 + and CD19- scRNA-seq libraries for two indicated variants - chr4:109,084,804A > C (‘LEF1’) and chr19:36,394,730G > A (‘HSCT’). (k) Corroboration of T cells based on gene expression signatures and carrying indicated somatic nuclear and mtDNA mutations (Patient 2). (l) Gene activity scores supporting cell type annotations in Fig. 4n. Arrows: cluster enriched for respective gene score. (m) All mtDNA mutations (rows) by cells (columns) observed in the CRC tumor. Columns are colored by defined chromatin cell state defined as in Fig. 4n. (n,o) Chromatin-derived UMAP with cells marked by select mtDNA mutations enriched in (n) epithelial and (o) immune cells. Color bar: heteroplasmy (% allele frequency).

Extended Data Fig. 5 Supporting information for clonal lineage tracing across accessible chromatin landscapes and time in an in vitro model of hematopoiesis.

(a) Depiction of single-cell UMAP embedding showing the original distribution of cells for each library/ time point, (b) relative cell density, (c) Louvain cluster, and (d) mitochondrial DNA coverage per single cell. (e) Overlap of variants called for each of the two datasets. (f) Comparison of log2 fold change in heteroplasmy from day 14 to day 8 for 19 overlapping variants. The p-value shown is for the beta 1 coefficient of the depicted linear regression model. (g) Proportion of cells (%) at day 8 of the 500 cell (x axis) and 800 cell (y axis) input culture carrying shared mtDNA variants as derived from panel (e) suggests limited clonal overlap. (h) Known pathogenic mtDNA mutations detected from a healthy donor. Each dot is a cell separated by the sampled library. All cells with a heteroplasmy of at least 2% are shown. (i) Depiction of unsupervised clustering of groups of cells based on shared somatic mtDNA mutations (y-axis) with corresponding individual mtDNA mutations (x-axis) associated with each cluster for the 500 cell input and (j) 800 cell input culture. Color bar, heteroplasmy (% allele frequency). (k) Fraction of cells (y-axis) carrying number of somatic mtDNA variants (x-axis) above indicated thresholds (≥1%, ≥5%, ≥10% heteroplasmy; red, black, and blue lines, respectively) for indicated cultures.

Extended Data Fig. 6 Support information for cellular population dynamics in native hematopoiesis in vivo resolved by mtDNA based tracing.

(a) Assignment probabilities (%, colorbar) of scRNA-seq data derived transfer labels (rows) across mtscATAC-seq derived Louvian data clusters (columns) as identified in Fig. 6d. (b) Distribution of percent mitochondrial reads derived from mtscATAC-seq data (y axis) across PBMC populations (x axis). (c) Percent mitochondrial counts (y axis) in FACS sorted populations (x axis) from bulk RNA-seq data. (d) Identification of high confidence variants from CD34 + HSPC and PBMC cell populations. Number of variants passing both thresholds (dotted lines) is indicated. A Venn diagram depicts the overlap of shared mutations. (e) Percent duplicates of sequenced mtDNA fragments, mean mtDNA coverage and percent mitochondrial reads for CD34 + HSPC and PBMC cell populations as derived from mtscATAC-seq data. Boxplots: center line, median; box limits, first and third quartiles; whiskers, 1.5x interquartile range. (f) Distribution of maximum level of heteroplasmy of mgatk derived variants from (d) in individual cells. (g) Unsupervised clustering of groups of cells based on shared somatic mtDNA mutations (y-axis) with corresponding individual mtDNA mutations (x-axis) associated with each cluster/clone. (h) Fold-change (observed over expected) of identified rare mutations (y axis) in each class of mononucleotide and trinucleotide change from the CD34 + HSPC data. (i) Comparison of pseudobulk allele frequencies from mgatk identified variants (blue) and rare variants (green). Boxplots for (b,c,e): center line, median; box limits, first and third quartiles; whiskers, 1.5x interquartile range. Bounds are contained within the data range shown. Sample sizes exceed 100 single cells from one experiment.

Supplementary information

Reporting Summary

Supplementary Table 1

Summary of conditions and statistics for mtscATAC-seq optimization related to Fig. 1b.

Supplementary Table 2

Heteroplasmy statistics across fields of view for m.8344 in GM11906 cells.

Supplementary Table 3

Quality control statistics, thresholds and hyperparameter values for populations and mutation calling for all applications.

Supplementary Table 4

Data sources for public datasets analyzed with this work.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lareau, C.A., Ludwig, L.S., Muus, C. et al. Massively parallel single-cell mitochondrial DNA genotyping and chromatin profiling. Nat Biotechnol (2020).

Download citation


Quick links

Sign up for the Nature Briefing newsletter for a daily update on COVID-19 science.
Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing