Article | Open | Published:

Sensitive detection of mitochondrial DNA variants for analysis of mitochondrial DNA-enriched extracts from frozen tumor tissue

Abstract

Large variation exists in mitochondrial DNA (mtDNA) not only between but also within individuals. Also in human cancer, tumor-specific mtDNA variation exists. In this work, we describe the comparison of four methods to extract mtDNA as pure as possible from frozen tumor tissue. Also, three state-of-the-art methods for sensitive detection of mtDNA variants were evaluated. The main aim was to develop a procedure to detect low-frequent single-nucleotide mtDNA-specific variants in frozen tumor tissue. We show that of the methods evaluated, DNA extracted from cytosol fractions following exonuclease treatment results in highest mtDNA yield and purity from frozen tumor tissue (270-fold mtDNA enrichment). Next, we demonstrate the sensitivity of detection of low-frequent single-nucleotide mtDNA variants (≤1% allele frequency) in breast cancer cell lines MDA-MB-231 and MCF-7 by single-molecule real-time (SMRT) sequencing, UltraSEEK chemistry based mass spectrometry, and digital PCR. We also show de novo detection and allelic phasing of variants by SMRT sequencing. We conclude that our sensitive procedure to detect low-frequent single-nucleotide mtDNA variants from frozen tumor tissue is based on extraction of DNA from cytosol fractions followed by exonuclease treatment to obtain high mtDNA purity, and subsequent SMRT sequencing for (de novo) detection and allelic phasing of variants.

Introduction

The past decades, extensive genomic analysis of tumor specimens using massive parallel sequencing by large sequencing consortia (e.g. https://www.icgc.org/icgc and http://cancergenome.nih.gov/) have revealed the major somatic drivers of human cancer, that have been reported in numerous studies. However, the small circular genome of the mitochondria has been largely ignored in such analyses. The human mitochondrial DNA (mtDNA) consists of ~16,569 base pairs encoding 37 genes: two rRNAs and twenty-two tRNAs functioning in the mitochondrial translation apparatus and thirteen proteins essential for oxidative phosphorylation. The total number of mtDNA molecules per cell varies between cell types from a few up to several thousand, and depends on both the number of mitochondria per cell as well as the number of mtDNA molecules per mitochondrion1,2,3. Similar to chromosomal DNA in the nucleus (nDNA), mtDNA may contain rare or polymorphic variants. Currently nearly 10,000 variable positions within mtDNA are reported in public databases4. When variation is acquired, genetically different mtDNA molecules can reside within a single cell, referred to as heteroplasmy (that is, >0% and <100% allele frequency per cell). Importantly, heteroplasmic patterns can differ within an individual across tissues5,6,7,8. Despite inherited and somatically acquired variants in mtDNA being associated with multiple human diseases9, the exact significance of somatic mtDNA variants in cancer remains controversial10,11.

Recently, taking advantage of publically available data from the large sequencing consortia, a handful of papers reported on the catalog of somatic mitochondrial variants in multiple tumor types12,13,14. However, a complicating issue in the genomic analysis of mtDNA is the presence of sequences of mitochondrial origin in the nDNA (termed nuclear insertions of mitochondrial origin, NUMTs). NUMTs have likely originated from joining mtDNA/RNA fragments to nDNA ends during double strand break repair15,16 and are found in nearly all eukaryotes that contain mtDNA. This process may occur at any moment during lifetime17 as well as during tumor evolution18. There are fixed NUMTs present in virtually every human genome–and thus reported in the human reference genome–inserted millions of years ago, but also more recent NUMT insertions have been described19. Unfortunately, due to their sequence similarity to mtDNA, NUMTs can interfere with accurate variant detection and thus investigation of mitochondrial heteroplasmy16,19,20,21,22,23. Estimations based on the human reference genome indicate that for each 175 base pairs mtDNA segment an average of 9.5 NUMT copies are present in the human nDNA24, but this number may likely be higher19. In addition, since the insertion of the mitochondrial genome is an ongoing process, this number is even larger in tumor cells since they also contain all somatic insertions events of NUMTs18. In addition, in tumor cells the processes shaping nDNA25,26 are substantially different from the one that shapes the mtDNA13, resulting in somatic variants in NUMTs and complicating accurate mtDNA heteroplasmy detection even further for tumor cells.

Consequently, the large variation in mtDNA between and within individuals as well as the presence of NUMTs demands a highly specific and sensitive detection of mtDNA variants, especially for low-frequent tumor-specific variants. In the study described here, we aimed to develop a sensitive procedure to detect low-frequent single-nucleotide mtDNA variants in frozen tumor tissue. Multiple efforts in developing methods for extraction of pure mtDNA exist27,28,29,30,31,32,33,34. These include methods using commercial kits or (laborious) ultracentrifugation to obtain pure mitochondria, and techniques to enrich for mtDNA by either the isolation technique or enzymatic degradation of nDNA. Unfortunately, the majority of previous studies focused on either cultured cells or cells from the blood and not on more physically and biochemically complex structures formed by tissue specimens. Thus, the application of these techniques to frozen tumor tissue specimens–an important source to assess tumor cell characteristics–has not been shown to date. Therefore, we compared four easily implementable procedures to extract mtDNA as pure as possible from frozen tumor tissue. Also, we evaluated three state-of-the-art techniques for the detection of low-frequent mtDNA-specific variants: Pacific Biosciences’ SMRT sequencing35, UltraSEEK chemistry36 and digital PCR.

Results

Procedure to obtain mtDNA-enriched DNA extracts from frozen tumor tissue

To obtain mtDNA as pure as possible from frozen tumor tissue, our first focus was on the most optimal isolation procedure to extract mtDNA with minimal carry-over of nDNA. For this, we extracted DNA from fresh frozen primary tumor specimens using four easily implementable methods, and compared the yields via quantification of the percentage of mtDNA (Fig. 1A) and total amount of dsDNA (Fig. 1B). A silica-based total cellular DNA extraction method (I) used as reference for yield resulted in median 863 ng (interquartile range IQR 94 ng) dsDNA of which 0.1% (IQR 0.0%) mtDNA. A method (II) based on alkaline extraction–commonly used to extract plasmid DNA and thus designed to extract circular DNA28,30,32,33–yielded median 144 ng (IQR 140 ng) dsDNA with 0.5% (IQR 0.6%) mtDNA. Extracting DNA from isolated mitochondria (III)34 yielded median 825 ng (IQR 529 ng) dsDNA with 0.2% (IQR 0.1%) mtDNA. A selective lysis method (IV) that starts with the disruption of the plasma membrane to release the cellular components29,37 followed by sedimentation of cell nuclei, and DNA extracted from the remaining cytosol fraction yielded median 403 ng (IQR 321 ng) dsDNA with 1.0% (IQR 0.8%) mtDNA. Note that a similar trend was obtained by these methods using frozen cultured cells as input (Supplementary Figure 1A/B). From these results, it is evident that the best isolation procedure to extract mtDNA from frozen tumor tissue is method IV–DNA from cytosol fractions–with the highest mtDNA percentage and sufficient dsDNA yield. To increase the mtDNA fraction, we applied an enzymatic exonuclease reaction to degrade specifically linear nDNA. This greatly increased the percentage of mtDNA in DNA extracts from cytosol fractions, from median 1% (IQR 0.8%) to median 27% (IQR 40%) (Fig. 1C). This result was also obtained when using DNA from frozen cultured cells as input material (Supplementary Figure 1C). Exonuclease treatment on total cellular DNA extracts increased the percentage of mtDNA as well, but not to the same extent as for DNA extracts from cytosol fractions, and total dsDNA yield was lower (Supplementary Figure 2). Concluding, the preferred procedure to obtain mtDNA as pure as possible from fresh frozen tumor tissue is to extract DNA from cytosol fractions followed by exonuclease treatment.

Approach for sequencing of mtDNA

Next we explored sequencing methods for the detection of mtDNA variants. First, whole genome sequencing-by-synthesis (SBS) was applied to total cellular DNA extracts (method I) and DNA extracts from cytosol fractions (method IV), both without and with additional enrichment for mtDNA by exonuclease treatment. As expected, the cell line DNA extract from cytosol fraction treated with exonuclease yielded the highest percentage of aligned reads to mtDNA (86%), whereas the other methods yielded much lower percentages (<25%) (Supplementary Table 1). The DNA extract from cytosol fraction treated with exonuclease derived from fresh frozen tumor tissue yielded a percentage of aligned reads to mtDNA in line with the PCR-based mtDNA percentage (respectively 12% and 10%). Thus, despite the relatively high fraction of 10% mtDNA, a major proportion of reads were derived from nuclear DNA. The observed spread in mtDNA percentage in exonuclease treated method IV extracts from frozen tumor tissue (Fig. 1C) will therefore lead to a variable proportion of mtDNA reads using whole genome SBS. To circumvent this variability, we decided to explore a targeted approach for sequencing mtDNA.

For this, nine primer sets covering the complete mtDNA were evaluated for their specificity to mtDNA, as in silico BLAST search showed that the primers did not match to known NUMT sequences in the reference genome. Specificity of the nine primer sets was confirmed by the absence of PCR products in two mtDNA-depleted cell lines (Supplementary Figure 3), allowing mtDNA-specific sequencing of the nine amplicons using single-molecule real-time (SMRT) sequencing. This method is able to generate long reads, covering each amplicon in a single read. To obtain an estimate of sequencing output and to evaluate variants detected by the whole genome SBS and targeted SMRT sequencing approaches, we compared for the two approaches the sequencing output of MDA-MB-231 DNA extracts from cytosol fraction treated with exonuclease. Whole genome SBS generated a total of 800,504 reads of 100 nucleotides (of which 87% duplicated reads) and after alignment resulted in an evenly distributed coverage of median 201x (IQR 2, range 13–404). The 2,727 reads of 1,738–2,836 base pairs by targeted SMRT sequencing displayed more variable coverage among the amplicons with median 282x (IQR 132, range 87–761) (Supplementary Figure 4). The more variable coverage in targeted SMRT sequencing was mainly due to regions where amplicons overlapped, causing an increase in coverage (Supplementary Figure 4). Both sequencing approaches detected all 29 positions with a documented alternative allele in MDA-MB-231 against rCRS at homoplasmic levels (>99% allele frequency). Also additional heteroplasmic variants were detected, with no major differences observed between the two sequencing approaches (Supplementary File). Given the lower output in read depth per number of generated reads by whole genome SBS sequencing–due to a loss of reads which map to the nuclear genome–and the risk of introducing NUMTs hampering downstream analysis, we continued sequencing experiments using the targeted SMRT sequencing approach.

Sensitive detection of low-frequent mtDNA variants

To detect low-frequent single-nucleotide variants in mtDNA, we evaluated three approaches: SMRT sequencing, UltraSEEK chemistry and digital PCR. As a source of mtDNA we used breast cancer cell lines MDA-MB-231 and MCF-7. A total of respectively 29 and 13 variants alternative to rCRS have been documented in the mtDNA of MDA-MB-231 (also see above) and MCF-7, with a total of 28 positions containing a different allele between the two cell lines. To determine detection limits empirically, we prepared mixtures of the cell lines–considering MDA-MB-231 as the mutant variant–to generate samples with allele frequencies of 0%, 0.001%, 0.01%, 0.1%, 1% and 10% variant. The mixture samples were subjected to the three detection methods, and we evaluated their ability to detect the mutant variant. By SMRT sequencing, we obtained a median coverage of 4,060x per sample (IQR 4,842x, range 648–34,263x) (see Supplementary Table 2 for coverage per sample per amplicon). In the 0% variant allele sample (pure MCF-7), we confirmed all 13 positions with an alternative allele against rCRS38 at >95% allele frequency. At 5/28 positions known to be different between the two cell lines, heteroplasmic variants were observed in all mixture samples (Supplementary Table 3), prompting us to omit these positions in further analysis for limit of detection. Thus, we explored 23 positions by SMRT sequencing and confirmed all variant alleles, with a detection limit of 0.1% for 21 positions and a detection limit of 1% for 2 positions (Table 1 and Supplementary Figure 6A). The UltraSEEK method employs amplification of the region(s) of interest by PCR and subsequent detection of the variant(s)-of-interest via a single-base extension using chain terminators labeled with a moiety for solid phase capture, allowing enrichment of product, and identification of the product using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry36. By UltraSEEK, we explored 7 positions and detected all variant alleles at those positions, with a detection limit of 0.1% for 5 positions and a detection limit of 1% for 2 positions (Table 1 and Supplementary Figure 6B). In digital PCR, a sample is partitioned into many individual parallel probe-based PCR reactions, each reaction contains either one target molecule or none, allowing a “yes” or “no” answer for the target molecule containing the mutant and wildtype allele in each reaction. By digital PCR 2 positions were evaluated for the variant allele, and one variant allele was detected ≥0.01% allele frequency and the other ≥0.1% allele frequency (Table 1 and Supplementary Figure 6C).

Detection of de novo mtDNA variants by SMRT sequencing

Since by SMRT sequencing the entire mtDNA was sequenced, we explored all alternative alleles that were called in the dataset of the six sample mixtures containing 0%, 0.001%, 0.01%, 0.1%, 1% and 10% mutant variant frequency. A total of 132 variants were called at 126 positions (some positions contained more than one alternative allele, Supplementary Table 3). Besides the documented homoplasmic variants for these two cell lines (35 variants, including the 28 differing alleles described above and 7 concordant alleles), 97 de novo variants were detected. Of those, 55 appeared as false positive calls in Integrative Genomics Viewer39 since they were associated with homopolymer regions or were in close proximity to homoplasmic alternative variants (Supplementary Figure 5). Of the remaining 42 de novo variants, the allele frequency ranged from 0.01% to 24.8% (Table 2). To evaluate if those de novo variants are true positive variants or potential false positives, we assessed their validation within the dataset: independent observations of a variant in multiple mixtures, or independent observations of a variant in overlapping regions of the sequenced amplicons. Of the 42 de novo variants, 20 were present in multiple mixtures, whereas 22 were present in one mixture only (Table 2). Also, 5 had been detected in the mutant-only sample (100% MDA-MB-231) that was sequenced at lower depth by both SMRT and SBS sequencing (see Supplementary File). Ten de novo variants were detected in overlapping regions of the sequenced amplicons, and thus represent two independent observations within one sample (Table 2). This resulted in 26 de novo variants that could be validated in our dataset, and thus true positive calls. A total of 16 de novo variants were detected in only a single amplicon in a single sample (Table 2), and can in theory be false positive calls (i.e. PCR errors or sequencing errors). These potential false positive variants had an allele frequency between 0.03% and 0.34%. Based on this, if validation of variants in either multiple samples or multiple amplicons is not possible, a conservative threshold on allele frequency for de novo variant detection of the SMRT sequencing approach would be ≥1.0% allele frequency.

Allelic phasing of mtDNA variants detected by SMRT sequencing

The long read length of SMRT sequencing enables to phase variants i.e. determine if they are present on the same read or on separate reads and thus if they originated from the same or another mtDNA molecule (Fig. 2). By this, we could evaluate if variants phased together with the known homoplasmic variants of the wildtype (MCF-7) or of the mutant (MDA-MB-231) genotype. Of the 42 de novo variants, a total of 32 variants phased together with the wildtype genotype and not with the mutant genotype (Table 2). The variants with an allele frequency ≥0.5% in the wildtype-only mixture (0% mutant) were typically detected in all mixtures, whereas variants ≤0.5% allele frequency in the wildtype-only mixture were typically detected in the mixtures with only low mutant fractions (Table 2), hence the detection limit of the method. The remaining 10 de novo variants phased together with the mutant genotype and not with the wildtype genotype. Among those 10 variants that phased together with the mutant genotype, were the five that had also been detected in the mutant-only sample (100% MDA-MB-231) sequenced at lower depth by both SMRT and SBS sequencing (see Supplementary File). Also here, variants with a higher allele frequency in the mutant-only sample were typically detected in the mixtures with high mutant fractions (Table 2), hence the detection limit of the method. Thus, by SMRT sequencing we were able to evaluate the origin of the 42 de novo variants, phased to either the wildtype or mutant genotype (Table 2).

Discussion

In this research, we aimed to develop a sensitive procedure to detect low-frequent single-nucleotide mtDNA variants from frozen tumor tissue. In assessing tumor cell characteristics, tissue specimens are an important source to detect tumor-specific variants. Especially when the focus is on low-frequent variants, frozen tissue is more suitable than formalin-fixed paraffin-embedded tissue since the latter is prone to deamination artefacts40. We started by establishing an extraction procedure to obtain mtDNA as pure as possible from frozen tumor tissue. The optimal method was DNA from cytosol fractions (method IV) treated with exonuclease, and resulted in a 270-fold mtDNA enrichment when compared to total cellular DNA extraction (27% versus 0.1% mtDNA yield, Fig. 1). The method based on alkaline extraction that is normally applied to extract plasmid DNA has also been described by others for preparation of mtDNA-enriched samples28,30,32,33. In line with the work by Quispe-Tintaya et al.33, we find for frozen cultured cells a good mtDNA enrichment compared to total cellular DNA extraction (158-fold, Supplementary Figure 1). However, application to frozen tumor tissue resulted in only a 5-fold mtDNA enrichment (Fig. 1) indicating that this method is less suited for frozen specimens. The method that extracts DNA from isolated mitochondria has also been described by others34, for which we find for frozen cultured cells similar mtDNA enrichment levels compared to total cellular DNA extraction (3-fold, Supplementary Figure 1). However, again for frozen tumor tissue we observe lower mtDNA enrichment (2-fold, Fig. 1). Note that, although the alkaline-based and mitochondria-based extraction methods were equivalent, different methods were applied to extract total cellular DNA in the above mentioned studies, and even among silica-based extraction methods mtDNA yield can be different41,42. Importantly, DNA from cytosol fractions either with or without exonuclease treatment compared to total cellular DNA extraction did also show better results for cultured cells (resp. 33-fold and 760-fold enrichment, Supplementary Figure 1). Thus, generally, extraction methods that significantly enrich for mtDNA from frozen cultured cells (and possibly also blood cells) do not guarantee a proper enrichment for mtDNA from frozen tissue.

A high fraction of mtDNA obtained within the DNA extract is vital to minimize the presence of NUMTs, which may lead to misinterpretation of mtDNA variants. Due to the variable number of mtDNA molecules per cell and the variable frequency of NUMTs, estimating the potential misinterpretation with NUMTs is difficult and unique for each position in each individual. Since the generation of NUMTs is an ongoing process17,18,19 estimating NUMT frequency is even more difficult for tumor cells since, they contain all private and all somatic NUMT events that have occurred during tumorigenesis and before that time. This is why we have chosen–and recommend–to analyze a mtDNA extract as pure as possible in SMRT sequencing. Exemplifying, in the case of 20x abundance of a NUMT (which is the case for numerous mtDNA regions24) in a cell type with 500 mtDNA molecules, it is possible to misinterpret the NUMT as a mtDNA variant with 8% heteroplasmy (2 × 20/500) in a total cellular DNA extract. Indeed, misinterpretation of non-identical mtDNA and NUMT positions is not a rare event and multiple examples have been highlighted in the literature16,20,21,22,23. Therefore, obtaining a high mtDNA fraction corresponds to obtaining a high number of mtDNA molecules as opposed to nDNA molecules, decreasing the variant allele frequency of the NUMTs, thus diminishing the likelihood for misinterpretation: a 270-fold increase in mtDNA for the example mentioned above would result in suppressing the NUMT variant to 0.03% heteroplasmy (2 × 20/270 × 500).

To conclude, our sensitive procedure to detect low-frequent single-nucleotide mtDNA variants from frozen tumor tissue is based on the extraction of DNA from cytosol fractions followed by exonuclease treatment to obtain high mtDNA yield, and subsequent SMRT sequencing for (de novo) detection and allelic phasing of variants. Orthogonal validation of variants can be done by either UltraSEEK (in the case of numerous variants) or digital PCR (in the case of a few variants). We conclude that the presented approach enables mtDNA-specific detection of de novo variants ≥1% allele frequency.

Materials and Methods

Specimens

Cell lines MDA-MB-231 and MCF-7 were cultured using RPMI (Invitrogen) supplemented with FBS (10%) (Lonza), 100 U/mL penicillin (Invitrogen), 100 µg/mL streptomycin (Invitrogen) and 0.05 mg/mL gentamycin (Invitrogen). A mtDNA-depleted MDA-MB-231 breast cancer cell line (MDA-MB-231-ρ0) was established by culturing MDA-MB-231 cells in the presence of 50 ng/µL ethidium bromide for 100 days in medium supplemented with uridine (0.05 mg/mL) (Sigma-Aldrich) and pyruvate (1 mM) (Invitrogen). Frozen 143B and 143B-ρ0 osteosarcoma cell line pellets were kindly provided by dr. W.N.M. Dinjens (Department of Pathology, Erasmus MC). Fresh frozen primary breast tumor tissue specimens (resection material) were selected from the tumor biobank at the Erasmus MC (n = 10, stored in liquid nitrogen). The use of these patient materials was approved by the medical ethics committee of the Erasmus MC (MEC 02.953) and in accordance to the code of conduct of Federation of Medical Scientific Societies in the Netherlands. In the Netherlands, according to the Code of Conduct, informed consent is not required for retrospective analysis of bio-specimens retrieved during standard of care procedures.

DNA extraction and mtDNA enrichment

Input for frozen tumor tissue was standardized at 20 cryosections of 30 µm thickness, which resulted in an average input of 19.2 mg (range of 5.9–33.4 mg) tumor tissue per extraction. Input for cultured cells was standardized at 1 million frozen cells per extraction. Total cellular DNA was extracted using the NucleoSpin Tissue kit (Macherey-Nagel) according to the supplier’s protocol (method I). Alkaline-based extraction was performed using the QIAprep Spin Miniprep kit (Qiagen), according to the supplier’s protocol (method II). Mitochondria were extracted using the Qproteome mitochondria isolation kit (Qiagen) according to the supplier’s protocol, and subsequently DNA was extracted using the NucleoSpin Tissue kit (above) (method III). To remove cell nuclei, samples were lysed using detergent that dissolves the cellular membrane (1 mL of 0.5x TBE containing 0.5% (v/v) Triton X-10037) for 10 minutes, followed by sedimentation of the nuclei at 1,020 × g for 10 minutes. From the remaining supernatant–the cytosol fraction–DNA was extracted using the QIAamp Circulating Nucleic Acid Kit (Qiagen) according to the suppliers’ protocol (method IV). In experiments to remove linear DNA, extracts (max. 100 ng DNA) were treated with 40 units of the ATP-dependent exonuclease PlasmidSafe (Epicentre) for 3 hours at 37 °C. Exonuclease was heat-inactivated (30 minutes 70 °C) and the circular DNA was purified using ethanol precipitation (70% ethanol).

DNA quantification and mtDNA purity assessment

All DNA extracts were quantified using the Qubit dsDNA HS assay kit (Life Technologies) according to the suppliers’ protocol. Purity of mtDNA was assessed in duplicate runs of a multiplex qPCR assay targeting a nuclear and a mitochondrial encoded gene to calculate the ratio of mtDNA molecules opposed to nDNA molecules by the relative quantitation method (2ΔCq) as described before44. The percentage of mtDNA in the DNA extract was quantified (eq. 1) based on the ratio mtDNA:nDNA molecules and the sizes of the mitochondrial reference genome (16,569 base pairs, NC_012920) and complete reference genome (haploid 3,088,269,805 base pairs, GRCh38). If no amplification signal for the nuclear encoded gene was obtained, the ratio mtDNA:nDNA was set to 20,000,000 corresponding to a mtDNA percentage of 99%.

$$mtDNA\,percentage=\frac{{r}{a}{t}{i}{o}\ast {m}{i}{t}{o}{c}{h}{o}{n}{d}{r}{i}{a}{l}\,genome\,size}{({r}{a}{t}{i}{o}\ast {m}{i}{t}{o}{c}{h}{o}{n}{d}{r}{i}{a}{l}\,genome\,size)+nuclear\,genome\,size}\ast 100$$
(1)

Whole genome sequencing-by-synthesis (SBS)

Input DNA was mechanically sheared using focused-ultrasonicator (Covaris) to yield fragments of ~300 base pairs in length, which required the following shearing-time for different DNA extracts: 90 seconds for total cellular DNA, 120 seconds for total cellular DNA treated with exonuclease, 90 seconds for cytosol fraction DNA, 50 seconds for cytosol fraction DNA treated with exonuclease. Sequence library was created using the Thruplex DNA-seq sample preparation kit (Rubicon Genomics), using 0.1–7.7 ng sheared input DNA. Sequencing was performed on an Illumina HiSeq2500 sequencer using HiSeq Rapid v2 chemistry and yielding 100 nucleotides single-end reads.

UltraSEEK

UltraSEEK assays were designed using the AgenaCx online assay design software which automatically selects the PCR and extension primers (Supplementary Table 4), and adds to each reaction control assays for PCR and capturing. All oligonucleotides were obtained from Integrated DNA Technologies and control oligos from Agena Bioscience GmbH. Reactions were performed as described before36, using reagents obtained from Agena Bioscience. Briefly, PCR (45 cycles) was followed by shrimp alkaline phosphatase treatment and single base primer extension using biotinylated ddNTPs specific for the mutant alleles. After capture of the extended primers using streptavidin-coated magnetic beads, a cation-exchange resin was added for cleaning and 10-15 nl of the reaction was transferred to a SpectroCHIP® Array (a silicon chip with pre-spotted matrix crystals) using an RS1000 Nanodispenser (Agena Bioscience). Data were acquired via matrix-assisted laser desorption/ionization time-of-flight mass spectrometry using a MassARRAY Analyzer 4 (Agena Bioscience). After data processing, a spectrum was produced with relative intensity on the y-axis and mass/charge on the x-axis. Typer Analyzer software was used for data analysis and report generation.

Digital PCR

Custom assays for two alternative variants were performed on the Quantstudio 3D digital PCR system (Thermo Fisher) according to the supplier’s protocol, with an adaption to the DNA input due to high mtDNA copy number. Reactions contained 20 pg of DNA in 1x dPCR mastermix v2, 0.9 µM of each primer (Invitrogen) and 0.2 µM of each probe (Sigma) (Supplementary Table 4). After initial denaturation for 10 minutes at 96 °C, the 40-cycle two-step PCR was performed at 30 seconds denaturation (98 °C) and 120 seconds annealing/extension (56 °C), and followed by a final 2 minute extension (56 °C). To calculate a variant frequency of the alternative variant, the threshold for signal dots was set to at least two dots.

Single Molecule Real-Time (SMRT) sequencing

Amplicons covering the complete mtDNA45,46 (Supplementary Table 4) were generated in singleplex PCR reactions with initial denaturation for 3 minutes at 98 °C, 15 cycles of a three-step PCR with 10 seconds denaturation (98 °C), 30 seconds annealing (67 °C) and 90 seconds extension (72 °C), and final extension (72 °C) for 5 minutes. Each 50 µL reaction contained 2.5 ng of template DNA and 1 unit of Hot-Start Q5 High Fidelity DNA polymerase (NEB) in 1x Q5 reaction buffer, 200 µM dNTPs and 0.5 µM of each 5′-M13 tailed primer (Invitrogen) (Supplementary Table 4). Specificity of the generated products was confirmed using microchip electrophoresis (DNA-12000 reagent kit, Shimadzu). Amplicons were equimolar pooled per sample and purified using AMPure PB paramagnetic beads (Pacific Biosciences) with a 0.6 beads:sample ratio according to the SMRTbell Template Prep Kit protocol and eluted in 10 mM Tris-HCl pH 8.5. The 5′-M13 universal sequence tail of the primers allowed barcoding of each sample by performing 5 amplification cycles of the three-step PCR as described above but with an annealing temperature of 58 °C. Specificity of the generated products was confirmed using microchip electrophoresis (BioAnalyzer, DNA12000 or High Sensitivity DNA kit, Agilent). A final mix of barcoded fragments of all samples was obtained by equimolar pooling and subsequently purified using AMPure PB paramagnetic beads with a 0.6 beads:sample ratio. Concentration of the final mix was determined using the Qubit dsDNA HS assay kit, and SMRTbell library was generated according to the Amplicon Template Preparation and Sequencing guide (Pacific Biosciences). Sequencing was performed on Pacific Biosciences RSII with P6-C4 sequencing chemistry and 360 minutes movie-time or Sequel platforms with version 2 sequencing chemistry and 600 minutes movie-time. A total of twenty-two RSII and two Sequel SMRT cells were used to reach a read depth estimated at 3,000x per sample. In addition, two RSII SMRT cells were used to reach an estimated 5,000x for one sample (cell line mixture with 0.1% mutant allele frequency).

Bioinformatics

Whole genome sequencing-by-synthesis (SBS) reads were trimmed and aligned using hisat247 against the human reference genome GRCh38, after which the percentage of mtDNA was calculated (eq. 2). In addition, for evaluation of detected variants (Supplementary File), SBS reads were aligned against an extended version of rCRS (BWA-MEM version 0.7.15 default parameters48) and duplicate reads marked (Picard MarkDuplicates default parameters http://broadinstitute.github.io/picard/). We aligned the data against extended versions of rCRS (Supplementary Table 5) to compensate for mapping bias due to circularity of the mitochondrial genome.

{percentage}\,{reads}\,{of}\,{mitochondrail}\,{origin}=\frac{{aligned}\,{reads}\,{on}\,{chrM}}{{aligned}\,{reads}\,{on}\,{GRCh38}}\ast 100
(2)

Single Molecule Real-Time (SMRT) sequencing RS bax.h5 files were converted to Sequel BAM files, of which circular consensus reads (CCS) were generated using the CCS2 algorithm for each sample-specific barcode49. Next, a minimum quality threshold of 99% and at least five passes of the SMRTbell were applied to select for highly accurate single-molecule reads. Selected CCS reads were trimmed (Cutadapt50 for primers-tails) and subsequently aligned against an extended rCRS (BWA- MEM version 0.7.15 parameters -k17 -W40 -r10 -A1 -B1 -O1 -E1 -L048). We aligned the data against extended versions of rCRS (Supplementary Table 5) to compensate for mapping bias due to circularity of the mitochondrial genome.

For the comparison between SBS and SMRT sequencing methods (Supplementary File), pileup files were generated (Bioconductor Rsamtools 1.26.2 pileup function with pileupParam min_base_quality = 30, min_mapq = 0, min_nucleotide_depth = 0, min_minor_allele_depth = 0, distinguish_strands = TRUE, distinguish_nucleotides = TRUE, ignore_query_Ns = TRUE, include_deletions = FALSE, include_insertions = FALSE and in the case of SBS data flag isDuplicate = FALSE) and converted back to rCRS positions. For evaluation of detection limit and de novo variant detection for SMRT data, pileup files were generated as described above but with a more stringent threshold on the minimal number of alternative allele reads (min_nucleotide_depth = 5) to minimize detection of potential PCR errors (see Supplementary File). All detected variants were manually inspected in the Integrative Genomics Viewer (IGV, Broad Institute)39. Phasing of variants was done by manual inspection of every read containing the detected alternative variant and evaluating the other detected alternative variants present on that read.

MDA-MB-231 and MCF-7 mitochondrial sequences were obtained from the NCBI GenBank (resp. AB626609.1 and AB626610.1, deposited after resequencing by Imanishi et al.38) and blasted against rCRS to obtain the homoplasmic mtDNA positions alternative to the reference sequence for these two cell lines (NCBI’s nucleotide web blast, https://blast.ncbi.nlm.nih.gov).

Data availability

Sequencing datasets can be accessed as BAM files (.bam) from the European Nucleotide Archive under accession number PRJEB23243.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1. 1.

Robin, E. D. & Wong, R. Mitochondrial DNA molecules and virtual number of mitochondria per cell in mammalian cells. J Cell Physiol 136, 507–513 (1988).

2. 2.

Wiesner, R. J., Ruegg, J. C. & Morano, I. Counting target molecules by exponential polymerase chain reaction: copy number of mitochondrial DNA in rat tissues. Biochem Biophys Res Commun 183, 553–559 (1992).

3. 3.

Legros, F., Malka, F., Frachon, P., Lombes, A. & Rojo, M. Organization and dynamics of human mitochondrial DNA. J Cell Sci 117, 2653–2662 (2004).

4. 4.

Attimonelli, M. et al. HmtDB, a human mitochondrial genomic resource based on variability studies supporting population genetics and biomedical research. BMC Bioinformatics 6(4), S4 (2005).

5. 5.

Samuels, D. C. et al. Recurrent tissue-specific mtDNA mutations are common in humans. PLoS Genet 9, e1003929 (2013).

6. 6.

He, Y. et al. Heteroplasmic mitochondrial DNA mutations in normal and tumour cells. Nature 464, 610–614 (2010).

7. 7.

Li, M. K., Schroder, R., Ni, S. Y., Madea, B. & Stoneking, M. Extensive tissue-related and allele-related mtDNA heteroplasmy suggests positive selection for somatic mutations. P Natl Acad Sci USA 112, 2491–2496 (2015).

8. 8.

Calloway, C. D., Reynolds, R. L., Herrin, G. L. Jr. & Anderson, W. W. The frequency of heteroplasmy in the HVII region of mtDNA differs across tissue types and increases with age. Am J Hum Genet 66, 1384–1397 (2000).

9. 9.

Schon, E. A., DiMauro, S. & Hirano, M. Human mitochondrial DNA: roles of inherited and somatic mutations. Nat Rev Genet 13, 878–890 (2012).

10. 10.

Chatterjee, A., Mambo, E. & Sidransky, D. Mitochondrial DNA mutations in human cancer. Oncogene 25, 4663–4674 (2006).

11. 11.

Wallace, D. C. Mitochondria and cancer. Nat Rev Cancer 12, 685–698 (2012).

12. 12.

Larman, T. C. et al. Spectrum of somatic mitochondrial mutations in five cancers. Proc Natl Acad Sci USA 109, 14087–14091 (2012).

13. 13.

Ju, Y. S. et al. Origins and functional consequences of somatic mitochondrial DNA mutations in human cancer. Elife 3 (2014).

14. 14.

Stewart, J. B. et al. Simultaneous DNA and RNA Mapping of Somatic Mitochondrial Mutations across Diverse Human Cancers. PLoS Genet 11, e1005333 (2015).

15. 15.

Blanchard, J. L. & Schmidt, G. W. Mitochondrial DNA migration events in yeast and humans: integration by a common end-joining mechanism and alternative perspectives on nucleotide substitution patterns. Mol Biol Evol 13, 537–548 (1996).

16. 16.

Hazkani-Covo, E., Zeller, R. M. & Martin, W. Molecular poltergeists: mitochondrial DNA copies (numts) in sequenced nuclear genomes. PLoS Genet 6, e1000834 (2010).

17. 17.

Caro, P. et al. Mitochondrial DNA sequences are present inside nuclear DNA in rat tissues and increase with age. Mitochondrion 10, 479–486 (2010).

18. 18.

Ju, Y. S. et al. Frequent somatic transfer of mitochondrial DNA into the nuclear genome of human cancer cells. Genome Res 25, 814–824 (2015).

19. 19.

Dayama, G., Emery, S. B., Kidd, J. M. & Mills, R. E. The genomic landscape of polymorphic human nuclear mitochondrial insertions. Nucleic Acids Res 42, 12640–12649 (2014).

20. 20.

Parfait, B., Rustin, P., Munnich, A. & Rotig, A. Co-amplification of nuclear pseudogenes and assessment of heteroplasmy of mitochondrial DNA mutations. Biochem Biophys Res Commun 247, 57–59 (1998).

21. 21.

Parr, R. L. et al. The pseudo-mitochondrial genome influences mistakes in heteroplasmy interpretation. BMC Genomics 7, 185 (2006).

22. 22.

Ramos, A. et al. Nuclear insertions of mitochondrial origin: Database updating and usefulness in cancer studies. Mitochondrion 11, 946–953 (2011).

23. 23.

Albayrak, L. et al. The ability of human nuclear DNA to cause false positive low-abundance heteroplasmy calls varies across the mitochondrial genome. BMC Genomics 17, 1017 (2016).

24. 24.

Cui, H. et al. Comprehensive next-generation sequence analyses of the entire mitochondrial genome reveal new insights into the molecular diagnosis of mitochondrial DNA disorders. Genet Med 15, 388–394 (2013).

25. 25.

Alexandrov, L. B. & Stratton, M. R. Mutational signatures: the patterns of somatic mutations hidden in cancer genomes. Curr Opin Genet Dev 24, 52–60 (2014).

26. 26.

Helleday, T., Eshtad, S. & Nik-Zainal, S. Mechanisms underlying mutational signatures in human cancers. Nat Rev Genet 15, 585–598 (2014).

27. 27.

Palva, T. K. & Palva, E. T. Rapid isolation of animal mitochondrial DNA by alkaline extraction. FEBS Lett 192, 267–270 (1985).

28. 28.

Defontaine, A., Lecocq, F. M. & Hallet, J. N. A rapid miniprep method for the preparation of yeast mitochondrial DNA. Nucleic Acids Res 19, 185 (1991).

29. 29.

Lindberg, G. L., Koehler, C. M., Mayfield, J. E., Myers, A. M. & Beitz, D. C. Recovery of mitochondrial DNA from blood leukocytes using detergent lysis. Biochem Genet 30, 27–33 (1992).

30. 30.

Peloquin, J. J., Bird, D. M. & Platzer, E. G. Rapid miniprep isolation of mitochondrial DNA from metacestodes, and free-living and parasitic nematodes. J Parasitol 79, 964–967 (1993).

31. 31.

Yamada, Y. et al. Comparison of different methods for extraction of mitochondrial DNA from human pathogenic yeasts. Jpn J Infect Dis 55, 122–125 (2002).

32. 32.

Graffy, E. A. & Foran, D. R. A simplified method for mitochondrial DNA extraction from head hair shafts. J Forensic Sci 50, 1119–1122 (2005).

33. 33.

Quispe-Tintaya, W., White, R. R., Popov, V. N., Vijg, J. & Maslov, A. Y. Fast mitochondrial DNA isolation from mammalian cells for next-generation sequencing. Biotechniques 55, 133–136 (2013).

34. 34.

Gould, M. P. et al. PCR-Free Enrichment of Mitochondrial DNA from Human Blood and Cell Lines for High Quality Next-Generation DNA Sequencing. PLoS One 10, e0139253 (2015).

35. 35.

Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009).

36. 36.

Mosko, M. J. et al. Ultrasensitive Detection of Multiplexed Somatic Mutations Using MALDI-TOF Mass Spectrometry. J Mol Diagn 18, 23–31 (2016).

37. 37.

van Strijp, D. et al. Complete sequence-based pathway analysis by differential on-chip DNA and RNA extraction from a single cell. Sci Rep 7, 11030 (2017).

38. 38.

Imanishi, H. et al. Mitochondrial DNA mutations regulate metastasis of human breast cancer cells. PLoS One 6, e23401 (2011).

39. 39.

Robinson, J. T. et al. Integrative genomics viewer. Nat Biotechnol 29, 24–26 (2011).

40. 40.

Weerts, M. J. A. et al. Somatic tumor mutations detected by targeted next generation sequencing in minute amounts of serum-derived cell-free DNA. Sci Rep 7, 2136 (2017).

41. 41.

Guo, W., Jiang, L., Bhasin, S., Khan, S. M. & Swerdlow, R. H. DNA extraction procedures meaningfully influence qPCR-based mtDNA copy number determination. Mitochondrion 9, 261–265 (2009).

42. 42.

Andreu, A. L., Martinez, R., Marti, R. & Garcia-Arumi, E. Quantification of mitochondrial DNA copy number: pre-analytical factors. Mitochondrion 9, 242–246 (2009).

43. 43.

Zhang, W., Cui, H. & Wong, L. J. Comprehensive one-step molecular analyses of mitochondrial genome by massively parallel sequencing. Clin Chem 58, 1322–1331 (2012).

44. 44.

Weerts, M. J. et al. Mitochondrial DNA content in breast cancer: Impact on in vitro and in vivo phenotype and patient prognosis. Oncotarget 7, 29166–29176 (2016).

45. 45.

Ramos, A., Santos, C., Alvarez, L., Nogues, R. & Aluja, M. P. Human mitochondrial DNA complete amplification and sequencing: a new validated primer set that prevents nuclear DNA sequences of mitochondrial origin co-amplification. Electrophoresis 30, 1587–1593 (2009).

46. 46.

Ramos, A. et al. Validated primer set that prevents nuclear DNA sequences of mitochondrial origin co-amplification: a revision based on the New Human Genome Reference Sequence (GRCh37). Electrophoresis 32, 782–783 (2011).

47. 47.

Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12, 357–360 (2015).

48. 48.

Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).

49. 49.

Anvar, S. Y. et al. TSSV: a tool for characterization of complex allelic variants in pure and mixed genomes. Bioinformatics 30, 1651–1659 (2014).

50. 50.

Acknowledgements

The authors are especially grateful to S. Müller (Agena Bioscience), H.J.P. Buermans (Leiden Genome Technology Center), A.T. den Dekker (Center for Biomics) and A. van de Stolpe (Philips Research Laboratories) for their contribution and/or technical support. This work was supported by a grant from Philips Research (Eindhoven, The Netherlands).

Author information

Affiliations

1. Department of Medical Oncology and Cancer Genomics Netherlands, Erasmus MC Cancer Institute, Rotterdam, The Netherlands

• M. J. A. Weerts
• , S. Sleijfer
•  & J. W. M. Martens
2. Philips Research Laboratories, High Tech Campus 11, 5656 AE, Eindhoven, The Netherlands

• E. C. Timmermans
• , D. van Strijp
•  & P. J. van der Zaag
3. Leiden Genome Technology Center (LGTC), Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands

• R. H. A. M. Vossen
•  & S. Y. Anvar
4. Center for Biomics, Erasmus MC, Rotterdam, The Netherlands

• M. C. G. N. Van den Hout–van Vroonhoven
•  & W. F. J. van IJcken
5. Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands

• S. Y. Anvar
6. Department of Clinical Pharmacy and Toxicology, Leiden University Medical Center, Leiden, The Netherlands

• S. Y. Anvar

Contributions

M.W., E.T., S.S. and J.M. conceived and designed the study. M.W., E.T., R.V., D.S., W.I., P.Z. and S.A. designed experiments. M.W. processed specimens and carried out experiments. R.V., S.A. and W.I. led the sequencing. M.W., M.H. and S.A. performed data analyses. M.W., S.S. and J.M. prepared the manuscript, which was revised by all authors.

Competing Interests

The authors declare that they have no competing interests.

Corresponding author

Correspondence to M. J. A. Weerts.