Evolution analysis of heterogeneous non-small cell lung carcinoma by ultra-deep sequencing of the mitochondrial genome

Accurate assessment of tumour heterogeneity is an important issue that influences prognosis and therapeutic decision in molecular pathology. Due to the shortage of protective histones and a limited DNA repair capacity, the mitochondrial (mt)-genome undergoes high variability during tumour development. Therefore, screening of mt-genome represents a useful molecular tool for assessing precise cell lineages and tracking tumour history. Here, we describe a highly specific and robust multiplex PCR-based ultra-deep sequencing technology for analysis of the whole mt-genome (wmt-seq) on low quality-DNA from formalin-fixed paraffin-embedded tissues. As a proof of concept, we applied the wmt-seq technology to characterize the clonal relationship of non-small cell lung cancer (NSCLC) specimens with multiple lesions (N = 43) that show either different histological subtypes (group I) or pulmonary adenosquamous carcinoma as striking examples of a mixed-histology tumour (group II). The application of wmt-seq demonstrated that most samples bear common mt-mutations in each lesion of an individual patient, indicating a single cell progeny and clonal relationship. Hereby we show the monoclonal origin of histologically heterogeneous NSCLC and demonstrate the evolutionary relation of NSCLC cases carrying heteroplasmic mt-variants.

that most of the somatic mtDNA mutations are homoplasmic 9, 10 , make the mt-genome an ideal target for tumour cell tracking 11, 12 . In the present study, we established an ultra-deep sequencing approach, identifying mt-variants of the entire mt-genome on formalin-fixed paraffin-embedded (FFPE) tumour lesions. This novel technology of 'whole mitochondrial DNA ultra-deep sequencing' (wmt-seq) was applied to non-small cell lung cancer (NSCLC), which represents 80% of all lung cancer. The two main histological subtypes of NSCLC are adenocarcinoma (AD), which accounts for 50% NSCLC, and squamous cell carcinoma (SQ), which accounts for 40% 13 . However, NSCLC exhibits a variety of morphological and molecular features 14 . Moreover, tumour-heterogeneity is a common and well-recognised phenomenon in NSCLC, much more than in other solid tumours 15 . The accurate clonal assessment of NSCLC to either distinguish clinically challenging synchronous and metachronous tumours or differentiate between multiple primary tumour lesions from metastases is an essential basis for prognostic estimation and therapeutic decision 16 .
In order to study the clonal relationship, we used two NSCLC cohorts characterized by i) multiple lesions with different histology within the same patients or ii) tumours with heterogeneous, adenosquamous differentiation. Hereby we clearly show that an ultra-deep sequencing technology of the entire mt-genome (wmt-seq) is a suitable molecular tool for tumour history tracking on pathologically processed FFPE material.

Results and Discussion
In order to track tumourigenesis on FFPE archived material, we developed a novel approach for comprehensive mtDNA mutation analysis using a multiplex PCR-based ultra-deep sequencing approach on multifocal NSCLC lesions of different histological growth patterns.
For molecular tumour tracking, a PCR-based NGS of the entire mt-genome was established. To generate amplicons of a low size (around 60-200 bp), 108 primer sets, spanning the entire mtDNA ( Fig. 1A) were designed according to the mt-sequence of accession no. NC_012920 or taken from previously published primer sets (Supplemental Table S1). Primers were designed to generate overlapping amplicons of 160 bp average length to guarantee robust multiplex PCR (Fig. 1B). This design allowed an efficient amplification of small quantities of highly fragmented DNA (Supplemental Figure S1) as extracted from FFPE material 17 . Though formalin causes high nucleic acid fragmentation and nucleotide modification such as cytosine amination that leads to a C > T/ G > A exchange during PCR reactions, it is used world-wide in diagnostic routine-processing of tissues. Therefore, our approach is of particular interest for analysis of clinical, formalin-fixed samples. Circulating DNA, isolated from plasma and serum samples, is also of low quantity and quality 18,19 . Since in the recent past circulating mtDNA has been highlighted as a prognostic tool in cancer diagnostics 20, 21 , our technical approach benefits from the high efficiency of whole mt-DNA sequencing (wmt-seq) on short DNA fragments.
Previous NGS-based approaches to mtDNA analysis used long-range PCR amplicons, and worked only on native material 22,23 . Others focused on the analysis of the mt-control region (D-LOOP), which encompasses a hotspot region but only covers 7% of the total mt-genome 11,12 . The presented robust multiplex PCR, however, was used for PCR-based target enrichment, providing an approach to wmt-seq that can be applied to different sources of DNA.
Application of ultra-deep wmt-seq on NSCLC lesions. Forty-three nodules from 19 patients with NSCLC were studied for tumour heterogeneity by wmt-seq. The FASTQ files generated by wmt-seq of 43 samples were applied to bioinformatics analysis. The sequences were mapped against the entire human reference genome hg19 to exclude the possibility that nuclear pseudogenes, which have a high homology to parts of the mt-genome, were recognised. As a threshold for variant calling, a minimum read depth was set to 30 and the minimum variant frequency was set to 5%. Polymorphisms were recognised using the MITOMAP (www.mitomap.org), dbSNP-v138 (www.ncbi.nlm.nih.gov/SNP), and HAPMAP_phase_3 (hapmap.ncbi.nlm.nih.gov) databases.
The data analysis output generated an average of 484 × 10 3 reads per sample. A total of 473 × 10 3 reads were mapped to the mtDNA reference sequence, showing that 96% of the mt-genome was covered by a mean read depth of 3.000 reads per 100 bp (Supplemental Table S2). Furthermore, the high sequencing performance was demonstrated by nearly 98% run specificity, proven by only 2.5% off-targets reads (Supplemental Table S3).
The good coverage of the mt-genome and the low rate of site-off reads demonstrated the efficiency of the designed primer sets and a good run performance for the analysis of the entire mt-genome. mt-variants identified in NSCLC by wmt-seq. After data filtering, a total of 640 variants were identified (Supplemental Figure S2). Most of these mutations were T > C/A > G, G > A/C > T base transitions ( Fig. 2A) as shown by Kennedy et al. 24 . In agreement with previous reports 9, 10, 25 , the highest frequency of variants was observed in the D-LOOP regulatory site, which is responsible for the replication and expression of mtDNA (178/640, 28%). Furthermore, a high frequency of variants was found in the genes encoding for the respiratory chain complexes (Supplemental Figure S2). These mutations could lead to an abnormal metabolism as well as to altered apoptosis 26,27 . The defined mutations are listed in Supplemental Table S4.
The majority of detected variants were homoplasmic (520/640; 81%) or highly heteroplasmic, as reported in previous studies 9,25 . Most of the mt-variants might have occurred before tumourigenesis, because germline mtDNA polymorphisms generally reach a relatively high percentage of heteroplasmy or even homoplasmy during cellular phenotype development 28 . Some of these mtDNA polymorphisms (Fig. 2B) were previously shown to be disease-related and should be considered as possibly pathogenic (c.f. MITOMAP database) 29,30 . Furthermore, in agreement with the studies of Brandon et al. our data revealed that most of the identified tumour associated mtDNA variants (Fig. 2B) are frequent in the general population 28 . Notably, two of the detected mtDNA population polymorphisms, namely mtDNA mutations at nucleotides 10398 and 16189 (Fig. 2B), have been shown to be associated with an increased risk of both breast 31 and endometrial cancer 32 . The heteroplasmic T > C exchange at  (Table S2) were designed (A) generating 108 amplicons spanning the whole mitochondrial genome (B). Four primer pools, each including 27 primer sets, were applied to multiplex PCR according to the GeneRead multiplex PCR design of Qiagen, to ensure that overlapping amplicons were generated in separate reaction mixes. position 16189 is located in a hypermutable polyC stretch (16184^16193). This mt-microsatellite region as well as the one at position 302^315 are therefore appropriate regions to detect mt-DNA instability, which was shown previously to be associated with tumour malignancy 33,34 .
Functional mtDNA polymorphisms may help tumours to adapt to new environments and actively grow in metastatic oxygen-rich conditions 28,35 . Therefore, these mtDNA polymorphisms may become fixed and shift from initially heteroplasmic to homoplasmic mutations. Somatic tumour-specific mtDNA mutations inhibit oxidative phosphorylation, increase ROS production, and promote tumour cell proliferation. These somatic mutations may be lost during subsequent tumour oxygenation by replicative segregation, with the cell turning back towards the more oxidative mtDNA genotype favoured in the metastatic environment 28, 35 . NSCLC tumour-tracking and clonality analysis by wmt-seq. Tracking the tumour cell lineage was not possible on samples that carried only homoplasmic mtDNA polymorphisms (i.e. #02, #04, #10 and #11). However, sample sets that harboured heteroplasmic somatic mutations, occurring in all of the lesion's individual nodules (trunk mutations), in a subset (branch mutations), or only in one individual nodule (private mutations) could be considered for evolution analysis.
Notably, branch mutations in tumour nodules TN2 and TN3 of sample #05 at positions A215G, G2268A and G11711A, suggesting that TN2 and TN3 are subclones, derive from TN1. The increasing frequencies of the accumulating additional mutations show that they became dominant over time and provide insights about the evolutionary mechanisms that drive neoplastic progression (Supplemental Table S4, Supplemental Figure S3).
Thus, heteroplasmic somatic mutations indicate a clonal expansion of mtDNA mutants, which become either intra-and intercellularly dominant or submissive (Supplemental Table S4). This is in concordance with our findings that the adenocarcinoma and squamous carcinoma components of the adenosquamous NSCLC sample set (ADSQ-NSCLC, group II, Table 1) also share identical genomic hot-spot mutations e.g. in the TP53 and EGFR genes 36 . The mtDNA constantly predisposed to mutations that may either expand or be lost during tumour progression 9, 10, 38, 39 . As such, mt-mutations are discussed to acquire a selective replicative advantage during cellular development and become dominant, evolving a clonal cell population with homoplasmic mutants 9, 10, 37 . As previously shown, clonal homoplasmic expansion develops by crypt fission, forming large tumour patches with identical mt-mutations 38,39 . In agreement, our data demonstrate that tumour nodules of the cases #01, #05, #07, #L08 and #12-19 arise within the clonal patch harbouring identical mt-mutations (Supplemental Table S4, Fig. 5).
In conclusion, the established multiplex PCR-based ultra-deep sequencing method may be considered as a novel molecular tool for the comprehensive analysis of the entire mt-genome. Though a limited number of private and branch variants, which we identified in different nodules, did not allow us to describe the complete evolutionary dynamics of tumor clonal networks 2, 3 , this technology provides for the first time a highly specific and sensitive approach to study the clonal relationship and tumour history on FFPE material of the pathology routine processing. This is of particular importance in terms of an appropriate tumour-specific treatment strategy when de-novo primary tumours and recurrent cancers have to be differentiated 12 .
Importantly, due to the tolerance of low DNA quantity and quality, the multiplex PCR-based wmt-seq technology might also be applicable to cell-free DNA approaches as a novel option to detect mitochondrial DNA alterations in various body fluids and to monitor cancer progression and mitochondrial disorders.

NSCLC biopsies and DNA extraction. A total of 43 FFPE NSCLC tumour lesions archived from 19
patients were included in this study (Table 1) Table S4).

DNA extraction.
Lesional areas of NSCLC were marked by senior pathologists (SCS, CT, JM) and nodules with more than 80% were scraped off or -if necessary-laser microdissected as we described previously 36 . Thereafter, DNA was automatically extracted using the Maxwell DNA FFPE isolation kit on a Maxwell platform (Promega GmbH, Mannheim, GER) according to the manufacturer's instructions. PCR accessible DNA was determined by qPCR as described previously 40,41 . In brief, PCR amplifiable DNA was quantified by real-time PCR using the HFE gene as amplifying reference (173 bp). Standard curves in a range of 0.195 to 50 ng were prepared from unmutated high quality DNA (Takara Saint-Germain-en-Laye, F). Real-time PCR was then carried out in triplicates with 1 µl DNA each, in a 20 µl reaction mix containing 0.4 µM of the HFE forward and reverse primer (HFE-173F: TTC TCA GCT CCT GGC TCT CAT C and HFE-173R: TCG AAC CTA AAG ACG TAT TGC CC) and the GoTaq ® qPCR Master Mix (Promega).

Primer design, multiplex PCR-based library construction and next generation sequencing.
To generate amplicons of a low size (around 60-200 bp), 108 primer sets spanning the whole mtDNA (Supplemental Table S1) were designed according to the mt-sequence of accession no. NC_012920 or taken from previously published primer sets (Fig. 1A, Supplemental Table S1). For enrichment of the mt-genome by a multiplex PCR, primer sets were pooled in four primer mixes of 2 µM and in each reaction, 10 ng of PCR accessible DNA -representing DNA of around 1500 cells was used (Fig. 1B).
mtDNA was then amplified in four separate multiplex PCR reactions per sample using the GeneRead DNAseq Panel PCR Kit (QIAGEN Inc., Hilden, GER) in accordance with the manufacturer´s protocol. Libraries were pooled and purified using Agencourt ® AMPure ® XP magnetic beads and a Biomek ® FXp workstation (Beckman Data filtering and analysis. The FASTQ files generated by the Illumina platform were analysed by means of the Biomedical Genomics Workbench 2.5.1 (QIAGEN Inc., Hilden, GER; www.qiagenbioinformatics.com). To determine run performance and off-target reads, the FASTQ sequences were mapped against the whole human reference genome hg19. For variant calling and annotation, the mt-genome (Genebank; accession no. NC_012920) served as a reference. Using the workflow tool of the Biomedical Genomics Workbench 2.5.1  L01  TN1  SQ  G2  L09  TN1  AD  G3   TN2  G3  TN2  SQ  G3   L02  TN1  SQ  G2  L10  TN1  AD  G3   TN2  G2  TN2  SQ  G3   L03   TN1   AD   G2  L11  TN1  AD  G3   TN2  G2  TN2  SQ  G3   TN3  G2  L12  TN1  SQ  G3   TN4  G2  TN2  AD  G3   L04  TN1  SQ  G2  L13  TN1  SQ  G3   TN2  G2  TN2  AD  G3   L05   TN1   AD   G2 + G3  L14  TN1  SQ  G3   TN2  G2  TN2  AD  G3   TN3  G2 + G3  L15  TN1  SQ  G3   L06   TN1   AD   G2  TN2  AD  G3   TN2  G2  L16  TN1  AD  G3   TN3  G2  TN2  SQ  G3   TN4  G2  L17  TN1  AD  G3 NSCLC of ADSQ subtype (group II)  TN2  SQ  G3   L07  TN1  AD  G3  L18  TN1  AD  G3   TN2  SQ  G3  TN2  SQ  G3   L08  TN1  AD  G3  L19  TN1  AD  G3   TN2  SQ  G3  TN2 SQ G3 Spurious calls were subsequently filtered by manual analysis using the integrative genomic viewer of the Biomedical Genomics Workbench software. Variants, which occur in different sample sets but with a similar frequency as well as variants which were located in repetitive or highly homologous regions of the mt-genome, in high background noise regions, or at the end of the amplicons were considered as putative false variants. Potential false positive variants were either deleted when they were clearly recognizable as artefacts or were further re-assessed by Sanger sequencing. In addition, whenever DNA was still available, the mt-DNA regions carrying a variant in one lesion sample but not in another of the same patient sample set, were subsequently re-analysed by conventional Sanger sequencing.
Data Availibility. All data generated or analysed during this study are included in this published article and its Supplementary Information files. Filtered variant profiles of wmt-seq can be found in supplemental Table 4, showing variants of the sample sets in the different excel sheets. Raw data are available as FASTQ sequences at the SRA data base (SUB2583044 and SUB2962918).