Genetic and epigenetic intratumor heterogeneity impacts prognosis of lung adenocarcinoma

Intratumor heterogeneity (ITH) of genomic alterations may impact prognosis of lung adenocarcinoma (LUAD). Here, we investigate ITH of somatic copy number alterations (SCNAs), DNA methylation, and point mutations in lung cancer driver genes in 292 tumor samples from 84 patients with LUAD. LUAD samples show substantial SCNA and methylation ITH, and clonal architecture analyses present congruent evolutionary trajectories for SCNAs and DNA methylation aberrations. Methylation ITH mapping to gene promoter areas or tumor suppressor genes is low. Moreover, ITH composed of genetic and epigenetic mechanisms altering the same cancer driver genes is shown in several tumors. To quantify ITH for valid statistical association analyses, we develope an average pairwise ITH index (APITH), which does not depend on the number of samples per tumor. Both APITH indexes for SCNAs and methylation aberrations show significant associations with poor prognosis. This study further establishes the important clinical implications of genetic and epigenetic ITH in LUAD. Many tumors are known to be heterogeneous. Here, the authors examined multiple samples from 84 patients with lung adenocarcinoma and demonstrate that the intratumor heterogeneity of methylation and copy number associates with poor prognosis.

L ung cancer is the leading cause of cancer mortality, causing more than one million deaths worldwide annually 1 . Lung adenocarcinoma (LUAD) is the most common histologic subtype and accounts for about 40% of lung cancer incidence. While hundreds of LUAD tumors have been profiled extensively based on a single biopsy per patient 2,3 , fewer subjects have been investigated for diversity within the tumor through multisampling. A few studies have analyzed the extent of intratumor heterogeneity (ITH) of somatic nucleotide variants (SNVs) and/ or somatic copy number aberrations (SCNAs) [4][5][6] , and others of DNA methylation 7,8 in LUAD. Some of these studies found a positive association between SNV ITH and risk of relapse 4 or SCNA ITH and cancer free survival (combining risk of mortality and of recurrences) 6 . However, ITH in these studies was estimated without adjusting for the number of samples examined for each tumor and the methylation analysis did not take into account sample purity.
Using multiple samples per tumor it is possible to reconstruct the cancer evolutionary history. In prostate and brain tumors, congruent evolutionary trajectories of genetic and epigenetic mechanisms have been reported 9,10 , but to what extent epigenetic changes occur alongside phylogenetic changes in LUAD remains largely unknown.
Here, we perform a comprehensive analysis of ITH of somatic mutations in cancer driver genes, copy number aberrations and DNA methylation in 292 tumor samples from 84 patients with LUAD. We also investigated genetic/epigenetic ITH affecting cancer driver pathways. Moreover, we order genetic and epigenetic events along the LUAD evolutionary trajectories, and test the hypothesis that co-occurrence of genetic and epigenetic mechanisms characterizes the evolution of LUADs. Finally, we assess the clonality of targetable cancer driver genes and evaluate the association of ITH with clinical outcomes (survival and, separately, risk of metastasis) correcting for sample purity and using an unbiased statistical model which takes into account the number of samples examined from each tumor.

Results
Patient characteristics. To determine clonal evolutionary patterns in the genome and epigenome, we performed multi-region sampling from 84 patients with LUAD, of whom 76 (90%) reported past or current smoking. All the samples were treatment-naïve and surgically excised. The demographic characteristics are summarized in Table 1 and Supplementary Data 1. The clinical outcome analyses were based on a median follow-up time of 40.0 months. In total, 292 tumor tissue samples and 157 nontumor samples (including 74 normal tissue samples, 81 blood samples and 2 buccal cell sample) were collected from 84 subjects. ITH was estimated for tumors that included between 2 and 11 tumor samples for each assay type. For reference, we used blood or buccal cells for deep target sequencing and, to factor out high tissue specificity, normal tissue samples for methylation. For SNP arrays we used only tumor samples. Samples used for each assay type are shown in Fig. 1 and Supplementary Fig. 1.
Next, we performed unsupervised clustering based on the global SCNA profiles (see Methods section). Intratumoral heterogeneity was lower than intertumoral heterogeneity, with 226/268 (84.3%) samples from the same tumors clustered together and another 7/268 (2.6%) samples from the same tumors in close proximity to each other ( Supplementary Fig. 2).
In order to quantify levels of intratumoral heterogeneity, we developed an unbiased metric, the average pairwise ITH index (APITH, see Methods section) and applied it to the SCNA profiles of all patients (Fig. 3a). A major advantage of APITH is that its value is not biased by the number of multi-region samples per tumor while a previously used method 6 is strongly affected (Fig. 3b). APITH ranged from 0 to 0.68 with a mean = 0.184, median = 0.157, and standard deviation = 0.153 (Fig. 3c), sug-gesting~18.4% of the genome had different copy number status on average for any pair of tumor samples from the same patient. Of note, the largest APITH values (>0.5) were from patients with only two tumor samples, likely because of large variance in the APITH estimate.  Tumor stage  IA  19  IB  18  IIA  13  IIB  8  IIIA  21  IIIB  3  IV  2  Chemotherapy  Yes  0  No  84  Distant metastasis  44  Deceased  50 Previous TCGA studies 2,13 have reported recurrent SCNA regions for LUAD and identified in these regions 32 candidate driver genes, including 25 amplified and 7 deleted genes. In our samples, SCNA ITH in the recurrently altered regions was similar to SCNA ITH across the whole genome ( Supplementary Fig. 3). SCNAs of candidate driver genes were observed in 11.3-72.5% tumors, comparable to the TCGA study (Supplementary Data 3). SMARCA4 deletion and MCL1 amplification were the most frequent SCNA events (72.5% and 68.75%, respectively). KRAS and EGFR amplification were also observed in 38.75% and 41.25% of tumors, respectively, of which 41.9% and 33.3% were public.
We do not report APITH of SNVs since we conducted target sequencing of 37 cancer driver genes only, with resulting low number of mutations identified.
Intratumoral heterogeneity of DNA methylation. We first performed unsupervised hierarchical clustering based on methylation profiles using the 5000 most variable CpG sites across the genome (Fig. 4a) and, separately, limited our analysis to CpGs in promoter regions ( Supplementary Fig. 4). Both analyses confirmed that normal tissue samples from almost all subjects (59/ 61) clustered together. Similarly, 183/205 (89.3%) samples from the same tumors clustered together, showing higher intertumoral heterogeneity than intratumoral heterogeneity.
Previous studies have identified high levels of methylation in promoter regions of some genes, also referred to as the CpG island methylator phenotype (CIMP) in multiple cancer types, including lung cancer 2,14,15 . 16/68 (23.5%) patients had significantly altered CpG island methylator phenotype (CIMP-H) and 47/68 (69.1%) had a normal-like pattern (CIMP-L). In five patients, both CIMP-H and CIMP-L patterns were observed in the same tumor.
We next examined the distribution of DNA methylation ITH based on either the probes across the whole genome or those mapping to specific genomic regions (Fig. 4b). The CpG probes mapping to CpG island regions had a significantly lower APITH compared to those mapping to other regions (p = 1.09 × 10 −10 ), as previously observed in aggressive prostate cancer 10 . Moreover, the CpG probes mapping to gene promoter regions (TSS1500, TSS200, 5′ UTR and first exon) had lower APITH compared to those mapping to gene bodies, 3′ UTR regions and intergenic regions (t-test p = 1.626 × 10 −8 ).
Restricting the analysis to CpG probes mapping to 250 oncogenes and 300 tumor suppressor genes predicted by TUSON Explorer 16 revealed that methylation ITH mapping to tumor suppressor genes was significantly lower than that of oncogenes (t-test p = 1.68 × 10 −17 ) and that of other genes (t-test p = 1.50 × 10 −16 ) (Fig. 4c). Inactivation of tumor-suppressor genes by hypermethylation at promoter regions has been observed in multiple cancer types including lung cancer [17][18][19] . Lower DNA methylation ITH in these regions suggests greater selective pressure which is consistent with their high putative impacts in oncogenic transformation.
ITH of genomic alteration types in cancer driver pathways. We analyzed genetic and epigenetic aberrations of 13 cancer driver genes in the RTK/RAS/RAF pathway that are frequently mutated in LUAD 2 (Fig. 5). For SCNAs, we only included amplification of oncogenes and deletion of tumor suppressor genes. For DNA methylation, we determined abnormality based on probes located in CpG islands at promoter regions of the target genes.
Across the 13 genes, 77/84 (91.7%) tumors harbored genetic or epigenetic alterations in this pathway; ITH was observed in 69 (89.6%) tumors. SNVs, SCNAs, and methylation of the driver genes altered 7.28%, 17.5%, and 4.19% of the tumors, respectively. Different types of genetic or epigenetic alterations affected different samples in the same tumor. For example, in tumor IGC-11-1130, four samples were tested and all had alterations in the KRAS pathway. Among them, two samples had amplification in ROS1 and the other two samples had aberrant DNA methylation in the promoter regions of ALK (Fig. 5).
Of note, when we overlaid the SNVs in target genes on the SCNA-derived and methylation-derived trees, we observed that some genes were altered by different mechanisms in the same trees. For example, tumor IGC-10-1179 had STK11 mutated in the trunk and deleted in a branch (Fig. 6c).
Associations between ITH and clinical outcomes. We tested the association of SCNA and DNA methylation APITH with clinical data and observed no significant correlations with age, tumor stage, or grade (Supplementary Data 4). Smokers had higher APITH of SCNAs (t-test p = 0.035, nominally significant) but similar APITH of methylation compared with non-smokers. Other smoking behaviors (e.g., smoking intensity and duration) were not associated with APITH of SCNA or DNA methylation. Next, we examined the associations between APITH and survival or risk of distant metastasis. For each analysis, we performed a Cox proportional-hazards model weighted or unweighted by the variance of the estimated APITH. For significant associations, we found that weighted analysis had smaller p-values than unweighted analysis, as expected. Thus, we report below results based on weighted analyses (summary of weighted and unweighted results is in Supplementary Datas 5 and 6).
Similar to the TRACERx study 6 , we found that increased SCNA APITH was associated with poor overall survival with p = 0.05 using all patients and p = 0.0044 (HR = 1.77, 95% CI = 1.2-2.6) when restricting the analysis to patients with ≥ 3 tumor samples per tumor (Supplementary Data 5 and Fig. 7a). SCNA APITH was not significantly associated with risk of developing distant metastases (Supplementary Data 5). Of note, for both overall survival and distant metastasis, APITH based on SCNAs in the 37 cancer driver genes provided lower prognostic value than the APITH of SCNAs in the whole genome.
We then tested the association of DNA methylation-based APITH with overall survival and the risk of distant metastases. We found that APITH based on the 5000 most variable CpG probes was associated with overall survival (HR = 1.27, 95% CI = 1.05-1.55, p = 0.016, Supplementary Data 6 and Fig. 7b) but not significantly associated with risk of metastasis (p = 0.14). APITH based on CpG probes mapping to island regions had the strongest association with overall survival (HR = 1.31, 95% CI = 1.10-1.57, p = 0.0028, Fig. 7c) and were also found to be associated with risk of distant metastasis (HR = 1.35, 95% CI = 1.07-1.72, p = 0.012, Fig. 7d). The results for APITH defined based on other genomic regions are in Supplementary Data 6. The CIMP phenotype did not show substantial ITH, i.e., there were only a few tumors with CIMP-H and CIMP-L across samples from the same tumor. Therefore, we could not analyze the association of APITH of CIMP with clinical outcomes.

Discussion
In this study, we investigated genetic and epigenetic intra-tumor heterogeneity based on multi-region sampling per tumor across 84 patients with LUAD. On average, 35% of SNVs in targeted genes were private and~18.4% of the genome had SCNA ITH for any pair of samples from the same tumor. Methylation in CpG islands or gene promoter regions, particularly of tumor suppressor genes, had low ITH. Different types of somatic alterations across samples from the same tumors affected cancer driver genes in the RTK/RAS/RAF or cell cycle pathways. SCNAs and DNA methylation changes showed congruent evolutionary trajectories. Notably, we developed a statistical approach to correctly estimate ITH for any pair of tumor samples from the same patient and showed that ITH of SCNAs and DNA methylation was associated with poor prognosis.
The findings of substantial ITH across different genomic types and of similar tumor evolutionary trajectories for genetic and epigenetic changes are important to understand the biology and natural history of LUAD. Moreover, they are crucial to inform clinical management and therapeutic strategies. Using multi-region sampling, we identified private events in cancer driver genes, which may not be detected by a single biopsy or by limiting the analyses to point mutations. For example, combining genetic and epigenetic changes, 95.2% of tumors had activating events in the RTK pathway, of which~36% were private.
ITH of SCNA and ITH of DNA methylation (overall and in CpG islands) were similarly associated with shorter survival in our study, and ITH of methylation in CpG islands was also associated with higher risk of developing metastasis. Adding both measures in the same model did not significantly improve the prediction value (data not shown), likely because the two measures were highly correlated to each other.
In previous studies, ITH of SCNAs was quantified as the fraction of SCNAs not shared by all samples in the tumor 6 . Clearly, this ITH index positively depends on the number of tumor samples per tumor (Fig. 3b) and thus hinders valid crosspatient comparisons or testing associations with clinical outcomes. Moreover, unobserved factors that are associated with the number of tumor samples per patient, e.g., tumor size or different study sites, may confound the association analysis with clinical outcomes. We propose APITH as an index for ITH, similar in spirit to that previously proposed for quantifying ITH from the observation of SNVs 20 . This index, defined by pairwise distances of genomic profiles, is not biased by the number of samples per tumor and thus allows association testing for any genomic profiling platforms. Crucially, the variance of APITH estimates depend on the number of samples per tumor and thus a naïve statistical association test between APITH and any outcome may have low statistical power. We explicitly addressed this issue by quantifying the variance of APITH and proposing a procedure for its numerical calculation. This variance was used to weight subjects in the regression analyses to achieve the best statistical power, as demonstrated in theoretical analyses (Supplementary Methods). As a confirmation, our empirical results showed that the weighted analyses produced more significant results than the unweighted analyses for the association between APITH and overall survival.
A comparison between the previous TRACERx study 6 and this study using APITH to estimate the ITH of SCNAs is reported in the Supplementary Notes.
In conclusion, our results delineate the genetic and epigenetic ITH in LUAD and provide a rigorous statistical approach to estimate ITH for comparisons across individuals and for associations with clinical outcomes. DNA methylation and genomic changes followed similar evolutionary trajectories and strongly impacted cancer driver genes and pathways in a complex manner. These findings can inform the clinical management of LUADs suggesting that taking multiple biopsies and analyzing multiple genomic types may be needed to capture the landscape of targetable events. Future larger studies are warranted to identify the combination of genomic types and genomic regions to best predict clinical outcomes.  Fig. 1 and Supplementary Fig. 1.
Bioinformatic analyses of DNA methylation data. Bisulfite treatment and Illumina Infinium HumanMethylation450 BeadChip assays were performed to derive the DNA methylation levels. Raw methylated and unmethylated intensities were background-corrected, and dye-bias-equalized, to correct for technical variation in signal between arrays. For background correction, we applied a normal-exponential convolution, using the intensity of the Infinium I probes in the channel opposite their design to measure non-specific signal. For each CpG probe, the DNA methylation level was summarized as the fraction of signal intensity obtained from the methylated beads over the total signal intensity. After excluding CpG probes annotated with genetic variants, in repetitive genomic regions or on the X-chromosome, 338,730 CpG probes remained for analysis. Each CpG probe was annotated as in CpG Island (denoted as CGI), nonCGI (including shores and shelves) or open-sea. Each CpG probe was also annotated as in promoter (TSS200, TSS1500, and first exon), body, 3′UTR in a specific gene or annotated as intergenic. Methylation ITH of specific genomic regions was computed using the average pairwise distance of the top 10% variably expressed probes mapping to that region scaled by the number of probes. To identify potential driver DNA methylation events, we analyzed CpG island regions of cancer driver genes and compared the beta values of tumor samples and corresponding normal samples. We used 0.3 as the cutoff value to call differences in beta values 7 .
Bioinformatic analyses of deep target sequencing data. Deep target sequencing was performed to identify SNVs for 37 established lung cancer driver genes with an average sequencing depth of 500×. The genes were targeted with an Ion Ampliseq panel, and enriched libraries were sequenced using P1 chips on the Ion Proton sequencer. All laboratory analyses were performed at the Cancer Genomics Research Laboratory (CGR) of the Division of Cancer Epidemiology and Genetics, NCI. Sequence data were processed using standard Ion Torrent Suite Software (Thermo Fisher Scientific) version 5.0.7. The data processing pipeline includes signal processing, base calling, quality score assignment, adapter trimming, read alignment to hg19, coverage analysis and somatic variant calling. We used TVC version 5.0.9 to detect somatic mutations. A somatic mutation was detected if the variant allele count >3, coverage >2 in both tumor and normal samples and variant allele fraction ≥ 0.1. The dN/dS ratio was estimated using R package dNdScv 24 .
Bioinformatic analyses of SCNA data. We initially performed SCNA analysis using ASCAT 23  significantly higher or lower at significance level of 0.05 after adjusting for multiple comparisons, the segments were identified as amplified or deleted, respectively. Otherwise they were identified as LOH. In addition, amplifications with at least four copy numbers in oncogenes and deletions with zero copy number in tumor suppressor genes were identified as potential driver events.
Statistical analyses. Regional genetic and epigenetic evolutionary trees for each patient were built using fastme.bal in an R package ape that uses the minimum evolution algorithm based on a distance matrix of SCNA or methylation profiles 25 T6   T1   T4   T5   T2   T3 T1  T5 T6 T4 T3   T2   T7   N2   ATM SNV  KRAS SNV   TP53 SNV   T5   T2   T4  T3   T1   T6   T7  T8   N1   IGC-11-1044   T5   T2   T4  T3   T1   T6   T7  T8 Pairwise distance based on methylation of CpG probes across the whole genome Pairwise distance based on SCNA profile denotes SCNA distance and d 2 ij denotes methylation distance. When both samples have SCNA and methylation profiles, the new distance was defined as . When only SCNA profiles are available, we define Here, the denominators are used to rescale distances so that they are comparable between SCNA and DNA methylation profiles. The analysis of the CIMP was performed using the hierarchical clustering based on the 5000 most variable CpG probes mapping to gene promoter regions and CpG island regions.
Methylation analysis adjusted by sample purity. For a given CpG probe, the observed DNA methylation was a linear combination of the data from the normal and the tumor tissue samples weighted by tumor purity. ITH may be overestimated if purity varies across tumor samples from the same patient. Thus, we estimated purity, π, for each tumor sample and derived the purity adjusted methylation values using R package InfiniumPurify 26 . All downstream methylation analyses were based on purity-adjusted methylation.
Quantification of ITH. The frequently used index 6 that measures ITH for a patient using the fraction of aberrations present in all samples positively depends on the number of multi-region tumor samples. This estimate may vary across tumors making the association analysis between ITH and clinical outcomes problematic. To address this problem, we defined an ITH metric, average pairwise ITH or APITH, for each patient. For a patient with k tumor samples, and with d ij defined as the genomic or epigenetic distance between a pair of samples (i, j), the APITH is defined as the average across all pairs of samples: The expectation of APITH does not depend on the number of multi-region tumor samples. For SCNAs, d ij is calculated as the proportion of the 705,667 probes with different copy number status for (i, j). To investigate ITH of lung cancer driver genes 2,11,12 , we also calculated d ij as the fraction of the genome in these driver gene regions with differing copy number state. For DNA methylation, we define d ij as the Euclidean distance calculated for a given set of CpG probes, e.g., all CpG probes after quality control (QC), CpG probes mapping to CpG island and gene promoter regions. These analyses are informative to investigate whether ITH defined based on specific genomic regions would be useful for predicting prognosis. The variance of the APITH estimate depends on the number of multi-region tumor samples. Intuitively, APITH is more accurate for patients with more multiregion tumor samples and should be weighted up in downstream statistical analyses to optimize statistical power. As described in Supplementary Methods, we heuristically derived sample weights that were used for the weighted Cox proportional-hazards model using svycoxph in the R package survey 27 to investigate the association between APITH and clinical outcomes (overall survival and the risk of distant metastasis). The survival analysis was adjusted for stage, age, and smoking status. KM-plot were stratified by the median APITH of subjects with at least two tumor samples.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The target sequencing data have been deposited in SRA through dbGaP under the accession number phs001169.v2.p1. The SNP array and methylation array data have been deposited in dbGaP under the same accession number. All other data are available tin the Article, Supplementary Information or available from the author upon reasonable request.

Code availability
The corresponding R code has been distributed at https://github.com/xtmgah/ EAGLE_LUAD.