Evolution of DNA methylome from precancer to invasive lung adenocarcinoma

The evolution of DNA methylation at genome level and methylation intra-tumor heterogeneity (ITH) during early lung carcinogenesis has not been systematically studied. We performed multiregional reduced representation bisulfite sequencing (RRBS) of 62 resected lung nodules from 39 patients including atypical adenomatous hyperplasia (AAH, n=14), adenocarcinoma in situ (AIS, n=15), minimally invasive adenocarcinoma (MIA, n=22), and invasive adenocarcinoma (ADC, n=11). We observed significantly higher level of methylation ITH in later-stage lesions and gradual increase in both hyper-and hypomethylation compared to matched normal lung tissues over the course of neoplastic progression. The phyloepigenetic patterns inferred from methylation aberrations resembled those based on somatic mutations suggesting parallel methylation and genetic evolution during the early lung carcinogenesis. De-convolution of transcriptomic profiles from a previously published cohort and RBBS data from the current cohort demonstrated higher ratios of T regulatory cells (Tregs) versus CD8+ T cells in later-stage diseases implying progressive immunosuppression with neoplastic progression. Furthermore, increased global hypomethylation was associated with higher mutation burden, higher copy number aberration burden, higher allelic imbalance burden as well as higher Treg/CD8 ratio highlighting the potential impact of methylation states on chromosomal instability, mutagenesis and the tumor immune microenvironment.


ABSTRACT
The evolution of DNA methylation at genome level and methylation intra-tumor heterogeneity (ITH) during early lung carcinogenesis has not been systematically studied.
We performed multiregional reduced representation bisulfite sequencing (RRBS) of 62 resected lung nodules from 39 patients including atypical adenomatous hyperplasia (AAH, n=14), adenocarcinoma in situ (AIS, n=15), minimally invasive adenocarcinoma (MIA, n=22), and invasive adenocarcinoma (ADC, n=11). We observed significantly higher level of methylation ITH in later-stage lesions and gradual increase in both hyperand hypomethylation compared to matched normal lung tissues over the course of neoplastic progression. The phyloepigenetic patterns inferred from methylation aberrations resembled those based on somatic mutations suggesting parallel methylation and genetic evolution during the early lung carcinogenesis. De-convolution of transcriptomic profiles from a previously published cohort and RBBS data from the current cohort demonstrated higher ratios of T regulatory cells (Tregs) versus CD8+ T cells in later-stage diseases implying progressive immunosuppression with neoplastic progression. Furthermore, increased global hypomethylation was associated with higher mutation burden, higher copy number aberration burden, higher allelic imbalance burden as well as higher Treg/CD8 ratio highlighting the potential impact of methylation states on chromosomal instability, mutagenesis and the tumor immune microenvironment.

INTRODUCTION
Lung cancer remains the leading cause of cancer-related death worldwide, yet it is curable if treated early. Many cancers, including lung cancers, are preceded by precancers. Treating precancers to prevent invasive lung cancer is theoretically an attractive approach to reduce lung cancer−associated morbidity and mortality. However, developing strategies for lung cancer prevention has been challenging owing to our limited understanding of neoplastic progression from precancers to invasive lung cancers 1 . Lung adenocarcinoma (ADC) is the most common histologic subtype, accounting for more than 50% of all lung cancers 2 . It has been proposed that invasive lung ADC evolves from atypical adenomatous hyperplasia (AAH), the only recognized precancer for lung ADC, which could progress to ADC in situ (AIS), a pre-invasive lung cancer, then to minimally invasive ADC (MIA), and eventually frankly invasive ADC 3 .
AAH, AIS, MIA and some ADC often present as pulmonary nodules with a unique radiologic feature termed ground-glass opacity (GGO, hazy nodular opacity with the preservation of underlying bronchial and vascular margins) on CT scans 4

. The biology
and clinical course of these lesions are not well defined and their management is controversial. Surgical resection is not the standard of care for treating these lung nodules and the diagnostic yield of biopsy is often low, particularly for GGO-dominant nodules 5 . Therefore, these lung nodules are often referred to as indeterminate pulmonary nodules (IPN) 6 . However, as surgical resection is not often offered, obtaining adequate tissue for comprehensive profiling of these IPNs is difficult, hindering our understanding of the biology underlying these lesions.
We have initiated an international collaboration for the collection and characterization of these lung precancers, pre-invasive and early invasive ADC presenting as IPNs. We recently reported the genomic landscape, including the genomic intra-tumor heterogeneity (ITH), of 116 IPNs from 53 patients and revealed evidence of progressive evolution from AAH to AIS, MIA, and ADC at the single-nucleotide level and macroevolution at the transitions from AAH to AIS, and from AIS to MIA associated with somatic copy number aberration (SCNA) and allelic imbalance (AI), respectively 7 . In addition, we found that genomic ITH is more pronounced in AAH than in ADC, which suggests a clonal sweeping model during the neoplastic progression of AAH 7 .
In addition to mutations, somatic epigenetic alterations such as DNA methylation can also impact neoplastic transformation and fitness [8][9][10][11] . Recent genome-wide methylation profiling studies have revealed that certain alterations, such as the silencing of tumor suppressor genes (TSG) and the activation of genes in stem-like cellular programs [12][13][14] may contribute to carcinogenesis. Our previous study has demonstrated that complex methylation ITH was associated with larger tumor size and increased risk of postsurgical recurrence in patients with invasive lung ADC 15 . Methylation aberrations have been reported in AAH lesions and tumor-adjacent lung tissues suggesting that somatic methylation changes may be early molecular events 16 . However, these pioneer studies only analyzed small numbers of genes implicated in lung carcinogenesis. The dynamic changes in the methylome at the genome level and the evolutionary trajectories of methylation ITH during the initiation and progression of lung precancers have not been studied systematically. In the current study, using a unique cohort of resected IPNs of different histologic stages, we delineated the evolution of the methylation landscape and methylation ITH architecture during progression from precancers to invasive lung ADC.

DNA methylation aberrations increase with the progression of precancers
We performed reduced representation bisulfite sequencing (RRBS) of 62  were clearly different from that of normal lung (Fig. 1A). Similarly, unsupervised hierarchical clustering identified two separate clusters: one comprising normal lung and AAH, and the other including AIS, MIA, and ADC (Supplementary Fig. 1) indicating a major change in the global DNA methylation landscape (i.e., macroevolution) at the transition from AAH to AIS. In addition, different regions from the same IPNs tended to cluster together, suggesting that inter-lesion heterogeneity was more prevalent than intra-lesion heterogeneity. Similarly, the methylation profiles of different regions from the same IPNs were correlated (median coefficient r, 0.884 [range, 0.736-0.969], p < 2.2 x 10 -16 , Supplementary Fig. 2). There was a progressive increase in both hypomethylation and hypermethylation (as compared with matched normal lung tissues from the same patients) from AAH to AIS, MIA, and invasive ADC; hypermethylation emerged as early as AAH, whereas hypomethylation was evident in only AIS, MIA, and ADC. Meanwhile, the correlation between the methylation profiles of IPNs and those of their paired normal tissues progressively decreased (r = 0.885 for AAH-normal, 0.824 for AIS-normal, 0.796 for MIA-normal, and 0.740 for ADC-normal, p < 2.2 × 10 −16 for all 4 comparisons; Fig. 1B). Further quantification of the IPNs' differentially methylated regions (DMRs) compared with their paired normal lung tissues revealed that later-stage IPNs had more CpG sites with hypermethylation (p = 0.3635, Kruskal-Wallis test) and significantly more CpG sites with hypomethylation (p = 0.000287, Kruskal-Wallis test) ( Fig. 1C).

Enrichment of transcription factor binding sites at DMRs
DNA methylation can impact the DNA binding of transcription factors (TFs) to their target sequences, termed "motifs" 17,18 . We searched the motifs that were covered by the DMRs in the IPNs and identified 50 motifs in AAH, 57 in AIS, 64 in MIA, and 66 in ADC that were significantly enriched over all DMRs in each histologic stage (Supplementary Table 3). Some of these motifs were significantly aligned with known motifs recognized by different TFs, such as MYC, ITF2, KLF4, and WT1 (Supplementary Fig. 3A-C, Supplementary Table 4). These TFs are involved in critical biological processes, including cell cycle progression, cell proliferation, apoptosis, tumor metastasis, angiogenesis and immune response [19][20][21][22] , indicating that the epigenetic regulation of these processes, by modifying TF binding capacity, may have potential roles during the early carcinogenesis of lung ADC.

Later-stage disease has higher epigenetic ITH
Molecular ITH can have a profound impact on cancer biology 23,24 , and our previous work demonstrated that methylation ITH is associated with clinical outcome in patients with invasive ADC 15 . To assess the evolution of methylation ITH during the progression of lung precancers, we first calculated the "epiallele shift" 25 , the combinatory difference in epiallele status in all captured loci (>60 reads), for each IPN specimen compared to paired normal lung tissues from the same patients. MIA and invasive ADC had more pronounced epiallele shifts than AAH or AIS did, indicating a higher level of methylation ITH in IPNs of later histologic stages ( Fig. 2A). In addition, the abundance of eloci (loci with distinct epiallele shifts) progressively increased in later-stage IPNs (p = 0.004567, Kruskal−Wallis test) (Fig. 2B). When focusing on the 15 patients with multiple IPNs of different stages (IPNs of different stages with the identical genetic background and exposure history), the abundance of eloci was also higher in later-stage IPNs than in early-stage IPNs in 13 of the 15 patients (Supplementary Fig. 4), further suggesting that later-stage IPNs have a higher level of methylation ITH.
To determine whether methylation ITH has the potential to impact transcriptional dynamics, we examined the relative distance of eloci to the nearest transcription start sites (TSSs). Interestingly, the vast majority of the eloci were much closer to TSSs in invasive ADC than in AAH, AIS, or MIA (Supplementary Fig. 5 Table 5), epigenetic modifications strongly associated with transcriptional repression [29][30][31] , supporting the concept that epigenetic marks co-operate with DNA methylation alterations along the evolution of lung precancers 32,33 .
Furthermore, we calculated the frequency of 16 combinatory DNA methylation patterns at four consecutive CpG sites covered by the same RRBS reads (termed "epipolymorphisms") in each specimen. Again, later-stage IPNs had higher epiallele diversity than early-stage IPNs did (Supplementary Fig. 6). Moreover, ADCs had higher levels of epipolymorphism at eloci than MIA, AIS, or AAH did (Fig. 2C), indicating that the epigenetic states had undergone a greater extent of drifting in late-stage IPNs than in early-stage IPNs.

Genomic and methylation evolution occur in parallel during early lung carcinogenesis
To dissect the evolutionary relationship between the methylome and genome in lung ADC, we constructed phyloepigenetic trees for patients with more than four spatially separated specimens available. The overall structure of the methylation-based phyloepigenetic trees was similar to that of trees based on mutations of the same multiregional specimens 7 (Fig. 3A, Supplementary Fig. 7). Furthermore, the genomic distance based on all somatic mutations was positively correlated with the methylation distance between any pair of samples from the same IPNs (ρ = 0.7283, p < 2.2 × 10 −16 , Spearman's rank correlation test) (Fig. 3B). Taken together, these results suggest that genomic and methylation evolution occurred in parallel during early carcinogenesis in this cohort of lung ADC. Interestingly, in patient C10, promoter hypermethylation of TSC2, a candidate TSG known to inhibit cell growth in lung 34,35 , was identified in AIS specimens, whereas copy number loss was identified in AAH lesions from the same patient. Taken together, these data suggested convergent evolution, whereby the same genes or pathways are activated or inactivated by different mechanisms in different cancer cell clones during lung cancer development and progression.

IPNs
As an essential chemical modification, the methylation states can directly impact the chromosomal structure and DNA mutagenesis. It has been well documented that global hypomethylation is associated with chromosomal instability (CIN) and increased mutational rates in cancers [36][37][38] . To further depict the interaction between genome and epigenome during early carcinogenesis of lung ADC, we sought to delineate the association between global hypomethylation and genomic changes of these IPNs. As RRBS is designed to be overrepresented by CpG islands but low in repetitive DNA sequences 39 , and therefore for suboptimal quantification of global hypomethylation, we used long interspersed transposable elements-1 (LINE-1), a widely used surrogate marker for global DNA methylation 40,41 to assess these IPNs. There was a significant decrease of LINE-1 methylation in AIS, MIA and ADC compared 42 to normal lung tissues or AAH (Fig. 4A) indicating increased global hypomethylation in IPNs of later histologic stages. Importantly, methylation level of LINE-1 was inversely associated with SCNA burden (Fig. 4B), AI burden ( Fig. 4C) and total mutation burden ( Fig. 4D) indicating global hypomethylation is associated with higher level of CIN. Interestingly, LINE-1 methylation status was also inversely associated with the proportion of clonal mutations in each specimen (Fig. 4E). Our analysis demonstrated a progressive increase of CD4+ T regulatory cells (Tregs) (p<2.2e-16) and progressive decrease of CD8+ T cells (although the difference was not significant, p=0.1374) from normal lung tissues to AIS/MIA and invasive ADC leading to significantly higher Treg/CD8 ratio in invasive ADC (p<2.2e-16) (Supplementary Fig.   8A-C). We next applied MethylCIBERSORT 45 to the RRBS data to delineate the T cell infiltration of IPNs in our cohort. Similarly, with progression from normal lung to AAH, AIS, MIA and ADC, there was a progressive increase in Tregs (Fig 5A, p=0.009758), while CD8+ T cell infiltration was lower in later-stage IPNs (although the difference did not reach statistical significance (Fig 5B, p=0.1472), leading to significantly higher Treg/CD8 ratio in later-stage IPNs (Fig. 5C, p=0.00194). As increased Treg/CD8 ratio is known to associate with suppressed anti-tumor immune surveillance 46 , these results indicated dynamic changes of anti-tumor immune response during the neoplastic progression from precancers to invasive lung ADC with potentially more suppressive immune microenvironment in later stage IPNs, consistent with our previous findings 47,48 .

Global hypomethylation was associated with suppressed T cell infiltration
Interestingly, Treg/CD8 ratio was inversely correlated with LINE-1 methylation level (Fig.   5D) implying the potential association between global hypomethylation and immunosuppression.

DISCUSSION
The methylation landscape of invasive lung cancers has been studied extensively 49,50 .
However, somatic methylation aberrations in precancers and preinvasive lung cancers are less defined, largely because of a lack of appropriate specimens, as surgery is not the standard of care for treating lung precancers. Using small panels of genes implicated in lung carcinogenesis, previous studies have demonstrated gradual change in DNA methylation in AAH, AIS, and ADC 51 . Leveraging a unique collection of resected IPNs of different stages, we for the first time revealed evolution of comprehensive DNA methylome as lung precancers progress to preinvasive lung ADC and eventually invasive lung ADC. Our results demonstrated progressive changes in both hypermethylation and hypomethylation along with neoplastic evolution from AAH to AIS, MIA, and ADC. Interestingly, compared to normal lung tissues, somatic hypermethylation alterations appeared to emerge as early as AAH, whereas hypomethylation only became obvious after AIS (Fig. 1B), implying that somatic hypermethylation may have preceded hypomethylation during early carcinogenesis of lung ADC.
Different cells within the same tumor can exhibit different molecular and phenotypic features, a phenomenon termed ITH. ITH may foster tumor evolution by providing diverse cell populations, and the dynamics of ITH architecture may evolve with neoplastic progression 15,52 . Methylation ITH has been observed in various malignancies, including colorectal cancer 53 , lung cancer 15 , and chronic lymphocytic leukemia 54 . High levels of methylation ITH have been reported to associate with inferior clinical outcomes 15,54,55 . In addition, methylation ITH has also been reported in precancers; for example, Merlo et al. reported that methylation ITH in Barrett esophagus, a precursor to esophageal ADC, was associated with the risk of malignant transformation 55 . In the current study, we provided the first evidence of methylation ITH in lung precancers and its dynamic changes during neoplastic progression. Later-stage IPNs demonstrated more complex methylation ITH than early-stage IPNs did, with more divergent epiallele shift and higher fraction of epipolymorphisms (Fig. 2). Importantly, eloci were significantly more abundant around TSSs in ADC than in AAH, AIS, or MIA. One plausible explanation is that although somatic methylation ITH may be stochastic during early lung carcinogenesis, some of the methylation aberrations (particularly those that are close to TSSs and potentially impact gene expression) may convey survival and/or growth advantages, resulting in selection of cells with higher densities of eloci around TSSs.
Cancer evolution is associated with the accumulation of genetic and epigenetic aberrations. Our previous work has revealed evidence that epigenetic evolution and genetic evolution take place in parallel in invasive lung ADC 15 . Our current study demonstrated similar phylogenetic patterns and correlated genetic and methylation distances in IPNs of different stages (Fig. 3), suggesting genetic alterations and DNA methylation also evolve in parallel during the early carcinogenesis of lung ADC.
Meanwhile, promoter hypermethylation and copy number loss of TSG TSC2 were identified in two independent IPNs respectively from the same patient. These results were reminiscent of previous findings showing that distinct mutations of the same cancer genes were present in different regions of the same tumors 23,56,57 or in different primary tumors from the same patients 58 , indicating convergent evolution. Although most genetic and methylation changes were occurring in parallel, and genetic events (e.g., copy number loss of TSGs) or methylation changes (e.g. promoter hypermethylation of TSGs) may independently offer proliferation or survival advantages to cells, these processes may be constrained around certain genes or pathways (e.g., inactivation of TSC2 in the case of patient C10) that are essential to carcinogenesis in certain patients.
In addition to independently and/or cooperatively promoting cell expansion by affecting different genes or pathways, DNA methylation states can directly impact on vulnerability of DNA for genomic aberrations. It is well-known that global hypomethylation is associated with CIN 36 and increased rate of somatic mutations 38 . In the current cohort, global hypomethylation assessed by LINE-1, was associated with significantly increased mutation burden, SCNA burden as well as AI burden (Fig. 4B-D). Importantly, LINE-1 methylation level in AAH was similar to normal lung, but significantly decreased in AIS, MIA and ADC (Fig. 4A). Genomic analysis of the same cohort of IPNs in our previous study have revealed progressive increase in mutation burden from AAH to ADC, while SCNA only became obvious after stage of AIS and AI events were rare in AAH and AIS 7 .
These data suggest that methylation aberrations have not only evolved in parallel with genomic aberrations, but may have also facilitated accumulating genomic aberrations that may have led to more drastic phenotypic changes in IPNs of later stages.
Interestingly, higher-level of global hypomethylation was associated with higher proportion of clonal mutations (Fig. 4E). As the progression of lung precancers into invasive lung ADC predominantly follows a clonal sweeping model with selective outgrowth of fit subclones 7 , one plausible explanation for this association is that cells within the precancers with high level of global hypomethylation may be prone to accumulate genomic aberrations, which subsequently provide growth advantages to these cell clones to develop into major clones in invasive lung ADC.
We have previously reported that the immune microenvironment was suppressed in invasive lung cancers compared to preinvasive cancers or precancers 47,48,59 . We de- Compared with invasive cancers, precancers and preinvasive cancers may have less complex molecular landscapes, as well as better preserved immune contextures, and thus may be easier to eradicate. There has been increasing enthusiasm toward moving interventions successfully applied to metastatic cancers to early stage cancers and even precancers, a concept called interception 66,67 . For example, we have launched the IMPRINT-Lung clinical trial (NCT03634241), in which patients with high-risk IPNs (many of which may be AAH or AIS) are treated with immune checkpoint inhibitors. In the current study, we demonstrated that methylation aberrations are less complex in precancers and preinvasive lung cancers than in invasive cancers. Therefore, novel therapeutic agents that can modulate methylation by targeting aberrant methylations and potentially reprogram the immune microenvironment 68 may also have potential in treating precancers and preinvasive cancers to prevent invasive lung cancers.
To the best of our knowledge, this is the first study on the evolution of genome-wide DNA methylation during the early carcinogenesis of lung ADC. Due to the scarcity of adequate materials for comprehensive profiling of IPNs, our study has several inevitable limitations. First, the sample size was relatively small, so our data has to be interpreted with caution. Second, because we did not have enough materials for transcriptomic profiling, the biological impact of the methylation changes remains to be determined; however, as a substantial number of DMRs were associated with TSSs, promoter binding sites, and TF binding sites, many of these changes could potentially regulate the transcriptome of the IPNs. Third, all patients in our cohort were from China or Japan, and whether our findings can be applied to patients in other ethnic groups remains to be investigated. Fourth, the follow-up times for all patients in this study were relatively short, so we were not able to investigate the impact of these methylation changes on recurrence or survival. Finally, the resected specimens in this study could only provide a single molecular snapshot of the evolutionary process of IPNs. Whether all AAHs will evolve into AIS, MIA, and ADCs, and whether all ADCs evolve from AAH, is still unknown.
Although we used a multi-regional sequencing approach to de-convolute the methylation evolution, deciphering the temporal evolution of the methylome during neoplastic progression will require specimens obtained over the course of disease progression.
Clinical trials collecting longitudinal biopsy specimens, such as IMPRINT-Lung (NCT03634241), may provide such opportunities in the future.

Sample acquisition
A total of 53 resected pulmonary nodules and paired normal lung tissues from 39 patients treated at Nagasaki University Hospital or Zhejiang Cancer Hospital between 2014 and 2017 were used in the study. None of the patients received chemotherapy or radiotherapy before surgery. 29 lung nodules from 15 patients had multiregional specimens for spatial heterogeneity assessment (Supplementary Table 1). Wholeexome sequencing (WES) data was available for all specimens (EGAS00001004960) 7 .
Written informed consent was obtained from all patients. The study was approved by the Institutional Review Boards of MD Anderson Cancer Center, Nagasaki University Graduate School of Biomedical Sciences, and Zhejiang Cancer Hospital.

DNA methylation profiling by RRBS and data processing
DNA was extracted using the QIAamp DNA FFPE Tissue Kit (QIAGEN), and 200ng−1μg of DNA was subjected to RRBS for genome-wide DNA methylation profiling as previously described 69,70 . Raw RRBS data were processed using the Bismark pipeline 71 . Briefly, TrimGalore v. 0.4.3 was used to trim the Illumina adapter sequences (a minimum of 5 bp in a read was required to overlap with the adapter sequence); FastQC v. 0.11.7 was used for quality control; and then bowtie2 v. 2.2.3 was used to align the trimmed reads to the GRCh37 assembly of the human genome. The DNA methylation levels for individual CpGs were calculated using methyKit 72 . Methylated reads (containing Cs) and unmethylated reads (containing Ts) at each cytosine site were counted, and the percentage of methylated reads among total reads covering the corresponding cytosine was calculated to quantify the DNA methylome for each sample at the single-base resolution. The CpG sites, DMRs, and loci of known genes, as well as genomic features, including CpG islands, were annotated using the R package "ChIPSeeker" 73

Comparison of methylation profiles between different specimens
To assess the DNA methylome profiles of distinct IPNs of different pathological stages and examine the heterogeneity between IPN samples, we first aggregated the DNA methylation levels of 5-kb tiling regions across the genome in each sample by retaining only the CpG sites with ≥10 sequencing reads. We then applied principal component analysis to identify global DNA methylation patterns between samples. To evaluate consistent clusters of these DNA methylation profiles, we performed unsupervised hierarchical agglomerative analysis of CpG sites covered by ≥50 reads across all samples; these reads were based on single CpG methylation calls without any binning.
To evaluate the correlation between overall DNA methylation in all samples from each patient, we used a pairwise approach to compare distance and similarity matrices on the basis for all CpGs with a coverage of ≥10 reads.

Differential DNA methylation analysis
Differentially methylated regions (DMRs) encompassing the differentially methylated CpGs between paired disease and normal tissue samples were identified using a triangular kernel to smooth the number of methylated reads and total number of reads by applying the "noise filter" function in the "DMR caller" package 75 . The differentially methylated CpG positions of paired samples were identified by mapping CpG sites to DMRs; only CpG sites covered by ≥10 reads in paired samples were included.

Motif identification and prediction of TF binding sites
The de novo methylated DNA motifs in IPNs of each stage were identified by mEpigram, which discovers motifs by using position-specific weight matrices from the k-mers that are most enriched in the positive sequences compared with the negative sequences as "seeds" and extending the motifs in both directions 76 . The Tomtom tool from the MEME suite was used to select significantly enriched methylated DNA motifs to the database of known transcript factors (HOCOMOCO_v11) 77 .

Estimation of DNA methylation ITH
To estimate DNA methylation ITH, we applied "methclone" to identify epigenetic loci whose distributions of epigenetic allele ("epiallele") 25 clonality differed between paired tumor and normal samples by quantifying the degree to which the compositions of epialleles at given loci in the tumor samples were distinct from those in the normal tissue samples. An epiallele was defined by setting 60 reads in four consecutive CpG sites as the threshold to consider the epigenetic allele composition of the locus. We then calculated the differences in epiallele entropy between each IPN sample and its matched normal tissue sample. Loci with combinatorial entropy changes below -60 between each IPN sample and its paired normal tissue sample were defined as epigenetic shift loci (termed "eloci"). To reduce the bias due to the different coverage for each sample, we then calculated relative epiallele shifts (i.e., the normalized number of eloci) by dividing the number of eloci by the total number of assessed loci in each sample and then multiplying that ratio by the average number of total loci across all samples 25 . To evaluate the dynamics of methylation change in paired tumor and normal epigenomes, we also assessed epigenetic polymorphism ("epipolymorphism") to measure the epiallelic diversity in each IPN sample as described previously 78 by calculating the frequency of each specific epiallele from multiple stochastic changes in the frequencies of many epialleles. We calculated 16 epiallele statuses (0000, 0100, 0010, 0001, 1000, 1100, 0110, 0011, 0101, 1010, 1001, 0111, 1011, 1101, 1110, and 1111, where 1 represents a methylated CpG site, and 0 represents an unmethylated CpG site).

Locus Overlap Analysis
We applied Locus Overlap Analysis (LOLA) 26 to the genomic regions identified as eloci (ΔS<−60) in all samples of each stage to evaluate the overlap between genomic regions with methylation ITH and chromatin marks. The genomic regions of all loci with ≥ 60 reads were used as background genomic regions, and the selected genomic regions were mapped to a compendium of publicly available histone mark profiles, including CTCF, H2AZ, H3K4me1, H3K4me2, H3K4me3, H3K9ac, H3K9me3, H3K27me3, H3K27ac, H3K36me3, H3K79me2, H4K20me1 in the A549 lung adenocarcinoma cell line (http://hgdownload.soe.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeBroadHistone) and H3K27me3, H3K4me3, H3K9me2, H3K9ac, and CTCF in an immortalized human bronchial epithelial cell line (BEAS-2B; GSE56053) 28 . P-values in the enrichment analyses were calculated using a one-sided Fisher exact test. Adjustment for multiple testing was performed using the Benjamini-Yekutieli method.

Construction of phylogenetic trees
To construct epiphylogenetic trees from the methylation data, we used promoter CpG sites (≥50 reads per CpG site selected for all samples from each patient) with the most variable methylation values (mean absolute deviation >10%) shared by all samples, including a normal tissue sample used as the tree root, to build a Euclidean distance matrix. We built the phylogeny trees by applying a neighbor-joining algorithm from the "ape" package to independently infer phylogenetic relationships between IPN specimens for each patient from the mutation and methylation profiles. To assess the phylogenetic similarity between the genetic and epigenetic profiles, we employed an independent but parallel distance matrix construction from the mutation and methylation profiles for each patient and calculated the Spearman correlation coefficient for all pairwise samples grouped by lesion.

Estimation of global hypomethylation
To determine global methylation levels, we chose CpG sites (covered by ≥10 aligned reads) mapped to evolutionarily young subfamilies of LINE-1 repeat elements (L1HS and L1PA). LINE-1 family annotation were obtained from the Repeat-Masker of the UCSC genome browser 79 . We averaged the methylation values of the chosen CpG sites, to represent the global methylation level of each sample.

Deconvolution of T cell profiles
To derive tumor-infiltrating T cell subtypes from transcriptomic data previously

Statistical analysis
Violin plots were created using the "geom_violin" function in the R statistical software package "ggplot2" to represent data point density along the y axis, and the "stat_summary" function was used to calculate the median as the center point.
Differences in DMR numbers, eloci numbers, and immune cell infiltration between lesions of different stages were assessed using the Kruskal−Wallis chi-square test. We used a two-sided Pearson correlation coefficient to compare methylation profiles between two samples and between groups of samples of different stages. We used a two-sided Spearman correlation coefficient to determine the extent to which distance matrices were correlated with DNA methylation profiles and somatic mutation profiles.

Data availability
The raw sequencing data are available from the European Bioinformatics Institute European Genome-phenome Archive (EGA) (accession number: EGASXXX0000XX) through controlled access. To protect patient privacy, interested researchers need to apply via a data access committee (DAC), which will grant all reasonable requests by bona fide researchers. , Treg/CD8 ratio (C) in normal lung, AIS/MIA and invasive ADC inferred by deconvolution of transcriptomic data from a previously published cohort using ImmuCellAI. Each green dot represents each specimen and the blue dots represent the means. Error bars indicate 95% confidence intervals. The differences among all stages were assessed using the Kruskal-Wallis H test.