Contrasting DCIS and invasive breast cancer by subtype suggests basal-like DCIS as distinct lesions

Ductal carcinoma in situ (DCIS) is a non-invasive type of breast cancer with highly variable potential of becoming invasive and affecting mortality. Currently, many patients with DCIS are overtreated due to the lack of specific biomarkers that distinguish low risk lesions from those with a higher risk of progression. In this study, we analyzed 57 pure DCIS and 313 invasive breast cancers (IBC) from different patients. Three levels of genomic data were obtained; gene expression, DNA methylation, and DNA copy number. We performed subtype stratified analyses and identified key differences between DCIS and IBC that suggest subtype specific progression. Prominent differences were found in tumors of the basal-like subtype: Basal-like DCIS were less proliferative and showed a higher degree of differentiation than basal-like IBC. Also, core basal tumors (characterized by high correlation to the basal-like centroid) were not identified amongst DCIS as opposed to IBC. At the copy number level, basal-like DCIS exhibited fewer copy number aberrations compared with basal-like IBC. An intriguing finding through analysis of the methylome was hypermethylation of multiple protocadherin genes in basal-like IBC compared with basal-like DCIS and normal tissue, possibly caused by long range epigenetic silencing. This points to silencing of cell adhesion-related genes specifically in IBC of the basal-like subtype. Our work confirms that subtype stratification is essential when studying progression from DCIS to IBC, and we provide evidence that basal-like DCIS show less aggressive characteristics and question the assumption that basal-like DCIS is a direct precursor of basal-like invasive breast cancer.


INTRODUCTION
Ductal carcinoma in situ (DCIS) is a non-invasive, non-obligate precursor to invasive breast cancer (IBC) with low risk of progression 1 . As breast cancer screening has become widespread, more DCIS lesions are being detected [2][3][4] . Autopsy studies and studies on DCIS from non-treated patients show that many lesions, if left alone, will never progress to invasive disease [5][6][7][8][9] . However, there is currently no robust method to distinguish DCIS with invasive potential from those that may be left untreated. Furthermore, DCIS is a heterogeneous disease and may at time of diagnosis vary from indolent lesions to tumors on the verge of becoming invasive. Clinical, histopathological and molecular characteristics may also vary considerabely 10,11 . As a consequence of this uncertainty, treatment for DCIS is often extensive, resulting in substantial overtreatment [12][13][14][15] .
Knowledge on the underlying mechanisms of progression from DCIS to IBC is still limited. In order to select the optimal treatment strategy for a patient diagnosed with DCIS, it would be beneficial to determine the tumor's invasive potential. Several studies have observed few genomic and epigenomic differences between DCIS and IBC [16][17][18][19] . However, most breast cancer progression studies have not taken into account the significance of molecular subtype in DCIS. For IBC, molecular subtypes have distinct characteristics and also provide valuable prognostic and predictive information 20 . In a previous study, we found evidence of subtype specific progression from DCIS to IBC suggesting that each molecular subtype undergoes a distinct evolutionary disease course 21 . In DCIS, grade and growth pattern provide some information on risk of recurrence, yet, there is still a need for more precise risk prediction [22][23][24] . For this purpose, the Oncotype DX Breast DCIS score has been developed to predict individual risk of recurrence after breast conserving surgery (BCS) 25 . This assay, however, does not take into account the vast heterogeneity of DCIS and the low risk group still experienced a relatively high risk of recurrence of 10% after 10 years 26 . Nevertheless, this score illustrates the potential of molecular-based assays for risk prediction in DCIS.
In this study, we explore the differences between DCIS and IBC in a subtype-specific manner using data from three genomic levels: Gene expression, DNA copy number and DNA methylation. We observed that DCIS and IBC of the luminal A subtype were overall highly similar, while for the basal-like subtype, DCIS might represent a different molecular entity than its invasive counterpart. We hypothesize that tumors of different molecular subtypes may have different modes of progression, and by comparing DCIS and IBC for each subtype separately, we gain insight into the mechanisms of breast cancer invasion and progression.

RESULTS
Tumor characteristics and PAM50 subtyping The study cohort includes data from 57 pure DCIS and 313 IBC cases. All samples were obtained from individual patients, i.e., none of the samples represents paired (synchronous) lesions from the same patient. DCIS lesions were from patients with no concurrent invasive disease ("pure" DCIS). All sample information including clinical and molecular parameters is presented in Table 1 and Supplementary Data 1. Based on expression of the PAM50 genes, we determined the intrinsic subtypes using the widely used centroid based classifier 27 (see "Methods"), which provided correlation coefficients to each of the four centroids; basal-like, HER2-enriched, luminal A and luminal B. We found a significantly different distribution of the subtypes between DCIS and IBC (P = 0.0016, Fisher's exact test, Fig. 1a). Most notably, there was a higher frequency of the HER2-enriched subtype and a lower frequency of Luminal B tumors in DCIS compared with IBC. This was reflected by a significantly different distribution of ESR1 gene expression between DCIS and IBC (P = 0.0012 Fisher's exact test, Fig. 1b). In general, we observed that DCIS tumors showed lower correlation coefficients to the subtype centroids compared with IBC; this was particularly evident for the basal-like subtype (Table 2). To investigate whether differences in tumor cell content could explain the lower subtype correlation coefficients in DCIS compared with IBC, we used ASCAT (Allele-Specific Copy number Analysis of Tumors) 28 to calculate tumor purity based on copy number data (see Methods). We found no significant difference in tumor cell content between DCIS and IBC (Basal-like: P = 0.86, HER2: P = 0.13, LumA: P = 0.88, LumB: P = 0.19, Mann-Whitney U tests, Supplementary Fig. 1a).
Diverging subtype characteristics between DCIS and IBC The overall lower correlation to the PAM50 centroids in DCIS compared with IBC prompted us to explore the expression of the PAM50 genes in each subtype and tumor type to identify the contribution of each gene to the subtyping output (Supplementary Fig. 1b). Only one gene (Matrix metalloproteinase 11, MMP11, also named stromelysin 3) clearly delineated DCIS and IBC. MMP11 is expressed in stromal cells and favors cancer cell survival and tumor progression through cleavage of collagen VI 29 . MMP11 was markedly lower expressed in DCIS of all subtypes compared with IBC, in accordance with its non-invasive state. All other PAM50 genes showed expression patterns characteristic of the subtypes, independent of tumor type. Luminal genes (e.g., ESR1, PGR, NAT1, BCL2, SLC39A6) were higher expressed in luminal tumors in both DCIS and IBC compared with tumors of basal-like and HER2enriched subtypes. Basal-like IBC showed markedly higher expression of genes associated with proliferation compared with all other subtypes (including basal-like DCIS). Both DCIS and IBC of the HER2-enriched subtype showed elevated expression of genes typically highly expressed in this subtype (ERBB2, GRB7, and TMEM45B). Of note, keratins associated with basal epithelium (KRT5, KRT14, and KRT17) were markedly higher expressed in DCIS of non-basal-like subtypes compared with their invasive counterpart while for the basal-like subtype, these keratins were highly expressed in both DCIS and IBC. This observation may be explained by gene expression contribution from a retained myoepithelial cell layer in DCIS.
Interestingly, we identified a distinct group of basal-like IBCs with high correlation to the basal-like centroid and correspondingly low correlation to the luminal A centroid (Fig. 2a), which was not found among basal-like DCIS (Fig. 2b). These invasive tumors may correspond to so-called core basal tumors, characterized by deletions on chromosome 5q and high expression of specific genes associated "in trans" with such deletions 30,31 . In accordance with this, we found 5q deletions at high frequency in basal-like IBC, while in only a minority of basal-like DCIS (Fig. 2c). Clustering gene expression values of the core basal-defining genes revealed two distinct clusters: one consisting of mostly IBC tumors with high correlation to the basal-like subtype (i.e. the core basal tumors), and a second cluster including most of the DCIS tumors and IBC tumors with low correlation to the basal-like subtype (Fig. 2d). By visual inspection of the distribution of the correlation  coefficient to the basal-like centroid, we classified core basal tumors as those with correlation >0.6 ( Fig. 2a, b). When investigating the PAM50 genes separately for the core and the non-core basal invasive tumors compared with basal-like DCIS, we found that the non-core basal invasive tumors showed lower expression of proliferation genes and higher expression of luminal genes compared with core basal invasive tumors (Supplementary Fig. 1c) Also, EGFR and basal keratins (which are known to be highly expressed in core basal tumors) showed lower expression in non-core basal tumors compared with core basal invasive Extensive genomic differences between basal-like DCIS and basallike IBC We found few gene expression differences between DCIS and IBC when performing principal component analysis (PCA) of genomewide gene expression data across all subtypes ( Supplementary Fig.  2a). This is in accordance with previous studies 16,17 . However, after subtype stratification, PCA clearly separated IBC from DCIS in the basal-like and HER2-enriched subtypes, while not in the luminal subtypes ( Supplementary Fig. 2b). Also, with respect to copy number aberrations, differences between DCIS and IBC varied between subtypes. DCIS exhibited overall fewer copy number changes compared with IBC as demonstrated by a lower genomic To further explore subtype specific differences between DCIS and IBC, we included information on the strength of the correlation to all other subtype centroids (Fig. 3, Supplementary Data 1). We found that basal-like IBC correlated highly to the basal-like centroid, and next, to the HER2-enriched centroid, while basal-like DCIS showed overall lower correlation to the basal-like centroid and more often had luminal subtypes as their second subtype (Fig. 3). On the contrary, luminal A tumors, both DCIS and IBC, showed relatively high correlation to the luminal A centroid and a similar distribution of the second best subtype (mostly basal-like and luminal B). Next, we calculated gene expressionbased proliferation-, differentiation-, immune-, stromal-, and epithelial-to-mesenchymal transition (EMT)-scores, as well as HER2-copy number status (Fig. 3, Supplementary Fig. 3 and Supplementary Data 1). Both DCIS and IBC tumors showed subtype specific characteristics such as higher proliferation and lower differentiation in basal-like and HER2-enriched subtypes when compared with luminal A. In general, DCIS received lower stromal and EMT scores compared with IBC. The differences between DCIS and IBC were most pronounced in basal-like tumors: Basal-like DCIS displayed significantly lower median proliferation score compared with basal-like IBC ( Supplementary  Fig. 3b), while the median differentiation score was significantly higher in basal-like DCIS compared with IBC ( Supplementary Fig.  3c), although still lower than in DCIS of any other subtype. Interestingly, there was no statistically significant difference in median immune score, median stromal score or median EMT score between basal-like DCIS and IBC ( Supplementary Fig. 3d, e, f). The distinct difference seen between core and non-core basal invasive tumors prompted us to investigate these scores for core and noncore basal invasive tumors separately ( Supplementary Fig. 5). For GII and proliferation, the scores for non-core basal invasive tumors were in between basal-like DCIS and core basal invasive tumors, while the differentiation scores were at the level of basal-like DCIS. There was no difference between core and non-core basal invasive tumors with regards to immune-, stromal-and EMTscores. Overall, these findings show that subtype profiles of DCIS are comparable to those found in IBC, except for the basal-like subtype where DCIS appears to be associated with less aggressive gene expression characteristics.
Long range epigenetic silencing of cPCDH genes occurs in basallike IBC We identified numerous genes with significantly different methylation profiles between DCIS and IBC (Supplementary Data 2). For the basal-like subtype, 1053 genes showed statistically significant different methylation profile between DCIS and IBC, while for the HER2-enriched and luminal A subtypes, only 144 and 172 genes, respectively, showed significantly different methylation profiles (Fig. 4a). Due to low sample size, no genes with statistically significant different methylation profiles were identified for the luminal B subtype. None of the differentially methylated genes were common between the other three subtypes. Among the genes with significantly different methylation profiles between basal-like DCIS and IBC were multiple clustered protocadherins (cPCDH). These genes are involved in cell-cell adhesion and are organized in three clusters on chromosome 5q31 and notably; the genes are highly overlapping 32,33 . Long range epigenetic silencing (LRES) has previously been shown to occur in cancer in an 800 kb genomic window spanning the cPCDH gene clusters [34][35][36] . To corroborate the methylation profile analyses and explore whether LRES is characteristic of basal-like IBC, we clustered all basal-like tumors based on the β-values of the 698 CpGs present in this genomic window (Fig. 4b). For comparison, we also included normal breast tissue samples. This analysis revealed that basal-like invasive tumors with high correlation to the basal-like centroid were, in general, characterized by hypermethylation across the cPCDH genes, while normal samples displayed low levels of methylation. Basal-like DCIS showed significantly lower mean cPCDH methylation compared with basal-like IBC (P = 0.001, Mann-Whitney U test, Fig. 4c). Importantly, there was no association between mean cPCDH methylation and tumor percentage, indicating that the lower methylation levels of the cPCDHs in basal-like DCIS is not simply an artifact of normal tissue in these samples. The basal-like invasive tumors showed the highest cPCDH methylation levels of all tumors. Notably, the distinct difference between DCIS and IBC seen in the basal-like subtype was not found for any of the other subtypes (Fig. 4c). Of note, the highly overlapping organization of the cPCDH genes complicates interpretation of these results, since one CpG may be located in multiple genes simultaneously, e.g., in the transcription  When compiling methylation, copy number and gene expression data of the cPCDHs for the basal-like tumors, it appeared that invasive tumors with hypermethylation of the cPCDH genes often exhibited deletions of the same genes, and that these changes corresponded well with correlation to the basal-like centroid ( Supplementary Fig. 6). Importantly, the cluster of tumors with concurrent hypermethylation and deletion of the cPCDH genes consisted mainly of aneuploid tumors, while the sub-cluster containing most DCIS consisted only of diploid tumors. We could not detect any effect of hypermethylation or 5q deletions on cPCDH gene expression. This could possibly be explained by expression of retained alleles in polyploid tumors or by posttranscriptional regulation. In summary, the notable differences in cPCDH methylation between basal-like DCIS and IBC support our previous results that basal-like DCIS may be a different entity than basal-like IBC.

DISCUSSION
In this study, we have explored differences between DCIS and IBC in a subtype specific manner using gene expression, copy number and DNA-methylation data derived from fresh frozen tumor material. The study was instigated by findings from our previous study where we hypothesized that progression of DCIS to invasive cancer differ between molecular subtypes 21 . The indolent nature of many in situ tumors and the fact that many of these tumors never progress to invasive or metastatic disease correlate poorly with the results from several studies showing remarkably few genomic differences between DCIS and IBC [16][17][18] . This lack of genomic dissimilarity may be explained by inherent differences between the molecular subtypes: In most breast cancer cohorts, the majority of tumors are of luminal subtypes; hence, characteristics that differentiate between DCIS and IBC in unstratified analyses are confounded by subtypes. The different distribution of molecular subtypes observed between IBC and DCIS may in part be explained by underrepresentation of small DCIS lesions and, consequently, overrepresentation of high-grade DCIS lesions included in the cohort. However, the frequency of tumors of the least aggressive subtype (luminal A) is similar in DCIS and IBC, indicating that the observed difference in subtype distribution between the two tumor types represents a true distinction.
Interestingly, the most pronounced differences between DCIS and IBC were found for the basal-like subtype. Basal-like DCIS showed lower correlation to the basal-like centroid (i.e., low "basalness") compared with basal-like IBC, and there were no core basal DCIS in our data. This is in accordance with a previous integrative clustering analysis that showed genomic isolation of basal-like IBC, and not basal-like DCIS 37 . In the present study we showed that the basal-like DCIS tumors exhibited higher correlation to Luminal A subtype, higher degree of differentiation, lower proliferation and lower genomic instability than basal-like IBC. Also with respect to alterations of DNA methylation, basal-like tumors did prominently show more differences between DCIS and IBC compared with all other subtypes. Most notable was the marked hypermethylation of CpGs mapping to the (cPCDHs) genes in basal-like IBC compared with DCIS and a positive association between hypermethylation of cPCDHs and degree of "basalness". Hypermethylation of DNA in the genomic location spanning the cPCDH genes through long range epigenetic silencing (LRES) 38 has been shown to increase with progression of cervical cancer 36 and has also been observed in breast cancer 34 , colorectal cancer 35 and Wilm's tumor 39 . Interestingly, the chromosomal region of the cPCDH genes (5q31) is frequently deleted in basal-like IBCs and is a defining feature of core basal IBC tumors 40,41 . cPCDHs are molecules involved in cell-cell adhesion and have also been shown to inhibit cell growth and suppress oncogenic pathways, features consistent with a role as tumor suppressors 42 . Loss of intraepithelial cell-cell adhesion is a key feature during tumor cell invasion 43,44 and it is tempting to speculate that loss of cPCDH tumor suppressor function through LRES may contribute to driving the invasion process specifically in basal-like cancer.
During tumor evolution, transition from DCIS to an invasive stage may represent an evolutionary bottleneck which may also impact tumor subtype 1,45 . To study subtype evolution and plasticity during tumor progression and invasion, we would need consecutive biopsies from the same patients. Nonetheless, our study includes sufficient number of samples to be able to compare subtype characteristics between DCIS and IBC as groups for each subtype, separately. We show that the difference between DCIS and IBC is greater for the basal-like subtype compared with all other subtypes. Despite that the intrinsic subtypes were defined in IBC, we believe that basal-like DCIS are truly basal-like since firstly, the PAM50 subtyping showed that they correlate the most to the basal-like centroid, albeit to a lower degree than IBC. Secondly, several genomic features of basal-like tumors are also present in basal-like DCIS, including low degree of differentiation, high expression of basal keratins, low expression of luminal genes and expression of genes indicative of immune cell infiltration. Despite these similarities, basal-like DCIS may not be precursors to basal-like IBC. Basal-like breast cancer is an aggressive disease that develops rapidly. Especially the core basal tumors have an aggressive phenotype with poorer prognosis than non-core basal tumors 30,46 . Although all core basal invasive tumors at some point must have progressed from an intraductal stage, the transition from DCIS to IBC may occur so rapidly that the probability of "capturing" such tumors as DCIS is very small, as also proposed by Kurbel 47 . This hypothesis is supported by the fact that basal-like invasive breast tumors have fewer concurrent DCIS lesions compared with other subtypes 48,49 . Our results indicate that DCIS in general possesses characteristics that resemble those of invasive tumors of the same subtype. It is therefore uncontroversial to hypothesize that a DCIS with basal-like characteristics will progress to a basal-like cancer with its wellknown characteristics. However, our results indicate that many basal-like DCIS resemble the less aggressive non-core basal invasive tumors and hence, we therefore speculate that patients diagnosed with basal-like DCIS do not carry high-risk tumors. Potentially may they be slow-growing tumors that never progress to an invasive tumor in the life-time of the patient 50 . This may have profound impact on how we perceive DCIS and not least, how they should be treated.
A limitation of this study is the lack of follow-up information on recurrence or survival. Hence, our results need to be validated in a DCIS cohort with more extensive clinical follow-up information. Also, the subtype stratified approach that we have employed, reduces the number of samples in each group which may preclude statistically significant results. The limited availability of small and low-grade DCIS for molecular analysis may artificially skew the cohort towards large or high-grade DCIS that may not be representative of the DCIS present in the population. Nevertheless, our study has reaffirmed the necessity of taking a subtype specific approach when studying progression of DCIS and we have demonstrated that there are substantial differences between basal-like DCIS and IBC that may question basal-like DCIS as precursor lesions to invasive breast carcinoma.

Tissue samples
This study includes gene expression, DNA copy number and DNA methylation data from 57 DCIS and 313 IBC cases. All samples were obtained from individual patients, i.e. none of the samples represents paired (synchronous) lesions from the same patient. DCIS lesions are from patients with no concurrent invasive disease ("pure" DCIS). Samples were fresh frozen tissue collected from three different patient cohorts, of which two ("Uppsala" and "Oslo2") are previously published [51][52][53][54][55][56] . The third cohort, ("Milan") has not been previously published and includes fresh frozen tissue from a total of 34 breast tumors. Histopathological evaluation of H&E stained tissue sections was performed by a trained pathologist. Normal breast tissue samples were obtained as core biopsies from women without breast cancer 57

DNA and RNA isolation
Total RNA and DNA was isolated using the QIAcube system with the AllPrep DNA/RNA Universal Kit (cat.no. 80224, Qiagen, Hilden, Germany) with 30 mg tissue as input. The tissue was manually minced with a scalpel on ice followed by homogenization using TissueLyzer LT and Qiashredder (Qiagen). RNA and DNA extraction was performed according to the protocol provided by the supplier. Nucleic acid concentrations were measured on a NanoDrop ND-1000 spectrophotometer (Thermo Scientific, Wilmington, DE, USA) and RNA integrity was analyzed using Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, USA).

Gene expression analysis
To obtain whole genome expression data 58 , Agilent Sureprint G3 Human Gene Expression 8 × 60 K microarrays (G4851A) (Agilent, Technologies, Santa Clare, USA) with the Low Input Quick Amp Labeling protocol were used. RNA input was 40 ng and Cy3 was used as fluorophore. Quality Control (QC) was performed in Agilent's Feature Extraction software. From the Milan cohort, five invasive breast carcinomas and 28 DCIS were successfully analyzed and passed all quality control criteria while one DCIS failed QC. As a control, one sample of commercially available normal breast RNA (Ambion Human Breast Total RNA, Thermo Fisher Scientific, Wilmington, DE, USA) was included throughout the whole experimental pipeline. The same microarray platform had been used for the two other patient cohorts. Data from all three cohorts were normalized together using quantile normalization. For genes represented with more than one probe, mean expression was calculated to obtain one gene expression value per gene.
Genome-wide methylation DNA methylation data 59 was obtained using the Illumina Infinium HumanMethylation450K microarray (Illumina, Inc. CA, USA) following the manufacturer's instructions. Data was preprocessed using subset quantile normalization 60 . The resulting β value represents the fraction of methylated DNA molecules at a specific CpG. Quality control of β values was performed as presented by Wilhelm-Benartzi et al. 61 : β-values with detection p-values higher than 0.05 (0.225% of the β-values) were replaced by NA. CpG sites where more than 25% of the β values failed quality control, were removed from the analysis resulting in 436 162 reliable CpGs in the final dataset. NA values were imputed using the R-function impute. knn with default parameters.
For the initial part of the analysis we obtained methylation profiles by performing PCA separately for each gene. All CpGs within the gene or 50 kB upstream or downstream of the gene were included. The value of the first principal component represents the gene's methylation profile. This method allows for obtaining one value per gene per sample, while preserving as much information as possible from the CpGs representing each gene.

Copy number aberrations analysis
Copy number data 62 was obtained using Affymetrix SNP 6.0 arrays (Affymetrix, Santa Clara, CA, USA) at Aros Applied Biotechnology (Aarhus, Denmark) following the manufacturer's instructions. CEL-files were processed using the PennCNV-Affy library 63 with the HapMap samples as reference set 64 and corrected for GC content 65 . The data was segmented using the PCF algorithm with arguments k min = 5, gamma = 100 in the R copynumber package 66 . The copy number of the segment overlapping the gene the most was set as a gene's copy number. Ploidy and tumor percentage were calculated using ASCAT 28 . In short, ASCAT can accurately dissect the allele-specific copy number of solid tumors, and simultaneously estimate both tumor ploidy and non-aberrant cell admixture. Genome instability index (GII) was derived by calculating the fraction of the genome affected by copy number change.
PAM50 centroid-based subtype method for breast cancer PAM50 subtyping, as described in Parker et al. 27 , uses gene centered expression data from 50 genes. Using Spearman correlation, we correlated gene expression data for each tumor sample to the published centroids and assigned the subtype with the highest correlation coefficient. Note that this PAM50 classifier requires the cohort to have a similar proportion of ER-positive tumors as the original training cohort 67 . In the training cohort, about 60% of tumors are ER-positive and gene centering for each gene can be described as follows: Mean all patients ¼ 0:6Mean ERþpatients þ 0:4Mean ERÀpatients Since the composition of ER-positive patients is higher than 60% in cohorts included in this study, we adjusted our cohort to the training cohort, by calculating the mean for the ER-positive and ER-negative tumors separately, before calculating the overall mean according to the formula above. ER-status was determined by using the ESR1 gene expression value which showed a distinct bimodal distribution enabling a reliable cut-off to be set. Consistency in ER status derived by IHC and ESR1 expression was high, with 98% of the tumors (320/327) concurring. Progesterone receptor (PR) status was derived by PGR-expression the same way as for ER (Supplementary Data 1).

Gene expression-based tumor scores
Proliferation scores were calculated using an 11-gene proliferation signature 68 and EMT scores were calculated using an EMT signature based on four adhesion genes (weighted negatively) and seven EMT-genes (weighted positively) (Supplementary Data 1): For each gene and sample, a standard (Z) score was calculated, then the proliferation/EMT-scores were obtained for every tumor by calculating the mean of all Z-scores across all genes in the signature. Differentiation scores were derived using the differentiation predictor described in Prat et al. 69 and immune and stromal infiltration scores were calculated using ESTIMATE 70 .

Differential methylation
Genes differentially methylated between DCIS and IBC where identified using Mann-Whitney U tests separately for each subtype. False discovery rate was used to correct for multiple testing. Cut-offs for identifying differentially methylated genes were set at both FDR and effect size (defined as the absolute difference in median between DCIS and IBC) to increase the likelihood of finding the biological relevant differences between the two groups. We included genes with FDR < 0.05 and effect size within the top 20% (corresponds to a cut-off > 0.127). Mean cPCDH methylation was calculated for each tissue sample (tumor and normal tissue) as the mean of standard (Z) scores for all relevant CpGs.

Statistical and bioinformatic analyses
All statistical analyses were conducted in R 71 unless otherwise specified. Heatmaps were created using the R package Complex Heatmaps 72 and other plots were created using the package ggplot2 73 . Fisher exact tests were used to compare distribution of subtype and ER-status between the two tumor types. Mann-Whitney U-tests (two-sided) were used to compare tumor content, GII, proliferation scores, differentiation scores, immune scores, stromal scores, EMT scores and mean cPCDH methylation between DCIS and IBC separately for each subtype. Correlation between cPCDH methylation and tumor percentage was calculated using spearman correlation.

Reporting summary
Further information on experimental design is available in the Nature Research Reporting Summary linked to this paper.

DATA AVAILABILITY
The data generated and analyzed during this study are described in the following metadata record: https://doi.org/10.6084/m9.figshare.12293102 74 . Gene expression, copy number and DNA methylation data from Oslo2 and Uppsala tumor cohorts and DNA methylation data from normal tissue samples, analyzed during this study, have previously been published and are publicly available at Gene Expression Omnibus: https://identifiers.org/geo:GSE80999 53 , https://identifiers.org/geo:GSE59248 55 , https://identifiers.org/geo:GSE60185 54 , and at the European Genome-phenome Archive (EGA): https://identifiers.org/ega.dataset:EGAD00010000942 56 . The data of the Milan cohort, generated during this study, are available at the European Genomephenome Archive (EGA): https://identifiers.org/ega.dataset:EGAD00010001863 62 (DNA copy number data), https://identifiers.org/ega.dataset:EGAD00010001864 58 (gene expression data) and https://identifiers.org/ega.dataset:EGAD00010001865 59 (DNA methylation data). Due to the European general data protection regulations, the processed datasets in.Rdata file format are not publicly available, but can be made available on reasonable request from the corresponding author, Dr. Therese Sørlie, email: tsorlie@rr-research.no. To access the data, researchers must complete an institutional agreement, and it must be verified that the research to be conducted is covered by the current study's ethical approval and the patients' consents.