Comprehensive characterization of claudin-low breast tumors reflects the impact of the cell-of-origin on cancer evolution

Claudin-low breast cancers are aggressive tumors defined by the low expression of key components of cellular junctions, associated with mesenchymal and stemness features. Although they are generally considered as the most primitive breast malignancies, their histogenesis remains elusive. Here we show that this molecular subtype of breast cancers exhibits a significant diversity, comprising three main subgroups that emerge from unique evolutionary processes. Genetic, gene methylation and gene expression analyses reveal that two of the subgroups relate, respectively, to luminal breast cancers and basal-like breast cancers through the activation of an EMT process over the course of tumor progression. The third subgroup is closely related to normal human mammary stem cells. This unique subgroup of breast cancers shows a paucity of genomic aberrations and a low frequency of TP53 mutations, supporting the emerging notion that the intrinsic properties of the cell-of-origin constitute a major determinant of the genetic history of tumorigenesis.

From Fig6G and summary Fig7, it seems clear that CL2 and CL3 are very related to Luminal and Basal-like subtypes. An arrow with EMT is then built. Is the EMT observed by the authors a difference in stromal/immune composition of these tumors? Increased Cancer associated Fibroblast in CL2 and CL3 could 'produce' these EMT related enrichment in pathways. How can the author address this? Finally see: https://www.biorxiv.org/content/10.1101/756411v1 and maybe comment not necessary in the text but at least for reviewers and editors how the results of this analysis is comparing to that reference.
Claudin-low is not a very well-defined subtype in breast cancer, as the author observe early in their analyses tumor purity is low. Further trying to subdivide a poorly characterized subtype in three subgroups is risky, especially when subdividing only a fraction of this poorly characterized subtype. In addition, the stromal/immune component of claudin low seems to be important as most often the tumor purity <50%. Therefore, trying to draw conclusions from the bulk molecular data of these tumor is also very risky especially when this stromal/immune component is ignored in the analysis. It would be also important to justify the temporal model described in the discussion and the last figure of the paper. What in the present analyses allows the authors to built this time succession of events? This will be very useful for the reader.
Reviewer #2: Remarks to the Author: In this manuscript, the authors examine claudin-low tumors, an aggressive subtype defined by low expression of cell junction components and associated with mesenchymal and stem features in the METABRIC dataset. Because gene expression for claudin-low tumors are similar to normal stroma, the researchers used allele specific copy number analysis (ASCAT) to estimate purity of tumor cell fractions using a stringent threshold. They characterized 42 low-claudin tumors by gene expression and copy number alterations and show a diversity of clinical subtypes, integrative cluster membership and fraction of genome altered (FGA). The authors employed a clever method of analyzing B allele frequencies and log-R ratios of SNPs located in genomic regions of deletion, to demonstrate that claudin-low tumors with a low FGA scores are likely not due to normal contamination but rather tumors with an absence of gross chromosomal instability. Using FGA they divided up the low-claudin subtype into three additional subtypes (CL1, CL2 and CL3). CL1, defined by a low FGA (< 10%), was mostly composed of ER-negative tumors that were all stratified into integrative cluster 4, CL2 was mostly composed of ER-positive tumors mainly stratified in two luminal-related clusters and CL3 displayed a high FGA (> 30%) classified into the gnomically unstable cluster 10. The authors then demonstrate each CL subtype displays unique gene expression, pathway enrichment and methylation patterns. These data with correlations to normal mammary cell lineages suggests different cell origins for the various claudin-low subtypes and explain the variety of clinical subtypes in claudin-low tumors. The researchers also identify a potential dependency on MAPK signaling in claudin-low tumors, with cell lines more sensitive to MEK inhibitors, uncovering a potential future therapeutic option.
Specific comments 1) Unstratified claudin-low tumors are associated with worse overall survival, likely due to the enrichment of TNBC tumors. Are there differences in survival when sub-stratified into CL1, CL2, CL3. To achieve enough statistical power, METABRIC and TCGA may need to be combined in the survival analysis.
2) Given that Claudin-low tumors exhibit marked immune and stromal cell infiltration, were there any differences in immune cell composition in CL1, CL2 or CL3. The authors should consider using gene expression data estimations (ESTIMATE) and pathological evaluation of H&E sections, which available for at least the TCGA (https://cancer.digitalslidearchive.org/) dataset. In addition, the authors should consider estimating immune cell composition with cibersort or xCell.
3) After purity threshold, were the 1270 remaining tumors re-subtyped for PAM50 with genefu? Subtype calls for the same samples have the potential to shift greatly when subsets are analyzed. 4) Was there any enrichment in unique histological subtypes among the claudin-low subtypes?

Initial version Revised Version
Main In performing a comparison to non-Claudin-low tumors the authors, exclude the Claudin-low tumors with 'outlier' tumor purity very early in their analyses. They just choose to analyze the remaining 45 claudin-low tumors with sufficiently high tumor purity. Given that tumor subtypes are increasingly characterized by their microenvironment, this first step in the analysis is discussable." We agree with the comment made by Reviewer #1 that claudin-low tumors are generally considered as highly infiltrated by non-tumor cells present in the microenvironment. Considering this issue, it is important to stress that the main objective of our study (as highlighted in the introduction of the manuscript) was not to generally characterize claudin-low tumors, but rather to decipher the genomic architecture of these malignancies as a way to gain insight into their developmental origin. Hence, a purity-based selection of tumor samples was an absolute necessity (as also underlined by Reviewer #2) to unequivocally demonstrate the existence of three distinct claudin-low subgroups, and more specifically the existence of claudin-low tumors exhibiting a low level of chromosomal instability. The latter notion is of critical importance because samples with a paucity of genomic aberrations are generally considered as tumors with massive contamination. Of note, as now mentioned in the revised version of the manuscript, while most of our analyses were indeed performed on tumors selected for their high tumor purity, the 3 subgroups of claudin-low tumors are also found when studying the whole  We agree with the reviewer that the number of samples analyzed following the purity -based selection was relatively low, though as discussed in #1.1., this selection was an absolute necessity to decipher the genomic architecture of claudin-low tumors, the main objective of our study. To verify that the purity-selected tumor cohort was representative of the whole cohort (without purity selection), we validated that repartition of TNBCs within the molecular subtypes was not biased by this selection, as illustrated in Supplementary Figure 1c

"Second it appears that, in general Claudin-low have very low tumor-percent. Have the authors checked whether within the discovered here subgroups of Claudin-low do not differ tumor percent? by this I mean: do Claudin-low in IC3 have same tumor percent than non-claudin low in IC3, same for IC4 and IC10?"
If we correctly understand, there are two different questions raised by Reviewer #1.
First, is the tumor purity similar within the three claudin-low subgroups? This was shown in Figure 5 of the manuscript (purity row in heatmap annotation), in which we conducted an unsupervised clustering analysis based on the ASCAT median purity score of each subtype. CL1 presents a higher purity score compared to CL2 and CL3 subgroups in METABRIC cohort, and a slightly lower purity score compared to CL2 and CL3 subgroups in TCGA cohort. Nevertheless, this difference in the purity score does not impact the clustering of the three subgroups.
The Reviewer #1's second question is related to the comparison of tumor purity between claudin-low and non-claudin-low tumors within the integrative clusters, previously defined by Curtis and

"While the authors use all non-claudin low tumors to exclude the Claudin low with the lowest tumor
percent, the rest of the 30% Claudin-low analyzed (the one with decent purity) are not called claudinlow because of a lower tumor percent according to the main subtype they 'derive' from? In other words: from the literature appears that the definition of claudin low subtype is intermingled with the microenvironment, so excluding it by setting filter on tumor purity the authors risk to produce skewed results." We are not sure to understand Reviewer #1's concerns and expectations considering his two consecutive statements, claiming that our claudin-low selection is both (1) not stringent enough, suspecting that claudin-low are "called claudin-low because of a lower tumor percent"; and (2) too stringent, excluding the microenvironment by setting filter on tumor purity "risks to produce skewed results".
Considering the first issue, it is noticeable that CL2 and CL3 display a statistically different purity from their non-claudin-low counterparts (CL2 vs luminal, p=0.03; CL3 vs basal, p=0.04), due to their intrinsic we identified luminal and basal tumor samples with a low degree of purity, indicating that the attribution of a tumor sample to a molecular subtype is not prejudiced by a low tumor purity .
Considering the second issue, as explained above, the main objective of our study is to decipher the genomic architecture of these malignancies rather than to generally characterize claudin-low tumors and their microenvironment. Nevertheless, we fully agree with both Reviewers #1 and #2 that we cannot overlook the microenvironment, as it might influence the development of the tumors.  Fig. 9a-d). Nevertheless, unsupervised clustering of microenvironment signatures (immune and non-immune cell subsets) did not allow to discriminate the different claudin-low subgroups (Supplementary Fig. 9e-h).
Overall, we agree that the purity selection strategy chosen for this study restricts the analysis of the full spectrum of claudin-low subgroup characteristics, but nonetheless it was the indispensable method to emphasize their unique genomic architecture. In particular, this analysis led to the demonstration of the existence of claudin-low tumors with a very low FGA. Moreover, we show that, while most of our analyses were indeed performed on tumors selected for their high tumor purity, the 3 subgroups of claudin-low tumors are also found when studying the whole cohort of claudin-low tumors (Supplementary Fig. 4).

"Concerning the GSEA and the ssGSEA, would the authors have obtained the same results by comparing normal breast tissue, versus luminal versus basal tumors? "
Reviewer #1 may suggest here that the CL1 subgroup could be normal breast tissue, and CL2 and CL3 could be luminal and basal tumors highly contaminated by non-tumor cells, calling into question the existence of claudin-low tumors as a true molecular subtype. Considering the CL1 subgroup, we have addressed this critical issue in Figure 1b

"It is a bit difficult to understand the methylation analysis. It seems that the heatmaps in Fig4 the authors plot the Beta-values for DNA methylation. Beta-values are usually [-1, 1]. The values in this plot seem different. It is in addition not described how the enrichment analysis is performed afterward."
We are surprised by this comment since the detailed methylation and enrichment analysis strategy

"The authors could make better use of their differential methylation analysis and assess whether differentially methylated CpGs are in the binding site of specific transcription factor which may explain the biology between Claudin low in general or CLs subtypes."
The question concerning the role of methylation on the binding of specific transcription factors and how it affects the biology of claudin-low subtypes is interesting, though we believe out of the scope of our current manuscript, and would require a specific investigation. Indeed, the choice we made was to perform a functional methylation analysis by selecting the genes for which methylation level was negatively correlated with gene expression. Thus, the methylation data used were pre-processed at the gene level and not at the probe level and therefore did not allow for a comprehensive interrogation of the methylated CpGs located in the binding sites of specific transcription factors targeting these genes.
Overall, we strongly believe that the analysis reported in the manuscript unequivocally demonstrates the distinct methylation profiles of claudin-low subgroups, correlated with gene expression.

"From Fig6G and summary Fig7, it seems clear that CL2 and CL3 are very related to Luminal and Basal-like subtypes. An arrow with EMT is then built. Is the EMT observed by the authors a difference in stromal/immune composition of these tumors?"
Since this comment is also addressed in the concluding remarks of Reviewer #1, we will provide a detailed explanation below (see #1.17.).

"Increased Cancer associated Fibroblast in CL2 and CL3 could 'produce' these EMT related enrichment in pathways. How can the author address this?"
As indicated in #1.5, we have assessed the microenvironment composition through the analysis of gene expression by using a signature-based deconvolution method. Results are shown in a new supplementary figure, referred as Supplementary Figure 9 in the revised manuscript. The analysis did not highlight any enrichment in cancer-associated fibroblast signature in CL2 or CL3, as compared to luminal and basal tumors, respectively (Supplementary Figure 9a-d).

"Finally see: https://www.biorxiv.org/content/10.1101/756411v1 and maybe comment not necessary in the text but at least for reviewers and editors how the results of this analysis is comparing to that reference."
As mentioned in the cover-letter joint to our initial submission, the manuscript entitled "Re-definition of claudin-low as a breast cancer phenotype" by Fougner C et al., available in BioRxiv (doi: https://doi.org/10.1101/756411) came to our attention as their data were highly complementary to our own. We strongly believe that our data strengthen their conclusions but also provide additional  This irrefutable demonstration allowed us to extend our investigation of claudin-low diversity through a comprehensive multi-omics analysis including: -genomic alteration (CNA and mutations) analysis (Pommier et al.: Figure 1b-c, Figure 2a-d, Figure   6d, Supplementary Figure 1a,c, Supplementary Figure 3 & Supplementary Figure 6d- Considering the first part of the Reviewer #1's concluding remark, we would like to emphasize again that the very reason of our study was to characterize the genomic architectures of claudin-low tumors as a way to gain insight into their developmental origin. It was therefore an absolute necessity to focus our analysis on purity-based selected tumors. We strongly believe that, the present characterization of the 3 subgroups of claudin-low tumors, associated with the first demonstration of the existence of claudin-low tumors with very low FGA, further supports the rationality of our approach. As for the second part of this comment, the illustration presented in Figure 7 of the initial version of the manuscript is not a demonstration but a model, as indicated in the discussion of the manuscript.  Figure 9 in the revised manuscript. Of note, according to our results described in Supplementary Figure 1, we used the ASCAT (allele-specific copy number analysis of tumors) copy number-based tumor purity estimation method to assess tumor cell fraction rather than using the ESTIMATE gene-expression based method, which is less appropriate when analyzing tumors with mesenchymal features.
Moreover, as requested by Reviewer #2, we have analyzed the data from TCGA of pathological evaluation of breast tumor tissue slides by molecular subtypes. As illustrated below, this analysis did not discriminate the three claudin-low subgroups and we decided not to include this figure in the manuscript.

"After purity threshold, were the 1270 remaining tumors re-subtyped for PAM50 with genefu?
Subtype calls for the same samples have the potential to shift greatly when subsets are analyzed." The reliability of the expression-based classifiers was taken into account in our study. Indeed, to establish the subtyping of breast tumor samples with the most optimal robustness, we used 5 different classifiers (PAM50, AIMS, SCMGENE, SSP2006 and SMOD2) rather than using only PAM50, as mentioned in the Methods section of the manuscript. Re-subtyping was not performed after purity  Table: Accuracy between breast cancer molecular subtype assignment before and after tumor purity threshold.

"Was there any enrichment in unique histological subtypes among the claudin-low subtypes?"
As also suggested by Reviewer #1 in #1.9., we have analyzed clinicopathological features (including tumor histological subtype, and also tumor stage, lymph node involvement and patient age, overall survival and disease-specific) in claudin-low and non-claudin-low tumors from METABRIC and TCGA combined cohorts after tumor purity selection. These data are now shown in a new supplementary figure, referred as Supplementary Figure 11 in the revised manuscript.
To conclude, we greatly appreciated the thoughtful comments of the reviewers and we believe that we have addressed each of them. We hope that the proposed modifications substantially improve our manuscript making it a valuable publication for the journal.
We are looking forward to hearing from you and thank you for the opportunity to resubmit our work for publication in Nature Communications.