Characterization of transcriptome diversity and in vitro behavior of primary human high-risk breast cells

Biology and transcriptomes of non-cancerous human mammary epithelial cells at risk for breast cancer development were explored following primary isolation utilizing conditional reprogramming cell technology from mastectomy tissue ipsilateral to invasive breast cancer. Cultures demonstrated consistent categorizable behaviors. Relative viability and mammosphere formation differed between samples but were stable across three different mammary-specific media. E2F cell cycle target genes expression levels were positively correlated with viability and advancing age was inversely associated. Estrogen growth response was associated with Tissue necrosis factor signaling and Interferon alpha response gene enrichment. Neoadjuvant chemotherapy exposure significantly altered transcriptomes, shifting them towards expression of genes linked to mammary stem cell formation. Breast cancer prognostic signature sets include genes that in normal development are limited to specific stages of pregnancy or the menstrual cycle. Sample transcriptomes were queried for stage specific gene expression patterns. All cancer samples and a portion of high-risk samples showed overlapping stages reflective of abnormal gene expression patterns, while other high-risk samples exhibited more stage specific patterns. In conclusion, at-risk cells preserve behavioral and transcriptome diversity that could reflect different risk profiles. It is possible that prognostic platforms analogous to those used for breast cancer could be developed for high-risk mammary cells.


Results
Relative viability and mammosphere formation were preserved properties across different media conditions inherent to individual samples. Ipsilateral high-risk CRC-viable cultures were derived from women with a range of breast cancer types. Advancing age significantly reduced likelihood of initial CRC isolation with no viable samples isolated from the six women ≥ age 70 years (Table 1, Fig. 1a). No effect from exposure to prior neoadjuvant chemotherapy was found. Eight matched invasive tumor samples available were added to the study for comparison. Secondary passage in MEGM was used to expand cultures in serumand phenol-red-free media in preparation for hormonal response testing. Age did not impact likelihood of growth in MEGM but showed an inverse relationship with measured viability scores (Fig. 1b,c). Because a range of viability was evident in MEGM, two additional mammary-specific media (EpiC and MammoC) along with CRC conditioned media (CM) were tested to determine the extent of media-specific behaviors. Although media differences within samples were identified, viability differences between samples were largely preserved across media (Fig. 1d). Viability was not significantly higher in CRC CM even though this modeled the initial isolation media, albeit without the feeder cell layer. Extent of mammosphere formation was similarly assessed in different media. Samples showed generally consistent behavior across media even with some within sample differences being present (Fig. 1e). A few samples were available for testing across passage. This showed variability between samples was preserved across passage (Suppl. Figure 1a, b). Because the matrix-free scaffold-based nano-culture plates permit both 2D and 3D growth, cell growth patterns were analyzed in each media. The majority of samples showed mixed mammosphere-monolayer cell growth across all media (Suppl. Figure 1c). Only two samples exhibited mammosphere-only growth across all tested media. Phenol-red and serum-free MEGM supported mammosphere growth in all samples. Based on these results, variability in behavior in culture was assessed as a preserved biological property inherent to each sample.
Relative expression levels of cell cycle related target genes of E2F transcription factors paralleled relative viability. Because cell viability was a preserved factor in samples, we asked if expression of genes positively regulating cell proliferation would correlate with relative viability. The HALLMARK_E2F_TAR-GETS gene set includes cell cycle related target genes of E2F transcription factors. While the highest viability samples showed relative higher expression of many E2F target genes, some samples demonstrated a more selective pattern (Fig. 2). Examples of genes that were recurrently expressed at higher levels in different viable samples included Aurora Kinase A (AURKA), Kinesin Family Member 4A (KIF4A) Stromal Antigen 1 (STAG1), CCCTC-Binding Factor (CTCF), Replication Protein A1 (RPA1), and Tissue Necrosis Factor (TNF). One sample showed higher expression of Serpin Family B Member 2 (SERPINB2). This analysis showed that differences in sample viability on secondary passage were linked to transcriptome differences present in the cells upon their initial isolation in CRC.  (Fig. 3a). Of the nine samples available for concurrent testing in 4-OHT, viability was unchanged in six. Three samples showed higher viability than in the control condition but each had a different pattern. There were no consistent differences between E2 responsive and non-responsive samples in patterns of mammosphere growth (Fig. 3b). One hundred and ninety-four genes were expressed at significantly higher levels and one hundred and forty-eight genes at significantly lower levels in the E2 responsive samples (Fig. 3c). Genes with at least two-fold differences were analyzed by GSEA for significant gene set enrichment. The HALLMARK_TNFA_SIGNALING_VIA_NFKB and HALL-MARK_INTERFERON_ALPHA_RESPONSE gene sets were enriched in the E2 responsive samples (Fig. 3d) while HALLMARK_OXIDATIVE_PHOSPHORYLATION, HALLMARK_P53_PATHWAY and HALLMARK_ CHOLESTEROL_HOMEOSTASIS gene sets were enriched in the E2 non-responsive samples (Fig. 3e). Differences in expression levels of specific genes within these enriched gene sets between samples responsive and non-responsive to E2 are illustrated using log scales (Fig. 3f,g). In summary, transcriptome differences between samples associated with absence or presence of increased viability in response to E2.
Genes linked to mammary stem cell formation were expressed at higher levels in samples from women previously exposed to neoadjuvant chemotherapy. To explore the hypothesis that exposure to neoadjuvant chemotherapy might alter gene expression in the ipsilateral non-cancer samples, we tested the number and character of differentially expressed genes (DEGs) in all ipsilateral samples available from individuals with Erb-B2 Receptor Tyrosine Kinase 2/HER2 positive (HER2+) breast cancer, half of whom had received neoadjuvant chemotherapy. The analysis was limited to this subgroup because this was the only breast cancer subtype that was consistently associated with neoadjuvant chemotherapy administration in our sample set. Limiting the analysis to samples from individuals all of whom had HER2+ invasive breast cancer enabled us to match for any theoretical differences in the ipsilateral high-risk cells that might be associated with development of HER2+ breast cancer. GSEA analyses was performed to identify C2 gene sets enriched in samples from individuals exposed to neoadjuvant chemotherapy and gene sets enriched in samples from individuals not-exposed to chemotherapy. Significantly, the 328 up-regulated genes from samples exposed to neoadjuvant chemotherapy showed enrichment of the LIM_MAMMARY_STEM_CELL_UP gene set while the 189 downregulated genes, identified due to their higher expression in the absence of neoadjuvant chemotherapy exposure, showed enrichment of the LIM_MAMMARY_STEM_CELL_DOWN gene set (Fig. 4a-c). Relative expression levels of individual mammary stem cell related genes differentially expressed in samples from individuals with and without exposure to neoadjuvant chemotherapy from these two gene sets are shown using a heat map (Fig. 4d). Identification of a pattern of differentially expressed genes in the neoadjuvant-exposed samples that includes genes that are consistently up-and down-regulated in mammary stem cells suggests that mammary stem cell populations could be enriched in the ipsilateral breast tissue of individuals exposed to neoadjuvant chemotherapy. www.nature.com/scientificreports/  www.nature.com/scientificreports/

Both ipsilateral non-cancer and cancer samples demonstrated aberrant patterns of pregnancy-related gene transcription. Pregnancy-associated mammary gland development is characterized
by stages of stereotypic gene expression changes regulating normal mammary cell proliferation and differentiation. Abnormal expression patterns of these genes could underlie unregulated cell proliferation associated with breast cancer. To explore this question, sample transcriptomes were queried for adherence to normal pregnancyassociated gene expression patterns. A heat map illustrating changes in gene expression during four different stages of pregnancy (A-D) was developed utilizing mouse transcriptomic data with genes that are members of established breast cancer prognostic profiles indicated by asterisks (Fig. 5a). Relative expression patterns of pregnancy-associated genes in cancer samples were similarly arrayed in a heat map and sorted by expression pattern to determine if there were any links to patterns during pregnancy. Links were found, but patterns invariably overlapped different pregnancy stages, consistent with the notion that cancer cells demonstrate abnormal patterns of genes that normally are not expressed together (Fig. 5b). To address the question of whether or not the limited number of pregnancy-associated genes that are members of an established breast cancer prognostic platform are associated together in established pathways, they were subjected to GSEA analysis (Fig. 5c). This demonstrated that they show significant enrichment in cancer pathways, including breast cancer specific ones, illustrating the potential significance of up-regulated expression in high-risk samples. Finally, relative expression patterns of pregnancy-associated genes in the high-risk samples were arrayed in a heat map and sorted by expression patterns (Fig. 5d). Expression patterns arrayed differently than the cancer samples but still showed links to different pregnancy stages. One-third of the samples showed an overlapping pattern of pregnancy-associated gene expression (A/B/D), similar to that seen in some of the cancer samples. IPSI10 and T10 samples from the same individual showed the same pattern but other cancer samples showed divergent patterns from www.nature.com/scientificreports/  www.nature.com/scientificreports/ their ipsilateral counterparts. A portion of samples more closely resembled single pregnancy stage profiles. In summary, a portion of the high-risk cells studied exhibited expression profiles similar to a pregnancy profile.

Discussion
The overall goal of the study was to study the biology and transcriptional profiles of human mammary epithelial cells at higher than normal risk for breast cancer development. Because women who develop one breast cancer are at risk for secondary breast cancer development, this was approached through isolation of primary mammary epithelial cells from non-cancerous breast tissue obtained from mastectomies performed for treatment of invasive breast cancer. The conditionally reprogrammed cells (CRC) technique was utilized for initial isolation for its efficiency in isolating primary epithelial cells. Transcriptomic studies were performed using cells from this initial isolation condition. Biological studies were performed in mammary-specific media to maintain appropriate differentiation status and provide a suitable environment for testing hormonal response. All studies were performed using low passage number cells to minimize changes that can come with extended passage.
There is reason to consider that the sample diversity identified in this study could be reflective of different risk probabilities for future cancer development. Inspection of the data provides different behavioral and transcriptional profiles for individual samples. This can range significantly. For examples, some samples showed the combination of high viability and E2 responsiveness with a transcriptome including higher E2F target and proliferative mid-pregnancy gene expression that included known breast cancer prognostic genes. Other samples were equivalently E2 responsive but showed an overlapping pattern of different pregnancy stage gene expression. Others showed low viability with or without E2 responsiveness. Without long-term clinical follow-up one cannot define if these different patterns are predictive of different outcomes but they are measurable differences between samples. Only subsets of women develop either primary or secondary breast cancer. The approach utilized here provides additional understanding of at-risk mammary epithelial cell biology that could be further exploited for improvement of secondary breast cancer risk profiling 3,4 .
The study demonstrated associations between behavior and gene expression. Retention of an estrogen growth response correlated with expression of genes enriched in TNF alpha and Interferon alpha signaling pathways. Both of these pathways are established contributors to estrogen signaling 40,41 . Relatively high TNF alpha expression was found in one of the viable samples that lacked high expression of E2F cell cycle target genes. It is not uncommon for cancer cells to rely on a specific growth factors such as TNF alpha for viability 36 . Cancer growth can also be reliant on high expression of specific E2 cell cycle target genes including AURKA, KIF4A, STAG1, CTCF, and RPA1, all of which were identified highly expressed in at least one of the at-risk samples studied here [34][35][36][37][38][39] . One sample with low overall E2F target genes expression demonstrated high levels of SERPINB expression, which is associated with maintenance of cell viability 37 . These findings can be interpreted as evidence that high risk cells are poised for future cancer development due to transcriptional changes that favor cell growth and survival. However, whether or not the high-risk cultures could be contaminated with cancer cells has to be considered. This possibility cannot be absolutely excluded but the experimental design included all possible controls for this variable. Ipsilateral samples were taken from breast geographically distinct from the invasive tumor and assessed as non-cancerous on gross examination with mirrored tissue samples from all submitted samples examined histologically for any microscopic evidence of cancer cells. Contaminated samples were excluded from the study.
One can speculate on whether or not the in vitro differences in E2 and 4-OHT response would reflect in vivo response. Anti-hormonals are the mainstay for prevention of secondary cancers for individuals with ER+ cancer, but there is variability in response and resistance can develop 42,43 . Whether or not prospective in vitro testing of hormonal response would ever be clinically useful is an open question. The overall goal for this study, to test a possible platform for evaluating hormonal response of primary mammary epithelial cells and evaluate transcriptome associations was completed. Replication of these results with associated clinical follow-up of anti-hormonal response would be a possible next step.
Neoadjuvant chemotherapy exposure was associated with enrichment of normal mammary gland stem cell gene sets in the high-risk cells 11 . Specific genes relevant to mammary stem cells that were found up-regulated include leukemia inhibitory factor receptor (LIFR), reported as promoting breast cancer stem cell renewal 44 and Thy-1 Cell Surface Antigen (Thy1), a gene associated with serial transplantation of mammary epithelial cells 45 . . Neoadjuvant chemotherapy exposure associated with gene expression changes linked to mammary stem cell formation in ipsilateral high-risk non-cancer samples. (a) Bar graph presenting the top ten C2 gene sets with the lowest significant FDR q-values identified from the MSigDB Collection utilizing genes expressed at significantly higher levels in samples from individuals exposed to neoadjuvant chemotherapy. (b) Bar graph presenting the top ten C2 gene sets with the lowest significant FDR q-values identified from the MsigDB Collection utilizing genes expressed at significantly lower levels in samples from individuals exposed to neoadjuvant chemotherapy. (c) Bar graph indicating numbers of up-regulated and down-regulated DEGs identified in non-cancer ipsilateral samples from women with ER+/HER2+or HER2+ cancer that were exposed or not exposed to neoadjuvant chemotherapy.   www.nature.com/scientificreports/ Specific stem-cell related genes that were down-regulated included CD24, a gene which when down-regulated is associated with a breast cancer stem cell phenotype 46 . While we posit that expansion of the mammary stem cell population may be a normal homeostatic restoration of breast tissue post-chemotherapy given chemotherapy can induce lobular atrophy 47 , the character of the surviving stem cells merits additional study. It is known that breast cancer stem cells can survive chemotherapy and the expression pattern identified does have links to both normal and neoplastic mammary stem cells. Whether or not the enriched population found here is reflective of a protective healing response or might provide precursors for future cancer stem cell development is unknown 48 . Hormone receptor expression is a marker for secondary breast cancer risk 49 . The analysis presented here included three hormone receptor positive samples with neoadjuvant chemotherapy exposure (IPSI21, IPSI19, IPSI15) and three hormone receptor positive samples without receipt of neoadjuvant chemotherapy (IPSI18, IPSI5, IPSI14). Transcriptome data was used to approach a long-standing hypothesis in the field, that is, aberrant expression patterns of genes expressed in normal development might serve as risk factors for cancer generation 50 . Nine of the 52 pregnancy-associated genes in the presented profile are members of at least one established breast cancer prognostic screen 14 . There are pregnancy-related genes known to have clear links to human breast cancer such as BIRC5 14 and STAT5A 51 , which were found increased in a subset of samples here. A question we considered is whether or not the patterns found related to different menstrual cycle phases 9 . While the study did include women under the age of 50, differences were found between patterns seen here and the luteal phase patterns. Samples with higher expression of five of the 10 genes expressed during the luteal phase were found but expression was limited to these five genes (BRCA1, CCNB1, MKI67, BIRC5, EZH2, PCNA) and the full luteal pattern was not seen. Future studies would be recommended to include menstrual cycle history at the time of collection to directly address this question. Identification of both cancer and high-risk samples with combinations of gene expression that are not found during normal pregnancy development is consistent with the possibility of a link between deregulated expression of pregnancy genes and cancer risk.
From a technical aspect, CRC-technology was a cost-effective approach to develop a collection of high-risk human breast cells that could be effectively exploited for comparison of growth characteristics under different mammary-specific media and hormonal conditions. Use of matrix-free scaffold-based nano-culture plates enabled us to process replicate samples in parallel at reasonably high efficiency for simultaneously scoring of viability and mammosphere growth in different media and under different conditions 14 . Consistent with previous reports that CRC technology supports preservation of stem/progenitor cells 21 , we found uniform ability to form mammospheres upon secondary culture.
In conclusion, at-risk cells preserved behavioral and transcriptome diversity that could reflect different risk profiles, providing a baseline for further development of possible prognostic platforms for breast cancer risk. It is clear that platforms would be challenging to build. For example, they would require clinical validation that might take decades to develop, given the long time frame for breast cancer recurrence. However, utilization of a CRC-based approach would make this feasible as it enables bio-banking of primary breast at-risk cells that can be renewed as needed. This allows them to be available for recurrent experimentation as new technologies and new questions arise over a long-term follow-up study.
Limitations of the study include inadequate sample numbers from men and individuals over age 70. Previously, we used CRC technology to successfully isolate primary squamous cell carcinoma metastatic to salivary gland from individuals aged 70-80 20 , but metastatic epithelial cancer cell and non-cancer breast epithelial cell biology are significantly different and alternative approaches for non-cancerous mammary epithelial cells from aging individuals may be needed 19 . Only one site was utilized due to funding limitations and to facilitate successful handling procedures but future studies could utilize different sites to help expand diversity. Samples from women with ER+/HER2+ breast cancer were relatively over-represented (22%) and samples from triple-negative under-represented (7%) compared to U.S. population frequency (13.4% and 13%, respectively) 52 . The distribution towards successful isolation from younger age individuals contributed to this under-representation because the majority of triple negative samples submitted were from older women. Because the study was designed to characterize in vitro cell behavior at the lowest passage number possible to limit media-induced behavioral and gene expression changes 19 , few samples were studied across different passages. Passage five was the highest passage number evaluated. Because the study focused on primary cell behavior, establishment of cell lines was not attempted. A limitation was that it was not possible to obtain primary cell cultures of the same sample at the same  www.nature.com/scientificreports/ passage at different points in time so each sample could be studied in temporally distinct experiments, rather than with multiple replicates at a single timepoint as was performed here. However, while it was not feasible to test all samples in more than one experiment, a subset of samples that passaged well were repetitively studied in two or three different experiments across passage. Results showed general consistency across the individual experiments (Supplementary Figure 1). Funding was insufficient to include a limiting dilution analysis for mammosphere formation. Protein expression validation was also not able to be included, however, in previous studies we have validated concordance between quantitative and specific RNA and qualitative protein measurements 19,20 . The pregnancy gene expression profile was developed from RNAseq of whole mammary gland tissue, which would also contain populations of stromal and other cells that could influence gene expression levels shown 53 . To help address this issue, the pregnancy-related gene profile was limited to genes known to be expressed in mammary adenocarcinoma cell lines. Although both ipsilateral and cancer sample tissues were evaluated to confirm histological identity, the possibility remains that either an ipsilateral sample would contain some infiltrating tumor cells or, conversely that a tumor sample might contain non-cancerous cells that could affect results.

Methods
Research involving human participants. This study was approved by the Institutional Review Board (IRB) of the Office of Research Oversight/Regulatory Affairs, Georgetown University. It was determined to impose minimal risk on participants. Informed consent was obtained. All research was performed in accordance with relevant guidelines/regulations. This included deidentification of all samples and medical records with assignment of unique GUMC identifier.
Human sample acquisition and RNA sequencing. Experiments were designed to ethically collect deidentified human breast tissue from living individuals with a diagnosis of breast cancer or at high-risk for breast cancer, with biospecimens obtained from surgically excised tissue not needed for pathologic diagnosis by a board-certified pathologist at room temperature, stabilized by immersion in CRC-compatible media, with mammary epithelial cells isolated from minced tissue incubated in media containing a collagenase/hyaluronidase/ dispase solution followed by culture at 37 °C utilizing CRC conditions [18][19][20][21] , and viably stored in liquid nitrogen (− 196 °C) for < 1 to 4 years. Ipsilateral non-cancerous breast tissue (volume 2 cm 3 ) were procured from grossly unremarkable fibroadipose mammary parenchyma, at least 6-8 cm away from any tumor. Breast cancer tissue (volume 2 cm 3 ) was included for comparative evaluation when consent provided. Tissue was processed into two "mirrored" (volume 0.5 3 -1 cm 3 ) samples: one for CRC technology and one for formalin-fixation. Deidentified pathology reports provided clinical and pathology diagnoses including age, breast cancer subtype, pathologic stage, BRCA gene mutation status, lymph node or other metastasis, and history of neoadjuvant chemotherapy. Numbers of ipsilateral tissue samples for acquisition were determined prior to the experiment (n = 50). Fifty-six samples were obtained. The only study inclusion factor was that the individual was undergoing mastectomy for breast cancer treatment. Final classification of samples as invasive tumor or non-cancer was determined after review of H&E sections of the formalin-fixed "mirrored" tissue by a board-certified pathologist. One ipsilateral non-cancer specimen was re-classified as invasive tumor. All samples very processed equivalently and no ipsilateral or invasive tumor submitted sample was excluded from the study. Unique Georgetown University Medical Center (GUMC) primary cell culture identifiers were assigned following initial isolation 20 . Epithelial cell growth was separated from the underlying fibroblast feeder layer by differential trypsin treatment, divided, and viably frozen as Passage 0 (P0  20 . Quality check (FastQC), quality trimming (Trim galore) and alignment (STAR) were performed 54 according to library preparation method with batch effect normalization 55 . Normalized expression levels were estimated by means of transcripts per million (TPM) using RSEM 56 . Sequences were mapped to a merged human + mouse genome file (HG38 + mm10) to assess for mouse fibroblast feeder contamination of epithelial cell pellets. Samples with > 10% mouse sequence contamination were excluded from further analysis. Sequences were then mapped to a mycoplasma genome file for detection of mycoplasma contamination 29 . Samples with detectable mycoplasma sequences were excluded from further analysis. Sample numbers retained following screening for human and mycoplasma sequences and confirmed diagnosis of invasive tumor in the individual from whom samples were obtained were ipsilateral (IPSI) n = 25 (97% ± 0.44 (mean ± SEM) human sequence) and invasive tumor (T) n = 8 (97% ± 0.86 human sequence) (  57 were analyzed in samples from individuals that received neoadjuvant chemotherapy prior to sample acquisition (n = 4) as compared to samples from individuals that did not (n = 4). Significantly up-and down-regulated DEGs were analyzed separately (C2 gene sets, GSEA) 31,33 . Enrichment in gene sets related to mammary epithelial stem cells were identified for both (accessed July 2021) (FDR q-value < 0.05). TPM values of the genes found enriched in these two gene sets were normalized and relative expression levels visualized (Heat Map, GraphPad Prism 9.3.1). All cultures with sufficient numbers of cells for this evaluation were tested for significant differences in viability and mammosphere formation in the presence of E2 (n = 18). As a comparator, viability in 4-OHT was examined in the nine samples that had additional sufficient cells for this evaluation. The transcriptomes of samples with significantly higher viability in E2 were compared to those with equivalent viability to identify genes expressed at significantly different levels (Multiple unpaired t tests with Welch correction, Two-stage step-up, Graphpad Prism, q ≤ 0.004 considered statistically significant). Significantly differently up-and down-regulated genes were analyzed separately for enrichment in HALLMARK gene sets (GSEA, accessed February 2022, FDR q-value < 0.05). TPM values of genes found in enriched gene sets were visualized on bar graphs (GraphPad Prism 9.3.1).

Pregnancy-linked breast cancer risk genes.
A heat map illustrating changes in relative gene expression in the mammary gland during pregnancy was generated using downloaded data generated from virgin and pregnant 2-month-old mice (GSE70440) 8 (GraphPad Prism 9.3.1). Four patterns (A-D) of gene expression changes during pregnancy were identified based on their relative expression levels in non-pregnant (virgin), day 13 and day 18 pregnancy. Pregnancy-related genes that are included in at least one validated breast cancer prognosis platform were identified 26 and subjected to GSEA analysis for C2 gene set enrichment. Heat maps of pregnancy-related genes were generated separately for invasive tumor (T) and ipsilateral (IPSI) samples (Graph-Pad Prism 9.3.1). Gene expression patterns of the individual samples were visually sorted by similarity. Three different patterns were recognized in the invasive tumor samples and six different patterns in the ipsilateral samples. These patterns were then compared to the four pregnancy-related patterns. Patterns were assigned to a single pregnancy-stage-related pattern or overlapping pregnancy-stage-related patterns dependent upon the gene expression data. www.nature.com/scientificreports/ Analyses for chromosomal deletions and amplifications. Transcriptomes from four mycoplasmafree validated invasive tumor/ipsilateral pairs were available to assess for known breast cancer-related chromosomal deletions/amplifications (Suppl. Table 1) 57 . DEGs between T4/IPSI4, T13/IPSI13, T3/IPSI3 and T10/ IPSI10 were identified using DESeq2 (Padj < 0.05 considered statistically significant). Chromosomal regions of differentially expressed genes were identified using C1:positional gene sets (Molecular Signatures Database v7.4 C1, nominal p value < 1%) (accessed August 2019) 31,33 . cBioPortal (accessed February 2021, nine different human breast cancer databases queried) was used to identify specific percentages of chromosomal region/gene amplifications/deletions in the paired samples 60-62 . Statistical analyses. For RNAseq, quality check (FastQC), quality trimming (Trim galore) and alignment (STAR) were performed 54 according to library preparation method and batch effect normalization conducted 55 . Normalized expression levels were estimated by means of transcripts per million (TPM) using RSEM 56 and differentially expressed genes were identified using DESeq2 (padj ≤ 0.05 considered statistically significant) 57 . Mean ± standard error of the mean (SEM) for patient ages, viability measurements and mammosphere counts were calculated (GraphPad Prism 9.3.1). Ordinary one-way ANOVA with Brown-Forsythe test and Sidak's multiple comparisons test was used to compare probability of CRC isolation by age and neoadjuvant chemotherapy exposure (p ≤ 0.05 considered statistically significant, GraphPad Prism 9.3.1). Fisher's exact, two-tailed was used to compare probability of MEGM passage in cancer versus non-cancer cells (p ≤ 0.05 considered statistically significant, GraphPad Prism 9.3.1). 2way ANOVA was used to analyze for statistically significant interactions between samples and media for viability and mammosphere numbers for groups with three and four media (p ≤ 0.05 considered statistically significant, DF values reported in figure legend 1, GraphPad Prism 9.3.1). Multiple unpaired t-tests using FDR approach (Two-stage step-up method of Benjamin, Krieger and Yekutieli, GraphPad Prism 9.3.1) was used to examine for statistically significant differences for two media comparison of viability and mammosphere numbers, differences in hormonal response for viability, mammosphere numbers and TPM values for hormonal response (p < 0.05 statistically significant, t, df values reported in figure legend 3, GraphPad Prism 9.3.1). Simple linear regressions were conducted to analyze for significant relationships between MEGM viability and patient age (p ≤ 0.05 statistically significant, GraphPad Prism 9.3.1). Regression equations, p and R squared values presented on regression scatter plot graph (Fig. 1c). Scatter plots, stacked bar graphs, bar graphs and heat maps were prepared in GraphPad Prism 9.3.1.

Ethical approval. This study was approved by the Institutional Review Board (IRB) of the Office of Research
Oversight/Regulatory Affairs, Georgetown University.
Informed consent. Informed consent was obtained prior to sample acquisition.