LPA receptor activity is basal specific and coincident with early pregnancy and involution during mammary gland postnatal development

During pregnancy, luminal and basal epithelial cells of the adult mammary gland proliferate and differentiate resulting in remodeling of the adult gland. While pathways that control this process have been characterized in the gland as a whole, the contribution of specific cell subtypes, in particular the basal compartment, remains largely unknown. Basal cells provide structural and contractile support, however they also orchestrate the communication between the stroma and the luminal compartment at all developmental stages. Using RNA-seq, we show that basal cells are extraordinarily transcriptionally dynamic throughout pregnancy when compared to luminal cells. We identified gene expression changes that define specific basal functions acquired during development that led to the identification of novel markers. Enrichment analysis of gene sets from 24 mouse models for breast cancer pinpoint to a potential new function for insulin-like growth factor 1 (Igf1r) in the basal epithelium during lactogenesis. We establish that β-catenin signaling is activated in basal cells during early pregnancy, and demonstrate that this activity is mediated by lysophosphatidic acid receptor 3 (Lpar3). These findings identify novel pathways active during functional maturation of the adult mammary gland.

. FACS enrichment of luminal and basal epithelial populations. (a) FACS contour plots of one representative sample per time point depicting the intensity of the CD24 and CD29 markers in the lineage negative population (purple points). Lin-CD24+CD29lo (luminal, LUM) and Lin-CD24+CD29hi (basal) populations are colored green and red, respectively. Each point in the plot represents a different cell. (b and c) Re-sort of 1,000 enriched luminal and basal epithelial cells, respectively, from each time point. The percentage of cells recovered in each population after the re-sort is noted next to its respective population on the graph. Each row represents a different time point as labeled in the grey boxes to the right of the graphs.   Figure S2 a. b.
d.  Figure S3. Spatio-temporal analysis of differentially expressed genes, biological validation and species conservation of cell subtype-specific genes. (a) Venn diagram representing the number of spatially (blue) and temporally (yellow) differentially expressed genes. The total number of expressed genes is on the top left. The overlapping area represents genes that are both spatially and temporally differentially expressed. (b) Venn diagram representing the number of genes that are luminal (green) or basal (red) in at least one time point. The total number of cell population specific genes in at least one time point is on the top left. Overlapping area shows the number of genes that are both luminal and basal specific at different time points. Smaller circles represent the number of luminal specific (purple) and basal specific (yellow) genes across all 8 developmental time points. (a and b) The arbitrary threshold for determining the number of significant differentially expressed genes was a relatively stringent fold change of 2 and FDR of 0.01.

Metabolism of lipids and lipoproteins
Transmembrane transport of small molecules GPCR ligand binding  c.
f. Green nodes are genes upregulated specifically during lactation (c) and early involution (f) in the luminal population, while light blue nodes represent genes whose expression is detected in the luminal population during lactation (c) and early involution (f).

Initial Validation of Sorted Populations
Because we were specifically interested in comparing the luminal and basal subtypes across the developmental time points, we attempted to maintain the same gates for each time point, omitting the third unidentified population during lactation and involution. However, due to differences in the intensity of the fluorescence labeled antibodies over time, we had to adjust the gates slightly for each sort (Supplementary Figure S1). The enriched luminal and basal cell populations were collected in wash media at 4°C. Finally, to validate the sorted populations, we re-sorted 1,000 cells from each of the samples and verified that they were enriched in their respective Lin -Cd24 + CD29 lo and Lin -Cd24 + CD29 hi populations, and there was less than 2% cell cross contamination (Supplementary Figure S1). following the kit protocol and measuring the quantity with a NanoDrop 3300 Fluorospectrometer (Thermo Fisher Scientific, Waltham, MA). Because of the low RNA yield, the quality of RNA was analyzed with the Agilent RNA 6000 Pico Assay on the Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA). A total of 3ng of RNA was used to assess the RNA quality.

RNA-seq Data Quality Assessment and Normalization
Post-sequencing, the data were automatically processed through the web-based system termed WASP (Wiki-based Automated Sequence Processor) that was developed, and is maintained, at Einstein 1 . Aside from its use as a sample submission and laboratory information management system, WASP conducts automated primary data analysis of massively-parallel sequencingbased assays. The WASP pipeline (version 3.0.60, rev 6425) was specifically designed to process and align sequencing reads. The raw RNA-seq files (FASTQ files) automatically uploaded onto the system and the sequencing quality metrics were assessed using FastQC 2 .
The WASP pipeline then trimmed the adapter sequences and aligned the reads against the reference mouse genome using the Genomic Short-read Nucleotide Alignment Program (GSNAP; version 2012-07-20) 3 . For the present study, the reference genome used was the National Center for Biotechnology Information (NCBI) Build 37.1 mouse genome obtained from the Reference Sequence (RefSeq) database 4,5 . The assignment of reads to genes was performed by htseq-count (v0.5.3p3), a component of the HTSeq package for analyzing highthroughput sequencing data 6 . The output of this pipeline was a samples (columns) by genes (rows) table with the total number of counts aligned to each gene and resulted in a sum of 37,991 mapped transcripts. Normalization was conducted in R v2.15.1 statistical software 7 using the edgeR and package available through Bioconductor 8 . The output data from the WASP pipeline were normalized following the documentation provided with the edgeR package.
Briefly, the total number of counts per million (CPM) were calculated for each sample by dividing the total number of reads by 1X10 6 . A scaling factor was then computed for each sample based on the total CPM, and these factors were used to calculate the library sizes. To avoid skewing of the data we removed genes with an extremely high number of reads (> 2X10 6 reads) and low variability across the samples. Low expressed genes across the samples (genes that have less than 1CPM across 3 samples) were also removed. The normalization and filtering methods used left a total of 17,606 unique genes for subsequent analyses.

Principal Component Analysis
The prcomp function that is part of the stats package in R 9 was used to calculate the amount of variability present between the samples. We then used a linear analysis of variance (ANOVA) modeling approach similar to that previously described 10 Table 1 below). The technical covariates are the RNA was isolation batch (Batch), the lane on which the samples were run (Lane), the RNA amplification batch was conducted (RNAamp) and the FACS sorting batch (Sorting time).  Figure S2a, black arrows)

Basal
Pregnancy Lactation principal component analysis on the dataset using the prcomp function that is part of the stats package in R 9 As with the correlation analysis, the samples clustered primarily based on the cell population. The same sample emerged as an outlier and was removed from the analysis (Supplementary Figure S2b, black arrow). The normalization was performed again and subsequent analyses were conducted using the remaining 46 samples. Hierarchical clustering and plotting of the correlation matrix using the heatmap.3 function in the GMD package 11 .  Table 2.

Real-time PCR
Supplementary Methods Table 2. Sequences of the human primers used for qRT-PCR.

Biological Validation of Cell Subtypes with Publically Available Datasets
The luminal and basal populations were biologically validated using a gene set derived from

Statistical Methods for the Identification of Spatially and Temporally Expressed Genes
ANOVA was used to identify genes that were significantly differentially expressed in at least one condition (see Supplementary Methods Table 1), defined as "expressed" genes, as previously described 21 . Based on our PCA analysis we were confident that the technical covariates did not significantly contribute to the overall variability between the samples, thus we did not include them in the linear model for ANOVA. After correcting for multiple testing using the Benjamini & Hochberg false discovery rate (FDR) method 22 , we set a statistical threshold of less than 0.01 (1%) to define the spatially and temporally "expressed" genes. This set of "expressed" genes was then used for subsequent analyses using edgeR package, available through Bioconductor in R, to determine cell type and time point specific genes 8 . Correction for multiple testing was performed using the Benjamini & Hochberg false discovery rate (FDR) method 22 and set a threshold of an absolute fold change > 2 and an FDR < 1%.

Reactome, Gene Ontology and KEGG Enrichment Analysis
The GOseq package 23 was used to test for enrichment of our gene sets in functional groups based on their annotation defined by the Reactome database 24 , Gene Ontology (GO) Consortium 25 , or KEGG database 26 . We tested specifically for enrichment in the Biological Processes category and set a p-value cutoff of 0.05. Functional interaction network analysis was conducted using the Reactome FI pug-in in cytoscape 27 .

Gene Set Enrichment Analysis
Using transcriptional profiling of basal and luminal cells during postnatal mammary gland development, we generated stage specific gene lists. Individual gene lists were converted for use as gene sets for Gene Set Enrichment Analysis 28 . Using GSEA and a published database of mouse mammary tumor models 29 , we queried individual mouse mammary tumor models for enrichment of basal or luminal stage specific gene sets.

Immunostaining of Tissue Sections
The mouse mammary inguinal gland was fixed in 10% formalin and paraffin embedded, 5 micrometer slices were sectioned and added onto slides. Slides with tissue sections were baked at 60°C for 1 hour, deparaffinized in xylene (twice for 10 minutes each time) and rehydrated in a series of 2 minute washes in ethanol (100%, 95%, 80%, 70%) followed by two washes in distilled water. Antigen retrieval was performed using the Vector® Antigen Unmasking Solution For immunofluorescence, both control and LPA treated HME50 cells were grown in tissue culture petri dish containing glass cover slips. After exposure to LPA for 72h, the cover slips were removed and fixed using 4% Paraformaldehyde (PFA) for 15min at room temperature (RT). This was then replaced by 0.1% Triton (dissolved in PBS) for 10min, RT. After washing 3x with PBS the coverslips were blocked with 3% goat serum for 30 min, RT. For simultaneous identification of basal and luminal cells, the cells were incubated for 1h with primary antibody mixture containing mouse monoclonal to Cytokeratin18 (luminal specific) and Keratin14 Polyclonal chicken antibody (basal specific). This was followed by 3 washes with PBS and exposure of the slides to secondary antibodies for 1hr containing goat anti chicken, Alexa Fluor 488 (green) and goat anti mouse, Alexa Fluor 555 (red). The coverslips were washed and dehydrated with ethanol at room temperature -70% for 3min; 90% for 3min; 100% for 3min. Quantification of bands intensities was carries out using ImageJ 30 . After subtraction of background intensities the β-catenin specific expression was normalized against its respective control (Histone H3 for the nuclear fraction andα-tubulin for the membrane fraction); plots and statistical analysis were carried out using GraphPad Prism version 6 for Windows (GraphPad Software, La Jolla California USA, www.graphpad.com).