Introduction

Breast tissue changes over a woman’s lifetime, altered by endogenous and exogenous factors. Previous studies have evaluated gene expression and histological alterations in association with putative breast cancer risk factors, such as age, obesity, and mammographic density, and their findings provide important insights into breast cancer etiology1,2,3,4,5,6,7,8. In addition, recent research has characterized changes in histologic and molecular features of benign tissue adjacent to breast tumor, and linked these changes to breast cancer outcomes9,10,11,12, demonstrating the potential for benign breast tissue to inform breast cancer outcomes or risk stratification of pre-cancerous breast. For further development of biomarkers based on benign breast gene expression, careful consideration must be given to biomarker characteristics.

Low intra-individual and high inter-individual variabilities are important properties of reliable biomarkers, determining test reliability and reproducibility. Although many studies have considered inter-individual variation and technical variation introduced by experiment13,14, intra-individual transcriptional differences in normal breast tissue, as well as factors that contribute to intra-individual variability are not well characterized. Few studies of benign breast tissue have had sufficient resampling of tissues to allow assessment of whether a single biospecimen reliably represents the state of breast tissue. On the histologic level, we previously evaluated intra-individual variation in normal breast composition and found substantial variability in stromal and epithelial contents upon repeated sampling6. The impact of these histological differences on gene expression is not well characterized but is crucial for the reliability of transcriptional biomarkers.

This study employed a hierarchical sampling structure in the Normal Breast Study (NBS) to investigate intra-individual variation in benign breast gene expression. We analyzed the microarray gene expression data of 288 samples from 170 women using multiple samples per patient collected from different regions and separate sections of the breast. In addition, we identified participant or breast tissue histological characteristics that were associated with intra-individual variability.

Materials and Methods

Study population

The NBS is a study of breast cancer microenvironment and normal breast tissue conducted at UNC Hospitals in Chapel Hill, North Carolina. The details of study design and participant recruitment were published previously15. Briefly, participants were women ≥18 years who underwent breast surgery between October 2009 and April 2013, and consented to donate breast tissue. The NBS study population included 399 women with breast cancer and 75 women without malignant disease. All participants donated at least one histologically normal breast tissue specimen (as assessed by pathology assistants at UNC Hospitals). A telephone interview was conducted to collect demographic and breast cancer risk factor exposure data. Medical records were abstracted to obtain patients’ medical history, mammographic screening, breast cancer–related treatment, tumor pathology, and breast surgery data. All participants provided written informed consent, and all study protocols were approved by the UNC School of Medicine’s Institutional Review Board. All methods of data collection and analysis were performed in accordance with relevant guidelines and regulations with appropriate quality control.

This analysis included 288 normal samples from 170 women of the NBS with gene expression data. Our primary intra-individual analysis was based on 57 women having multiple normal samples. To study gene expression variability at different levels, we employed the NBS hierarchical sampling structure: one or more blocks were sampled from each participant; one or more sections were sampled from each block; and mRNA expression was measured from each section with some mRNAs run twice on microarray platform (technical replicates). We also collected tumor gene expression data from a small number of breast cancer patients (n = 8) to qualitatively compare variability observed in normal tissue to that observed in tumors. The data structure, including number of normal samples per woman, is shown in Supplementary Table 1 and summarized in Fig. 1.

Figure 1
figure 1

Sampling structure of the UNC Normal Breast Study.

Tissue sectioning and image analysis

For each tissue block, sections of alternating width (100 µm and 20 μm) were cut over dry ice for histological and gene expression analyses. The 100 µm sections were used for RNA isolation as described below, while the 20-μm sections were stained with hematoxylin and eosin (H&E) and used for high-resolution scanning and histological composition annotation. The details of histological annotation by Aperio Scan-Scope XT Slide Scanner and Genie Classifier have been published previously3,6,15. Standard, validated algorithms were used to partition epithelium, non-fatty stroma, and adipose tissue (in mm2) and identify the number of nuclei per unit area. The number of cells per epithelial, stromal, and adipose tissue area were calculated to represent cellular density (in cells/mm2). Variability in these composition parameters have been described previously6, and the current analysis utilized mean values across blocks and sections.

mRNA isolation and microarrays

100 µm sections were homogenized as described previously15. mRNA was isolated following standard manufacturer protocols using RNeasy kits. The quality and quantity were analyzed on an Agilent 2100 Bioanalyzer and a ND-1000 NanoDrop spectrophotometer, respectively. Two-color 4 × 44 K Agilent whole-genome arrays (Version 1 or Version 2) were run on each mRNA samples, with the reference channel representing a strata gene Universal Human Reference RNA sample spiked with breast cancer cell line RNA (MCF-7 and ME16C) to increase expression of breast-specific genes. Expression data was preprocessed as follows: lowess-normalization, setting values of the probes that had a signal less than 10 dpi in either channel as missing, excluding probes that had more than 20% missing data across all samples, imputing missing values using k-nearest neighbors’ imputation (with k = 10), collapsing the replicate probes by averaging, and median-centering genes. A total of 12,429 genes passed the filters and were included in the analysis. Microarray data used in these analyses is publicly available through the Gene Expression Omnibus (under preparation).

Statistical analysis

Using the hierarchical sampling strategy of multiple blocks per patient, multiple sections per block, and multiple technical replicates per section, we studied intra-individual gene expression variation at each level. We used the variation-by-distance (VD) metric, wherein the variation was estimated by the Euclidean distance between samples (details of VD calculation are provided in supplementary materials). Higher values of VD indicate higher variability. We compared VD at block, section, and technical replicate levels using t-tests. We used a nested ANOVA to obtain the proportion of intra-individual variability attributed to block and section levels16. Since intra-individual variation may differ by signatures/pathways, intra-individual variation of several previously published signatures (Supplementary Table 2), including composition-related signatures (epithelial signature17, stromal signature17, and immune signature)18, tumorigenesis-related signatures (p53 signature19, proliferation signature20, and hypoxia signature)21, and risk factor-related signatures (age signature2, and obesity signature4 and parity signature)22, were also evaluated using the same methods. To exclude potential confounding due to different version, variation was estimated among samples measured by a single version of microarray platform.

To identify patient factors associated with high intra-individual variability in breast gene expression, we classified patients as ‘heterogeneous’ if inter-block VD was greater than the inter-block median VD (n = 44), or if inter-section VD was above the inter-block median VD (n = 13 women who did not have multiple blocks). We estimated the associations between high intra-individual variability and individual characteristics (age, menopausal status, obesity, race, parity, oral conceptive, hormone replacement therapy) or tissue characteristics (tissue source, histological area and nuclear density in adipose and epithelium and stroma compartments) using Fisher’s exact tests. Statistical significance was defined as p < 0.05. All statistical analyses were performed using R, version 3.0.1.

Results

Intra-individual variability of global gene expression profile

To assess global gene expression, we calculated intra-individual variability (variability across replicate blocks and sections), inter-individual variability (across women), and technical variability (using replicate microarrays on the same isolated mRNA samples) for our hierarchical samples of normal tissues, as well as inter-individual variability in tumor expression in a small set of samples. As shown in Fig. 2A, intra-individual variability (measured by VD) was lower than inter-individual variability, and significantly higher than the variability of technical replication (t-tests p < 0.01). Although block-level variation in global gene expression did not show significant difference from section-level variation, in the variation contribution analysis (Fig. 2B), blocks appeared to explain higher percentage of intra-individual variation than sections. In addition, we observed the percentage of intra-individual variation explained by block and section differed from gene-to-gene, accounting up to 40% in some specific genes. We therefore assessed how intra-individual variability impacted several multi-gene signatures.

Figure 2
figure 2

Global gene expression variability of histologically normal breast tissue. (A) Global gene expression variation at different levels. Y-axis is variation-by-distance (VD). Greater distance indicates higher variation. Inter-individual variability, n = 8 for tumor and n = 184 for normal; intra-individual variability, n = 49 for inter-block, n = 20 for inter-section, and n = 7 for technical duplicates. (B) Contribution of block and section to overall expression variation of genes. X- and Y-axes are percentage of contribution of block and section to the global gene expression variation, respectively. Each dot represents one gene. The diagonal solid line indicates the ideal situation where contribution of block and section is equal.

Intra-individual variability of the selected transcriptional signatures

We assessed the intra-individual variability of several selected multi-gene signatures, including previously published risk factor-related signatures (age, obesity, and parity), tissue composition-related signatures (stroma, immune, and epithelium), and tumorigenesis-related signatures (proliferation, p53, and hypoxia). As shown in Fig. 3, intra-individual variability was different depending upon the specific signature. Risk factor-associated signatures showed a similar pattern that was observed in global gene expression, with the lowest variability in technical replicates (block vs. technical replicate p < 0.01, section vs. technical replicate p < 0.01) and similar variability at block and section levels. For tissue composition-associated signatures, intra-individual variability was suggestively associated with spatial distance (p < 0.01 for trend test of variability from technical replicate to inter-section, and to inter-block), but tumorigenesis-associated signatures had low intra-individual variation, with inter-section and inter-block variations not statistically distinct from technical replicates.

Figure 3
figure 3

Gene expression variation of selected signatures.

Factors associated with intra-individual variability

To identify patient factors associated with intra-individual variability in gene expression, we categorized individuals as heterogeneous or homogeneous based on global gene expression, and evaluated this categorical variable in association with demographic and histologic characteristics (Table 1). Higher stromal density (OR = 3.42, 95% CI = 1.15–10.15) and low stroma area (OR = 0.29, 95% CI = 0.10–0.86) were significantly associated with intra-individual variability. These associations remained after adjusting for age (stroma density, adjusted OR = 3.25, 95% CI = 1.08–9.74; stroma area, adjusted OR = 0.31, 95% CI = 0.10–0.94) or BMI (stroma density, adjusted OR = 3.09, 95% CI = 1.01–9.48; stroma area, adjusted OR = 0.32, 95% CI = 0.10–0.96). Moreover, young age (<50 years, OR = 2.44, 95% CI = 0.79–7.51), lower BMI (<30 kg/m2, OR = 1.97, 95% CI = 0.65–5.95), and lower adipose density (OR = 1.90, 95% CI = 0.66–5.46) were suggested to be associated with increased intra-individual variation in gene expression profiles. These factors were also evaluated in the intra-individual variability of the selected transcriptional signatures. Similar to patterns for global expression, age and breast composition were associated with intra-individual variability.

Table 1 Characteristics and intra-individual variability in histological normal breast tissue.

Discussion

We evaluated intra-individual variation in benign breast gene expression using a hierarchical sampling scheme. We observed relatively high intra-individual reliability in global gene expression, despite that block-level variation was suggested to be slightly higher than section-level variation. However, the degree of intra-individual variability depended upon biological pathways/features, with composition-related signatures (e.g., stromal signature) showing a higher intra-individual variability than other biological-function specific pathways. We also observed that that histological tissue composition and key demographic variables (e.g., age, obesity) were associated with intra-individual variability in benign breast gene expression.

In the past decade, many studies have evaluated inter-individual gene expression in normal or cancer-adjacent tissue as a predictor of survival or in association with tumor characteristics or exposure history4,9,10,17,22. The reliability and reproducibility of these findings have yet to be well determined, and may depend, in part, on whether a single biospecimen procured at one point in time represents the underlying biology of interest. Increasing sample sizes is a straightforward way to tackle this issue, however, this approach is costly and must be driven by knowledge of how many samples are required to represent the relevant biology. Our findings have important implications for biospecimen sampling strategies, suggesting that for some signatures, a single sample may provide representative gene expression data. In particular, intra-individual variation was very low for proliferation, p53 and hypoxia signatures. Yet, we also observed that global gene expression and expression of composition-related signatures showed higher levels of variability between sections and blocks.

Our finding of high intra-individual variability of composition-related signatures is consistent with previous histological studies where histologic measures (e.g., stromal percent area and terminal duct lobular unit size) had a low/moderate agreement across different regions of the same normal breast tissue block6,23,24,25. It has also been previously reported that there is substantial variability of pathological and molecular characteristics within tumor tissue, according to cell mixture/tissue composition26,27,28,29. As an intrinsic feature, intra-tumor heterogeneity results from the dynamic evolution of tumor cells and their interaction with microenvironment, and plays an important role in breast cancer progression and therapy resistance30,31,32. For normal breast tissue, the extent to which intra-individual variability is an intrinsic feature of the breast, versus a reflection of a biospecimen sampling, has been poorly understood. To explore this question, we assessed the relationships between intra-individual heterogeneity in gene expression and several available woman/tissue characteristics. While these associations are largely insignificant, we did detect significant differences according to histological features including stromal nuclear density and area. Our results indicate that intra-individual variability may be strongly impacted by tissue composition. Besides histological features, we found that older and non-obese women tended to have breast tissue with more heterogeneous gene expression. Age and obesity are well-known risk factors for breast cancer and significantly affect breast tissue composition15,17,33, suggesting we cannot exclude the possibility that normal tissue heterogeneity is an intrinsic feature of breast, reflecting previous exposure history. Further study is needed to relate intra-individual heterogeneity to breast cancer risk or prognostic factors to clarify its intrinsicality and understand its biological significance in tumor development and progression. On the other hand, our results demonstrate that tissue composition may confound mRNA transcripts extracted from bulk normal tissue, particularly when normal tissue analysis is to ascertain more subtle effects of target phenotypes (e.g., breastfeeding) or epithelial cells are of interest.

Our findings should be interpreted in light of some limitations. Although multiple samples at different levels per patient provided a unique opportunity to study intra-individual variability, not all women donated samples at all levels, as sample availability was dependent on availability of normal tissue at time of surgery. Moreover, our study compiled histologically normal breast tissues from women undergoing a variety of procedures. Previous research suggests that cancer-adjacent tissue possesses some differences from benign tissues of disease-free women9,34,35,36. To test the potential impact of the samples from breast cancer patients, we conducted a series of sensitivity analysis: (1) checking the similarity of gene expression by tissue sources using principle component analysis (Supplementary Figure 1A); (2) re-evaluating the variabilities across different levels after excluding tissue samples with distance to tumor less than 1 cm (n = 15, Supplementary Figure 1B); (3) comparing the intra-individual variabilities between the whole samples (n = 57) and the subset (n = 42) (Supplementary Figure 1B). We did not observe remarkable changes in these sensitivity analyses. Therefore, we do not think the potential biological alterations in cancer-adjacent tissue will change the conclusions in our study significantly. Last, our sample size hampered the precision of our estimates to evaluate the association between intra-individual variation and breast cancer risk factors.

In summary, our study characterized intra-individual variation in gene expression of normal breast tissues, both globally and by selected transcriptional signatures. Our results underscore the importance of considering intra-individual variability in tissue-based biomarker development. Validation of our study findings in future studies is needed to further characterize heterogeneity in candidate histological and molecular biomarkers of breast cancer risk within normal breast tissue.