Genomic and expression profiling reveal molecular heterogeneity of disseminated tumor cells in bone marrow of early breast cancer

Detection of disseminated tumor cells (DTCs) in bone marrow is an established negative prognostic factor. We isolated small pools of (~20) EPCAM-positive DTCs from early breast cancer patients for genomic profiling. Genome-wide copy number profiles of DTC pools (n = 45) appeared less aberrant than the corresponding primary tumors (PT, n = 16). PIK3CA mutations were detected in 26% of DTC pools (n = 53), none of them were shared with matched PTs. Expression profiling of DTC pools (n = 30) confirmed the upregulation of EPCAM expression and certain oncogenes (e.g., MYC and CCNE1), as well as the absence of hematopoietic features. Two expression subtypes were observed: (1) luminal with dual epithelial–mesenchymal properties (high ESR1 and VIM/CAV1 expression), and (2) basal-like with proliferative/stem cell-like phenotype (low ESR1 and high MKI67/ALDH1A1 expression). We observed high discordance between ESR1 (40%) and ERRB2 (43%) expression in DTC pools vs. the clinical ER and HER2 status of the corresponding primary tumors, suggesting plasticity of biomarker status during dissemination to the bone marrow. Comparison of expression profiles of DTC pools with available data from circulating tumor cells (CTCs) of metastatic breast cancer patients revealed gene expression signatures in DTCs that were unique from those of CTCs. For example, ALDH1A1, CAV1, and VIM were upregulated in DTC pools relative to CTCs. Taken together, analysis of pooled DTCs revealed molecular heterogeneity, possible genetic divergence from corresponding primary tumor, and two distinct subpopulations. Validation in larger cohorts is needed to confirm the presence of these molecular subtypes and to evaluate their biological and clinical significance.

transferred to a 12 x 75 polystyrene tube and subjected to second round of magnetic separation for 5 min.
The supernatant was aspirated and the cells were resuspended in 150 μL of 1x phosphate buffered saline (PBS). A 20-μL solution containing equal volumes of anti-CD45 (2D1) PerCP-Cy5.5 and a nucleic acid dye, Thioflavin (BD Biosciences), were added. The sample was incubated in the dark for 15 min, and then 1x PBS was added prior to flow cytometry. All reagents and antibodies were obtained from BD Biosciences.
The optimized FACS gating strategy shown in Supplementary Figure 7 was used for isolation of DTCs. In addition to forward and side scatters, the fluorescence signal intensities from the following stains: (1) EPCAM (EBA-1) mAb conjugated to phycoerythrin (EPCAM-PE), (2) a nucleic acid dye, thioflavin, and (3) a leukocyte-specific CD45 (2D1) mAb conjugated to peridinin-chlorophyll-protein-Cy5.5 (CD45-PerCPCy5.5) were assessed for each event. Forward and side scatters were used for preliminary identification of cells (P1 gate) and to exclude debris. P2 was used to gate for nucleated (nucleic acid dye+) cells and the next two gates were used to select for EPCAM+/CD45-(P3 gate) and EPCAM+/nucleated cells (P4 gate). DTCs must be present within gates P1-P4, and were defined as EPCAM+/CD45-/nucleated cells. P5 was used to gate for marrow leukocytes (non-tumor controls), which are defined as CD45+/EPCAM-/nucleated cells.
DTC isolation. DTCs were isolated within 24 hours after bone marrow aspiration using the same procedure. Cells were sorted into reaction tubes containing appropriate solution for DNA or RNA profiling and then stored in -80°C until further processing ( Supplementary Figure 2A). In a subset of samples, replicate pools of DTCs from the same enriched sample were collected for parallel DNA and RNA profiling.
Quality control of samples. Quality control of DNA and RNA data was performed as previously Copy number analysis. The sample median absolute deviation (MAD) was calculated to estimate noise in individual aCGH data. Clones with segmented value equal to the median segment value of autosomal clones were considered to have a normal copy number. Copy number status of clones with segmented value that is one sample MAD higher than normal were considered as "gain", while those that had one sample MAD lower than normal were considered as "loss". A clone was considered to be amplified if its log2 ratio value was 4 sample MAD above the segmented value and was greater than the "normal" segment value by at least 0.75. To estimate the amount of the fraction of the genome altered (FGA), each clone was assigned a genomic distance equal to the sum of one half the distance between its center and that of its neighboring clones. The genomic distances of clones that were gained or lost were added to calculate the FGA. The frequency of gains or losses between primary tumors and DTCs was compared in a clone-wise manner. For each clone, a linear model was fit with the segmented value as the response variable and sample type as the predictor variable along with patient ID as a covariate. Q-values were then 3 computed to correct for multiple testing. To compare the extent of genomic aberrations between primary tumors and DTCs, a linear model was fit with the log-transformed fraction of genome altered (or gained or lost) as the response variable and sample type as the predictor variable along with patient ID as a covariate.
Mutation analysis of PIK3CA. PCR primers were designed to amplify the regions containing the complete exon 9 and exon 20 of the PIK3CA gene (Supplementary Figure 4). To increase specificity and prevent non-target amplification of the PIK3CA pseudogene on chromosome 22, a thymine nucleotide in the reverse primer for exon 9 was replaced with a locked nucleic acid (LNA) residue 3 . The PCR primer sequences were also designed to contain flanking universal primers M13F(-21) and M13R for sequencing.
The PCR conditions were optimized using breast cancer cell lines as controls: MCF7 and BT20, which carry the hotspot mutations E545K in exon 9 and H1047R in exon 20, respectively, as well as BT474, which carry the wildtype sequences for both exons.
The optimized PCR conditions were as follows: (1) initial denaturation step at 95 °C for 5 minutes, (2) followed by 34 cycles of denaturation at 95 °C for 30 seconds, primer annealing at 60 °C for 60 seconds, and elongation at 72 ° for 1 minute, and (3) a final elongation step at 72 °C for 5 minutes.
The PCR products were run on 2.5% agarose gel to confirm amplification. The amplicons were then sequenced using the Sanger method via a commercial sequencing facility. The sequencing traces were processed and analyzed using the ApE software. Both forward and reverse sequencing reactions were performed to confirm results.

ESR1 and ERBB2 status in DTCs.
To determine whether the assignment of ESR1 and ERBB2 status in DTCs based on the criteria: detection (Ct<36) or no detection (Ct≥36) was biologically consistent, we calculated the Log 10 RQs for ESR1 and ERBB2 in 15 of the 30 DTC samples with paired marrow leukocytes, and compared them with those of breast cancer cell lines with known ER and HER2 status. Using expression in cell lines as references, Log 10 RQ>1 was chosen as a cut-off for positivity (see Figure 5A and B in the main text). Cell lines studied included: BT474 (ER-positive, HER2-positive); MCF7 (ER-positive, HER2-negative); MCF7clone18 (ER-positive, HER2-positive [stably transformed to over-express ERBB2]); and SKBR3 (ERnegative; HER2-positive). Log 10 RQ values for cell lines were calculated using leukocytes from healthy donors (n=5) as a calibrator group. Comparison of the ESR1 and ERBB2 calls using these two approaches (based on Ct or Log 10 RQ) revealed good concordance (see main text).
Oncotype Dx score. We designed our aQPCR platform to include all the genes in the 21-gene recurrence score assay (Oncotype Dx, Genomic Health, Inc.) 4 . For data normalization, delta Cts were calculated by subtracting the mean Ct of the reference genes (see Supplementary Table 2) to the Cts of each of the remaining 16 genes in the signature. The delta Cts were converted to negative delta Ct so that higher values correspond to higher levels of expression. The oncotypedx function in the R package genefu (Computation of Gene Expression-Based Signatures in Breast Cancer) was then used to calculate recurrence scores (RS), which estimate the risk of distant recurrence 5 .