Background

Breast cancer (BC) is a heterogeneous disease, whose histopathological and molecular features significantly influence clinical evolution [1,2,3,4,5]. Besides the most common prognostic factors, such as the expression of hormone receptors (HR) and human epidermal growth factor receptor 2 (Her-2) status [6], multigene assays have been shown to support management and decision-making processes in selected early-stage patients [7, 8].

Over the past few years, clinical and molecular monitoring of BC has been facilitated by developments within liquid biopsy technologies, since both the number of circulating tumour cells (CTCs) and their features have been shown to give additional prognostic information in both early and advanced BC [9, 10], paving the way to novel experimental approaches which provide a real-time and dynamic picture of these malignancies and their evolution during treatment [11]. Several techniques for BC CTC isolation have been proposed, relying on their phenotypical, physical or metabolic properties [12], although CellSearch® is still the only FDA-approved methodology for CTC enumeration [9, 10]. Longitudinal monitoring of CTC count has also been shown to reveal the onset of acquired resistance to treatment as well as disease progression, in parallel with diagnostic standards [13]. Phenotypical and molecular characterisation of BC CTCs, based on the application of “omics” technologies, have also been attempted [11, 14,15,16] with the purpose to identify prognostic/predictive signatures and novel therapeutic targets [15, 17].

A number of studies investigating BC metastatic behaviour in search of specific “organotropism” signatures have been performed on both BC cell lines and primary tumour samples [18, 19], whereas the potential application of CTCs to this field of research has been only partially explored [20,21,22]. Indeed, with respect to the “bone homing” process, molecular panels including Kang’s [7], Savci-Heijink’s [23] and Cosphiadi’s [24] have been defined in primary BC, although none of them has entered routine clinical practice.

Based on the assumption that CTCs reflect not only the disease extent but also its molecular heterogeneity and evolution [2, 10], we attempted to define and evaluate a specific “osteotropism” gene signature in BC CTCs which could enable the identification of a subset of breast malignancies capable of leading to “bone-only” metastases.

We thus developed a targeted RNAseq assay to screen a panel of genes critically involved in the metastatic cascade. The panel adequacy was first confirmed on cell models, including sub-clones of the MDA-MB-231 BC cell line characterised by variable organotropism (towards “bone” or “lung”) [25]. Then, the targeted RNAseq of CTCs from stage-IV BC patients explored the correlation between distant metastasis sites and CTC gene expression profile (GEP).

Methods

Patients

Forty stage-IV BC patients, attending the Medical Oncology Unit of the University Hospital “Policlinico of Bari”, were enrolled. Eligible patients were adult (≥18 years) subjects with metastatic BC, either systemic-treatment-naive or experiencing radiological disease progression during systemic anticancer treatment. In the latter case, patients were enrolled at least 21 days after the last cycle of therapy. Personal history of other synchronous or metachronous malignancies represented an exclusion criterion. Before enrollment, patients underwent a full-body computed tomography (CT) and a bone scan to define sites of distant metastases. When deemed necessary by the clinician, a 18F-fluorodeoxyglucose (FDG)-PET-CT was also performed, in agreement with current guidelines [26]. Clinical and pathological data from all patients were collected and recorded in anonymised form.

CTC identification and isolation

Viable CTCs were purified from 15 ml of peripheral blood, as previously described [27]. Briefly, following pre-enrichment through immunomagnetic sorting (AutoMACS Pro, Miltenyi Biotech, Bergisch Gladbach, Germany), cell samples were incubated with a mixture of monoclonal antibodies (Abs), against epithelial or mesenchymal markers, conjugated with different fluorochromes.

CTCs were loaded into a DEPArray V2 dielectrophoretic system (Menarini, Silicon Biosystem, Castel Maggiore, Italy), entrapped in single cages under the effect of a dielectric field and identified by a fluorescence microscope. Thus, selected CTCs were moved to the parking area, recovered as pools of ten cells and stored at −80 °C to preserve nucleic acid integrity until subsequent molecular analyses.

Cell lines

In order to set up the RNAseq method, human cell lines were used, including the healthy mammary gland epithelium-derived MCF-10A (ATCC CRL-10317) and the triple-negative MDA-MB-231 (ATCC HTB-26) BC cell line (P0), which shows no selective organotropism in vivo [28]. P7 and lung metastatic (LM) sub-clones of P0 cells, exhibiting bone and visceral tropism, respectively [25], and established at the “University of Sheffield” by serial passages in murine models of BC, were also employed.

BC cells were cultured at 37 °C in 5% CO2, in Dulbecco’s Modified Eagle’s Medium (DMEM) containing 4.5 g/L glucose, ultra-glutamine I (Lonza, Verviers, Belgium) and 10% fetal bovine serum (FBS). MCF-10A cells were cultured at 37 °C in 5% CO2 in DMEM Nutrient Mixture F-12 medium containing 10% FBS, 1% penicillin/streptomycin, 4 mM glutamine, 10 ng/ml epidermal growth factor (EGF) (Peprotech, London, UK), 100 IU/ml insulin and 0.5 µg/ml hydrocortisone (Sigma-Aldrich, Milan, Italy). Cell cultures were screened for Mycoplasma contamination (MP0040; Sigma) prior to molecular characterisation.

Definition of a putative “osteotropism gene panel”

In order to identify genes to be included in the putative BC osteotropism signature, a review of research articles published until May 2019 was performed by using PubMed, Scopus, ISI-Web of Science and Google Scholar databases. The applied keywords were “breast cancer”, “osteotropism”, “skeleton invasiveness”, “bone metastases” and “metastases”. The identified genes were subsequently screened for their function, correlation with metabolic pathways and alterations in invasive malignancies using UNIPROT (http://www.uniprot.org/), GENECARD (http://www.genecards.org/), OMIM (http://omim.org/) and iPATH2 (pathway.embl.de/iPath2.cgi#) web tools. Finally, the selected genes (Supplementary Table 1) were submitted via web interface for primer pool design and synthesis using the proprietary Ion Ampliseq Designer algorithm (https://www.ampliseq.com/browse.action).

RNAseq

Preliminary experiments with cell lines were performed to standardise the RNAseq method to be used for CTCs. In order to obtain RNA amounts comparable to those deriving from CTC pools, suspensions of 1 × 106 cells, from each line, were serially diluted until obtaining ten cells that subsequently underwent RNAseq analysis.

Sequencing libraries were prepared by the Ion AmpliSeq™ Library Kit 2.0 as indicated in the Ion AmpliSeq™ RNA Library preparation user guide (Ion AmpliSeq™ Library Preparation, Quick Reference, Publication Number MAN0006735 Revision F.0). For all cell lines, ten-cell pools were lysed with 1 μl of the Reaction Buffer (19 μl of Lysis Buffer and 1 μl of RNase Inhibitor) (Takara Bio, Mountain View, USA) and incubated at 72 °C in a thermal cycler for 3 min. The lysed product was incubated at 50 °C for 10 min with 1 μl of SuperScript IV Reverse Transcriptase (SuperScript™ IV One-Step RT-PCR SystemKit, Thermo Fisher Scientific, Waltham, USA), 5 μl of Ion AmpliSeq™ HiFi Mix (Thermo Fisher Scientific) and 5 µl of the primer pool. The cDNA target amplification was performed at 98 °C for 15 s and at 60 °C for 4 min, increasing from 17 to 25 the number of cycles indicated in the “Amplify the Targets” section of the above-mentioned Ion AmpliSeq™ RNA Library preparation user guide, while the following steps were performed according to the manufacturer’s instructions.

Both the quality and quantity of libraries, purified by Agentcourt AMPure XP (Beckman Coulter, Indianapolis, USA), were evaluated by the Ion Library Taq Man Quantitation Kit (Life Technologies Carlsbad, California, USA) on the StepOne Plus system (Applied Biosystem, Foster City, California, USA). Finally, libraries were templated through the Ion OneTouch™ 2 System and Ion OneTouch™ ES, and sequenced on the NGS Ion Torrent PGM™ system by using Ion Torrent™ 318 chips. CTC RNAseq analyses were performed following the same protocol described for BC cell lines.

Analysis of RNAseq data

RNAseq data from both cell lines and CTCs were analysed by using the AmpliSeq RNA plugin available for Ion Torrent sequencing platforms and Partek Flow (Build version 9.0.20.0720; Partek Inc., St. Louis, MO).

Data analysis began with FASTQ files, with a single FASTQ file corresponding to each sequenced RNA sample and containing the information of sequenced reads as well as the quality score for each nucleotide. Once the quality of FASTQ files was checked, bases and reads with low quality were filtered out, and adaptors and barcodes were extracted from the data. Reads were aligned to the reference human genome (version hg19) and TMAP was used as an aligner.

The list of DEGs was obtained through the gene-specific analysis (GSA) method. Normalisation of read count was performed by the total number of counts (count per million) plus 0.0001, and all genes with less than ten normalised read counts were excluded from subsequent analyses. Each gene was associated with a relative (log2) fold change (FC) > 2, whose statistical significance was expressed in terms of P value. Only genes whose P value was ≤ 0.05 were considered differentially expressed.

Statistically significant DEGs were subsequently grouped in a hierarchical manner using the correlation distance and displayed in a heatmap. All correlation analyses were performed by calculating the Pearson coefficient and the adjustment for the multivariate analysis was conducted with the Benjamini and Hochberg method (false discovery rate, FDR < 0.25).

In both cell lines and CTCs, the analyses of DEGs were performed at the inter-group level, after classification of the samples according to the organotropism, for the former, and sites of distant metastasis, for the latter. The principal component analysis (PCA) method was adopted to visualise similarities and differences between the samples in the dataset while identifying potential outliers.

Once a list of ranked DEGs was obtained, Gene Ontology (GO) enrichment analysis was applied to annotate genes in classes or categories like “biological process”, “molecular function” and “cellular component” [29, 30], while the Kyoto Encyclopaedia of Genes and Genomes (KEGG) was used for pathway enrichment analysis [31]. In all cases, a P value < 0.05 was defined as a cut-off. Cytoscape (version 3.7.2) software [32] and its tool StringApp [33] were used for visualising networks and performing enrichment analysis by applying default parameters (confidence score cut-off 0.4).

Survival analysis

The potential prognostic meaning of the top-10 (4 ≤ FC ≤ −4) most deregulated genes, emerged from the comparison among CTC groups, was explored first in our patient series and then by applying the Molecular Taxonomy of BC International Consortium (METABRIC) dataset which includes 2509 primary breast tumours and matched clinical data [34,35,36]. In both cases, the association between putative biomarkers and overall survival (OS) was first explored by splitting the patients into “high” and “low” gene expression groups, and then visualised by Kaplan–Meier curves, plotted through “survminer” and “survival” packages of R software (v. 3.6.1), according to the relative log-rank P. In a similar fashion, we explored in our cohort the prognostic role of the above-mentioned genes with respect to the “time-to-BM diagnosis” and the “time-to-first skeletal-related event (SRE)” outcomes.

Results

Patients and CTC enumeration

Patients’ clinicopathological information is summarised in Table 1. As shown, 10 out of 40 subjects were systemic-treatment-naive, while the others were experiencing disease progression at the time of recruitment. The median number of viable CTCs from the whole cohort was 50 (range 10–110). Neither clinicopathological features nor the number of previous systemic treatments exhibited a significant correlation with the count of isolated CTCs (Spearman’s correlation coefficient between −0.30 and 0.14; data not shown). The majority of patients harboured also CTC clusters (Table 1), defined as circulating multicellular aggregates made up of ≥2 cells with distinct nuclei, and including at least one CTC [37, 38]. As expected, a significant correlation between the total number of viable CTCs and cluster count emerged (Spearman’s coefficient 0.34, P = 0.038).

Table 1 Clinicopathological features of enrolled patients.

With regard to metastatic disease, at the time of CTC collection 7 patients (17.5%) exhibited bone-only metastases (defined as “BM”), 22 subjects (55.0%) had both skeletal and extra-skeletal metastases (BM + ES) and 11 patients (27.5%) presented with metastatic disease in sites other than bone (ES) (Table 1).

Assay set-up on BC cell lines

The literature review process described in the “Methods” led to the identification of 134 genes involved in several biological processes and functions, such as the epithelial-to-mesenchymal transition (EMT), angiogenesis, cell adhesion and motility, cell–cell signalling, intracellular signal transduction, remodelling of the extracellular matrix, modulation of immune response and immune escape (Supplementary Table 1).

CTC GEP analysis was first set up on MCF-10A and BC cell lines. The transcriptome heatmap of unsupervised hierarchical clustering of cell lines, based on normalised read counts, showed that “healthy” cell samples successfully separated from BC ones, while P7 clearly diverged from LM cells (Supplementary Fig. 1), validating the gene panel adequacy. Such divergences were further confirmed by the differential gene expression analysis, performed to identify deregulated genes among P0 and both of its sub-clones (Supplementary Table 2). Moreover, GO enrichment analysis applied to DEGs identified a number of significantly deregulated processes in osteotropic P7 cells compared to LM including, among upregulated ones, connective tissue development, cartilage development and ossification (Supplementary Table 3).

Targeted RNAseq of CTCs reveals metastasis site-related GEP

CTCs isolated from stage-IV BC patients were analysed in their GEP by using the same experimental approach described for cell lines. Due to the low quality of bases and reads emerging from the FASTQ file check, patients #29 and #40 were excluded from subsequent analyses.

The PCA, performed to evaluate the contribution of the transcript levels to CTC clustering, demonstrated a separation of “BM” CTCs from the remaining groups, namely “BM + ES” and “ES” (Fig. 1a), with the exception of one sample (#33) derived from a patient who, at the time of enrollment, had suspicious sub-centimetre lung nodules deserving close follow-up.

Fig. 1: Targeted RNAseq of CTCs reveals metastasis site-related GEP.
figure 1

a The PCA, performed to visualise similarities and differences among CTC samples, showed a clear-cut separation of “BM” CTCs (in blue, N = 7) from the others (in yellow: “ES” CTCs, N = 10; in red: “BM + ES” cells, N = 21). In the three-dimensional PCA plot, each sample is represented as a sphere; the closer the spheres in the spreadsheet, the higher the similarity among CTC GEP. b Volcano plot of differential gene expression analysis performed by comparing “BM” vs “ES” CTCs. c The represented heatmap shows the expression profile of the 31 deregulated genes identified by comparing “BM” vs “ES” CTCs (−2 < FC < 2, P ≤ 0.05, FDR < 0.25). BM   bone metastases only, BM + ES   bone and extra-skeletal metastases, CTCs circulating tumour cells, ES   extra-skeletal metastases, FC   fold change, FDR   false discovery rate, GEP   gene expression profile, PCA   principal component analysis.

By taking into account a −2 ≤ fold change (FC) ≥ 2 and a false discovery rate (FDR) threshold of 0.25, 31 DEGs were identified in “BM” CTCs compared with those from “ES” subjects (Table 2a), as shown by the volcano plot and the heatmap in Fig. 1b, c). Moreover, 24 DEGs emerged from the comparison between CTC samples belonging to “BM” and “BM + ES” patients, among which 6 were found upregulated and 18 downregulated in the former (Table 2b and Supplementary Fig. 2A), while no significantly deregulated genes were identified in CTCs from “BM + ES” versus “ES” patients (Supplementary Fig. 2B).

Table 2 Lists of DEGs obtained comparing the GEP of BC CTCs.

According to our preliminary categorisation of gene functional classes (Supplementary Table 1), the majority of these DEGs belonged to EMT, Wnt/β-catenin signalling, extracellular matrix remodelling and cell motility categories.

Interestingly, among the 31 deregulated genes observed in “BM” samples (compared to “ES” ones), CAPG (FC: 30.79; P value 1.73E-02), IL1B (FC: 5.37; P value: 2.46E-02), MAF (FC: 4.39; P value: 2.38E-02) and GIPC1 (FC: 2.43; P value 3.06E-02) were found overexpressed in CTCs derived from patients with skeletal relapse, in agreement with previous data describing the upregulation of these markers in osteotropic primary breast tumours [25, 39, 40].

In order to gain a deeper knowledge about the played biological functions and the reciprocal interactions existing among the identified DEGs, a gene regulatory network was constructed by using the Search Tool for the Retrieval of Interacting Genes database (STRING). Given a list of the 31 DEGs as input, this database assembled the protein–protein interactions (PPI) network and their topological information was visualised by Cytoscape. Figure 2a shows the PPI network of all DEGs (PPI enrichment in the amount of 3.97E-13) consisting of all proteins hierarchically located and interactions among them.

Fig. 2: Functional enrichment analysis of deregulated genes resulting from the comparison of “BM” vs “ES” CTCs.
figure 2

In (a), the PPI network obtained by StringApp Cytoscape application (confidence score cut-off = 0.4; maximum additional interactors = 0; query default set-up) is shown. b The bar plot in this figure represents the most significantly enriched pathways of DEGs that emerged from KEGG analysis. BM   bone metastases only, CTCs   circulating tumour cells, DEGs   differentially expressed genes, ES   extra-skeletal metastases, FDR   false discovery rate, KEGG   Kyoto Encyclopaedia of Genes and Genomes, PPI   protein–protein interaction.

Afterwards, genes on PPI network underwent GO enrichment analysis which showed that several upregulated DEGs found in “BM” CTCs (e.g. MAPK1, SOX9, IL1B, FGFR4, COL3A1, FGF5 and MEF2C) were enriched with greater statistical significance in biological processes related to cell signal transduction and proliferation including “cell surface receptor signalling pathway”, “regulation of intracellular signal transduction” and “MAPK cascade” (Table 3). On the other hand, downregulated DEGs (SMAD2, FGFR3, HMGA2 and MCM2) were enriched in different biological processes correlated with bone rearrangement, such as “chondrocyte differentiation” and “chondrocyte proliferation”, “mesoderm formation”, “skeletal system development”, “skeletal system morphogenesis”, “mesenchyme development” and “cell differentiation” (Table 3). Notably, in accordance with GO analysis, KEGG pathway analysis suggested that significantly upregulated genes were mainly enriched in key proliferative and bone-related processes, namely “MAPK signalling”, “regulation of actin cytoskeleton”, “PI3K-Akt signalling”, “Ras signalling”, “osteoclast differentiation” and “breast cancer” (Fig. 2b).

Table 3 GO enrichment analysis of DEGs emerged from the comparison between “BM” vs “ES” CTCs.

Survival analysis

Based on RNASeq data emerging from “BM vs ES” CTC comparison, we arbitrarily defined a top-ten group of deregulated genes by considering their FC values (4 ≤ FC ≤ −4). Thus, the potential prognostic meaning of these genes (i.e. CAPG, HRAS, IL1B, FGFR4, MAF, SERPINB2, CTSK, ANLN, MCM2 and HMGA2) was explored in both metastatic and early BC.

First, the enrolled patients were included in two different groups namely “altered” (N = 24) and “not altered” (N = 14), according to the presence (or not) of at least one top-ten gene deregulation in CTCs. Kaplan–Meier curves in Supplementary Fig. 3 show the lack of a significant correlation between such patient classification and median OS, calculated from either the time of BC diagnosis or distant metastasis onset, until death from any cause or last follow-up visit. A non-significant difference between the two groups was also found in terms of median time-to-BM onset (altered = 56 months, not altered = 108 months, P = 0.28) (Supplementary Fig. 3) and median time-to-first SRE (altered = not reached, not altered = 5 months, P = 0.16) (Supplementary Fig. 4). Once we tested patient dichotomisation according to the presence of at least two top-ten gene deregulations (≥2 group = 14 patients; 0–1 group = 24 patients), we did not find significant correlations between CTC GEP and the above-mentioned outcomes (data not shown).

Then, METABRIC dataset [27,28,29] was employed to investigate the potential prognostic meaning of the top-ten gene expression alterations. We focused on BC patients whose primary tumours had been screened for the expression of the ten genes of interest (N = 481). Since only nine stage-IV BC patients were included in this cohort, we decided to focus on the early-stage population (stage I–III, N = 472) for subsequent analyses. By applying the above-mentioned criterion to assign patients to either “altered” or “not altered” group, a statistically significant longer median survival emerged in the “altered” group of early-stage patients, compared to “not altered” one (199 vs 112 months, P = 0.014) (Fig. 3).

Fig. 3: Survival probability analysis based on METABRIC BC dataset.
figure 3

The expression of the top-ten most deregulated genes (CAPG, HRAS, IL1B, FGFR4, MAF, SERPINB2, CTSK, HMGA2, ANLN, MCM2) obtained from the comparison of “BM” vs “ES” CTCs was used to stratify stage I to III METABRIC BC patients (N = 472) in “altered” (red) and “not altered” (blue) groups. Data in cBioPortal were analysed to generate OS tables and create Kaplan–Meier curves, plotted through “survminer” and “survival” packages (v. 3.6.1) of R software. A statistically significant longer median survival was observed in “altered” group, compared to “not altered” one (199 vs 112 months, P = 0.014). BM   bone metastases only, CTCs circulating tumour cells, BC   breast cancer, ES   extra-skeletal metastases.

No information about the sites of distant metastases was available in METABRIC dataset, for which we could not speculate about the BM-predictive capability of our gene panel. Results from this analysis suggest that the identified top-ten gene deregulations might successfully select a subset of early breast malignancies with better prognosis, whose organotropism might deserve further, prospective evaluation.

Discussion

BC is one of the most osteotropic malignancies, with ~14% of women with early-stage disease experiencing subsequent skeletal dissemination [41] which occurs in up to 70% of patients with advanced tumours.

Several research groups have looked for putative prognostic factors, including clinicopathological features and molecular signatures, able to stratify BC patients according to the risk of future skeletal involvement [19], with the purpose to personalise treatment and follow-up strategies in consideration of the favourable results of adjuvant bisphosphonate studies [42,43,44]. Despite the extensive research on this topic, none of the identified signatures has entered routine clinical practice to date, but most studies were based on the examination of primary or secondary tumour-derived samples, with consequent limitations related to the intrinsic spatio-temporal heterogeneity of cancer [2, 45].

Recently, the identification of BC “organotropism” signatures, as well as novel prognostic and predictive biomarkers, has been pursued through the phenotypical and molecular characterisation of CTCs, which dynamically reproduce cancer features and their variations over time [2, 10].

By applying a previously described protocol to isolate viable CTCs from metastatic BC patients [27], in this study we performed a targeted RNAseq of these cells, aiming at the identification of a GEP specifically correlated with BM onset.

In a study by Aceto et al. [22], CTCs derived from patients with metastatic HR+ BC underwent RNAseq, to unravel mechanisms involved in acquired endocrine resistance, showing activation of the androgen receptor pathway in CTCs from “bone-predominant” BC. In the present study, we focused on patients with “bone-only” metastases, regardless of the tumour HR status, assuming that molecular osteotropism signatures might be at least partially shared across BC sub-groups [18].

Moreover, to develop a cost-effective method, potentially applicable to large-scale analyses, we developed a targeted RNAseq protocol, based on a literature-derived gene panel, whose adequacy was preliminarily verified on BC cell lines with different organotropism, observing a clear clustering of the samples according to their biological behaviour. Interestingly, we found MMP1, FST and GIPC1 significantly upregulated in the bone-homing cell population (P7), as compared to parental P0 cells, in agreement with previous findings and proteomics data [7, 39, 46].

We then focused on CTCs, grouped according to the sites of distant metastases detected at the time of patient enrollment, and analysed as 10-cell pools, to overcome the drawbacks of single-cell gene expression analyses, such as insufficient RNA quantity and quality [47,48,49], as well as issues related to intra-individual heterogeneity [50].

As expected [51], we did not find a perfect overlap between BC cell lines and CTC GEP but, notably, the PCA plot showed a sharp separation of “BM” CTCs from the others. Moreover, 31 DEGs emerged from the comparison of “BM” vs “ES” CTCs, such as CAPG, GIPC1, IL1B and MAF, whose overexpression in osteotropic BC emerged from previous studies on wide patient series [25, 39, 40]. Major functional classes including deregulated genes were the “EMT” one, which encompassed 8 out of 31 genes (i.e. MAF, MAFA, GIPC1, PRDX1, MEF2C, SOX9, SMAD2, HMGA2) and the Wnt/β-catenin signalling (including 5 out 31 genes), whose direct and indirect participation in the BM cascade has been widely reported [18, 52,53,54,55,56,57,58].

With respect to KEGG enrichment analysis, different pathways were found upregulated in “BM” CTCs, including those involved in osteoclast differentiation, which is a key step of the BM “vicious circle” [18, 52]. Notably, one of the overexpressed genes found in “BM” versus “ES” CTCs was CTSK, whose role in osteoclast differentiation is well documented [59].

In the last part of the present work, we attempted to find out any potential prognostic meaning of the identified panel, focusing on the top-ten most deregulated genes emerged from “BM vs ES CTCs” comparison. Patient classification according to the presence (“altered”), or not (“not altered”), of at least one gene deregulation in matched CTC samples, did not exhibit a significant correlation with median survival, although this analysis might have been limited by the small sample size and the heterogeneity of our patient cohort, especially in terms of metastasis-free-interval and number of previous treatment lines [60].

Hence, we moved to the METABRIC dataset [34,35,36], in which integrated transcriptomic/genomic data relative to 2509 primary breast tumours are publicly available, together with matched long-term clinical follow-up data. Interestingly, in stage I–III BC subjects whose primary tumours had been screened for the expression of the top-ten genes, we observed a significantly longer median survival in the “altered” group, as compared to the “not altered” one. This suggests that the combination of the above-mentioned genes in a single panel might provide prognostic information in early BC, although we could not speculate about its capability to predict the organotropism of such malignancies, for which prospective, long-term investigation in early BC patients is required. Moreover, functional analyses exploring the role of the top-ten genes in the metastatic cascade are planned in our laboratory, and will certainly be useful to clarify the apparently discrepant results obtained in metastatic versus early-stage patients. In addition, we are aware that the biological source of METABRIC data is different from the one we employed and, hence, future comparative analyses between primary tumour and CTC samples are mandatory.

With respect to transcriptomic data, we recognise that RNAseq performed on small amounts of nucleic acids, as in the case of CTCs, raises several issues due to both technical limitations and biological variables (e.g., biases of transcript coverage, low capture efficiency and sequencing coverage, stochastic transcription, high drop-outs and bursting events) [61, 62]. For these reasons, when analysing RNAseq data from CTCs it is crucial to use appropriate computational methods to overcome the difficulties in normalisation and DEG identification. However, to date, none of the currently used pipelines for RNAseq data analysis is able to simultaneously satisfy sensitivity and specificity constraints [63]. Some recent studies have attempted to compare different DEG software tools, concluding that those methods exhibiting high sensitivity are often less accurate, due to the presence of false positives; on the other hand, the highly specific tools generally identify less DEGs and are limited by higher false-negative rates [61, 63,64,65,66].

In our work, in order to reduce the chances of false positives, we set the parameters to a FDR (used in the hypothesis testing to correct type I error) <0.25. Furthermore, for DEG analysis we used Partek® Flow® GSA, capable of considering the following response distributions: Normal, Lognormal, Lognormal with shrinkage, Negative Binomial, Poisson, and ANOVA (https://documentation.partek.com/display/FLOWDOC/Gene-specific+Analysis). In addition, an a posteriori analysis performed using the DESeq2 tool, also contained in the Partek® Flow® package, has independently validated our previous results confirming the top-10 (4 ≤ FC ≤ −4) most deregulated genes among DEGs, with the exception of CTSK.

However, it has to be underlined that DESeq2 is specifically designed and commonly used for bulk RNAseq data but, once applied to CTCs, it tends to have greater specificity at the expense of sensitivity [65]. Moreover, it is well recognised that RNAseq data derived from small nucleic acid amounts might miss a fraction of transcripts, resulting in zero read counts, due to stochastic gene expression and low capture efficiency, affecting the subsequent differential gene expression analysis [61, 62]. Previous studies on this topic even claim that the missing rate can reach nearly 30%, resulting in a loss of valuable information [62].

For instance, we were surprised to find no significant differences in the expression of CXCR4, MMP1, CTGF and IL11 genes between “BM” and “ES” CTCs, in consideration of the previous literature supporting their role in BC osteotropism [6, 45]. Hence, our data analysis could have incorporated a certain rate of errors due to false signals, but gold-standard computational tools have yet to be developed.

In conclusion, despite the above-mentioned limitations of RNAseq, this method is still one of the most widely used technologies for gene expression profiling [67] and, for this reason, we deemed it appropriate for investigating BC osteotropism, in line with previous reports [15, 16, 21, 47].

However, rather than performing a more expensive whole trascriptome RNAseq, we adopted a targeted analysis that encompasses a selected number of genes. This strategy might render CTC GEP investigation more feasible, economically sustainable, and even applicable to a clinical setting, although a prospective validation, over a wider patient cohort involving early BC patients to be longitudinally monitored, is mandatory before moving from the bench to bedside.