Introduction

In cancer management, multiple imaging modalities such as computed tomography (CT), magnetic resonance imaging (MRI), positron emission tomography (PET), and single photon emission computed tomography (SPECT) are often prescribed for tumor detection, staging, and characterization. As a result, the collective imaging data are information rich and can be extracted for in-depth analysis. Recent advances in radiomics have demonstrated the power of transforming imaging data into multi-dimensional mineable radiologic features1,2 that are relatable to gene expression pattern3,4,5 and have significant predictive/prognostic power.3,6,7,8 However, determining the optimal use of multi-modality radiomic features to correlate with disease phenotypes, molecular characteristics, and disease prognosis remains an open problem. While radiomic features from anatomical images, such as CT, have shown significant potential in predicting survival outcome, and in associating with clinical and genomic features of various cancers,2,3,9 there are few studies investigating radiomics derived from molecular imaging modalities such as PET/CT.10,11,12,13 There are even fewer studies of radiomics for the same disease across imaging modalities such as PET and MRI.14 The added value of these multiple-order and multiple-dimension image features remains largely unknown. In our study, we carefully investigated the association of higher-order image features from PET and MRI with breast cancer phenotypes and prognosis. The association between the unsupervised clusters of radiomic features and outcome data was evaluated using χ2 test of independence. The pairwise relationships between PET and MRI radiomic features and breast cancer outcome were determined by Spearman’s rank correlation coefficients (ρ) and proportion of variance explained by the predictor from multiple regression (\(r_{{\mathrm{mreg}}}^2\)) for ordered and unordered clinical outcome, respectively. In addition, we also examined the predictive performance of radiomic features to recurrence-free survival (RFS) of up to 5 years following imaging and tumor grade.

Results

Study cohort

This retrospective study included 113 patients diagnosed with breast cancer. The median patient age at diagnosis of primary tumor was 49 (range 21–96). Patient and tumor characteristics are summarized in Table 1.

Table 1 A summary of patient demographic characteristics is shown

Unsupervised tumor and feature clustering

For consensus clustering based on PET and MRI radiomic features, the number of clusters that consistently generated the largest change in the area under consensus cumulative distribution function (CDF) was 3. Table 2 gives a summary of χ2-test of independence statistics and cluster consensus for all breast cancer outcomes.

Table 2 A summary of χ2 test statistics (p-value and Cramer’s V), median cluster consensus (CC), and the optimal clustering algorithm is listed to describe the degree of association between the patient clusters with a given clinical feature

Association of radiomic features with breast cancer outcome

The unsupervised clustering based on both PET and MR radiomic features in Fig. 1a shows that the tumor clusters were statistically and significantly associated with tumor grade (p = 2.02 × 10−6, χ2-test). Figure 1b indicates that 57.8% of tumor cluster I consisted of poorly-differentiated tumors (high tumor grade) while tumor clusters II and III were each associated with more differentiated tumors (lower tumor grade). We observed a strong PET image feature pattern among tumor clusters for deciphering tumor grade. Tumor overall stage was statistically significantly associated with the tumor clusters (p = 0.037, χ2 test) in Fig. 2a. Figure 2b shows that 50.0% of tumor cluster II were stage 2 tumors while 42.5% of tumor cluster I consisted of stage 0 tumors and 38.5% of tumor cluster III were stage 3 tumors. Figure 3a shows that the breast cancer subtypes were statically significantly associated with the radiomic feature pattern of PET and MR images (P = 0.0085, χ2 test). Figure 3b, c indicate that 76.6% of tumor cluster I were HR+/HER2+(Luminal B) and triple-negative tumors while 65.0% of tumor cluster III consisted of the HR+/HER2− (Luminal A) tumors and 25.0% of the HER2+ tumors were found in tumor cluster II. In addition, the tumor clusters were statistically significantly associated with whether the disease would recur, not recur, or was never disease free (P = 0.0053, χ2 test). In Fig. 4c, 80% of the patients who were never disease free were found in tumor cluster III.

Fig. 1
figure 1

PET and MR radiomics vs. tumor grade heatmap. a A heatmap of the PET and MR radiomic features is shown with the corresponding tumor grade and the tumor clusters resulted from the optimized consensus clustering. Each column represents a tumor and each row represents a radiomic feature. The PET and MR radiomic features are shown as z-scores. b The proportion of different grade tumors is shown for each tumor cluster. The frequency is shown with respect to the total number of tumors in each tumor cluster category. c The proportion of different tumor clusters is shown for each tumor grade category. The frequency is shown with respect to the total number of tumors in each tumor grade category

Fig. 2
figure 2

PET and MR radiomics vs. tumor overall stage heatmap. a A heatmap of the PET and MR radiomic features is shown with the corresponding tumor overall stage and the tumor clusters resulted from the optimized consensus clustering. b The proportion of different tumor overall stages is shown for each tumor cluster category. The frequency is shown with respect to the total number of tumors in each tumor cluster category. c The proportion of different tumor clusters is shown for each tumor overall stage category. The frequency is shown with respect to the total number of tumors in each tumor overall stage category

Fig. 3
figure 3

PET and MR radiomics vs. breast cancer subtype heatmap. a A heatmap of the PET and MR radiomic features is shown with the corresponding breast cancer subtype and the tumor clusters resulted from the optimized consensus clustering. b The proportion of breast cancer subtypes is shown for each tumor cluster. The frequency is shown with respect to the total number of tumors in each tumor cluster category. c The proportion of different tumor clusters is shown for each breast cancer subtype. The frequency is shown with respect to the total number of tumors in each breast cancer subtype category

Fig. 4
figure 4

PET and MR radiomics vs. disease recurrence status heatmap. a A heatmap of the PET and MR radiomic features is shown with the corresponding disease recurrence status and the tumor clusters resulted from the optimized consensus clustering. b The proportion of different disease recurrence categories is shown for each tumor cluster. The frequency is shown with respect to the total number of tumors in each tumor cluster category. c The proportion of different tumor clusters is shown for each disease recurrence category. The frequency is shown with respect to the total number of tumors in each disease recurrence category

Primary tumor stage (T-stage) and lymph-node stage (N-stage) did not reach statistical significance for their association with the radiomic features (p = 0.19, 0.14, respectively, χ2 test). In addition, there was no evidence of association between the tumor clusters and tumor histology (p = 0.084, χ2 test). The association between the tumor clusters and the anatomical site of disease recurrence was not conclusive based on the data considered in this study (p = 0.28, χ2 test).

Pairwise relationship of radiomic features with breast cancer outcome

Figure 5a indicates that the first-order statistics of PET image entropyHIST and PET-derived GLCM dissimilarity, entropyGLCM, and difference average, and difference entropy were estimated to be positively correlated with tumor grade. The first-order statistics of PET image uniformity and PET-derived GLCM maximum probability, energyGLCM, homogeneity, and inverse variance were negatively correlated with tumor grade (|ρ|≈ 0.48). There was no correlation (ρ > 0.4) between the PET or MR radiomic features and T, N, or overall stage.

Fig. 5
figure 5

Pairwise relationship of radiomics with breast cancer outcome. a A heatmap of Spearman’s rank correlation coefficients (ρ) between the PET and MR radiomic features and the ordered clinical outcome is shown. Only the radiomic features with |ρ| > 0.2 are displayed. b A heatmap of proportion of variance from multiple regression (\(r_{{\mathrm{mreg}}}^2\)) between the PET and MR radiomic features and the unordered clinical outcome is illustrated. Only the radiomic features with \(r_{{\mathrm{mreg}}}^2\) > 0.04 are shown

Figure 5b displays PET image texture features of difference average, difference entropy, dissimilarity, sum average, and PET SUVmean and SUVmax (\(r_{{\mathrm{mreg}}}^2 \approx\) 0.10) contributed to the variance seen in the feature values among the breast cancer subtypes. For recurrence-free survival, Fig. 5b indicates that the first-order statistics of MR image mean and minimum and MR-derived GLCM average intensity, sum average, difference average, and dissimilarity (\(r_{mreg}^2 \approx\) 0.10) contributed to the feature variance between the patient groups who were and were not disease free within 2–5 years. We also found that MR-derived GLCM IDMN, MR-derived GLCM IDN, and PET-derived GLCM cluster prominence (\(r_{mreg}^2 =\) 0.9–0.12) had contribution to the feature variance between the recurrence-free patient groups within 1 year. A summary of Spearman’s rank correlation coefficients and proportion of variance from multiple regression were reported for all PET and MR image features and the clinical outcome in the supplemental Tables 1 and 2.

Radiomics exploratory study with small sample size

Based on 8 patients, supplemental Fig. 1 suggests that MR-derived uniformityHIST (ρ = 0.67) and tumor surface-to-volume ratio (ρ = 0.71) were positively correlated with Oncototype DX score while MR-derived entropyHIST (ρ = −0.67) and GLCM autocorrelation (ρ = −0.64) were negatively correlated with Oncotype DX score. In addition, supplemental Figs. 2 and 3 shows PET radiomics of the primary tumor was consistent and associated with that of the recurrent tumors for 6 out of 8 patients.

Radiomic-based classification of recurrence-free survival (RFS) and tumor grade

Figure 6 shows a heatmap of the nested cross-validation performance of several classification algorithms at predicting RFS. The nested cross-validation shows that logistic regression with ElasticNet regularization and L1 regularization display the highest predictive performance with a mean AUC of 0.74 (95% CI = [0.62, 0.88] and [0.61, 0.89], respectively) for predicting recurrence-free survival in 1 year. For ease of algorithm interpretability, we selected ElasticNet logistic regression in this study for classifying RFS. The ElasticNet logistic regression has lower predictive performance at predicting recurrence free in 2 years with a mean AUC of 0.68 (95% CI = [0.58, 0.81]). The ElasticNet logistic regression using all PET and MR radiomics generated a mean AUC of 0.67 (95% CI = [0.58, 0.78]), 0.64 (95% CI = [0.55, 0.75]), and 0.57 (95% CI = [0.47, 0.68]) at distinguishing patients being recurrence free in 3, 4, 5 years, respectively. In predicting tumor grade, logistic regression with L2 regularization and Lbfgs, Newtoncg, or Sag solver was found have the highest predictive performance with a mean AUC of 0.76 (95% CI = [0.72, 0.83]).

Fig. 6
figure 6

Heatmap of the predictive performance of radiomics to breast cancer outcome. A heatmap depicts the classification performance in AUC and 95% confidence interval for several classification algorithms at predicting recurrence-free duration of 1–5 years and tumor grade. SVM denotes support vector machine. The classification name for logistic regression is defined as [Reg][Solver]LogReg, where [Reg] specifies the regularization scheme and [Solver] is the solver algorithm. For example, L1LiblinearLogReg denotes logistic regression with L1-regularization using Liblinear solver

Table 3 listed the PET and MR radiomic features that are dominant in predicting RFS and tumor grade using the optimal logistic regression algorithm. The key radiomic features for predicting RFS in 1 year are the MR-derived GLCM IDN, MR-derived GLCM IDMN, and the PET-derived GLCM cluster prominence. The radiomic features that were consistently dominant in predicting RFS are the MR-derived GLCM sum average, MR-derived GLCM average intensity, MR minimum intensity, MR-derived GLCM IDN, and PET-derived GLCM cluster prominence. The key radiomic features for predicting tumor grade consisted of mostly PET-derived GLCM features such as inverse variance and homogeneity along with PET-derived first-order statistics of PET SUVmean.

Table 3 The feature importance of the repeated nested cross-validation with optimal logistic regression algorithm with PET and MR radiomic features set is summarized

Discussion

Higher-dimensional radiomic features were successfully extracted from both 18F-FDG PET and MR images among patients diagnosed with breast cancer. In this study, radiomics were clustered in an unsupervised fashion; in other words, the clustering algorithm had no prior knowledge of the tumor phenotypes and disease outcome. The unsupervised learning allowed exploration of any potential relationship between the PET and MRI radiomics to breast cancer phenotypic behaviors and disease prognosis. We found statistically significant association of the PET and MR radiomics clusters with breast cancer tumor grade, which was previously reported to have prognostic value for disease survival rate.15 Among those radiomic features positively associated with breast cancer tumor grade were the first-order statistics of PET image entropyHIST and SUVvar and the PET-derived GLCM features including dissimilarity, entropyGLCM, difference average, different entropy, and cluster prominence and tendency. Among those radiomic features negatively associated with breast cancer tumor grade were the first-order statistics of PET image uniformity and PET-derived GLCM maximum probability, energyGLCM, homogeneity, and inverse variance (|ρ| ≥ 0.45). This finding suggests that 18F-FDG PET images large in asymmetry (high cluster prominence and tendency), large in 18F-FDG uptake texture variation (high dissimilarity and entropyGLCM and low texture energyGLCM) could be predictive of poorly differentiated breast cancer. In addition, the PET and MR radiomics were found to be associated with breast cancer subtypes. In a study of 84 cases, Li et al., 201616 found that the enhancement texture from the first post-contrast MR images were highly correlated to the molecular subtypes of breast cancer (normal-like, luminal A and B, HER2-enriched, and basal-like). This study suggests that PET and MR images with large texture variation (large difference entropy and dissimilarity) along with PET SUVmax and MR peak enhancement could be predictive of breast cancer subtypes. The finding not only confirmed the result in Li et al., 2016,16 but also added predictive potential of PET and MR radiomics over MR radiomics alone. Furthermore, breast cancer consists of several tumor subtypes and MRI phenotypes including unicentric mass, multilobulated mass, area enhancement with and without nodularity and septal spreading,17 which could explain the correspondence between large image texture variation and breast cancer subtypes.

Our study also investigated the predictive performance of PET and MR radiomics for breast cancer recurrence free status and tumor grade. Instead of using 900+ radiomic features such as gray level size zone matrix features and wavelet-based features reported in previous studies,3,14,18 we extracted a limited number of radiomic features from both PET and MR images, which provided a more succinct number of features (84) considering the limited sample size (N = 85) in this study. Even though we extracted the same type of radiomic features from both PET and MR images, the multi-modality radiomic features were able to provide additional information since PET and MR images captured different intrinsic information of tumor biology. Figure 5b shows that MR-derived GLCM IDMN and IDN, and PET-derived GLCM cluster prominence were highly correlated with 1-year RFS. Similarly, MR-derived GLCM IDN and IDMN emerge as key features for predicting patient 1-year RFS (highest AUC from the ElasticNet logistic regression). In addition, MR mean and minimum intensity, MR-derived GLCM average intensity, MR-derived GLCM sum average (\(r_{mreg}^2 =\) 0.09–0.10), and PET-derived GLCM cluster prominence (\(r_{mreg}^2 =\) 0.04–0.05), which were among the features moderately correlated with RFS at 2–5 years, would likely play an important role in RFS prediction. In a previous study,19 tumor size and enhancement texture from DCE-MR images were effective at distinguishing the risk of breast cancer relapse and are also confirmed in this study. In addition, this study shows that PET-derived GLCM features such as inverse variance and homogeneity were the key predictors of tumor grade, confirmed by the univariate analysis (|ρ| = 0.48) and the nested cross validation. These PET-derived GLCM features were ranked above the first-order PET image statistics such as PET SUVmean from nested cross validation of tumor grade classification. Therefore, a combination of PET and MR radiomics (both 1st-order statistics and GLCM features) could be more useful as prognosticator of breast cancer. Furthermore, feature selection for predictive performance may be more effective in our study due to the cross-validation process we used rather than depending heavily on the correlation coefficients from the pairwise univariate analysis.

There are limitations to this study. Some factors may affect the different outcome between the PET and MRI radiomics, including the fact that PET and MR images capture intrinsically different biological and physiological mechanisms. The purpose of the study was to determine, not to compare, the predictive power of the PET and MRI radiomics. Furthermore, the PET and MR images were resampled to the same isotropic voxel size for consistent image analysis. However, the image voxel upsampling likely introduced image interpolation effects, which may affect the accuracy of radiomic features in measuring image information. In addition, the cross-validation was conducted with different machine learning algorithms for the initial predictive performance. The dataset used for this paper was limited by size for a study of this scope. For future studies, we plan to obtain an independent image dataset to validate our current findings and thereby further evaluate the value of image radiomics in predicting disease prognosis. We hope to expand the dataset used in Supplement Fig. 1 to investigate the role of PET and MR radiomics in predicting breast cancer specific genomics. The difference in PET radiomics between the primary and recurrent tumors (patient # 25 and 116 in Supplemental Figs. 2 and 3) will be further investigated with larger dataset as a key predictor for the course of treatment for recurrent disease.

In summary, we investigated the benefit of PET and MRI radiomics in deciphering breast cancer phenotypes and disease prognosis. As an initial explorative investigation, this study demonstrated the potential value of PET and MR image-derived radiomics in characterizing tumor phenotypes using unsupervised clustering analysis. In particular, we determined that breast cancer tumor grade and breast cancer subtypes can be well characterized by the PET-derived GLCM features and 1st-order statistics. We found that and 1st-order image statistics and image texture features of the first post-injection DCE-MR image and PET images have high potential for predicting recurrence-free survival of breast cancer and tumor grade. Findings from data exploration and initial predictive performance evaluation provide optimism for eventual construction of an effective predictive model based on both PET and MRI radiomics for improved personalized disease management and treatment planning.

Methods

Image datasets

This study was a retrospective study of medical records and medical images and qualified as exempt by the UCSF Institutional Review Board. We identified all patients who were diagnosed with invasive breast cancer between January 1st, 2005 and December 31st, 2009 and underwent both breast dynamic contrast-enhanced (DCE) MR imaging and whole-body 18F-Fluorodeoxyglucose (18F-FDG) PET acquired as PET-CT examinations at different time at UCSF. All imaging studies were acquired prior to treatment, including surgery, radiation, and/or chemotherapy. In addition to images of primary tumors, PET images of patients diagnosed with recurrent metastases (N = 8) were obtained to explore the difference in radiomics between the primary and recurrent tumors. The PET images were acquired at more than 5 years after the diagnosis of primary disease. MR imaging was performed as previously described20 using either a 1.5-Tesla (T) imaging system (Signa, GE Medical Systems, Milwaukee, WI) or a 3-T imaging system (MagnetomVerio, Siemens Medical Systems, Erlangen, Germany) with the patient in prone position. The DCE-MRI series consisted of a three-dimensional (3D), fat-suppressed, T1-weighted gradient echo sequence in accordance with the ACRIN 6657 imaging protocol.21 MR imaging was acquired at three time-points: pre-contrast-injection, early post-contrast-injection, and late post-contrast-injection. 18F-FDG PET/CT images were performed with an integrated PET/CT system (Biograph 16, Siemens Medical Systems or Discovery VCT, GE Medical Systems). The PET/CT and MR images were reconstructed using the scanner-specific workstation.

Image segmentation, standardization, and pre-processing

Tumor regions on MR images were identified using an established enhancement criteria of 70% applied to the first post-contrast image.22 This empirical threshold was based on visual agreement with radiological assessments in clinical practice.23 Normal-appearing stromal tissue surrounding the tumor was subsequently defined as fibroglandular tissue and was segmented from adipose tissue using a fuzzy C-means clustering method.24 Tumors in the PET images were segmented semi-automatically using a region-growing algorithm (MeVisLab©, MeVis Medical Solutions AG). The segmented tumor regions were confirmed by trained radiologists (S.B., M.D.). The in-plane image resolution ranged from 0.5 mm to 1.2 mm and 4.1 mm to 5.5 mm for MR and PET images, respectively. The axial image resolution ranged from 0.5 mm to 2.8 mm and 2.0 mm to 5.6 mm for MR and PET images, respectively. For appropriate image feature comparison, all MR and PET images were resampled to the same voxel dimension of 0.5 × 0.5 × 0.5 mm3 and 2.0 × 2.0 × 2.0 mm3, respectively. PET images were converted into the unit of standard uptake value (SUV), normalized by patient body weight and the decay-corrected injected activity.25

Radiomic features

We defined 42 radiomic image features to characterize tumors in the following categories: intensity (9), shape (8), and texture features (25). Table 4 shows the summary describing the radiomic features extracted in this study. Mathematical definitions of all radiomic features were described in this previous study.3 For this explorative study, we extracted only GLCM texture features since they have been shown effective as a potential imaging biomarker.26,27 The intensity features described the first-order statistics of the image signal intensity and histogram-based statistics, which characterize the distribution of the tumor intensity. The intensity histogram of the tumor region was generated with a fixed bin width of voxel intensity for all images. The shape features captured the three-dimensional (3D) geometric attributes of the tumor. The texture features provided spatial relationship between neighboring voxels within the tumor region to quantify intra-tumor heterogeneity. The texture features were derived from gray level co-occurrence matrix (GLCM), which presents how combinations of discretized gray levels of neighboring voxels are distributed along a given image direction. In this study, image features were extraction from MR images acquired at the first post-injection time point. The first-order statistics and GLCMs were generated from the PET and MR images discretized with a fixed voxel-intensity bin width of 0.1 and 5.0 for PET and MR images, respectively. Generally, there are 26 connected neighborhoods in 3D for GLCM, which yields 13 unique directions within the neighborhood for a voxel distance of 1. Thus, 13 GLCMs were generated for each 3D image dataset, and the mean of the texture features computed from the 13 GLCMs were reported for each tumor region. All image features were computed using in-house software based on Python (version 2.7.14) and Insight Segmentation and Registration Toolkit (ITK, version 4.10.1). The value of radiomic features were validated with those computed with Pyradiomics open-source software.28

Table 4 A summary describing the radiomic features extracted from the PET and MR images are shown

Clinical dataset

The following clinical data was collected from patient charts contained in the electronic health system: tumor histologic type, tumor grade, estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) status. The breast cancer subtypes were then grouped into the following categories where, additionally, hormone receptor (HR) status was defined as positive (+) when the ER or PR or both receptors were positive on immunohistochemistry: HR+/HER2−, HR+/HER2+, HR-/HER2+, HR-/HER2−. The primary tumor staging (T-stage), regional lymph node staging (N-stage), and overall staging, as defined by the American Joint Committee on Cancer,29 as well as presence, site, and date of disease recurrence and recurrence site were extracted from the institution’s cancer registry. The cancer recurrence status was categorized as no recurrence, recurrence, never disease free. The recurrence site had the categories of no recurrence, any local recurrence, any distant recurrence, such as recurrence in bone or systemically. To investigate the effectiveness of PET and MR radiomic features to predict the duration until disease recurrence, the recurrence-free survival (RFS) was repeatedly dichotomized using cutoff times of 1, 2, 3, 4, and 5 years. The patients who were recurrence-free beyond the cutoff time were labeled 1, whereas those who were not recurrence-free were labeled 0. Furthermore, we evaluated the value of PET and MR radiomic features to predict tumor grade. The tumor grade was dichotomized such that those with tumor grade (T1) and (T2) were labeled 0 and those with tumor grade 3 (T3) and 4 (T4) were labeled 1. In addition, we obtained Oncotype DX score for 8 patients out of this study cohort to explore the pairwise relationship between tumor genomic data and radiomics. All data analysis was performed on clinical data extracted from our clinical imaging database, and there was no clinical trial associated with this study cohort.

Data analysis

For data exploration, we performed unsupervised clustering of tumors, using consensus clustering30 based on PET and MR radiomic features. Consensus clustering is a method that provides consensus across multiple runs of a clustering algorithm by subsampling data as a way to evaluate the cluster stability and the best number of clusters for a given dataset. For a cluster class, a cluster’s consensus was computed as the average proportion of clustering runs in which two items are clustered together between all pairs of items belonging to the same cluster.30 To determine the optimal clustering algorithm, we performed consensus clustering with the following algorithms: hierarchical clustering with agglomerative ward linkage (HC),31 K-means (KM) on a data matrix, K-means on a distance matrix (KMdist),32 and partitioning around medoids (PAM).33 We used 1-Pearson correlation (Pearson), 1 - Spearman correlation (Spearman), and 1-Euclidean distance (Euc) as the dissimilarity measure. We performed the consensus clustering with resampling (10,000 iterations). The number of clusters was estimated by the cluster number that gave the largest change in area under the consensus cumulative distribution function (CDF). The median of the cluster’s consensus (median cluster consensus) was computed among all cluster classes for the optimal clustering setting (algorithms and the number of clusters). We performed the χ2-test of independence between the tumor cluster labels and each clinical feature for inference of data association. Cramer’s V34 were computed to measure the strength of association for the χ2-test of independence. For each clinical feature, the optimal clustering algorithm was selected as the one that estimated the highest Cramer’s V between the tumor clusters and the clinical feature. We used a significance level of 0.05 for detecting a statistically significant association in the χ2-tests of independence. To facilitate the selection of radiomic features important to predict a clinical outcome, Spearman’s rank correlation coefficients (ρ) were computed to evaluate the strength and direction of association between an ordered clinical outcome (tumor grade, stages, and Oncotype DX score) and a radiomic feature. For an unordered clinical outcome, such as breast cancer subtype, we fitted multiple regression models and used the proportion of variance explained by the predictor (\(r_{mreg}^2\)) to indicate the strength of association. Consensus clustering was performed using ConsensusClusterPlus35 implemented in R. The χ2-test was performed using chi2_contigency implemented in the Python Scipy statistics package. The multiple regression and Spearman’s rank-order correlation coefficient were implemented in R (version 3.3.2).

Classification of recurrence-free survival and tumor grade

Several machine learning algorithms, including support vector machine, random forest, and logistic regression with L1, L2, and ElasticNet regularization, were investigated to classify the dichotomized disease recurrence outcome based on a range of different cutoff times. For logistic regression, algorithm solvers including Liblinear36 (L1 and L2), Saga37 (L1), Lbfgs38 (L2), Newtoncg39 (L2), and Sag40 (L2) were explored. All radiomic features were normalized to a standard z-score prior to any model training. The predictive performance of the classifier methods was quantified using the area under receiver operator characteristic curve (AUC). The model parameters were optimized using stratified nested cross-validation (CV),41 with 3-fold inner and outer cross validation repeated 10 times. The nested cross-validation approach repeatedly splits the data into training, validation, and testing sets in order to avoid potential for over-fitting when estimating optimal tuning parameters and to provide unbiased estimation of the prediction performance. Stratification with respect to label class was applied during the nested cross-validation such that the folds were made by preserving the proportion of samples for each label class. The mean and 95% confidence interval of the nested cross-validation AUCs (thresholding the logistic regression predicted probabilities) were reported over the 1000 repetitions using a bootstrap approach.42 All PET and MR radiomic features were included in the nested cross-validation. In predicting RSF, we reported ElasticNet logistic regression algorithm for the ease of interpretability. To examine the predictive power of the PET and MR radiomic features, the features with the fitted coefficient >0 were tallied among 1000 repetitions of 3-fold outer cross-validation loop. The proportion of the times that a radiomic feature was selected out of 3000 CVs was ranked and the top 10 features were presented as the key features for predicting recurrence-free survival. In predicting tumor grade, we reported logistic regression with L2 regularization and Lbfgs solver. The key predictors were determined by those with the |model fitted coefficient| >0.01 and ranked according to the method described above. Cross-validation was implemented using Python (version 3.5.5), and machine learning algorithms used in this study were implemented in the Python scikit-learn package.43

Code availability

All software custom-built for extracting radiomics from MR and PET images, data analysis, and cross validation is available on request from the corresponding author (Y.S.).

Data availability

The imaging data that support the findings of this are available on request. Please contact the following authors for specific image and clinical data used in this study: Y. Seo for the whole-body PET/CT image and N.M. Hylton for the breast MR images. The imaging data are not publicly available due to them containing information that could compromise research participant privacy. Please contact L. Esserman for the ONCOTYPE DX score of the limited number of patients. The radiomics data extracted from the PET and MR images along with the corresponding clinical outcome in this study are available in this file (https://ucsf.box.com/s/dqopi5rgxc9u79zbjo53t6wai8dmf5uu). Each unique tumor is identified by the column name ‘ptid_side’.