A Radio-genomics Approach for Identifying High Risk Estrogen Receptor-positive Breast Cancers on DCE-MRI: Preliminary Results in Predicting OncotypeDX Risk Scores

To identify computer extracted imaging features for estrogen receptor (ER)-positive breast cancers on dynamic contrast en-hanced (DCE)-MRI that are correlated with the low and high OncotypeDX risk categories. We collected 96 ER-positivebreast lesions with low (<18, N = 55) and high (>30, N = 41) OncotypeDX recurrence scores. Each lesion was quantitatively charac-terize via 6 shape features, 3 pharmacokinetics, 4 enhancement kinetics, 4 intensity kinetics, 148 textural kinetics, 5 dynamic histogram of oriented gradient (DHoG), and 6 dynamic local binary pattern (DLBP) features. The extracted features were evaluated by a linear discriminant analysis (LDA) classifier in terms of their ability to distinguish low and high OncotypeDX risk categories. Classification performance was evaluated by area under the receiver operator characteristic curve (Az). The DHoG and DLBP achieved Az values of 0.84 and 0.80, respectively. The 6 top features identified via feature selection were subsequently combined with the LDA classifier to yield an Az of 0.87. The correlation analysis showed that DHoG (ρ = 0.85, P < 0.001) and DLBP (ρ = 0.83, P < 0.01) were significantly associated with the low and high risk classifications from the OncotypeDX assay. Our results indicated that computer extracted texture features of DCE-MRI were highly correlated with the high and low OncotypeDX risk categories for ER-positive cancers.

Scientific RepoRts | 6:21394 | DOI: 10.1038/srep21394 features on MRI that might be associated with underlying biology molecular subtype and risk of recurrence of the tumor [9][10][11] . A radiogenomic approach presented by Yamamoto et al. 12 found a significant correlation between breast MRI (1.5 Tesla) features and a number of important breast cancer related gene sets. Correlation studies by Vassiou et al. 13 and Chang et al. 14 showed that DCE-MRI (1.5 Tesla) based imaging features, such as tumor margin, enhancement pattern, and kinetic characteristics, were associated with pathological prognostic factors for the prediction of clinical outcome during treatment of breast cancer. A recent study conducted by Sutton et al. 15 showed that two MRI (1.5 or 3.0 Tesla) derived statistical image features were significantly correlated with the  Table 1. The best two identified features in each feature class associated with their performance measures in distinguishing low and high risk estrogen receptor (ER)-positive breast cancers. Note.
-Numbers in parentheses are 95% confidence intervals. Az = area under the receiver operating characteristic curve; PPV = positive predictive value; NPV = negative predictive value; DHoG = dynamic histogram of oriented gradient; DLBP = dynamic local binary pattern; PK = pharmacokinetics; EK = enhancement kinetics; TK = textural kinetics; IK = intensity kinetics; RSD = relative standard deviation; err = error rate of classification. * ρ denotes correlation coefficient. median OncotypeDX recurrence scores with a range of 0-45. Ashraf et al. 9 presented a method for identifying correlation between computer-extracted morphologic and kinetic features from DCE-MRI sequences obtained at a 1.5T magnet with validated prognostic gene expression profiles of breast cancers. In addition, Agner et al. 8 presented an approach called textural kinetics (TK), which involved measuring dynamic changes in breast lesion texture during contrast uptake. These TK features were able to separate different molecular subtypes of breast cancers (triple negative, ER-positive, HER2-positive, fibroadenoma) on DCE-MRI obtained at either 1.5T or 3T 16 .
In this work, we investigate the ability of TK features on DCE-MRI to distinguish ER-positive breast cancers between low and high OncotypeDX risk categories (i.e., OncotypeDX recurrence score <18 and OncotypeDX recurrence score > 30). Our approach is different from the work presented by Ashraf et al. 9 , in which dynamic features were computed based on the estimation of parameters on time-intensity curve, e.g., peak enhancement, wash-in and wash-out slop. Our approach is also different from the work published by Sutton et al. 15 , in which the image features (i.e., morphological, static first-order, and Haralick texture features) were extracted from breast lesions on pre-and three post-contrast MR images. In this study, we focus on TK features that allow for characterization of dynamic texture changes, specifically texture involving dynamic histograms in tumors on 1.5 Tesla DCE-MRI. Also our TK features are different from the approach by Agner et al. 8,16 , in that it involves quantification of kinetic texture in a new way -dynamic histogram of oriented gradients (DHoG) and dynamic local binary patterns (DLBP).
The histogram of oriented gradients (HoG) 17 , local binary patterns (LBP) 18 , and their spatio-temporal representations 19,20 are image texture features that have been previously employed for detecting breast masses on mammographic images 21,22 . Unlike the approach in Agner et al. 8,16 which attempted to fit a single parametric curve to explain the changes in lesion texture during the contrast wash-in and wash-out, the DHoG and DLBP approaches allow for construction of a unique lesion signature that captures the frequency of occurrence of different spatio-temporal textural patterns.
In this study, we evaluate the ability of the DHoG and DLBP features extracted from DCE-MRI at 1.5 Tesla, via a linear discriminant analysis (LDA) classifier to distinguish low and high risk ER-positive breast cancers, risk having been established via the OncotypeDX assay. Table 1 shows the best two identified features in each feature class (shape, PK, EK, IK, TK, DHoG, DLBP) associated with Az, PPV, NPV, ρ (correlation coefficient), RSD, and err (error rate of classification). The numbers of bins used in the DHoG and DLBP features were {2, 4, 6, 8, 10} and {8, 16, 33, 64, 128, 256}, respectively. Higher ρ value indicates a stronger relationship between risk stratification via the features and OncotypeDX. The post-hoc power analysis revealed no significant difference between features identified from imaging data acquired at Sites I and II which were found to discriminate high versus low OncotypeDX risk categories. The heat map (Fig. 1) shows the values of all the features listed in Table 1 for all the patient studies. The values of best two identified features in each feature class and the six top performing features obtained from the LDA based feature selection method are listed in Supplementary Table S1 online.

Results
Shape Features for Discriminating ER-positive Breast Cancers. The feature compactness used to measure the speculation of tumor margin yielded the best discriminability among the computerized shape features. The lower values of compactness for the high OncotypeDX breast cancers (− 20.76 ± 7.32) compared with the low OncotypeDX cancers (− 19.05 ± 9.14) appears to suggest that higher OncotypeDX score cancers tend to be associated with more speculation compared to cancers with low OncotypeDX risk scores. Table 1 appears to suggest that shape features are less useful for differentiating between low and high OncotypeDX risk score ER-positive lesions compared to pharmacokinetic and textural kinetic features (i.e., PK, EK, IK, TK, DHoG, and DLBP).
PK Features for Discriminating ER-positive Breast Cancers. Among the PK parameters, K trans was found to be the most effective in distinguishing low and high OncotypeDX risk score ER-positive breast lesions (Fig. 2). While compared to K ep , K trans appeared to have a stronger correlation with the OncotypeDX risk scores. K ep appeared to be more predictive in identifying lesions on DCE-MRI that had a high OncotypeDX risk score.

EK Features for Discriminating ER-positive Breast Cancers.
The EK features were extracted to quantitatively characterize the contrast enhancement patterns within ROIs in the lesion. Although the uptake rate achieved the best classification performance among all the EK features, these features generally had weaker correlation with the OncotypeDX risk scores compared to the TK features.

IK and TK Features for Discriminating ER-positive Breast Cancers. The TK features outperformed
the IK features by up to 10% in terms of Az and NPV (Table 1). Further, the TK features had a higher degree of positive correlation with the OncotypeDX risk categories compared to the IK features which had a lower degree of negative correlation. These trends are consistent with the performance of the classifier (Table 1). Stability and Predictive Performance. The stability and error of LDA classification were measured by the relative standard deviation and the inverse power law model. It can be observed that DHoG and DLBP outperformed the other feature classes (i.e., shape, PK, EK, TK, IK) and achieved the smallest RSD values and error rates. The IK feature (1 st fitting coefficient) has the highest RSD values indicating the lowest stability in classification. The shape features produced the largest error rates among all the feature classes.

Discussion
We presented a computerized image analysis framework for identification of breast MR imaging markers to distinguish between low and high risk ER-positive breast cancers via a correlation of computer extracted DCE-MRI attributes and the OncotypeDX assay. Although tumor margin, tumor size, rim enhancement on DCE-MRI have been previously correlated with pathological factors and have been reported to be associated with disease outcome [11][12][13][14]23 , to the best of our knowledge, this is the first attempt to investigate the association between textural kinetic features on DCE-MRI with OncotypeDX recurrence scores for ER-positive breast cancers. This is important because the OncotypeDX is an assay with proven clinical utility that has been shown to be both prognostic and predictive in ER-positive breast cancers 24 . Hence by demonstrating computer extracted imaging features on DCE-MRI can predict the OncotypeDX risk category of the lesion, we might be able to non-invasively identify which patients would benefit from adjuvant therapy. This could pave the way for non-invasive risk assessment of the lesion even prior to biopsy. Furthermore, this is the first systematic comparison of various kinetic (PK, EK, IK,  TK, DHoG and DLBP) and shape features, to discriminate high and low OncotypeDX categories of ER-positive breast lesions. There has been recent interest in identifying radiogenomic correlates of breast lesions on MRI. In 16 , Agner et al. showed that textural kinetic features extracted from routine clinical DCE-MRI appeared to be associated with the biologic heterogeneity and molecular subtype of breast cancers. Giger et al. 25,26 computed enhancement kinetic features, such as time to peak, uptake rate, maximal uptake, from a characteristic time curse curve to distinguish benign and malignant breast masses. Ashraf et al. 9 also utilized curve-based kinetic features to construct breast DCE-MRI phenotypes and showed their correlation with the OncotypeDX assay.
In this study, we attempted to identify whether there was an association between textural kinetic features extracted from ER-positive breast lesions on 1.5 Tesla DCE-MRI and their corresponding OncotypeDX risk categories. While Ashraf et al. 9 focused on the association between lesion shape, contrast kinetic features and spatial heterogeneity features and the continuous OncotypeDX recurrence scores, our approach was focused on evaluating the ability of quantitative image features and spatio-temporal patterns within the lesion to distinguish between the low (< 18 risk score) and high (> 30 risk score) OncotypeDX risk categories. Additionally in conjunction with a LDA classifier, the textural kinetic features yielded an Az = 0.84 in distinguishing low and high OncotypeDX risk category lesions, compared to Ashraf et al. 9 where the Az was 0.77.
Our approach was also different from that of Agner et al. 8 in that we employed two new textural kinetic features, DHoG and DLBP, which unlike EK, IK and TK features, seek to capture contextual textural changes during contrast uptake by considering changes in spatial intensity patterns within divided grid cells in the lesion ROI. Unlike the approach in Agner et al. 16 which attempted to fit a single parametric curve to characterize the temporal changes in lesion texture, the DHoG and DLBP approaches capture the frequency of occurrence of different spatio-temporal textural patterns within the lesion.
A systematic and quantitative analysis of different computer extracted features demonstrated that curve-based kinetic features (i.e., EK, TK, IK) were less discriminating compared to the other three feature classes (i.e., DHoG, DLBP, PK) in distinguishing high and low OncotypeDX risk score ER-positive cancers. Consistently, the feature combination identified through the feature selection process contained the important features from 4 feature classes (DHoG, DLBP, PK, and TK). While the PK features showed moderate correlation, lesion shape features were even less correlated with the OncotypeDX risk categories for the lesions evaluated. The DHoG and DLBP appeared to be the most discriminative features in differentiating low and high OncotypeDX risk score ER-positive breast lesions on DCE-MRI. Figures 3 and 4 which show the normalized mean DHoG and DLBP curves plotted as a function of contrast uptake, appear to illustrate a high degree of heterogeneity in high OncotypeDX risk score cancers compared to low OncotypeDX risk score cancers. The corresponding color-coded DHoG and DLBP feature maps at peak enhancement (Figs 3 and 4) also suggest that high OncotypeDX risk score breast cancers may appear to be more heterogeneous at peak contrast compared to low OncotypeDX risk score cancers. The Spearmen's rank correlation test showed that DHoG and DLBP are significantly correlated (DHoG: ρ = 0.85, P < 0.01; DLBP: ρ = 0.83, P < 0.01) with the high and low OncotypeDX risk score categories. These results are consistent with the findings of Ashraf et al. 9 , who showed that DCE-MRI based heterogeneity kinetic features were correlated with OncotypeDX recurrence scores (ρ = 0.71, P < 0.001). However unlike Ashraf et al. where image data from only a single institute was considered, our approach included image data from two different clinical sites.
Our study did have its limitations, and as such, it is important to acknowledge that this is a preliminary study with need for additional independent validation of our initial findings. Additionally, we only included those patients having low (< 18) and high (> 30) OncotypeDX recurrence scores and excluded intermediate risk scores (> 18 and < 30) as the contrast was greatest between these categories and further work is needed to evaluate the intermediate category. Further, the extracted features were computed based on the automated segmentation method due to lack of precise lesion boundary for the data from Site II. Owing to the limited size of the dataset considered in this study, we did not conduct multiple statistical tests of comparisons on the features. We also did not explicitly quantify the inter-observer variability in segmentation of the dominant masses between multiple readers. One problem was the fact that we were identifying imaging markers correlated with a surrogate of outcome (OncotypeDX) instead of actual outcome itself-unfortunately this information was not available for the patients considered in this study.

Concluding Remarks.
We identified a set of computer extracted image texture features on DCE-MRI that appear to be able to segregate high and low OncotypeDX risk scores in ER-positive breast cancers. The texture features so identified may allow for non-invasively predicting which ER-positive patients might benefit from adjuvant hormonal and chemotherapy.

Materials and Methods
This study was approved by the institutional review board and compliant with Health Insurance Portability and Accountability Act. Written informed consent was obtained from all subjects. The experimental protocols were approved by the Case Western Reserve University Faculty of Biomedical Engineering Ethics Committee. The methods were carried out in accordance with the approved guidelines and regulations.

Patients. The breast DCE-MRI data were retrospectively collected from two institutions (Site I: Boston
Medical Center; Site II: UH MacDonald Women's Hospital) between 2006 and 2012. All the cases were anonymised. In Site I, women patients who presented with a suspicious breast lesion on screening mammogram, then had diagnostic MRI, were recruited to a large study of MRI in the staging, diagnosis, and screening of breast cancer. In Site II, women whose pathology revealed node-negative, ER-positive invasive breast cancer who participated in a large breast cancer case-control study were utilized for this study 27 . All the DCE-MRI images at 1.5T were obtained within 3-7 days after diagnostic biopsy. The lesion diagnosis for both cohorts was confirmed by ultrasound guided core needle biopsies or MRI guided biopsies, followed by the histopathologic examination of 3-10 specimens obtained by core biopsy sampling. A total of 89 patient studies were collected from the Site I, and only 17 patients with both pathology reports and available low or high OncotypeDX scores were included in this study. From the Site II, we acquired 101 ER-positive stage I-III female breast cancer patients. Of those, 79 patients had both associated pathology reports and available low and high OncotypeDX recurrence scores. Patients with intermediate OncotypeDX recurrence scores (18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30) were deemed to not be informative of cancer risk and hence excluded from the analysis. Three patient studies and one patient study from the Site I had two and three separate lesions respectively. All other patients only had a single lesion. For each patient, the OncotypeDX test was performed for the dominant mass (index lesion), hence only the index lesion in the case of the patient with multiple masses was considered. The patient selection criteria for our study are summarized in Fig. 6. Both DCE-MRI data sets were obtained prior to, during, and after administration of 0.1 mmol/kg body weighted of gadolinium-DTPA at a flow of 4cc/second, for a total imaging duration of 5-10 minutes. Each patient study was accompanied by: i) private health information free clinical metadata containing clinical history, age range, and radiology report; ii) pathological reports containing ER-positive scale values denoting low (17-34%), moderate (34-50%), or high (50-100%); and iii) recurrence score denoting lower relapse rate and improved overall survival with adjuvant tamoxifen (< 18), or the converse (> 30) outcome. Table  2 summarizes the patient characteristics.

Lesion Segmentation and Feature Extraction.
For each patient study, a representative section of the DCE-MRI volume, containing the largest diameter of the dominant mass, was chosen by a radiologist (B.N.B or D.P, both with more than 10 years of experience in the interpretation of breast MRIs) who was blinded to pathologic diagnosis. The lesion boundary was automatically delineated via an automated lesion segmentation method specifically developed and evaluated on breast DCE-MRI 28 . The computer derived features, including shape features, pharmacokinetics (PK), enhancement kinetics (EK), intensity kinetics (IK), TK, DHoG, and DLBP, were calculated based on the pixels enclosed by the delineated regions of interest (ROI) containing breast masses. A flowchart demonstrating the use of computerized features for lesion class discrimination is shown in Supplementary Figure S1 online. Table 3 describes the extracted features. All feature calculations were performed by using software developed in-house and was implemented using the MATLAB © programming platform (version R2013a, MathWorks, Natick, MA).
Shape features. Six shape features 8 were included: (a) area overlap ratio, (b) variance of distance ratio, (c) compactness, (d) smoothness, (e) normalized average radial distance ratio, and (f) standard deviation of normalized distance ratio. These attributes were used to measure the roundness, smoothness, spiculation, and regularity of the lesion margin.
Pharmacokinetics. Toft's PK model 29,30 is most commonly used in DCE-MRI to provide a physiologic interpretation of the breast MRI images via three parameters 31 , i.e., K trans (the transfer constant between the plasma and tissue compartments), v e (the extracellular extravascular volume fraction), and K ep (the ratio of K trans /v e ). The PK parameters were estimated on the MRI dynamic signal enhancement curves plotted as a function of time after a bolus injection of Gd-DTPA.
Enhancement Kinetics. Breast lesion enhancement can be qualitatively characterized by assessing the enhancement curve obtained by plotting the signal intensity values over time after contrast injection. The mean signal intensity at each time point was calculated on the entire lesion ROI. A total of four intensity kinetic features (maximal uptake, time to peak, uptake rate, and washout rate) were computed to measure the amount and rate of contrast uptake 25,26 .
Intensity Kinetics and Textural Kinetics. A third-order polynomial was fitted to the enhancement curve to characterize its shape via a set of four model coefficients 16 . For each lesion, we computed five types of textural features, including Kirsch, Sobel, Haralick, and first-order textural features. Table 3 summarizes all the textural features considered in this study. The mean textural feature of lesion ROI was plotted as a function of time during the period of contrast administration. These polynomial coefficients represent the corresponding intensity and textural kinetic behavior of the lesion and represent the corresponding IK and TK features.

Linear Discriminant Analysis based Classification via Cross-validation.
To determine computer extracted imaging features on DCE-MRI that best discriminated the low from high OncotypeDX risk categories, the LDA based classification was performed on the individual feature of each feature class (i.e., shape, PK, EK, IK, TK, DHoG, DLBP) and entire feature set containing all the feature classes (176 features in total). A LDA classifier 34 was trained using the extracted features to classify images with low or high OncotypeDX via an iterative 2-fold cross-validation scheme. To reduce overfitting, feature selection was performed on the entire feature set  via a sequential floating forward based LDA selection method 35 . Further description regarding the theoretical formulation of feature selection problem and LDA classification can be obtained from Supplementary-B online. The important features were identified during the feature selection process were combined with equal weighting and used in conjunction with the LDA classifier. We assume that the condition probability density function with respect to the low and high OncotypeDX classes is normally distributed with equal class covariance.
Analysis. Statistical Analysis. The Student t test was used to verify that there was no tumor size-related bias or age-related bias between low and high OncotypeDX risk categories (Table 2). To confirm that our classifiers and features were robust to the choice of MRI scanners and clinical sites, we used a paired t test to test the null hypothesis that there were no difference in feature values between data acquired from the two sites. A post-hoc power analysis of the 95% confidence interval was performed. The Spearman's rank correlation tests measured by correlation coefficient (ρ) were performed to determine the relationship between the computer extracted features and the low/high OncotypeDX risk categories. All analyses were performed by using the IBM SPSS software (version 21.0; IBM, Chicago, IL). A value of P < 0.05 was considered to indicate a statistically significant difference.  Stability of Classification Performance. In the LDA classification, area under the receiver operating characteristic curve (Az), positive predictive value (PPV), negative predictive value (NPV) were used as performance measures for evaluating the discriminability of each of the individual computer extracted features. In order to assess the stability of LDA classifier, the classification was performed via a 2-fold cross validation strategy. We computed the performance measures 100 times and reported the mean values with 95% confidence interval in the results. We employed a stability measure that Parmar et al. used to evaluate the performance of classification methods in their recent radiomic work 34 . The classifier stability was empirically quantified using the relative standard deviation (RSD %), which can be defined as:  Stability and Predictive Performance. In order to identify most accurate and highly reliable image features, we used mean values of Az and RSD as feature ranking measures. According to Parmar et al.'s selection criterion 32 , the features ranked in the top half of both measures are considered as highly accurate and reliable ones. For each feature class (i.e., shape, PK, EK, IK, TK, DHoG, DLBP), the best identified features have Az greater than the mean Az of all classifiers and RSD less than the mean RSD of all classifiers. Further, we utilized an inverse power law model 35 of statistical learning to estimate the error rate associated with the classification performance on the currently available data samples. The estimation procedure comprised the following steps: (i) The dataset was divided into a training pool and a testing set via a random sampling; (ii) Ensured that the number of training samples in each set was statistically significant for calculating the power law parameters; (iii) The power law model was applied to describe the relationship between error rate and training set size: where err(n) is the error rate for training set size n, a is the learning rate, α is the decay rate, and ε is the Bayes error. The model parameters [a, α, ε] can be estimated via a constrained non-linear minimization 35 .