A priori prediction of breast cancer response to neoadjuvant chemotherapy using quantitative ultrasound, texture derivative and molecular subtype

The purpose of this study was to investigate the performances of the tumor response prediction prior to neoadjuvant chemotherapy based on quantitative ultrasound, tumour core-margin, texture derivative analyses, and molecular parameters in a large cohort of patients (n = 208) with locally advanced and earlier-stage breast cancer and combined them to best determine tumour responses with machine learning approach. Two multi-features response prediction algorithms using a k-nearest neighbour and support vector machine were developed with leave-one-out and hold-out cross-validation methods to evaluate the performance of the response prediction models. In a leave-one-out approach, the quantitative ultrasound-texture analysis based model attained good classification performance with 80% of accuracy and AUC of 0.83. Including molecular subtype in the model improved the performance to 83% of accuracy and 0.87 of AUC. Due to limited number of samples in the training process, a model developed with a hold-out approach exhibited a slightly higher bias error in classification performance. The most relevant features selected in predicting the response groups are core-to-margin, texture-derivative, and molecular subtype. These results imply that that baseline tumour-margin, texture derivative analysis methods combined with molecular subtype can potentially be used for the prediction of ultimate treatment response in patients prior to neoadjuvant chemotherapy.


Ultrasound data analysis
The quantitative ultrasound parameters, including MBF (mid-band fit), SS (spectral slope), SI (spectral intercept), ACE (attenuation co-efficient estimate), ASD (acoustic scatterer diameter) and AAC (average acoustic-scatterer concentration) were determined from ROI-selected tumour core and 5-mm margin areas using quantitative ultrasound methods 16 .In this technique, each ROI was divided into window blocks of size 10 times the ultrasound wavelength with 94% overlap in both the axial and lateral directions to construct QUS parametric images.Tumour attenuation was determined using a spectral difference method 28 .The reference phantom method was used to remove any ultrasound system dependencies in quantitative parameters estimation.The attenuation coefficient and speed of sound of the reference phantom were 0.786 dB/MHz/cm and 1540 m/s, respectively.MBF, SS and SI were calculated using linear regression analysis of the normalized backscatter power spectrum over the − 6 dB bandwidth of the transducer 29 .The ASD and AAC parameters were derived from the backscatter coefficient by comparing measured data with a theoretically derived backscatter coefficient using a spherical Gaussian scatterer model (SGM) 30 .Finally, color-coded parametric maps for each estimated quantitative ultrasound parameter were constructed by generating a spatial map of the parameter values computed over all window blocks.The mean values of quantitative ultrasound parameters were determined by averaging QUS parametric map values.From the tumour core and 5-mm margin regions in QUS parametric images, two core-to-margin related parameters were calculated including core-to-margin ratio (CMR) and core-to-margin contrast ratio (CMCR) 24 .CMR compares the level of desired signal to the background noise, and CMCR is like CMR but also considers bias in an image.

Texture analysis
A statistical texture analysis technique was applied on QUS parametric images based on the concept of a greylevel co-occurrence matrix (GLCM).The GLCM represents the statistical angular relationship between neighbouring pixels, as well as the distance between them 23 .Four texture features, including contrast (CON), correlation (COR), homogeneity (HOM), and energy (ENE), were determined based on the statistical information provided by GLCM analysis.QUS parametric maps of the MBF, SS, SI, ASD and AAC from core and margin regions underwent a GLCM-based texture analysis process to extract these four texture features.In texture analysis, the contrast feature represents location-dependent gray level variations in an image.The  Texture-derivative analysis was subsequently applied to the QUS parametric images.In contrast to the previous texture analysis approach that produces averaged texture measures, texture-derivate analysis works by creating intermediary texture-encoded maps using a sliding window analysis with a 15-pixel by 15-pixel window.Each pixel in these maps represents a quantification of local textures across the window 18 .A second-pass GLCM based texture analysis was subsequently performed on these maps, resulting in texture-derivate features.In the end, a total of 201 features, including mean QUS, core-to-margin, texture, and texture-derivative features were extracted from each patient's ultrasound RF data.
The diagram of QUS, core-to-margin, GLCM based texture, and texture-derivative parameter estimation from the ultrasound data are summarized in Supplementary Fig. 1.

Classification and statistical analysis
ANOVA followed by a Bonferroni multiple comparisons Tukey test was used to compare the extracted means of QUS parameters, texture features, texture-derivative features, and tumour molecular subtype between responders and non-responders.To develop a tumour response prediction model, a nested cross-validation was performed on the parameters determined from ultrasound data 31,32 .The nested cross-validation is performed in two layers to achieve training and validation separation.In this study, leave-one-out and holdout cross-validation were investigated.In leave-one-out nested cross-validation (Fig. 1a), in the outer layer, one sample was separated for validation, and the rest of the data was used to develop a model.In the internal layer, the remaining 207 samples were used for feature selection and classifier parameter tuning.A developed model was then validated with one left sample, which was split at the beginning.This process was repeated 208 times by leaving different one sample for validation and by using a different 207 samples to develop a new model from scratch.The overall performance was then calculated as a mean of classification performances of the 208 separately developed models on different one sample left for validation, which was not involved in developing the models.In hold-out cross-validation www.nature.com/scientificreports/(Fig. 1b), 20% and 10% of the data was separated for validation, and 80% and 90% of the data, respectively, were used for model development.This process was repeated 10 times by randomly selecting 20%, and 10% of the data for validation and 80% and 90% for training process, respectively.Finally, overall performance was then calculated as a mean of classification performances of the 10 separately developed models.Classification metrics that include sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (PPV) and AUC were estimated to assess model performance.
In order to detect tumour response, a multi-feature response classification was performed using k-nearest neighbour (KNN) and support vector machine (SVM) with radial basis function (RBF) kernel 33 methods.In KNN model development, the number of nearest-neighbor, k, was set to an odd number (k = 1, 3, and 5) to avoid tied classes in this binary class case.Finally, the optimal number of nearest neighbour was tuned to achieve the best classification performance.In SVM-RBF model development, two classifier parameters including, the penalty for misclassification (C) and the width of a radial basis function (γ) were tuned.Here, optimal values for these two parameters were selected by grid search with the range of C = 2 8 , 2 9 , 2 10 ,…,2 15 and γ = 2 -18 , 2 -17 , 2 -16 ,…2 -5 .
The best feature subset was selected from training data set based on the maximal relevance minimal redundancy (mRMR) criterion 34 .This resulted in a reduced train dataset that consists of the 50 best mRMR features.From these best mRMR features, the optimal features were selected using the sequential-forward selection (SFS) method.To avoid the curse of dimensionality, the maximum number of features to select was set to 10 based on rule of thumb 35 .We implemented the synthetic minority over-sampling technique (SMOTE) to account for class imbalance 36 .Tumour response prediction model development was performed on this balanced reduced training dataset.The flow diagram of the training process, including feature selection, data balancing, and feature selection algorithm, is shown in Supplementary Fig. 2.

Patient, tumour and treatment characteristics
The clinical and pathological characteristics of the patients involved in this study are summarized in Table 1.The average age of patients was 51 ± 11 years.The average tumour size along the longest axis before treatment was 5.0 ± 2.7 cm.Among patients, 89% had invasive ductal carcinoma and 4% had lobular carcinoma.Five percent (5%) of patients had grade I tumours, 38% had grade II tumours, and 49% had grade III tumours incorporating Residual tumour size (cm) 1.8 ± 2.2 5.9 ± 4.5 2.7 ± 3.4 patients with locally advanced and earlier-stage breast cancer receiving neoadjuvant chemotherapy treatment.

Quantitative ultrasound, texture and texture-derivative parameters
Representative ultrasound B-mode, QUS parametric, and QUS-texture images corresponding to responding and non-responding patients, acquired prior to chemotherapy treatment, are displayed in Fig. 2. MBF and AAC parametric images demonstrated less tissue stiffness in both core and margin regions for responders compared to non-responders.ASD parametric images exhibited higher scatter diameters for responders than non-responders.A total of 201 features were determined from RF data acquired from breast cancer patients.Statistical analysis using unpaired t tests was performed to compare those QUS, core-to-margin, texture, and texture-derivative features between responders and non-responders acquired before treatment.Among the 201 features, 9 exhibited significant differences (p < 0.05) between responder and non-responder groups.The box plots for those features with significant differences, along with an overlaid scatterplot of the distribution of responding and non-responding patient values, are presented in Supplementary Fig. 3.The backscatter intensity parameters from the tumour core and margin regions, including Core MBF, Margin MBF, and Core AAC, were significantly higher in non-responders, as displayed in Fig. 2. The core-to-margin related parameters, including CMR-MBF, and CMR-AAC, were significantly lower and CMCR-AAC were significantly higher in responders, demonstrating a significantly difference in tissue stiffness between tumour core and margin region and less variation within the tumour core region in responder group, as visualized in Fig. 2. Texture-derivative parameters derived from scatter size parametric images from tumour core region, including Core ASD-ENE-HOM, Core ASD-ENE-CON, and Core ASD-ENE-ENE showed significant difference between two response groups, demonstrating the existence of orderly organized scatterers with broader varying sizes within tumour core region in responder group, as shown in Fig. 2.

Classification performances
Two different classifiers were investigated with leave-one-out and hold-out cross-validation approaches, and their performances were compared.Figure 3 shows bar plots illustrating the classification performance of the models developed using KNN and SVM-RBF classifiers with the following feature sets: (I) features including mean QUS, texture and core-to-margin parameters from core and margin region, (II) features including texture-derivative parameters from core and margin, (III) features including tumour molecular subtype, (IV) features including mean QUS, texture, texture-derivative, core-to-margin parameters from core and margin region, (V) features including mean QUS, texture, core-to-margin parameters from core and margin region, and tumour molecular subtype, (VI) features including texture-derivative parameters from core and margin region, and tumour molecular subtype, and (VII) features including all features with leave-one-out cross-validation.The algorithms developed using KNN provided the best classification performance from the combination of feature sets.Table 2 presents the optimal features selected using KNN methodology.In the classification algorithm development using the KNN classifier, most features were selected from tumour core region.Combining tumour molecular subtype with the mean QUS and texture features increased the classification performance to an accuracy of 72%.The best performance obtained using KNN was from the feature set including texture-derivative parameters and molecular subtype (VI) with a sensitivity of 74%, specificity of 80%, accuracy of 79%, PPV of 52%, NPV of 91% and AUC of 0.76.In compareison to the KNN algorithm, the SVM-RBF classifier performed well in differentiating responder and non-responder groups from all type of feature sets.The best performances were obtained from three feature sets using an SVM-RBF classifier in differentiating responder and non-responder groups (feature sets IV, VI and VII).Table 3 displays optimal features selected from these feature sets during the training process in classification model development using SVM-RBF classifier.The parameters determined from ultrasound RF data, including mean QUS, core-to-margin, texture and texture-derivative parameters could alone differentiate the response groups with a sensitivity of 80%, specificity of 80%, accuracy of 80%, PPV of 54%, NPV of 93%and AUC of 0.83 using SVM-RBF (feature set IV). Combining molecular subtype information with these parameters slightly increased the classification performance to a sensitivity of 87%, specificity of 81%, accuracy of 83%, and AUC of 0.87 (feature set VII).Similar to the KNN model, the best performance using SVM-RBF was obtained from a feature set including texture-derivative parameters and molecular subtype (VI) with a sensitivity of 79%, specificity of 86%, accuracy of 85%, PPV of 63%, NPV of 93% and AUC of 0.87.The summary of the classification performance results using KNN and SVM-RBF classifier based on various type of feature set with leave-one-out cross-validation approach is presented in Supplementary Tables 2a and 2b, respectively.Figure 4 displays the average performance on the training and test data sets using mean QUS, core-to-margin, texture, texture-derivative and molecular subtype with 10 hold-out validations, employing KNN and SVM-RBF algorithms for holding out 20% of data, and 10% of data.In a hold-out cross-validation approach, the model developed using KNN and SVM-RBF classifier exhibited similar performance in differentiating response groups.
For the 20% hold-out cross-validation, the classification accuracy and AUC for the KNN algorithm were 70 ± 7% and 0.71 ± 0.07, respectively, from the training set, and 68 ± 4% and 0.69 ± 0.02% from the test set.The bias and variance errors for this model were 30% and 2%, respectively.Similarly, the classification accuracy and AUC for the SVM-RBF algorithm were 68 ± 2% and 0.73 ± 0.04 from training set, 71 ± 4% and 0.71 ± 0.04% from independent test set, respectively.The bias and variance errors for this model were 32% and 3%, respectively.
For the 10% hold-out cross-validation, the classification accuracy and AUC for the KNN algorithm were 77 ± 2.5% and 0.76 ± 0.05from the training set, 81 ± 2.2% and 0.80 ± 0.03% from the test set, respectively.The bias and variance errors for this model were 21% and 3%, respectively.Similarly, the classification accuracy and AUC for the SVM-RBF algorithm were 70 ± 0.6% and 0.72 ± 0.03 from the training set, 71 ± 2.5% and 0.71 ± 0.07% from

Discussion
This study demonstrated, for the first time, the potential of combining quantitative ultrasound, texture, texture derivative, and molecular subtype analysis techniques to predict cancer treatment responder and non-responder among breast cancer patients before starting neo-adjuvant chemotherapy.All patients underwent a core needle  biopsy to confirm a cancer diagnosis and to determine tumour histological subtype and molecular subtype before treatment.A total of 201 features were determined from ultrasound data acquired from patients, including 5 QUS parameters from tumour core and tumour margin, 20 texture parameters from core and margin, 80 texture-derivative parameters from core and margin, 10 core-to-margin parameters, and tumour attenuation before treatment.
To understand the relationship of QUS, texture, and texture-derivative parameters determined from ultrasound-RF data with treatment response, we investigated the correlation between these features and patient tumour response.Among all QUS parameters, the backscatter intensity-related parameters from tumour core and margin regions, as well as ore-to-margin parameters, revealed significant differences between response groups.Additionally, specific texture-derivative features from the tumour core scatterer size-energy feature parametric images exhibited significant differences.Generally, backscatter intensity-related parameters, including the MBF and AAC, are strongly related to scatter number density and elastic properties 15,30,37 .In this study, the range of scatter size determined from ultrasound data acquired in both tumour and margin region was approximately 80-182 µm, representing lobule diameters observed in histopathological images.These results suggest that these two different type of responding group have different lobular number density in both tumour core and surrounding regions and different uniformity in size distribution in core region.These finding are reflected in the MBF and AAC parametric images constructed from responder data, demonstrating less tissue stiffness in both tumour core and margin regions compared to non-responder group (Fig. 2).Significant differences in backscatter parameters estimated from the margin region, as well as parametric image texture and texture-derivative parameters between two response groups, reveal the existence of microscopic infiltration from the tumour core to the surrounding region, particularly in non-responding patients.This is visualized in MBF and AAC images by significantly different values between core and margin region in responder group and no different in the non-responder group.Among molecular subtypes, majority (64%) of Luminal-A type tumours were in nonresponding group, and none of the ERBB2+ type tumours were in non-responding group.Previous studies have demonstrated that molecular subtype is a powerful independent predictor of chemotherapy response rate and overall survival.There were reported significant difference in response rates and overall survival of breast cancer patients with different molecular subtypes, including HER2+, triple negative, and ER and/or PR+ with HER2status 27 .Another study reported a lower pCR rate in Luminal-A and higher in Her2+ disease for neoadjuvant chemotherapy 38 .This was reflected in our study population.However, the calculated survival rate of these four molecular subtype tumour did not reveal the significant difference between these types.This is due to lack of sufficient sample in each molecule subtypes from our breast cancer patient population.Nevertheless, compared to responder group, number of Luminal A type tumour was higher and Her2+ type tumour was lower (12%) in our non-responder population.
In this study, multi-feature classification analyses were conducted on different feature sets, including mean QUS & texture from core and margin, texture-derivative parameters from core and margin, molecular subtype, and combination of these feature sets.Models developed from mean QUS & texture, and texture-derivative parameter exhibited similar performances with accuracies of approximately 69% and 70% using KNN and SVM-RBF, respectively.These performances were improved by including molecular subtype.The best performances were achieved from the feature set including texture-derivative parameters and molecular subtype, with accuracies of 79% and 85% using KNN and SVM-RBF, respectively.Comparing the performance of algorithms using two different classifiers, the better performances were obtained with the model developed using an SVM-RBF classifier based on all feature sets.Mostly, core-to-margin parameters that include core and margin region are selected (Table 3).This confirms the result presented in our previous finding where it was reported that the existence of microscopic extension in tumour surrounding regions affect the tumour response to NAC.The best classification performance was obtained by the combination of texture-derivative parameters from tumour core and margin, and molecular subtype, particularly texture-derivative parameters from backscatter intensity-related parametric maps, and Luminal-A type, with a sensitivity of 79%, specificity of 86%, accuracy of 85%, PPV of 63%, NPV of 93%, and AUC of 0.87.This agrees with a previous finding that response rate is significantly lower for Luminal-A type tumours compared to other molecular subtypes, and molecular subtype is one of the key predictor for treatment response.Combining molecular subtype with intra-tumoural heterogeneity reflected by texture-derivative parameters could easily predict the response type before treatment.The PPV and NPV values reveal that the our model has 63% chance in predicting non-responding patient and 93% chance in predicting responding patient accurately.Even though the change of predicting non-responder is moderate, there is a higher change of predicting responder.This finding reveals that, based on our tumour response model performance, treatment for the responding patient will not be changed from standard neoadjuvant treatment procedure.The lower value of PPV is due limited number of non-responding patient, which means the prevalence value is 22% in our patient population.Several previous studies reported that pathological response is a prognostic indicator for long-term, disease-free and overall survival 39 .This was confirmed in this current study.As expected, clinical outcomes were significantly different between the responders and non-responders, as defined by the modified response grading system criteria and demonstrated in survival plots (Fig. 5).The obtained models could differentiate two response-type patients' outcomes with good agreement with those based on histopathology and clinical outcomes determined after surgery.This result implies that the proposed classification models, based on combined quantitative ultrasound-texture biomarkers and molecular subtype as early survival-linked surrogates of response to cancer-targeting therapies, could facilitate switching from an inefficient treatment regimen to a more effective one on an individual patient-basis before starting treatment.
For the generalization of the proposed tumour response prediction models using KNN and SVM-RBF classifiers, we investigated their performance with a hold-out cross-validation approach.The classification performance with 20% testing and 80% training hold-out was, as expected, lower than result obtained with leave-one-out cross-validation approach, with bias and variance errors approximately at 30% and 2% respectively.However, www.nature.com/scientificreports/ the performance using KNN classifier with 10% testing and 90% training hold-out was approximately similar to that with leave-one-out approach, with bias and variance errors approximately at 21% and 3% respectively.The range of bias and variance errors from 10 repeated 20% testing hold-out validation was 31 to 35% and 1 to 11%, respectively.For 10 repeated 10% testing hold-out validations, the range was15 to 25% and 0.3 to 9%, respectively.This high variance and bias with 20% testing holdout reveal that the number of sample in training process is not sufficient enough to perform an adequately powered hold-out validation.This suggest that for our current patient population size, leave-one-out is the better cross-validation choice to investigate the classification performance of the response group based on combined quantitative ultrasound, texture-derivative and molecule subtype analyses.
In a previous study investigating tumour response prediction with a population of 56 patient based on mean QUS, texture, and core-to-margin parameters from core and margin regions, a classification performance with accuracy of 88% was reported 24 .In another study involving 100 patient, tumour response prediction was performed based solely on mean QUS, texture, and texture-derivative parameters from tumour core region, reporting a performance accuracy of 82% 18 .However this study showed slightly lower performance with those feature sets.This discrepancy is attributed to the increase variety of tissue microstructures in training sample population resulting from the inclusion of more patients in the model development.By combing tumour molecular subtype with mean QUS, texture and texture-derivative parameters, a significant improvement in the performance of tumour response prediction could be achieved.
In conclusion, this work demonstrates that combining texture analysis of quantitative ultrasound features with molecular subtype can accurately detect tumour response before neoadjuvant chemotherapy using a machine learning approach.While the current population appears reasonably good for tumour response prediction with leave-one-out cross-validation approach, a larger cohort of patients in the future should improve the generalizability and robustness of the prediction, even using hold-out validation approach.Nevertheless, this study shows that molecular and QUS-texture markers can serve as a prior treatment survival-linked surrogate of response to cancer-targeting therapies-leading the way towards personalized medicine and facilitating the selection of an appropriate treatment regimen on an individual patient basis.

Figure 1 .
Figure 1.Validation method.(a) Nested leave-one-out cross-validation method.Nested cross-validation is performed in two layers to achieve training and validation separation.In the outer layer one sample was separated for validation and the rest of the data was used to develop a model.In the inner layer, the training data was used to develop classification model.Finally obtained model was tested with test data set.(b) Holdout method.Here data set is divided into test and training data randomly.The training was used to develop classification model and tested with test data set.This process was repeated 10 times and average performance was calculated.

Figure 3 .
Figure 3. Summary of classification performance with leave-one-out cross-validation.Tumour response classification results obtained using (a) KNN and (b) SVM-RBF algorithms based on seven feature set types using leave-one-out cross-validation approach.The classification algorithms developed using both KNN and SVM-RBF classifiers exhibited the best performance with the feature set including QUS-texture-derivative parameter and tumour molecular subtype (Feature Set VI).Feature Set I: QUS + Texture + Core-to-Margin; Feature Set II: Texture Derivative; Feature Set III: Molecular Subtype; Feature Set IV: QUS + Texture + Coreto-Margin + Texture-Derivative; Feature Set V: QUS + Texture + Core-to-Margin + Molecular Subtype; Feature Set VI: Texture Derivative + Molecular Subtype; Feature Set VII: QUS + Texture + Core-to-Margin + Texture-Derivative + Molecular Subtype.

Figure 4 .
Figure 4. Summary of classification performance with hold-out validation.Average tumour response classification results obtained from 10 hold-out data set using KNN and SVM-RBF algorithms.The top panel (a) shows data with 20% testing (80% training) and the bottom plot (b) with 10% testing (90% training) hold-out approaches.The classification performance with 90% training was approximately similar to those with leaveone-out approach with bias and variance errors of 21% and 3%, respectively. https://doi.org/10.1038/s41598-023-49478-3

Figure 5 .
Figure 5. Recurrence free survival curves for neo-adjuvant chemotherapy treatment responders and nonresponders.Patients were differentiated based on clinical/pathology after treatment, and also based on baseline texture-derivative parameters combined with molecular subtype using the KNN and SVM-RBF algorithm.
measures textural uniformity while the homogeneity measures the incidence of pixels pairs of different intensities.As the frequency of pixel pairs with close intensities increases, homogeneity increases.The correlation feature measures the linear dependency among neighbouring pixels.In this study, a total of 40 textural features (four features for each of the five QUS parametric maps) were computed.

Table 1 .
Clinical and pathologic characteristics of breast cancer patients receiving NAC.

Table 2 .
Optimal features selected for tumour response classification using KNN algorithm developed based on ultrasound data (Feature Set IV), ultrasound data + molecular subtype (Feature Set VII), and the best performed feature set (Feature Set VI) with leave-one-out cross validation approach.Feature Set IV: QUS + Texture + Core-to-Margin + Texture-Derivative; Feature Set VI: Texture-Derivative + Molecular Subtype; Feature Set VII: QUS + Texture + Core-to-Margin + Texture-Derivative + Molecular Subtype.

Table 3 .
Optimal feature selected for tumour response classification using SVM-RBF algorithm developed based on ultrasound data (Feature Set IV), ultrasound data + molecular subtype (Feature Set VII), and the best performed feature set (Feature Set VI) with leave-one-out cross validation approach.Feature Set IV: QUS + Texture + Core-to-Margin + Texture-Derivative; Feature Set VI: Texture-Derivative + Molecular Subtype; Feature Set VII: QUS + Texture + Core-to-Margin + Texture-Derivative + Molecular Subtype.