Predicting BRAFV600E mutations in papillary thyroid carcinoma using six machine learning algorithms based on ultrasound elastography

The most common BRAF mutation is thymine (T) to adenine (A) missense mutation in nucleotide 1796 (T1796A, V600E). The BRAFV600E gene encodes a protein-dependent kinase (PDK), which is a key component of the mitogen-activated protein kinase pathway and essential for controlling cell proliferation, differentiation, and death. The BRAFV600E mutation causes PDK to be activated improperly and continuously, resulting in abnormal proliferation and differentiation in PTC. Based on elastography ultrasound (US) radiomic features, this study seeks to create and validate six distinct machine learning algorithms to predict BRAFV6OOE mutation in PTC patients prior to surgery. This study employed routine US strain elastography image data from 138 PTC patients. The patients were separated into two groups: those who did not have the BRAFV600E mutation (n = 75) and those who did have the mutation (n = 63). The patients were randomly assigned to one of two data sets: training (70%), or validation (30%). From strain elastography US images, a total of 479 radiomic features were retrieved. Pearson's Correlation Coefficient (PCC) and Recursive Feature Elimination (RFE) with stratified tenfold cross-validation were used to decrease the features. Based on selected radiomic features, six machine learning algorithms including support vector machine with the linear kernel (SVM_L), support vector machine with radial basis function kernel (SVM_RBF), logistic regression (LR), Naïve Bayes (NB), K-nearest neighbors (KNN), and linear discriminant analysis (LDA) were compared to predict the possibility of BRAFV600E. The accuracy (ACC), the area under the curve (AUC), sensitivity (SEN), specificity (SPEC), positive predictive value (PPV), negative predictive value (NPV), decision curve analysis (DCA), and calibration curves of the machine learning algorithms were used to evaluate their performance. ① The machine learning algorithms' diagnostic performance depended on 27 radiomic features. ② AUCs for NB, KNN, LDA, LR, SVM_L, and SVM_RBF were 0.80 (95% confidence interval [CI]: 0.65–0.91), 0.87 (95% CI 0.73–0.95), 0.91(95% CI 0.79–0.98), 0.92 (95% CI 0.80–0.98), 0.93 (95% CI 0.80–0.98), and 0.98 (95% CI 0.88–1.00), respectively. ③ There was a significant difference in echogenicity,vertical and horizontal diameter ratios, and elasticity between PTC patients with BRAFV600E and PTC patients without BRAFV600E. Machine learning algorithms based on US elastography radiomic features are capable of predicting the likelihood of BRAFV600E in PTC patients, which can assist physicians in identifying the risk of BRAFV600E in PTC patients. Among the six machine learning algorithms, the support vector machine with radial basis function (SVM_RBF) achieved the best ACC (0.93), AUC (0.98), SEN (0.95), SPEC (0.90), PPV (0.91), and NPV (0.95).

The BRAF V600E mutation is a significant contributor to the papillary thyroid carcinoma (PTC) phenotype, which aids in the diagnosis and differential diagnoses of PTC before surgery 1,2 .BRAF V600E diagnosis requires genetic testing of cell eluate by ultrasound-guided fine-needle aspiration (FNA), which is invasive.Ultrasound-guided FNA cytological examination of thyroid nodules can diagnose PTC before surgery, but 15% to 30% of the cytological results belong to the Bethesda system definition with uncertain detection results (including Bethesda Type III: atypical lesions or follicular lesions of unknown significance (AUS/FLUS), Type IV: follicular tumors/ suspected follicular tumors, and Type V: suspected malignant tumors (SUSP)).Therefore, "TBSRTC Classification Malignant Risk and Management Recommendation" recommends FNA cytology combined with BRAF V600E mutation detection, but are all invasive.As a result, it is critical in clinical practice to adopt non-invasive approaches to forecast the status of BRAF V600E mutations, so as to reduce the FNA and molecular detection rate.More importantly, According to earlier research, BRAF V600E mutation in thyroid tumors is thought to be a sign of severe illness and PTC-related mortality 3 .It is interesting to note that the presence of the BRAF V600E mutation has become a more reliable molecular marker for PTC recurrence 3 .Therefore, finding BRAF V600E mutations in thyroid tumors has implications for prognosis and serves as a marker for tumor recurrence.The BRAF V600E mutation is also strongly linked to the existence of extrathyroidal extension (ETE) and cervical lymph node metastasis (CLNM) in PTC patients, suggesting an invasion 4,5 .Jin et al. 6 observed a substantial relationship between BRAF V600E mutation with CLNM and ETE in a Mayo Clinic research.Xing et al. 7 revealed a close link between BRAF V600E mutation and ETE, CLNM, and advanced illness stages in a large comprehensive international multicenter investigation.Even if either gene mutation is capable of identifying aggressive cancer types, the genetic analysis still requires specimen tissue for examination, which is often collected by invasive surgical procedures.Furthermore, identifying tumors using FNA biopsy samples from type 4a nodules is difficult because tumor cells vary in quantity, quality, and purity 8 .A sensitive and precise BRAF V600E mutation detection method will thus aid in the early detection of PTC 9 .Because BRAF V600E gene mutations produce 99.8% of malignant nodules, it is a significant tumor marker for PTC.
With the increasing popularity of tumor thermal ablation technology (microwave, radio frequency, and laser) in China, an increasing number of patients with PTC, particularly those with minimal papillary thyroid cancer (MPTC), are willing to accept thermal ablation as a minimally invasive procedure to maximize the preservation of thyroid function.As a result, proper preoperative detection of BRAF V600E status in PTC patients is critical for patients to choose therapy methods.Surgery is currently chosen over ablation for highly invasive PTC in China.
The main imaging method for evaluating thyroid nodules is ultrasound (US) 10 .Grayscale US has recently been found to be able to predict the mutation of BRAF V600E in PTC.However, the findings are still up for debate 11,12 , which may be related to the drawbacks of the conventional US image, such as its dependence on the radiologist's experience and interobserver variation 13 .The ability of a tissue to resist deformation whenever a force is exerted on it or to restore its initial form when that force is withdrawn is called elasticity, which is what elastography often examines in tissues.Depending on the Doppler US technology, strain images can be presented in grayscale or in colors that reflect the stiffness and elasticity of the tissues [14][15][16][17] .Elastography is also based on the grayscale US which is also subjective and operator dependent.There have been several significant developments to increase

Patients.
A retrospective analysis was performed on PTC patients who had undergone preoperative thyroid US elastography, BRAF V600E
To acquire longitudinal and transverse images of the thyroid nodules, continuous longitudinal and transverse scanning was done while the patients were supine.Blood flow in and around the nodule, strain elastic grading of the nodule, calcification, and CLNM were all visible on the coexisting diagram, which also included the nodule diameter, location, echo, and boundary.www.nature.com/scientificreports/ The cross-sectional image's position and size of the sampling frame were adjusted, and the strain elastic imaging mode was activated.With an ROI that was larger than the nodules (generally more than two times), the nodules were placed in the middle of the elastic imaging zone.Pressure was applied steadily (range 1-2 mm, 1-2 times/s) while the probe was perpendicular to the nodule.When the linear strain hint graph (green spring) suggested stability, the freeze key was pressed to get an elastic image; the ROI's color changed (green indicated soft; red indicated hard), and the nodule's hardness was determined based on elasticity.The elastic image was graded according to the following criteria: one point equals a nodular area that alternates between red, green, and blue; two points equal nodules that are partially red and partially green (mostly green, area > 90%); three points equal a nodule area that is primarily green, with surrounding tissues visible in red; four points equal a nodule area that is primarily red, with the red area > 90%; and five points equal a nodule area that is completely covered in red.
Region of interests (ROIs) segmentation.One week prior to surgery, thyroid US exams were conducted.US image segmentation was done manually.Using the ITK-SNAP program (http:// www.itksn ap.org), the ROIs were manually drawn on each image (Fig. 2).The grayscale images were used to create a sketch outline of the tumor regions in the elastography US images.

Radiomic feature selection.
The retrieved features were normalized using a standard scalar to reduce bias and overfitting in the study.The dataset was divided into training and validation cohorts.To make each characteristic substantially independent, the row spatial dimension of the feature matrix was reduced using the Pearson correlation coefficient (PCC).Every pair of features with a PCC of more than 0.80 was deemed redundant.
After PCC, recursive feature elimination (RFE) for feature selection was applied to the whole dataset using the Scikit-learn python module 24 to choose representative features for the training cohort.During the RFE procedure, the following parameters were taken into consideration (cross-validation was set to stratifiedkfold with the number of splits being 10, the random state was set to 101, minimum features to select was set to 3, and accuracy was employed for the scoring.

Development of machine learning-based models.
The Support Vector Machine with the linear kernel (SVM_L), Support Vector Machine with radial basis function kernel (SVM_RBF), LogisticRegression (LR), Naïve Bayes (NB), K-nearest Neighbors (KNN), and Linear Discriminant Analysis (LDA) classifiers were used to build the prediction models using the RFE's key features.All six algorithms were implemented using the Scikitlearn machine learning library 24 The same feature sets were chosen and fed into the model during the validation process.Standard clinical statistics like the area under the curve (AUC), sensitivity, specificity, negative predictive value (NPV), positive predictive value (PPV), and accuracy (ACC) were used to evaluate the model's performance on the training and validation datasets.www.nature.com/scientificreports/A two-sided P < 0.05 indicated statistically significant differences.PyRadiomics (version 2.2.0, https:// github.com/ Radio mics/ pyrad iomics Accessed 10 August 2021) and scikit-learn version 1.2 24 were used to extract radiomic features and build the prediction models.Each prediction model's AUC, sensitivity, specificity, ACC, NPV, and PPV were calculated.
Medcalc Statistical Software was used to calculate the six models' AUCs and evaluate the predictions.The DeLong method was used to compare the AUCs of the six machine learning classifiers.To create calibration curves, the sci-kit-learn version 1.2 24 was used.R software (version 3.6.1,https:// www.r-proje ct.org) was used to perform the decision curve analysis.
Institutional review board statement.The study was conducted in accordance with the Declaration of Helsinki and approved by the Jiangsu University-Affiliated People's Hospital and traditional Chinese medicine hospital of Nanjing Lishui District Ethics Committee.
Informed consent.Patient consent was waived by the Jiangsu University-Affiliated People's Hospital and traditional Chinese medicine hospital of Nanjing Lishui District ethics committee due to the retrospective nature of the study.The relationship between the BRAF V600E mutation and ultrasonic imaging characteristics and the predictive performance of the machine learning algorithms is shown in Tables 2, 3.

Diagnostic performance of the machine learning models in the training and validation cohorts.
After PCC and RFE with stratified tenfold cross-validation, 27 radiomic features were chosen in the training cohort (Figs. 3 and 4).
The following features were chosen to develop the predictive models for BRAF V600E based on six machine learning algorithms (Table 4).
Furthermore, the DCA was used to evaluate the clinical utility of these models (Fig. 7).The calibration curve was used for evaluating the probability accuracy of the machine learning models in predicting an individual outcome event in the future (Fig. 8).

Discussion
The BRAF V600E mutation has become a distinctive and important molecular marker in the management of PTC due to its substantial relationship with aggressive clinical pathological outcome, serious molecular derangements, and selection of treatment methods in PTC.
Real-time tissue strain elastography helps assess the anatomical structure and biological traits of PTC because it represents the relative hardness of the lesion and its surrounding tissues, whereas the hardness of PTC tissues is closely related to its internal pathological structure.Bojunga et al. 25 showed that elastography US could effectively distinguish between benign and malignant thyroid nodules.Recent research has demonstrated that varied cell arrangements and cell compositions result in varying tissue elastography indices 26,27 ; the harder elastography US imaging is for PTC, the higher its malignancy.This gives the US diagnosis of BRAF V600E mutation-positive and negative PTC a new viewpoint.Using US images, it was determined that patients with and without BRAF V600E mutations had significantly different elastic moduli or levels of hardness.Elastography might more accurately describe the pathological features of tissues and directly and quantitatively indicate the absolute hardness of BRAF V600E mutation-positive and -negative PTC.This study used machine learning methods to extract imaging omics features behind PTC strain elastic US images, and then used these features to build models to predict BRAF gene status, which is more objective than the semi-quantitative method of dividing PTC strain elastic US images into 1-5 points based on naked eye observation.
In a previous study 28 , we developed three radiomic models to predict BRAF V600E mutation in PTC patients: a radiomic model based on grayscale US radiomic features, a radiomic model based on elastography radiomic features, and a third model based on combined gray US radiomic and elastography radiomic features using logistic regression classifier and the least absolute shrinkage selection operator (LASSO) feature selection algorithm.The gray scale US radiomic model did not perform well in the validation cohort (ACC: 0.67, AUC: 0.73), but the elastography model (ACC: 0.88, AUC: 0.93) and combined model (ACC: 0.91, AUC: 0.94), did perform well.Considering that elastic US images superimpose information on the hardness of the mass on the basis of Vol:.( 1234567890 Machine learning algorithms benefit from learning from input data and automatically recognizing patterns and trends in that data automatically.There have been numerous studies [29][30][31] on the use of machine learning to differentiate between benign and malignant thyroid nodules.However, there have been only a few studies using machine learning models to predict BRAF V600E mutation in PTC patients.Additionally, there hasn't been a comparison of the effectiveness of various machine learning algorithms for the prediction of BRAF V600E mutation in PTC patients based on US elastography radiomic features.In the current study, we constructed six machine-learning models to distinguish PTC patients with BRAF V600E gene mutations from PTC patients without BRAF V600E gene mutations using preoperative US elastography radiomic data.There were three noteworthy discoveries.First, based on preoperative US elastography radiomic features, the six machine learning models could differentiate PTC patients with BRAF V600E from non-BRAF V600E PTC patients.Second, when the six machine learning models were compared, SVM_RBF had the best prediction performance.Third the machine learning algorithms' diagnostic performance was based on 27 radiomic features.www.nature.com/scientificreports/ In order to determine whether US-based radiomics could perhaps assess the occurrence of BRAF V600E mutations among patients with PTC, Yoon et al. 10 established a radiomic score using a dataset of 527 patients who had undergone surgical treatment for PTC and who had all undergone BRAF V600E mutation analysis on surgical specimen.They reported that radiomics features extracted from US have limited value as a non-invasive biomarker for predicting the presence of BRAF V600E mutation status of PTC, with a c-statistic value (equivalent to AUC) of 0.63 in the validation cohort and 0.72 in the training cohort.In comparison to this study, our study reported AUCs range of (0.96-1.00) and (0.87-0.98) in the training and validation cohorts, respectively, and ACCs range of (0.89-0.98) and (0.81-0.93) in the training and validation cohorts, respectively for the six machine learning algorithms employed in this study.
The higher AUCs and ACCs in the current study could be attributed to the US elastography radiomic features used, which provided more information than the grayscale US.Also in this study, machine learning algorithms were used in conjunction with radiomics, and several machine learning classifiers were used to build the models.Table 2. Patient characteristics of the PTC with BRAF V600E and PTC without BRAF V600E groups.Clinical characteristics, such as age and gender, did not differ significantly between the two groups (P > 0.05).There were no significant differences in mean nodule size between the two groups (BRAF V600E mutant group: 24.12 ± 8.6 mm; non-BRAF V600E mutant group: 23.98 ± 11.01 mm, P = 0.928) or CLNM (P = 0.102).There was a significant difference in echogenicity (P = 0.000), vertical and horizontal diameter ratios (P = 0.001), and elasticity (P = 0.015) between PTC patients with BRAF V600E and PTC patients without BRAF V600E .To the best of our knowledge, this is the first study to develop models based on different machine-learning algorithms to predict BRAF V600E mutations in PTC patients.
Comparing the current study results to our previous study 28 , the ACCs of SVM_L (0.88), LDA (0.88), and LR (0.88) were the same as the ACC of the logistic regression classifier (0.88) utilized in our previous study.However, the KNN (0.81) and NB (0.81) ACCs values were lower than the LR (0.88) method utilized in our previous study.
The AUCs of the KNN (0.87), LDA (0.91), LR (0.92), and NB (0.80) classifiers were all lower than the logistic regression classifier (0.93) utilized in our previous study.The SVM_L AUC (0.93) score, on the other hand,  was the same as the logistic regression algorithm (0.93) utilized in our previous work.When compared to the logistic regression classifier (ACC: 0.88 and AUC: 0.93) employed in our previous investigation, the SVM_RBF had higher ACC (0.93) and AUC (0.98) values.The disparity in performance could be attributed to the various feature preprocessing techniques and feature selection strategies used in the current investigation.On US examination, Kabaker et al. 32 discovered that vertical and horizontal diameter ratios greater than one, as well as low echo, were all associated with the BRAF V600E mutation.Similarly, Hahn et al. 33 discovered that  vertical and horizontal diameter ratios greater than one were linked to BRAF V600E gene mutations.Consistent with these studies, we discovered in this current study that there was a significant difference in echogenicity, vertical and horizontal diameter ratios, and elasticity between PTC patients with BRAF V600E and PTC patients without BRAF V600E .
The calibration curve of the prediction model is an essential metric for evaluating the probability accuracy of a disease risk model in predicting an individual outcome event in the future.A high degree of calibration shows that the prediction model is accurate, whereas a low degree of calibration indicates that the model may exaggerate or underestimate the risk of illness.The blue line reflects the performance of the machine learning algorithms, while the diagonal dotted line represents an ideal prediction (Fig. 8).A closer match to the diagonal dotted line suggests a better prediction.When the calibration curves were near the diagonal line, SVM_RBF, SVM_L, and NB algorithms demonstrated good agreement between the real status of BRAF V600E gene mutation and the predicted probability.
Furthermore, the DCA was used to evaluate the clinical utility of these models (Fig. 7).Assuming that all patients do not have BRAF V600E gene mutation, the solid black line (negative line) indicates that when no patient accepts intervention or treatment, the net benefit is zero.On the contrary, the solid grey line (positive line) indicates the net benefits when all patients have BRAF V600E and receive treatments or interventions.According to the incidence of BRAF V600E among patients with PTC, the reasonable range of thresholds was set from 0.3 to 0.99.In the entire range, all machine learning-based algorithms showed higher net benefits than the two extreme lines (negative line and positive line).In almost the entire threshold probability range, the SVM_RBF algorithm had the highest net benefit in both the training and validation cohorts (Fig. 7).There are some limitations of the study, in the construction of the models to predict BRAF V600E in PTC patients, the gene in healthy individuals was not analyzed, and the focus was on evaluating the BRAF V600E mutation in PTC patients, which could have resulted in a selection bias.Also, this was a small sample retrospective study conducted at two institutions; thus, a selection bias may exist.In the future, we aim to conduct a multicenter study with a larger sample size.

Conclusion
Finally, our study found that machine learning-based US elastography radiomic models performed well in predicting the potential of BRAF V600E in PTC patients, which can assist physicians in identifying the risk of BRAF V600E in PTC patients.SVM_RBF achieved the greatest prediction performance of the six machine learning models tested.
Statistical analysis.Python (version 3.7, https:// www.python.org/ Accessed 8 July 2021) and IBM SPSS Statistics (Monk Ar, New York, New York State, USA.) for Windows version 26.0 were used for statistical analyses.Pearson's chi-square and Fisher's exact tests were used to compare the differences in categorical characteristics.The independent sample t-test was used for continuous factors with normal distribution, whereas the Mann-Whitney U test was used for continuous factors without normal distribution.

Figure 2 .
Figure 2. (A) Ultrasound conventional B-mode image of papillary thyroid carcinoma.(B) corresponding ultrasound elastography image, with the circle, labeled A indicating a lesion region and the circle labeled B indicating a reference area.(C) Corresponding image after region of interest (ROIs) segmentation step.

Figure 3 .
Figure 3. Boxplot of Selected features after RFE.Features were reduced to twenty-seven features during training of the machine learning models.

Figure 4 .
Figure 4. Recursive feature elimination (RFE) with tenfold cross-validation; number of features selected vs. cross-validation score.

Figure 5 .
Figure 5.The mixed ROC curves of the six machine learning models in the training cohort.ROC: receiver operating characteristic; KNN: K-nearest neighbor; LDA: linear discriminant analysis; LR: logistic regression; NB: Naïve Bayes;SVM_L: support vector classifier with the linear kernel; SVM_RBF: support vector classifier with the radial basis function.

Figure 6 .
Figure 6.The mixed ROC curves of the six machine learning models in the validation cohort.ROC: receiver operating characteristic.KNN: K-nearest neighbor; LDA: linear discriminant analysis; LR: logistic regression; NB: Naïve Bayes; SVM_L: support vector classifier with the linear kernel; SVM_RBF: support vector classifier with the radial basis function.

Figure 7 .
Figure 7. Decision curve for predictive models based on machine learning models in the training cohort (A) and the validation cohort (B).KNN K-nearest neighbor; LDA Linear discriminant analysis;LR Logistic regression; NB Naïve Bayes;SVM_L Support vector classifier with the linear kernel; SVM_RBF Support vector classifier with the radial basis function.

Figure 8 .
Figure 8. Calibration curves of the machine learning (A) K-nearest neighbor (B) Linear discriminant analysis (C) LR Logistic regression (D) Naïve Bayes (E) Support vector classifier with the linear kernel; (F) SVM_RBF Support vector classifier with the radial basis function.
mutation diagnosis, and surgery at Jiangsu University Affiliated People's Hospital and traditional Chinese medicine hospital of Nanjing Lishui District between January 2014 and 2021.The enrolling process is displayed in Fig. 1. 138 PTCs of 138 patients (mean age, 41.63 ± 11.36 [range, 25-65] years) were analyzed in this study.The patients were divided into BRAF V600E mutation-free group (n = 75) and BRAF V600E mutation group (n = 63).Using a stratified sample technique at a 7:3 ratio, all patients were randomly assigned to either the training group (n = 96) or the validation group (n = 42).The following criteria were required for inclusion: postoperative pathology indicated PTC; preoperative thyroid US elastography evaluation; related US images and diagnostic outcomes; maximum nodule diameter > 5 mm, and < 5 cm; and unilateral and single focal lesion.The exclusion criteria included a maximum nodule diameter of > 5 cm and indistinct US imaging of nodules caused by artifacts.The clinical details of the enrolled patients were documented, including age, sex, nodule diameter, nodule location, nodular echo, nodule boundary, nodule internal and peripheral blood flow, nodule elastic grading, calcification, CLNM, and BRAF V600E mutation results.The Jiangsu University Affiliated People's Hospital and the traditional Chinese medicine hospital of Nanjing Lishui District Ethics Committee approved this study.Because it was retrospective in nature, it did not require written informed consent.
US images, image feature extraction not only extracts the hardness characteristics of the mass, but also extracts rich features of the internal structure of the mass.So we decided to focus on elastography radiomics features in this current study to explore the performance of multiple machine learning classifiers or algorithms as well as other feature selection methods.

Table 1 .
Comparison of clinical and ultrasonic characteristics of the PTC patients in the training and validation dataset.

Table 3 .
Predictive performance comparison of machine learning algorithms in the training and validation cohorts.AUC, area under the curve; ACC, accuracy; SEN, sensitivity; SPEC, specificity; NPV, negative predictive value; PPV, positive predictive value; USR, ultrasound radiomic; PTC, papillary thyroid carcinoma; SVM-L, support vector machine with linear kernel;SVM-RBF, support vector machine with radial basis function; KNN, K-nearest neighbour; NB, naïve bayes; LDA, linear discriminant analysis;LR, logistic regression.

Table 4 .
Radiomics features selected after PCC and RFE analysis.