Establishment and validation of the scoring system for preoperative prediction of central lymph node metastasis in papillary thyroid carcinoma

Early preoperative diagnosis of central lymph node metastasis (CNM) is crucial to improve survival rates among patients with papillary thyroid carcinoma (PTC). Here, we analyzed clinical data from 2862 PTC patients and developed a scoring system using multivariable logistic regression and testified by the validation group. The predictive diagnostic effectiveness of the scoring system was evaluated based on consistency, discrimination ability, and accuracy. The scoring system considered seven variables: gender, age, tumor size, microcalcification, resistance index >0.7, multiple nodular lesions, and extrathyroid extension. The area under the receiver operating characteristic curve (AUC) was 0.742, indicating a good discrimination. Using 5 points as a diagnostic threshold, the validation results for validation group had an AUC of 0.758, indicating good discrimination and consistency in the scoring system. The sensitivity of this predictive model for preoperative diagnosis of CNM was 4 times higher than a direct ultrasound diagnosis. These data indicate that the CNM prediction model would improve preoperative diagnostic sensitivity for CNM in patients with papillary thyroid carcinoma.

Establishment and evaluation of prediction model and its scoring system. The  of 11 variables were included. The single-factor analysis showed that CNM was associated with gender, age, tumor size, microcalcifications, internal blood supply, RI > 0.7, multiple nodule lesions, and extrathyroid extensions (P < 0.05, Table 1). The CNM predictive model and scoring system were established based on the logistic regression model for the diagnosis of patients with PTC. They comprised gender, age, tumor size, microcalcifications, RI >0.7, nodular lesions, and extrathyroid extensions; the β value of multiple nodular lesions was the smallest ( Table 2). The β value of other variables was divided by the corresponding minimum regression coefficient. The prediction scores ranged from 0-9. The proportion of CNM in the subjects increased with increasing scores ( Table 3). The results of Hosmer-Lemeshow goodness-of-fit test showed that the logistic regression model had a good calibration (χ 2 = 9.065, P = 0.337). The area under the curve (AUC) of ROC was 0.742 [95% confidence interval (CI), 0.720-0.764], indicating a good discrimination of logistic regression model (Fig. 1). The Youden index reached a maximum (0.359) when the score was equal or more than 5 points in the scoring system, and the subjects were divided into a low-risk CNM population group (1067 cases, 55.6%) and a high-risk CNM population group (856 cases, 44.4%) ( Table 3). The high-risk group (≥5 points) included 64.3% (550/856) of CNM cases, and its CNM rate (64.6%, 550/851) was significantly higher compared with that of the low-risk population group (<5 points) (28.7%, 307/1067). Under the predicted threshold, the CNM sensitivity, specificity, PPV, NPV, PLR, and NLR were predicated in the scoring system as 64.3%, 71.7%, 64.7%, 71.4%, 2.272, and 0.395, respectively. The detection rates of CNM in the high-risk and low-risk population groups were 64.6% (51.2-100%) and 28.7% (16.2-34.3%), respectively. The prediction scoring system was then applied to evaluate validation group, showing 43.9% in the high-risk population group (≥5 points) and 56.1% in the low-risk population group (<5 points). The proportion of CNM cases in the two risk stratification groups were 62.3% and 23.8%, respectively, similar to the modeling population. The AUC of ROC was 0.758 (95% CI, 0.727-0.789), which was similar to the discrimination of the modeling population (0.742 in AUC of ROC). The results of Hosmer-Lemeshow goodness-of-fit test showed that the logistic regression model had good consistency (χ 2 = 2.449, P = 0.931). Under the predicted threshold, the sensitivity, specificity, PPV, NPV, PLR, and NLR of CNM were predicted as 67.2%, 72.1%, 62.3%, 76.2%, 2.409, and 0.387, respectively, in the scoring system. The detection rates of CNM in the high-risk and low-risk population groups were 62.3% (45.8-100%) and 23.8% (7.3-37.9%), respectively (Fig. 1).

Discussion
This study has established a prediction model and its scoring system for preoperative diagnosis of CNM in PTC.  Table 2. Multivariate logistic regression model and scoring system for CNM prediction. CNM, central lymph node metastases; RI, resistance index.
lesions, and extrathyroid extensions. The detection rates of CNM in the modeling and validation groups of high-risk population are 64.6% and 62.3%, respectively. The predictive model has a high prediction consistency, discrimination ability, and accuracy. The same results were obtained using the scoring system, indicating that the model has an important value for CNM risk stratification in patients with PTC, and can improve the sensitivity of preoperative diagnosis of CNM.
Our results indicate that the sonographic features of the primary foci of PTC are related to CNM. Since other primary foci imaging features also correlate with CNM, they might be used in an early CNM diagnosis. Screening of high risk groups for prophylactic central cleaning, and screening of low risk population may improve the diagnostic sensitivity of central lymph node metastasis.
Ultrasonographic microcalcification occurs in about 34-66% of malignant nodules [21][22][23] , mainly associated with the sand formation of papillary carcinoma. Microcalcification was located in the nipple cadre and fibrous interstitial, and could also be located among solid tumor cell nests; it was extremely rare in other thyroid lesions. Similar structures of psammoma bodies might appear in the cervical lymph node metastases. Concentric    Several studies have indicated that increased nodular blood flow signals are associated with malignancy 24,25. However, using a multivariate logistic regression analysis, Moon et al. 26 did not identify an increased nodular blood flow as an independent predictor of thyroid cancer. In this study, only internal blood supply of tumors was used as an indicator, and a total of 1716 cases (~60%) were found to have a visible internal blood supply using Doppler energy. In the modeling group, the internal blood supply of nodules was associated with CNM, and the difference was statistically significant (P < 0.001). However, it could not be used as an independent predictor of CNM in the multivariable analysis, so it was not included in this scoring system. Malignant cells can release vascular endothelial growth factor that stimulates angiogenesis within tumors. Adamczewski et al. 27 have suggested that the internal blood supply disorders of thyroid nodules are associated with thyroid cancer. In our study, a total of 304 patients in the modeling group showed an abnormal track of blood vessels or penetration from capsule; this was found irrelevant to CNM using a single-factor analysis (P = 0.737) (Fig. 4). These data suggest that the internal tumor blood supply may be important only in PTC, but may not be associated with the lymphatic system metastasis.
Pulse Doppler of normal thyroid arteries presented unidirectional pulsatile spectrum, having a rapid increase in systolic blood pressure that gradually fell to a low-amplitude blood flow in diastolic blood pressure. The RI was typically 0.55-0.66, and increased RI values were commonly seen in PTC and a small amount of nodular goiter cases. The main reason for this was oppression of small blood vessels by hard tumor texture or swollen follicles and hypertrophic fibrous tissue. In this study, RI > 0.7 was set as the threshold, and 622 cases had RI >0.7, accounting for 21.7% of the total cases. Among them, 424 cases were from the modeling group. The multivariable logistic regression analysis showed that RI could be used as an independent risk factor for CNM (OR, 1.388; 95% CI, 1.092-1.763). However, other studies that evaluated the imaging of nodular hardness 21,28 did not confirm its relationship with CNM, suggesting that other mechanisms might be contributing to the elevated RI values.
The tumor size has been used as a criterion for assessing PTC treatment programs and surgical ranges; larger tumor diameters are associated with cervical lymph node metastases and can increase T staging 29 . Several retrospective analyses with large sample sizes showed that tumor size >1 cm had higher CNM rates [30][31][32][33][34] . An accurate measurement of the tumor size before surgery may help provide a guidance for the diagnosis. In this study, the tumor size was divided into the following four groups: ≤1 cm, 1-2 cm, 2-3 cm, and >3 cm. Single-factor and multivariate analyses showed that the tumor size correlated with CNM (P < 0.001); each group could be used as an independent influencing factor and included in the predictive scoring system. Extrathyroid extension is one of the unfavorable prognostic factors for PTC. Several studies have shown that microscopic extrathyroid extensions have a less adverse effect on the prognosis of PTC patients [35][36][37] . Moreover, microscopic invasion found in pathological slices may not provide support for timely surgical modification of individual surgical procedures. Therefore, we used intraoperative extrathyroid extension observed by naked eyes as a predictor of one of the variables in the scoring system.
Because PTC often has gland metastases at an early stage, multifocality is one of its significant features. Previous studies have suggested that multifocality could be used as an independent predictor of CNM 38,39 . However, in the specimens with thyroid resection due to benign diseases, 7.3-33.9% of the cases had small foci [40][41][42][43][44][45][46] , suggesting that using multifocality may lead to selective bias due to relaxed inclusion criteria. Improvements in US equipment and diagnostic techniques increase the detection rates of thyroid nodules. Liu 47 et al. have suggested that the CLNM was related to multiple regions occupied by tumors in the thyroid but unrelated to multifocality.
Presently, in major hospitals of China, all thyroid nodules of >0.1 cm can be successfully found 8 . This study was strictly based on preoperative US to determine whether multinodular lesions and all nodules (benign and malignant) were included in the analysis, to reduce interventions by human factors as much as possible. The multivariable analysis showed that it could be used as a predictor of CNM and incorporated into the scoring system.
Conventional use of US for CNM diagnosis has a high specificity, but because of low sensitivity and false negative rates, it cannot be used for diagnosis. In this study, the AUC of ROC in the modeling group was 0.742 (95% CI, 0.720-0.764), with a diagnostic sensitivity and specificity of 64.3% and 71.7%, respectively. Validation population results showed that the AUC of ROC was 0.758 (95% CI, 0.727-0.789), with a diagnostic sensitivity and specificity of 67.2% and 72.1%, respectively. In this study, 265 out of 2862 cases were diagnosed with CNM or suspicious metastasis by preoperative US direct diagnosis. The specificity was 96.5%, and the sensitivity was only 16.8%. The sensitivity of this predictive model to preoperative diagnosis of CNM was 3.8 times (64.3% vs. 16.8%) and 4 times (67.2% vs. 16.8%) of direct ultrasound diagnosis.
The CNM detection rate of the scoring system established by preoperative factors was 64.6% and 62.3%, respectively, in the modeling and validation groups of high-risk population. Although the AUC of ROC in the scoring system was 0.742 and 0.758, it was not possible to provide a higher diagnostic specificity. However, in cancer, the sensitivity in metastatic diagnosis is more important than specificity, and a timely diagnosis of patients with CNM is important for the improvement of PTC prognosis.
The predictive indicators included in this study were clinical and US results, and they were easily accessible in the preoperative stage. The US diagnostic indicators were quantified, greatly reducing the interference due to the lack of experience of the US-analyzing physicians and other clinicians. The variables required for the scoring system are the patient's basic clinical characteristics and routine examination features (US); no invasive operations are needed. The increased sensitivity of preoperative diagnosis can help patients better understand their risk of CNM. Further research in this direction should improve the predictive value of the scoring system.
Although this study was a single-center retrospective study with a large sample size, the diagnosis and treatment processes were standardized and unified. However, some information still could not be accurately collected due to the non-standardized US reports, and human factors could not be completely eliminated. Prospective and multicenter studies are needed to confirm the accuracy of this diagnostic method and improve the scoring system. We also hope that preoperative gene detection and thyroglobulin tests will be supplemented to the scoring system in future research and other scholars' studies would further improve sensitivity and specificity. Despite the aforementioned shortcomings, this scoring system, as a predictive model for the effective risk stratification of PTC patients using preoperative data, is simple and reliable in a clinical setting, and can be used to improve the diagnostic sensitivity as well as help clinicians in the diagnosis and treatment of PTC. in the Thyroid Center, the First Affiliated Hospital of Kunming Medical University, were retrospectively analyzed. After reviewing the tumor US reports and images twice, the relationship between clinical and US characteristics and CNM was analyzed to establish a predictive model and scoring system, which were then validated using validation group. The predictive diagnostic effectiveness of the scoring system was evaluated using consistency, discrimination ability, and accuracy. This study was approved by the ethics committee of the First Affiliated Hospital of Kunming Medical University (2016 Ethical Review L No. 40), and no informed consent was required.
Research subjects. The patients with PTC enrolled in the Thyroid Center of the First Affiliated Hospital of Kunming Medical University from January 2007 to June 2016 were included in the study. Their PTC conditions were confirmed by postoperative pathology. The inclusion criteria were as follows: initial thyroidectomy, postoperative paraffin pathological diagnosis of PTC, and surgical range including total central lymph node dissection (CND). The US reports were issued by more than two US physicians, and the saved images showed clear signs of major lesions. The exclusion criteria included missing information of medical record, US or thyroid function test, history of head/neck surgery or radiology, history of other malignancy, and presence of other types of thyroid cancer. Moreover, ultrasonic characteristics with the following three situations were not included in the study: (1) interference with other diseases, such as Hashimoto's disease, could not be excluded because images of US tumor echo in some tumors and surrounding muscle tissues could not be saved at the same time; (2) the composition of tumor and the peripheral blood supply in some US reports were not standardized; and (3) the length/width ratio could not be confirmed by three dimensions in the review of US images. Surgeries were performed by specialists who perform more than 100 surgeries annually 48 . The CND range was according to the cervical lymph node division revised by the American Society of Head and Neck Surgery (2002) 49 and American Multidisciplinary Consensus (2009) 10 . In our department, the scope of operation includes prophylactic/cure central lymph node (level VI) dissection in preoperative suspicious/diagnosis and confirmation by intraoperative frozen of all PTC patients. The upper, lower, and external bounds of CND were lower edge of hyoid bone, sternal fossa, and medial carotid artery sheath, respectively. The posterior bound was anterior fascia, including all lymph nodes and adipose tissues, paratrachea, pretrachea, and prelarynx.
Data collection Methods. Age, gender, and other basic information of patients were collected through medical records. US characteristics were collected after reviewing the US reports twice along with the US images. Lesion characteristics of the most important lesions were recorded for multifoci cancer. The size of the maximum cancer nodule was selected in the multifocal carcinomas, and the longest diameter was included in the irregular shape of the carcinoma(s). Two or more nodules found by US were regarded as multiple nodular lesions, including all benign and malignant nodules, except glial nodules. Microcalcification was defined as less than 0.1 cm calcification without acoustic shadow or comet tail sign. Moreover, nodules having coarse calcification and microcalcification simultaneously were also classified as microcalcification 21,24 . According to the distribution trend of microcalcification lesions in tumors, scattered distribution and aggregated distribution were divided. Irregularity or lobulated growth on the edge of tumors was defined as irregular shape 6 . Internal blood supply was divided into rich and not rich based on the number of blood vessels in tumors using Doppler ultrasonography. Internal blood vessels with tortuous and disordered track and penetrating branches were defined as abnormal track. The blood flow resistance index (RI) of tumor was recorded as the highest measured value of all internal blood vessels 50 . The diagnosis of Hashimoto's disease was confirmed by US as well as by using serum thyroid peroxidase antibodies. All information of US and thyroid function was collected before the last test results (within 1 month before surgery). Extrathyroid extensions included tumors that broken through the capsule, invaded the band muscle, subcutaneous tissues, nerves, or other organs. Statistical analysis. Statistical analysis was performed using the SPSS 22.0 statistical software package (IBM, NY, USA). Two-thirds of the cases were randomly assigned into the modeling group and one-third into the validation group. Mean ± standard deviation was used for quantitative data with normal distribution. Skewed distribution used quartile and qualitative data that were expressed as rate and composition ratio. χ 2 test was used for the hypothetical testing data in the modeling group, and variables with P < 0.1 and without significant correlation were selected for a multivariable analysis 51 . After applying logistic regression analysis model, a backward stepwise regression was performed. Subsequent variables with P < 0.05 were selected, and variables with P > 0.1 were excluded according to the results of partial likelihood ratio test. Finally, regression equation and regression coefficients of independent variables were obtained. The selected variables in the logistic regression model were divided by minimum regression coefficient to get a score for each variable in an integer, thereby establishing the predictive scoring system of CNM in PTC. The consistency of risk assessment was assessed using Hosmer-Lemeshow goodness-of-fit test, in which P > 0.05 was considered as a good model consistency 50 . The receiver operating characteristic (ROC) curve was used to evaluate discrimination ability of the prediction model and its scoring system. An appropriate cutoff value was selected to calculate the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (PLR), and negative likelihood ratio (NLR) of the model. A logistic regression analysis in the validation population group was performed using the Enter method to evaluate the diagnostic efficiency of the predictive model (as mentioned earlier) and verify its clinical diagnostic value.