Thyroid nodule sizes influence the diagnostic performance of TIRADS and ultrasound patterns of 2015 ATA guidelines: a multicenter retrospective study

To evaluate the impact of thyroid nodule sizes on the diagnostic performance of thyroid imaging reporting and data system (TIRADS) and ultrasound patterns of 2015 American Thyroid Association (ATA) guidelines. Total 734 patients with 962 thyroid nodules were recruited in this retrospective study. All nodules were divided into three groups according to the maximal diameter (d < 10 mm, d = 10–20 mm and d > 20 mm). The ultrasound images were categorized based on TIRADS and ATA ultrasound patterns, respectively. A total of 931 (96.8%) and 906 (94.2%) patterns met the criteria for TIRADS and ATA ultrasound patterns. The AUC (0.849) and sensitivity (85.3%) of TIRADS were highest in d = 10–20 mm group. However, ATA had highest AUC (0.839) and specificity (89.8%) in d > 20 mm group. ATA ultrasound patterns had higher specificity (P = 0.04), while TI-RADS had higher sensitivity (P = 0.02). In nodules d > 20 mm, the specificity of ATA patterns was higher than TIRADS (P = 0.003). Our results indicated that nodule sizes may influence the diagnostic performance of TIRADS and ATA ultrasound patterns. The ATA patterns may yield higher specificity than TIRADS, especially in nodules larger than 20 mm.

fitted probability and risk of malignancy also increased. A recent meta-analysis of TIRADS showed that the sensitivity and specificity was 0.79 and 0.71, indicating that the TIRADS categories were a promising tool to evaluate thyroid benign and malignant nodules for making preoperative decision 6 .
Recently, the 2015 American Thyroid Association (ATA) guidelines constructed a new ultrasound risk stratification model from very low suspicion to high suspicion for malignancy according to sonographic features 7 . Yoon et al. have compared the diagnostic efficiency between the new ATA ultrasound patterns and the Korean TIRADS proposed in differentiating malignancy from benign lesions, indicating that ATA classifications may yield higher specificity, while TIRADS may offer a relatively higher sensitivity 8 . However, the influence of nodule sizes on the performance of these models has not been well investigated. The purpose of our study was to evaluate the diagnostic performance of original TIRADS developed by Horvath and ATA ultrasound patterns in thyroid nodules and to further clarify the impact of thyroid nodule sizes on the two models.

Results
Patient findings. A total of 962 thyroid nodules in 734 patients were included in our study with 578 women and 156 men. The average age was 46.8 ± 13.1 years old and the mean diameter of the nodules was 17.7 ± 12.8 mm. All 375 malignant lesions and 328 benign nodules were confirmed by histopathology. The remaining 259 nodules were regarded as benign lesions due to the repeated benign cytology or follow-up ultrasound after the first benign cytology (Fig. 1). The epidemiological, clinical data of studied cases between three groups of different sizes were shown in Table 1. Malignancy rates, male gender, nodularity, FT3 level were significantly different in three groups. While location, lymphadenopathy, age, FT4 level, TSH level showed no statistical difference between the groups (P > 0.05). The malignancy rates of nodules d > 20 mm, d = 10-20 mm and d < 10 mm were 22.2%, 45.7% and 48.5%, respectively.

Correlations between the TI-RADS classification and final diagnosis.
A total of 931 patterns (96.8%) were able to be categorized based on TIRADS classification. The malignancy rates of TIRADS 2, 3, 4A, 4B and 5 were 0, 14.1% (62 of 439 nodules), 50.0% (118 of 236 nodules), 80.4% (156 of 194 nodules) and 100.0% (27 of 27 nodules), respectively, with significant differences between categories (P < 0.001). The correlations between the TIRADS classification and final diagnosis according to nodule size were shown in Table 2. The ROC curves demonstrated that the best cutoff of TI-RADS was IV in all three groups. The sensitivity, specificity and AUC in d < 10 mm group were 82.5%, 57.7% and 0.753, respectively. In d = 10-20 mm group, the sensitivity, specificity and AUC increased to 85.3%, 72.6% and 0.849. The sensitivity and AUC were the highest among the three groups. In d > 20 mm group, TIRADS had lowest sensitivity (76.9%), highest specificity (80.6%) and relatively higher AUC (0.836) ( Table 3).
The remaining 31 nodules couldn't be categorized, of which 12 (38.7%) nodules were validated as PTCs by surgery. Among them, there was 11, 12, 8 cases in d < 10 mm, 10-20 mm and > 20 mm group, respectively, indicating that nodule size had no influence on this aspect. Moreover, in these nodules beyond the range of TIRADS classifications, nodules with hypoechogenicity and tall-than-wide shape had 100% malignancy risk, hypoechogenicity accompanied with irregular shape and ill-defined margin were likely to have 75.0% malignancy rate.
Correlations between ultrasound patterns of 2015 ATA guidelines and final diagnosis. A total of 906 patterns (94.2%) were able to be categorized based on ATA ultrasound patterns. According to histopathology or follow-up results, the malignancy rates of the nodules with very low, low, intermediate, and high suspicion for malignancy were 5.3%, 10.0%, 21.8% and 71.8%, respectively, with significant differences between patterns (P < .001). The correlations between the ATA ultrasound patterns and final diagnosis according to size were shown in Table 4. The ROC curves demonstrated that the best cutoff of ATA ultrasound patterns was High suspicious for malignancy in all three groups. The sensitivity, specificity and AUC in d < 10 mm group were 80.5%, 63.7% and 0.721, respectively. In d = 10-20 mm group, the specificity and AUC increased to 79.9% and 0.813 at the cost of a decreased sensitivity (75.8%). In d > 20 mm group, ATA ultrasound patterns had the highest specificity (89.8%), AUC (0.839) and the lowest sensitivity (70.8%).
In terms of the remaining 56 nodules beyond the range of ATA patterns, 16 (28.6%) nodules were validated as PTCs by surgery. There was 17, 20, 19 cases in d ≤ 10 mm, 10-20 mm and > 20 mm group, respectively, indicating that nodule size had little relations with the nodules that couldn't be classified by ATA ultrasound patterns. Furthermore, we compared ultrasound features of these benign and malignant lesions and found that hyper-/ isoechogenecity accompanied with irregular shape had much tendency to be malignant (42.9%).

Discussion
In this study, we evaluated the impact of thyroid nodule sizes on the diagnostic performance of newly published ultrasound patterns of 2015 ATA guidelines and original TIRADS classifications. We found that TIRADS performed best for differentiating nodules between 10-20 mm, while ATA ultrasound patterns had best value in lesions larger than 20 mm. The ATA ultrasound patterns may yield higher specificity, especially in nodules larger than 20 mm. TIRADS established by Horvath had been widely applied in clinical setting for the evaluation of thyroid nodules. Based on 10 US patterns, TIRADS related the rate of malignancy according to the patterns 3 . The malignant rates of TIRADS 3, 4A in the present study were 14.1%, 50.0%, pretty higher than the recommended range (< 5%, 5-10%, respectively), but equal to Cheng's results 9 . Meanwhile, the diagnostic sensitivity and NPV in our research   were 83.2%, 86.7%, much lower than Cheng's results, but comparable to those of Horvath's study 3,9 . This may be due to the difference of radiologists' experience, study population, inter-observer variability, US criteria and devices. A recent meta-analysis of TIRADS found that the sensitivity and specificity was 0.79 and 0.71, which was equal to our results 6 . However, there were 3.3% patterns of nodules didn't meet the criteria of the original TIRADS classification in our study, including some patterns of partial cyst, which accounted for 15-53.8% of all sonographically detected nodules 10 , or patterns of hypoechogenecity accompanied with taller-than-wide shape. The malignancy rate of these nodules reached 38.7%, within the recommended range of TIRADS 4B. Lesions with hypoechogenicity, irregular shape and ill-defined margin or hypoechogenicity with taller-than-wide shape have much tendency to be malignant. Thus, closer follow-up or fine-needle aspiration for these nodules should be applied. The 2015 ATA guidelines for patients with thyroid nodules established a 5-tier risk classification of ultrasound patterns by combining several individual sonographic characteristics 7 . The malignancy rates of benign, very low to high suspicion for thyroid cancer were < 1%, < 3%, 5-10%, 10-20% and > 70-90%, respectively. In our study, the malignancy risks were 71.8% for the high suspicion pattern, 21.8% for the intermediate-suspicion pattern and 10.0% for the low-suspicion pattern, which was comparable with the range in the 2015 ATA guidelines. However, the remaining 56 nodules (5.8%) were unable to be categorized based on the ATA ultrasound patterns, most of which showed patterns of hyper-/isoechogenecity with at least one suspicious feature like irregular shape, ill-defined margin, microcalcification or taller-than-wide shape. Though, many studies found that hyperechogenecity was a predictor of benign lesions 11,12 , Seo et al. considered that solid iso/hyperechoic nodule with any calcification beard a malignancy risk of 24.7% 13 . Those patterns of iso, nonencapsulated nodules with multiple peripheral microcalcifications that were beyond the range of ATA could be classified as TIRADS 4B with the malignancy risks around 10-80%. Among the 56 nodules in our study, 16 (28.6%) were proved to be PTCs pathologically, indicating that high malignancy risk could still exit in iso/hyperechoic nodules when they accompanied with some high-risk ultrasound features such as irregular shape.
Recently, Yoon et al. 8 had applied both the 2015 ATA ultrasound patterns and the Korean TIRADS established by Kawk to the 1293 thyroid nodules (d ≥ 10 mm). They found that the sensitivity was higher with TIRADS (P = 0.024), whereas specificity, PPV, and accuracy were higher with the ATA guidelines (P < .001 for all). Similar to Yoon's study, our study found that original TIRADS model had a higher sensitivity (P = 0.02), while specificity were higher with the ATA ultrasound patterns (P = 0.04). In addition, Yoon et al. also revealed that 44 (3.4%) patterns did not meet the criteria for any ATA pattern including hyper-to isoechoic solid or partially cystic nodules with microlobulated or irregular margins, microcalcifiations or mixed calcifiations, or nonparallel shape and the malignancy risk was 18.2%.
The novel finding in our study was that nodular size markedly influenced the diagnostic performance of the TIRADS and 2015 ATA US patterns in differentiating benign and malignant lesions. The value of US between large lesions and small ones was controversial. Andrej et al. 14 performed multivariate logistic regression analysis to evaluate the accuracy of US criteria for thyroid cancer in lesions d ≤ 15 mm and d > 15 mm, finding that the accuracy of US differentiation among larger nodules was lower than that among smaller ones. However, in Moon's study 11 , the specificity and PPV of ultrasound in nodules larger than 10 mm were greatly higher than those in smaller nodules with a little decreased sensitivity. In our study, TIRADS in nodules d < 10 mm had a lowest AUC and specificity among the three groups, in line with the conclusions of Cheng et al. 9 that TIRADS model of thyroid nodules was less reliable in smaller lesions. Interestingly, ATA ultrasound patterns also played a less credible role in nodules d < 10 mm. However, the difference was that the TIRADS performed best in nodules d = 10-20 mm, while the AUC and specificity of ATA patterns were highest in lesions d > 20 mm. The diagnostic value between the two models was similar in smaller size subgroups including d < 10 mm and 10-20 mm. Nevertheless, in nodules larger than 20 mm, the sensitivity of TIRADS was higher than ATA ultrasound patterns, though not significant, while the specificity of ATA patterns was significantly superior to TIRADS.
The limitations of our study should also be addressed. Firstly, all classifications were performed based on the static images of US, which might cause misinterpretation of ultrasound classification. Secondly, description of features was reported by different radiologists, which may cause inter-observer variability. Thirdly, this was a retrospective study, the selection bias such as patients who underwent thyroid surgery and gender bias (female: male = 3.71) may cause the high percentage of carcinomas (39.0%), resulting in the overestimation of PPV and underestimation of NPV, both in TIRADS and ATA ultrasound patterns 15 . However, this was a general limitation of most studies performed at endocrinology centers 16,17 . Fourthly, 259 of the 962 nodules (26.9%) were regarded as benign lesions based on cytology and follow-up US, which may cause false negative results.
In conclusion, both TIRADS and the 2015 ATA ultrasound patterns provide effective malignancy risk stratification for thyroid nodules. Nodule sizes may influence the diagnostic performance of the two models. The TIRADS showed best value in nodule between 10-20 mm, while ATA patterns had highest value in lesions larger than 20 mm. Both models are less reliable in lesions smaller than 10 mm. The ATA patterns may yield higher specificity than TIRADS, especially in nodules larger than 20 mm. Those nodules beyond the range of TIRADS categories and ATA patterns had little relation with nodule size and may still have a relatively high risk of malignancy (38.7% and 28.6%). However, due to the limitations of this study, our findings still need to be further validated in the clinical practice.

Methods
Subjects. This retrospective study was based on patient data collected from eight tertiary hospitals around Jiangsu province in China from January 6, 2014 to December 20, 2014. A total of consecutive patients underwent US-guided FNAB or thyroidectomy for thyroid nodules. Patients who met the following criteria by reviewing US patterns and clinical data were included in this study: (a) patients who underwent thyroid surgery regardless of cytologic results, (b) patients who underwent fine-needle aspiration cytology at least two times within a 1-year interval for benign thyroid lesions, (c) patients who had benign results on cytology and showed no change or decreased size at follow-up US for at least a year (7). The increase in size was defined as more than a 50% change in volume or a 20% increase in at least two nodule dimensions with a minimal increase of 2 mm in solid nodules or in the solid portion of mixed cystic-solid nodules (8). A total of 734 patients with 962 nodules (mean age, 46.75 ± 14.09 years; range, 15-84 years) were included preliminarily. There were 156 men (mean age, 50.41 ± 13.60 years; age range, 17-73 years) and 578 women (mean age, 45.76 ± 12.18 years; age range, 15-84 years). All nodules were divided into three groups according to the maximal diameter (d < 10 mm, d = 10-20 mm and d > 20 mm). Informed consent was obtained from all patients and the study was performed in accordance with the ethical guidelines of the Helsinki Declaration and approved by the First Affiliated Hospital with Nanjing Medical University ethics review committee (2012-SR-058).

US examination technique.
All US images were obtained by using a 4-13 MHz linear array transducer.
The scanning protocol in all cases included both transverse and longitudinal real-time imaging of the thyroid nodules. Participants were asked to assess the thyroid nodules according to the criteria from published literature [18][19][20] . The features used in analysis included size, composition, echogenicity of solid portion, orientation, shape, margin, and calcifications. All static US patterns and description of features were available and analyzed by a radiologist with 10 years of experience in thyroid imaging. Clinical information and pathology results were not available to the radiologist.
Ultrasound patterns of 2015 ATA guidelines. All nodules were scored based on ultrasound patterns of 2015 ATA guidelines as follows 7 : Benign: Purely cystic nodules. Very Low Suspicion: Spongiform or partially cystic nodules without any of the sonographic features described in low, intermediate or high suspicion patterns. Low Suspicion: Isoechoic or hyperechoic solid nodule, or partially cystic nodule with eccentric solid areas, without microcalcification, irregular margin or extrathyroidal extension, or taller than wide shape. Intermediate Suspicion: Hypoechoic solid nodule with smooth margins without microcalcifications, extrathyroidal extension, or taller than wide shape. High Suspicion: Solid hypoechoic nodule or solid hypoechoic component of a partially cystic nodule with one or more of the following features including irregular margins (infiltrative, microlobulated), microcalcifications, taller than wide shape, rim calcifications with small extrusive soft tissue component, evidence of extrathyroidal extension. Statistical analysis. Statistical analysis was performed using SPSS 20.0 software (SPSS Inc., Chicago, USA).
All quantitative values were expressed as means ± SD. Differences in the values of continuous variables between three groups were evaluated by the one-way ANOVA test or non-parametric test. Differences in the distribution of categorical variables between groups were evaluated by the 2-tailed Chi-square (χ 2 ) test or Fisher exact test. Compared to the final diagnosis (according to pathology or follow-up results), the sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) were calculated for each method. Receiver Operating Characteristic (ROC) curve analysis with MedCalc 11.4.2.0 software (MedCalc Software, Ostend, Belgium) was used to compare the two models and to determine the optimal cut-off value between benign and malignant nodules. Area under the curves (AUCs) and P value were calculated. P < 0.05 was considered significant in all tests.