Introduction

Thyroid nodule occurs in about 20% to 76% of the adult population with wide use of imaging modalities and the incidence increases with age1, 2. Thyroid cancer is becoming increasingly prevalent in Eastern countries that the incidence of thyroid cancer has been rising 200% to 300% within the past 30 years3. Due to excellent spatial and temporal resolution, ultrasound (US) has become the first detection tool for the imaging examination of TNs, especially for the asymptomatic and nonpalpable TNs4, 5. The main clinical challenge in the treatment of these patients is to rule out malignancy. With the development of US techniques, including elastography6, 7 and contrast-enhanced US8, 9, diagnostic accuracy for thyroid nodule is increasing, however, conventional US is still the basic imaging modality since it is widely available and no special function is needed. For nodules with suspicious features on US, US-guided fine-needle aspiration cytology (FNAC) is always recommended to rule out malignancy, which is regarded as the most cost-effective modality for diagnosis of thyroid malignancy. In recent years, many versions1, 2, 10,11,12,13,14,15,16,17 of Thyroid Imaging Reporting and Data Systems (TI-RADSs) have applied US features to categorize TNs or recommend FNAC. By establishing a standardized language and coding system for radiologists and clinicians, TI-RADS not only stratifies the malignancy risk of the TNs, but also facilitates their clinical management and follow-up10,11,12,13.

Horvath et al.10 and Park et al.11 initially established TI-RADSs in 2009 with an intention to categorize different malignancy risks for TNs, which followed the concept of Breast Imaging Reporting and Data System (BI-RADS)18. The latter has been widely used as a standard method to describe mammographic and US features of breast lesions to correlate with breast malignancies. In 2011, Kwak et al.12 developed a risk stratification method for thyroid malignancy according to the number of suspicious US features including solid composition, hypoechogenicity, marked hypoechogenicity, microlobulated or irregular margins, microcalcifications, and taller than-wide shape. In the same year, Russ et al.13 established their TI-RADS classification and proposed an equation for predicting the probability of malignancy in TNs with and without elastography19. Nonetheless, the limitation of these studies10,11,12,13 is inherent due to using FNAC as the gold standard. FNAC diagnosis includes a percentage of undetermined lesions (the Bethesda category III, IV and V classifications) whose final results (benign or malignant) are questionable since surgery is not performed on all of them20,21,22. For the reason of sampling errors, cytological examination can not replace the pathological diagnosis. Due to its uncertainty, a validation study against a surgical reference standard to confirm the utility of previous four TI-RADS categories is mandatory in clinical practice. Therefore, we performed this retrospective study with surgical series of 1011 TNs with an aim to compare the efficiencies of the four TI-RADS classifcations in malignancy risk stratification of TNs, which would provide evidences to select an appropriate system under a special circumstance.

Materials and Methods

This retrospective study was approved by our institutional review board and the requirement for informed consent from the patients was waived. The study was performed in accordance with relevant regulations.

Patients

From September 2015 to December 2016, a consecutive of 1140 patients with TNs underwent thyroid US examinations and surgeries in this referral hospital. The exclusion criteria were as follows: (a) patients with incomplete US information (103 nodules); (b) nodules with undetermined pathological results (26 nodules). For analysis in patients with multiple nodules, we selected the nodules most suspicious for malignancy at US. When no nodules were suspicious for malignancy, the largest one would be evaluated. Finally, the study group consisted of 1011 pathologically proven nodules in 1011 patients (768 women and 243 men; mean age, 51.0  years ± 13.7; age range, 13–84 years). The diameter of the nodules ranged from 4.0 to 92.0 mm (mean, 18.4 mm ± 13.3).

Conventional US

Conventional US was performed with Siemens S2000 (Siemens Medical Solutions, Mountain View, CA, USA; 5–14 MHz linear transducer), IU22 (Philips Medical Systems, Bothell, WA, USA; 5–12 MHz linear transducer) or Logiq E9 (GE Medical Systems, Milwaukee, WI, USA; 6–15 MHz linear transducer) instruments by three radiologists who were board-certified with more than 3 years of experience in thyroid US. All the US examinations were complied with the same protocol for thyroid scanning. The patient lied in the supine position, with their neck on a high pad. Conventional US images of the thyroid nodule were acquired by carefully scanning the thyroid and adjacent tissues both transversely and longitudinally. The US machine settings such as gain, focus, depth, time gain compensation, dynamic range, wall filter, color gain, were constantly adjusted until good quality US images were obtained. Conventional transverse, longitudinal and color Doppler US images were stored for each target nodule and then the images were recorded in the internal hard-disk for further off-line analysis. The nodule’s size was defined by the maximal diameter at US. The patients’ images with lymphadenopathy would also be stored.

Image Interpretation

One of two radiologists who did not involved in image capture reviewed the US images and analyzed TI-RADS categories independently with 6 and 13 years of experience respectively in thyroid US. Patients’ medical information including previous imaging results and histopathological results were blinded to the two reviewers. They were firstly asked to read carefully the four TI-RADSs until they understood the TI-RADSs and then assessed the US characteristics defined by the authors. Then the two radiologists discussed a baseline consensus in lexicon for TI-RADS and US characteristics including location, composition, echogenicity, echostructure, margin, calcifcations, shape, vascularization, halo sign, capsule and cervical lymph node (Fig. 1). Location was categorized as right, left and isthmus. Composition was classified as solid (complete solid), predominantly solid (cystic portion ≤50%), predominantly cystic (cystic portion >50%)11, 12 and spongiform (aggregation of multiple microcystic components in more than 50% of the nodule) according to the ratio of the cystic portion to the solid portion in the nodule10, 13. Echogenicity was classified as hyper-, iso-, hypoechogenicity (compared with the normal thyroid gland) or marked hypoechoic (lower echogenicity than the adjacent strap muscle)11,12,13. Echostructure was categorized according to that the nodule echo was even or not. Heterogenous echoexture was defined as mixed echogenecity due to the aggregation of multiple microcystic components intervening the solid component11. Margin was classified as well circumscribed, microlobulated (presence of many small lobules on the surface of the nodule) or irregular margin and infiltrative (poorly defined margin with adjacent glanular structure)11. Calcifications were categorized as microcalcifications (≤1 mm in diameter, visualized with or without acoustic shadows), macrocalcifications (>1 mm in diameter, or rim calcification)12, mixed calcification (presence of microcalcifications and macrocalcifications at the same time)23, hyperechoic spot (present tiny bright reflectors with a clear-cut comet-tail artifact at conventional US)10, 12, 13, and no calcification. Kwak et al.12 regarded it as having microcalcification that a nodule had both types of calcifications, Park et al.11 defined microcalcifications as calcifications that were equal to or less than 0.5 mm in diameter. Shape was categorized as taller than wide (greater in its anteroposterior dimension than in its transverse dimension) or wider than tall10,11,12,13. Vascularization which was classified as avascular, hypovascularized (poorly blood flow signal), hypervascularized (highly vascularized on color Doppler) or penetrating vessels (vessels are not visualized in its interior, only afferent vessels that penetrate the lesion)10. Halo sign which was defined as a hypoechoic rim around a nodule included absent halo sign, partly halo and complete fine sign11. Capsule was defined as circinate hyperechogenicity around a nodule10. Cervical lymph node was classified as normal and lymphadenopathy including lymph nodes with minimal diameter > 6.0 mm or nodes with a absent hyperechoic hilum10, 11.

Figure 1
figure 1

(a) Nodular goiter. Predominantly cystic nodule. TI-RADS H: 3; TI-RADS P: 1; TI-RADS K: 2; TI-RADS R: 3. (b) Follicular adenona. Solid and isoechoic nodule. TI-RADS H: 4a; TI-RADS P: 2; TI-RADS K: 4a; TI-RADS R: 3. (c) Papillary thyroid carcinoma. Solid and iso-hypoechoic nodule with microcalcification and hypoechoic halo, TI-RADS H: 4c; TI-RADS P: 4; TI-RADS K: 4b; TI-RADS R: 4b. (d) Papillary thyroid carcinoma. Solid and hypoechoic nodule with taller than wide shape, microlobulated margin, and microcalcification. TI-RADS H: 4c; TI-RADS P: 4; TI-RADS K: 5; TI-RADS R: 5. (e) Papillary thyroid carcinoma. Solid and marked hypoechoic nodule with microlobulated margin. TI-RADS H: 4b; TI-RADS P: 4; TI-RADS K: 4c; TI-RADS R: 4b. (f) Papillary thyroid carcinoma. Solid and hypoechoic nodule with disperse microcalcifications. TI-RADS H: 4c; TI-RADS P: 4; TI-RADS K: 4c; TI-RADS R: 4b. (g) Papillary thyroid carcinoma. Solid and hypoechoic nodule with microlobulated and mixed calcification. TI-RADS H: 4c; TI-RADS P: 5; TI-RADS K: 4c; TI-RADS R: 5. (h,i) Follicular thyroid carcinoma. Predominantly solid nodule with hypoechoic halo and hypervascular. TI-RADS H: 4c; TI-RADS P: 2; TI-RADS K: 3; TI-RADS R: 4a.

The TI-RADS categories were previously reported by Horvath E et al.10, Park et al.11, Kwak et al.12, Russ et al.13. We have summarized the classification of the different TI-RADS categories in Table 1.

Table 1 Four TI-RADS categories.

Statistical analysis

Statistical analyses were performed with SPSS software for Windows (version 20.0; Chicago, IL, USA) and MedCalc software (version 15.2, Mariakerke, Belgium). Independent two-sample t test was used to compare the continuous data including patient age and nodule size. Chi-square test was used to compare the categorical data including US features and patient sex. With adjustment for all variables, multivariate logistic regression analysis was performed to determine independent predictors for malignancy from the US characteristics that showed statistical significance. Odds ratios (ORs) with relative 95% confidence intervals (CIs) were also calculated to determine the relevance of all potential predictors for malignancy. The cut-off value for each TI-RADS category, was obtained from receiver operating characteristic (ROC) analysis when Youden index was maximum, as well as sensitivity and specificity. Positive predictive value (PPV), negative predictive value (NPV) and accuracy were all calculated by the diagnostic test 2 × 2 contingency tables. ROC curve analysis was performed to assess the diagnostic performance. The sensitivity and specificity were compared by Mcnemar test. Z test was applied to compare the area under the ROC curves (Azs). Statistical significance was determined at a P value less than 0.05.

Inter- and intra-observer agreement were assessed using the guideline of Landis and Koch for interpreting kappa values: slight agreement (0.00–0.20), fair agreement (0.21–0.40), moderate agreement (0.41–0.60), substantial agreement (0.61–0.80), and almost perfect agreement (0.80–1.00)24.

Result

Of the 1011 TNs included in this study, 547 (54.1%) were diagnosed as benign and the remaining 464 (45.9%) were diagnosed as malignant. Mean age of the patients with nodules diagnosed as malignant was significantly younger than that of patients with nodules diagnosed as benign (46.5 years ± 14.1 [age range, 13–84 years] vs 54.3 years ± 12.3 [age range, 18–83 years], respectively; P < 0.001). Mean size of the TNs diagnosed as malignant was significantly smaller than that of nodules diagnosed as benign (11.7 mm ± 8.2 vs 24.0 mm ± 14.2, respectively; P < 0.001). Patient sex showed no significant difference between benign and malignant nodules, and the female-to-male ratioes were 3.18 (416/131) and 3.14 (352/112) respectively (P = 0.501). Location of the TNs was significantly different between benign and malignant masses, and isthmus is association with malignancy (P = 0.035) (Table 2). The 1011 TNs in 1011 patients were all diagnosed with histopathological examination after surgery, including conventional papillary thyroid carcinoma in 455 nodules, follicular thyroid carcinoma in seven nodules, medullary carcinoma in one nodule, and Hürthle cell carcinoma in one nodule, nodular goiter in 413 nodules, Hashimoto’s nodule in 51 nodules, follicular adenoma in 35 nodules, esinophilic cell adenoma in five nodules, adenomatous goiter in 43 nodules.

Table 2 Basic demographic characteristics and conventional US features in predicting thyroid malignancy.

At univariate analysis, the following US features showed significant association with malignancy: solid composition, hypoechogenicity, marked hypoechogenicity, homogeneous echotexture, microlobulated or irregular margin, microcalcification, mixed calcifications and taller than-wide shape (all P < 0.05, Table 2). At multivariate analysis, among the suspicious US features, marked hypoechogenicity was the most significant predictor (OR: 15.344, 95% CI: 5.313–44.313), followed by mixed calcifications (OR: 13.753, 95% CI: 4.916–38.473), solid Composition (OR: 11.085, 95% CI: 1.393–88.218), hypoechogenicity (OR: 6.736, 95% CI: 3.416–13.282), microlobulated or irregular margin (OR: 4.951, 95% CI: 3.216–7.621), microcalcification (OR: 4.761, 95% CI: 2.772–8.178), taller than-wide shape (OR:2.630 95% CI: 1.489–4.647) (P < 0.05 for all, Table 3).

Table 3 Association between thyriod malignancy and various US features.

The malignancy rates of four TI-RADSs were all with signifcant differences among categories (P < 0.001 for all). The TI-RADS categories whose malignancy rates are all at the range of the recommendtion except the categories of TI-RADS P 2, TI-RADS K 3, TI-RAD R 3 and TI-RADS R 4a. (Table 4). The correlation coeffcient of four TI-RADSs between category and malignancy rate was 0.712, 0.731, 0.775, 0.733 respectively.

Table 4 Comparison of malignancy rates with four TI-RADSs.

The categories were dichotomized into findings as positive and negative for FNA with the cut-off values and the diagnostic performances of four TI-RADSs were listed in Table 5. Higher sensitivity and negative predictive value were seen for TI-RADS H, TI-RADS K, TI-RADS R in comparison with TI-RADS P (P < 0.05 for all), whereas there were no significant statistical differences comparing with each orther (P > 0.05 for all). The specificity, accuracy and Az for TI-RADS P were the highest compared with the other systems (P < 0.05 for all). Higher specificity, accuracy and Az were seen for TI-RADS K compared with TI-RADS R (P = 0.003). The specificity, accuracy and Az of TI-RADS H and TI-RADS R were lower and no significant statistical difference was seen between them (P = 0.101). (Tables 5, 6, Fig. 2).

Table 5 Diagnostic performances of four TI-RADSs.
Table 6 Pairwise comparisons of four TI-RADSs.
Figure 2
figure 2

ROC curves of four TI-RADSs. Higher sensitivity was seen for TI-RADS H, TI-RADS K, TI-RADS R in comparison with TI-RADS P. Specifcity for the TI-RADS P was the highest compared with the other versions.

Another 30 thyroid nodules were used for assessment of inter-observer agreement, and weighted kappa values of four TI-RADSs were 0.663 (95% CI: 0.446–0.830), 0.693(95% CI: 0.496–0.861), 0.748(95% CI: 0.565–0.914), 0.705 (95% CI: 0.492–0.873) respectively. Intra-observer agreement was assessed for one of two reviewers, and weighted kappa values of four TI-RADSs were 0.781 (95% CI: 0.581–0.951), 0.829(95% CI: 0.654–0.957), 0.874(95% CI: 0.727–1.000), 0.831 (95% CI: 0.651–0.958) respectively.

Discussion

The TI-RADS H10 was a prospective study equation with 10 variables, defining categories 1, 2, 3, 4a, 4b, 5 and 6. Recently, they prospectively evaluated the diagnostic accuracy of their TI-RADS and modified category 4 to 4a, 4b, 4c5. They intergrated other factors including imaging findings, a nodule’s changes over time, previous FNAC results, different diffuse pathologies (e.g. Graves’ disease, Hashimoto’s thyroiditis, De Quervain thyroiditis) and varying clinical situations. These might be useful in management of different classifications of thyriod nodules. Calification (macrocalcification or microcalcification) and hypervascularity were significantly associated with malignancy in their study. In the present study, however, macrocalcification and hypervascular were not identified to be risk factors. The malignancy rate of each category is all at the range of the recommendtion.

Park et al. proposed their TI-RADS11 in a retrospective study with 12 aspects of TNs, adding size and lymph node abnormality and resulting in 5 categories: T-US 1–5 with an increasing the risk of malignancy. In the current study, size was also significantly different between benign and malignant nodules. Lymph node abnormality was a risk factor at univariate analysis whereas not at multivariate analysis. The result was probably attributed to interferences of other variables including microcalcification, microlobulated or irregular margin, or marked hypoechogenicity, which were all the malignancy risk factors. The malignancy risk was 6.3% among category 2 nodules which was lower than recommendtion (8.0 ~ 23.0%). US features mentioned in category 2 were all not risk factors in the present study, which was possibly the cause.

Kwak et al.12 created a predictive model based on US characteristics in a retrospective study that included 1658 nodules, considering that the risk of malignancy increased with the number of suspicious malignant US features including solid structure, marked hypoechogenicity, hypoechogenicity, microcalcification, microlobulated or irregular margin, and taller than wider shape. Our study was in concidence with them that solid composition was the predictor for carcinoma. During the process of reviewing images, we regarded the nodule as positive if there was a suspicious US features in it. It is practical and convenient for the management of TNs in clinical practice. The malignancy rate of each category were all at the range of the recommendtion.

Russ et al. published their TI-RADS system13 based on 24 US characteristics. Their study was based on a retrospective analysis of 500 FNAC nodules from one observer at a single institution. In 2013, they prospectively evaluated the diagnostic accuracy of their categories on 4550 nodules with and without elastography19. Other authors had adopted it and had developed their own classification systems25, 26. The malignancy risk was 2.6% (3/182) among category 3 nodules which was beyond the recommended malignancy rate (<2.0%). Surgical cases might be responsible for this result. The malignancy risk was 16.4% (42/256) among category 4a nodules in our study which was beyond the recommended malignancy rate (2.0~10.0%). This can translate to that hypoechogenicity, which is a US feature of 4a category, is malignancy risk factor at both univariate analysis and multivariate analysis. That the nodules in our study were surgical series might be one of the reasons.

The present study suggests that solid composition, hypoechogenicity, marked hypoechogenicity, homogeneous echotexure, microlobulated or irregular margin, microcalcification, mixed calcification and taller than-wide shape were independent US features in prediction of thyroid malignancy, consistently matching other published literatures12, 14, 16, 27,28,29. The current study had higher sensitivity and accuracy than those in previous studies10,11,12,13. The underlying reason is that our findings are specific to surgical patient cohorts with histopathology results, while the previous study focused on the TNs under the FNAC. TI-RADS P had higher diagnosis performance compared to the other three systems and had the higher specificity which is especially important in the management of TNs. Higher specificity can lower the rate of false-positive findings and eventually aviod overtreatment and reduce the number of unnecessary FNAC25. However, TI-RADS P had lower sensitivity relatively. As a tool used to select high-risk nodules for FNAC, higher sensitivity is very important in clinical practice. The malignancy nodules which were diagnosed benign category by Park et al. had the US features including hypoechogenicity with halo sign, macrocalcification or predominantly hyperechogenicity. Among these features, absent or present halo sign has no significant difference at multivariate analysis, hypoechogenicity is a important US feature in prediction of thyroid malignancy. These may be the reasons of its lower sensitivity. Although TI-RADS P stratified nodules into categories, it was not easy to assign every thyroid nodule into the equation proposed during reviewing the US images (e.g. predominantly solid nodule with halo sign). TI-RADS H, TI-RADS K and TI-RADS R achieved higher sensitivity to identify those nodules with high malignancy risk. TI-RADS K and TI-RADS R recommended FNAC for thyriod nodules with one or more suspicious US feature, which may have contributed to the higher sensitivity. Although Horvath E et al. intergrated many factors, this stereotypic US application was difficult for radiologists to use. Therefore, it was not easy to apply it to clinical practice12. The specificity of TI-RADS R was lower than that of TI-RADS K (P = 0.003). The specificity, accuracy and Az of TI-RADS H and TI-RADS R were lower and no significant statistical differences were found. Macrocalcification and iso-echogenicity are in malignant classification of TI-RADS H and TI-RADS R, respectively that may bring about their lower specificity. Comparing with the other three scoring systems, TI-RADS K was a simplicity and convenience predictive model based on five US characteristics, however, other three approaches had 10, 12, 24 aspects of TNs respectively10,11,12,13. As long as there is only one suspicious US feature in nodule, the nodule is positive with TI-RADS K. The TI-RADS categories whose malignancy rates are all at the range of the recommendtion except the categories of TI-RADS P 2, TI-RADS K 3, TI-RADS R 3 and TI-RADS R 4a. The results indicates that the TI-RADSs are appliable to both the general population with thyriod nodules and surgical series. The malignancy risks of TI-RADS K 3, TI-RADS R 3 and TI-RADS R 4a in surgical series are higher than in general population. The malignancy risk of TI-RADS P 2 in surgical series is lower than in general population. Inter-observer agreements were all substantial with four TI-RADSs. Perfect agreements of intra-observer agreements were obtained for TI-RADS P, TI-RADS K and TI-RADS R, whereas substantial agreement for TI-RADS H.

To our knowledge, this was the first study correlating US findings with ultimate histopathology in the surgical specimen to compare different TI-RADSs. Consequently, the study’s results of the diagnostic capacity of the classifications are not biased by the inherent inaccuracy of FNAC cytohistology results. FNAC diagnosis includes a percentage of undetermined lesions during general populations whose final results (benign or malignant) were unknown since surgery was not performed on all of them. Furthermore, in the surgical series, we collected information of the other nonsuspicious nodules present in surgical series, correlating pathology findings with nodules classified as benign patterns, that otherwise would confirm their absolute non-malignant aetiology.

Recently, with TI-RADS classifications being created, the TI-RADS system is continuously improved and modified according to new evidence, might including contrast-enhanced ultrasound30, 31, elastosonography findings31, 32, PET (positron emission tomography) findings, or other imaging techniques in the future. The TI-RADS system allows the clinicians to easily understand the malignancy risk of a thyroid nodule from the US report and make more correct treatment decisions such as follow-up, FNAC or operation.

Our research has several limitations. Firstly, the study was a surgical series that overrepresentation of cancers (45.9%) was present, compared to the FNAC-based series (i.e. 4.0–5.0%)1, which may lead to selection bias. However, at present, only histopathology is the gold standard for diagnosis of TNs33. Secondly, as a result of the retrospective research, various US machines and operators possibly limited the image interpretation by radiologists. However, all the US machines in this study were high-end instruments and were reviewed by experienced radiologists. In addition, the US images were scanned and stored under the same protocol, which reduced the influence to a minimal extent, still, a prospective study design is needed. Finally, it is a single center experience in a tertiary referral hospital and multi-center studies with large case series are mandatory. Further prospective studies are anticipated to verify our results.

Conclusion

In conclusion, all the four TI-RADSs provide effective malignancy risk stratification for TNs. With its higher sensitivity, TI-RADS K, a simple predictive model based on five US characteristics, is practical and convenient for the management of TNs in clinical practice. The study also indicates that the TI-RADSs are appliable to surgical series, in addition to the general population.