Introduction

The echogenicity of a thyroid nodule on ultrasonography (US) is an important descriptor for distinguishing malignancy from benign nodules1,2,3,4. Previous studies have consistently reported that the malignancy risk of hypoechoic nodules was higher than that of iso- or hyperechoic nodules1,3,4,5. Marked hypoechogenicity is related to an increased risk of malignancy in thyroid nodules1,5,6,7,8 and has been adopted in several risk stratification systems (RSS)9,10,11,12.

Despite its importance, the US definition of nodule echogenicity shows discrepancies across risk stratification systems (RSS)13. For nodules with heterogeneous echogenicity, guidelines adopt different strategies; the Korean Thyroid Imaging Reporting and Data System (K-TIRADS) and American College of Radiology (ACR) TIRADS classified them based on the predominant echogenicity11,14, but the European Thyroid Imaging Reporting and Data System (EU-TIRADS) classified heterogeneous nodules as mildly hypoechoic nodules when they possessed any hypoechoic portion10. While the American Association of Clinical Endocrinologists/American College of Endocrinology/Associazione Medici Endocrinologi9, ACR11, EU-TIRADS10, and the Chinese Thyroid Imaging Reporting and Data System (C-TIRADS)12 distinguished between marked and mild hypoechogenicity, the American Thyroid Association15 and K-TIRADS14 did not for nodule risk stratification. In these RSSs including the 2016 K-TIRADS, nodules with a similar echogenicity to the anterior neck muscles (i.e., moderate hypoechogenicity) were grouped with nodules with mild hypoechogenicity9,10,11,12,14,15.

Our recent study demonstrated that nodule hypoechogenicity could be stratified as mild versus moderate to marked, and nodules with heterogeneous echogenicity are stratified by the predominant echogenicity of the solid portion7. According to the results of this study, the 2021 K-TIRADS revised the definition of marked hypoechogenicity as hypoechoic or similar echogenicity relative to the anterior neck muscles16. However, there is insufficient evidence on hypoechoic nodules’ stratification because the previous work involved a single-center7. We designed a multicenter study to determine if this revised definition of nodule hypoechogenicity could effectively stratify the malignancy risk of thyroid nodules. This study aimed to validate thyroid nodules’ malignancy risk according to their echotexture and degree of hypoechogenicity in a multicenter cohort.

Results

Demographic and clinicopathologic characteristics

The demographic data are summarized in Table 1. The mean size of the nodules was 53.4 ± 12.7 mm (range, 10–100 mm). Of the 5601 thyroid nodules, 4512 (80.6%) were diagnosed as benign and 1089 (19.4%) as malignant. The 1089 malignant nodules included 989 papillary thyroid carcinomas (90.8%), 62 follicular carcinomas (5.7%), 12 (1.1%) medullary carcinomas, 7 (0.6%) poorly differentiated carcinomas, 6 (0.6%) anaplastic carcinomas, 5 (0.5%) metastases, 4 (0.4%) unspecified malignancies, 3 (0.3%) lymphomas, and 1 (0.1%) squamous cell carcinoma. Patients with malignant nodules were significantly younger (P < 0.001), included a smaller proportions of female patients (P < 0.001), and smaller nodule size (P < 0.001) than patients with benign nodules.

Table 1 Demographic Data of 5601 Nodules in This Study.

When we determined the heterogeneously echotextured nodules’ echogenicity by the predominant echogenicity, iso- or hyperechogenicity was the most common (64.0%) in all nodules. Among 778 hypoechoic malignant nodules, 718 (92.3%) were PTCs, including 58 (7.5%) follicular variant PTCs, and 30 (3.9%) were follicular carcinomas. In 311 iso- or hyperechoic malignant nodules, 271 (87.1%) were PTCs, including 69 (21.9%) follicular variant PTCs, and 32 (10.3%) were follicular carcinomas.

Comparison of the malignancy risk between nodules with homogeneous and heterogeneous echotexture

Table 2 shows the malignancy risks of nodules classified according to their echogenicity and echotexture. Overall, the homogeneous hypoechoic nodules’ malignancy risk was significantly higher than heterogeneous hypoechoic nodules (40.5 vs. 33.5%, P = 0.022). Heterogeneous hypoechoic nodules showed significantly higher malignancy risk than heterogeneous iso- or hyperechoic nodules (33.5 vs. 15.8%, P < 0.001). Heterogeneous iso- or hyperechoic nodules showed a significantly higher malignancy risk than homogeneous iso- or hyperechoic nodules (15.8 vs. 6.7%, P < 0.001).

Table 2 Malignancy risk stratified by echogenicity and echotexture.

When we classified the nodules according to composition and the presence of suspicious features, there was no significant difference in malignancy risks between homogeneous hypoechoic and heterogeneous hypoechoic nodules in all subgroups (P ≥ 0.086). On the contrary, heterogeneous hypoechoic nodules showed significantly higher malignancy risks than heterogeneous isoechoic nodules in all subgroups (P ≤ 0.017) except partially cystic nodules. The malignancy risks were not significantly different between heterogeneous iso- or hyperechoic nodules and homogeneous isoechoic nodules in all subgroups except in the partially cystic nodules subgroup without suspicious features (P ≥ 0.05).

Risk stratification of thyroid nodules with heterogeneous echotexture

In terms of risk stratification, the malignancy risks of solid heterogeneous hypoechoic nodules with suspicious features were stratified within the high suspicion category, along with solid homogeneous hypoechoic nodules with suspicious features (a malignancy risk of 65.7% in solid heterogeneous hypoechoic nodules with suspicious features and 69.6% in solid homogeneous hypoechoic nodules with suspicious features). The malignancy risks of solid heterogeneous iso- or hyperechoic nodules ranged within the intermediate suspicion category depending on the presence of suspicious US features (10.9% in nodules without suspicious features and 36.3% in nodules with suspicious features). The malignancy risk of solid homogeneous iso- or hyperechoic nodules ranged within the low-to-intermediate suspicion categories, depending on the presence of suspicious US features (7.5% in solid homogeneous iso- or hyperechoic nodules without suspicious features and 26.7% in solid homogeneous iso- or hyperechoic nodules with suspicious features).

In partially cystic nodules, the malignancy risks of hypoechoic or iso- or hyperechoic nodules (either homogeneous or heterogeneous) with suspicious features were stratified within the intermediate suspicion category (10.7–30.7%). The malignancy risks of all partially cystic nodules without suspicious features ranged within the low-to-intermediate suspicion category (3.6–13.3%), regardless of echogenicity and echotexture.

Malignancy risk according to the degree of predominant hypoechogenicity

Table 3 lists the calculated malignancy risks of nodules categorized by their predominant degree of hypoechogenicity grouped by overall nodules and subgroups. Markedly hypoechoic nodules demonstrated a significantly higher malignancy risk than moderately (P < 0.001) and mildly hypoechoic (P < 0.001) nodules. The malignancy risks of markedly and moderately hypoechoic nodules were significantly higher than that of mildly hypoechoic nodules (P < 0.001).

Table 3 Malignancy Risk Stratified by Degree of Hypoechogenicity and Predominant Echogenicity According to Composition and suspicious Features.

When we categorized nodules according to composition and presence of suspicious features, there was no significant difference in malignancy risk between markedly and moderately hypoechoic nodules in all subgroups, regardless of composition and the presence of suspicious features (P ≥ 0.48). In solid nodules, markedly or moderately hypoechoic nodules showed a significantly higher malignancy risk than mild hypoechoic (P ≤ 0.016) and iso- or hyperechoic (P < 0.001) nodules, regardless of suspicious features.

In partially cystic nodules with suspicious features, moderately hypoechoic nodules showed significantly higher malignancy risk than mild hypoechoic (P ≤ 0.045) and iso- or hyperechoic nodules (P < 0.001). There was no significant difference in malignancy risk in partially cystic nodules without suspicious features according to the degree of hypoechogenicity (P ≥ 0.116). Moderately (P = 0.008) and mildly hypoechoic (P = 0.017) nodules showed significantly higher malignancy risk than iso- or hyperechoic nodules in partially cystic nodules without suspicious features.

Risk stratification of thyroid nodules according to the degree of hypoechogenicity

In solid nodules, the malignancy risks of nodules with moderate (73.3%) or marked hypoechogenicity (78.6%) with suspicious features were within the high suspicion category. Solid nodules with mild hypoechogenicity and suspicious features showed a slightly lower malignancy risk than the lower margin of the high suspicion category (52.0%). Solid, marked (31.3%), moderate (25.9%), and mild hypoechoic nodules (17.4%) without suspicious features were stratified as intermediate risk.

In partially cystic nodules, marked (50.0%) and moderate (45.9%) hypoechogenicity showed slightly higher malignancy risks than the estimated range of the intermediate and low suspicion categories according to the presence of suspicious features. Partially cystic nodules with mild hypoechogenicity and iso- or hyperechogenicity were classified within the low and intermediate suspicion categories according to the presence of suspicious features.

Comparisons of malignancy risks among four nodule groups, based on composition and echogenicity

Table 4 illustrates the malignancy risks in the four groups of nodules categorized according to a combination of composition, predominant echogenicity, and presence of suspicious features. The malignancy risks differed significantly between these groups in the following decreasing order: solid hypoechoic, partially cystic hypoechoic, solid iso- or hyperechoic, and partially cystic iso- or hyperechoic. The malignancy risks significantly differed in all subgroups (all, P < 0.001) except between partially cystic hypoechoic versus solid iso- or hyperechoic nodules (P ≥ 0.122), regardless of the presence of suspicious features.

Table 4 Malignancy risk of four nodule categories based on composition and predominant echogenicity.

Reproducibility of nodule echotexture and hypoechogenicity

The four categories of iso- or hyperechoic and mild, moderate, and marked nodule hypoechogenicity had a substantial agreement (k = 0.79, 95% CI 0.76, 0.82). The four categories combined with nodule echotexture and echogenicity of homogeneous hypoechoic, heterogeneous hypoechoic, heterogeneous isohyperechoic, and homogeneous isohyperechogenicity also showed a substantial agreement (k = 0.77, 0.75, 0.81).

Discussion

Our study demonstrated no significant difference in malignancy risks between homogeneous vs. heterogeneous hypoechoic nodules in all subgroups and homogenous vs. heterogeneous iso- or hyperechoic nodules in all subgroups except partially cystic nodules without suspicious features. Heterogeneous hypoechoic nodules showed significantly higher malignancy risk than heterogeneous isoechoic nodules in all subgroups except partially cystic nodules. Our study validated that classifying nodules by their predominant echogenicity is a reasonable form of risk stratification. Regarding the degree of hypoechogenicity, nodules with moderate hypoechogenicity showed similar malignancy risks compared to markedly hypoechoic nodules. In contrast, moderately hypoechoic nodules showed significantly higher malignancy risks than mild hypoechoic nodules in all subgroups except partially cystic nodules without suspicious features. Based on our results, moderately hypoechoic nodules should be grouped with marked hypoechoic nodules for risk stratification.

For nodules with heterogeneous echogenicity, the EU-TIRADS suggested that nodules with any hypoechoic component should be regarded as hypoechoic nodules and classified as intermediate risk10. However, in our study, the malignancy risks of heterogeneous isoechoic nodules were not significantly different from their homogeneous counterparts except in the partially cystic nodules without suspicious features subgroup. This result aligns with the findings of our previous study7. Although the malignancy risks of heterogeneous isoechoic nodules were higher than homogeneous isoechoic nodules and overall nodules, the malignancy risks of heterogeneous isoechoic nodules ranged within the low-to-intermediate risk categories, depending on concurrent suspicious US features. Therefore, our study’s results support the strategy provided by K-TIRADS and ACR TIRADS for assessing nodules with heterogeneous echogenicity.

In the EU-TIRADS10 and ACR-TIRADS11, moderate hypoechogenicity was classified as a similar risk to mild hypochogenicity. However, in this study, moderate hypochogenicity showed a similar malignancy risk to marked hypochogenicity. The results of this study suggest that the previous definition of marked hypoechogenicity should be revised as hypoechoic or similar echogenicity relative to the anterior neck muscles. The result of this study confirmed the validity of the revised definition of marked hypoechogenicity by 2021 K-TIRADS.

Our results are in line with those of our previous study in that the malignancy risks of moderately hypoechoic nodules are similar to that of markedly hypoechoic nodules7. However, the results in overall nodules were somewhat discrepant from the previous study7, demonstrating that the malignancy risks of marked hypoechoic nodules were higher than that of moderately hypoechoic nodules6. In this cohort, concurrent suspicious US features occurred more frequently in marked hypoechoic nodules than moderately hypoechoic nodules (marked hypoechoic, 72.4% vs. moderate hypoechoic, 52.5%, P < 0.001). The higher prevalence of suspicious features in marked hypoechoic nodules might have caused confounding effects in the malignancy risks between these two groups.

In the partially cystic nodules without suspicious features subgroup, the malignancy risks of most nodules ranged within the low-risk category, regardless of their predominant echogenicity. This contrasts partially cystic nodules with suspicious features’ risks, which fell within the intermediate-risk category in most nodules. In contrast to solid nodules, partially cystic nodules without suspicious features showed no significant difference in malignancy risk between marked/moderate versus mild hypoechoic nodules, and the difference between various degrees of hypoechogenicity was diminished. Additionally, in partially cystic nodules, most partially cystic hypoechoic nodules showed mild hypoechogenicity (68.7–85.0%) and the incidence of marked hypoechogenicity was very rare. We assume that in partially cystic nodules, malignancy risk is mainly determined by the presence of suspicious features and the degree of hypoechogenicity had little impact.

Our study has several limitations. First, the reference standards for some benign nodules were based on one biopsy result, which may cause false-negative results. However, considering that the majority of false-negative rates (FNR) occur in high suspicion nodules17, the rate of FNR might be negligible in this study (simulated false-negative rate in nodules with one benign biopsy result 1.4%). Second, we retrospectively assessed the nodules’ US features, which may limit the accuracy of interpretation. Regarding echogenicity, subtle echo changes could be misclassified in prerecorded US images. Moreover, focal suspicious features could be missed when archiving images during dynamic exploration and these points could potentially impact nodule categorization. However, a recent study reported that overall inter-exam agreement between real-time and retrospective US image interpretation for thyroid nodules was more than substantial in 2016 K-TIRADS18. Although this method might be suboptimal, we speculate that retrospective evaluation of US images could be an alternative assessment method of malignancy risk in thyroid nodules. Third, although our study demonstrated this proposed classification of echogenicity showed good reproducibility, determination of US lexicons can be still affected by interobserver variability19. In this context, artificial intelligence techniques might have a potential complementary role20,21. Future in-depth studies are needed to validate the reproducibility of this US lexicon in multiple readers. The malignancy rate of this cohort (19.4%) is relatively high considering the prevalence of malignancy among all thyroid nodules. We speculate that this is because this cohort was mainly consisted of biopsy required nodules, and the majority of institutions were tertiary referral hospitals. Regarding a recent meta-analysis, the average malignancy rate of published thyroid nodule cohorts was 27.8 ± 15.3 (range 3.9–56.2)22. Therefore, we assume that the malignancy rate in this cohort appears acceptable considering the specific aim of this study.

In conclusion, the malignancy risk of nodules with heterogeneous echotexture can be stratified based on predominant echogenicity. Additionally, nodule hypoechogenicity can be classified as mild vs. moderate to marked hypoechogenicity for malignancy risk stratification.

Materials and methods

The institutional review boards (IRB) of the 26 participating centers (CHA Gangnam Medical Center, Chung-Ang University Hospital, Konkuk University Medical Center, Gyeongsang National University Hospital, Korea University Anam Hospital, Kosin University Gospel Hospital, Daejeon St. Mary’s Hospital, Dongguk University Ilsan Hospital, Seoul National University Hospital, Seoul Metropolitan Government Seoul National University Boramae Medical Center, Soonchunhyang University Seoul Hospital, Nowon Eulji Medical Center, Busan Paik Hospital, Inje University Haeundae Paik Hospital, Hanyang University Guri Hospital, Asan Medical Center, Ajou University Hospital, GangNeung Asan Hospital, Korea University Ansan Hospital, Seoul National University Bundang hospital, National Cancer Center, Soonchunhyang University Bucheon Hospital, Gangwon University Hospital, Chonnam National University Hwasun Hospital, Gachon University Gil Medical Center and Seoul St. Mary’s Hospital) approved this study. This study was conducted in accordance with the Declaration of Helsinki. The informed consent requirement was waived for this retrospective review from the IRB of participating centers.

Study population

Patient data were retrospectively collected from 26 different hospitals in Korea (Thyroid Imaging Network of Korea registry, THINK). From June to September 2015, 22,775 patients underwent thyroid US at the 26 participating institutions. Among them, 16,679 patients were excluded due to a lack of reference standard test (biopsy or surgery) (n = 4304), thyroid nodules < 1.0 cm (maximal diameter, n = 12,130), suboptimal image quality (n = 245), or inconclusive/indeterminate biopsy results (Bethesda I; nondiagnostic or unsatisfactory or III; atypia of undetermined significance or follicular lesion of undetermined significance, n = 1015 patients with 1102 nodules)23,24. In this study, 59 isolated macrocalcifications and 48 purely cystic nodules were further excluded because of an inability to assess nodule echogenicity. We included a total of 5,601 thyroid nodules in 4989 patients in this study (4101 women, 888 men; age range: 19–76 years) (Fig. 1). Malignant nodules were diagnosed based on the histopathological results after surgery (n = 927) or malignant (Bethesda VI) fine needle aspiration (FNA) or core-needle biopsy (CNB) results (n = 162). Benign nodules were diagnosed based on the histopathological results after surgery (n = 390), with at least two benign FNA or CNB results (n = 594) and one benign FNA or CNB result (n = 3528). All cases with a preoperative diagnosis on Bethesda V with a diagnosis of malignancy were confirmed as thyroid cancer on surgical biopsy.

Figure 1
figure 1

Flowchart of the study. US = ultrasonography, numbers are nodules numbers, unless otherwise specified.

US examination and image analysis

All US examinations were performed with a 10–14 MHz linear probe. US images were retrospectively reviewed by one of 17 experienced radiologists with 8–22 years of experience performing thyroid US using an online program (AIM AiCRO; http://study.aim-aicro.com). As 26 hospitals participated in this study, we were not able to put the names of all institutions’ US units. Before the multicenter study began, we held training sessions to establish a baseline consensus regarding US criteria9,10,11,14,23,24. The 17 radiologists evaluated images of biopsy-proven masses not included in the study and were asked to assess the US criteria during a consensus meeting, including composition, echogenicity, margin, calcification, orientation (taller-than-wide), spongiform appearance, and intracystic echogenic foci with comet tail artifact. All of the reviewers, who were blind to the FNA results and final diagnoses, then assessed the US features of the thyroid nodules. Reviewers determined the nodule hypoechogenicity by assessing the echogenicity of the solid component in a nodule. In the 2021 K-TIRADS, the nodule was defined as hypoechoic if it was hypoechoic relative to the normal thyroid parenchyma. The echotexture of the nodule was categorized as having homogenous or heterogeneous echotexture based on the uniformity of the nodule echogenicity7 (Fig. 2). Heterogeneous echotexture was defined as the nodule’s solid component showing two different portions of echogenicity (iso- or hyperechoic vs. hypoechogenicity). Heterogeneously echotextured nodules’ echogenicity was determined by their predominant echogenicity. The nodules were classified into four groups: homogeneous hypoechoic, heterogeneous hypoechoic, heterogeneous iso- or hyperechoic, and homogeneous iso- or hyperechoic.

Figure 2
figure 2

Thyroid nodules classified according to the echotexture and echogenicity. (A) A nodule with homogeneous hypoechogenicity. Diagnosis: Conventional papillary thyroid carcinoma (B) A nodule with heterogeneous, predominant hypoechogenicity. Note internal iso- or hyperechoic solid portions consisting less than 50% of the nodule. Diagnosis: Conventional papillary thyroid carcinoma. (C) Nodule with heterogeneous, predominant iso- or hyperechogenicity. The hypoechoic solid portion accounts for less than 50% of the nodule. Diagnosis: Benign follicular nodule in core needle biopsy. D. Nodule with homogeneous iso- or hyperechogenicity. Diagnosis: Benign follicular nodule in core needle biopsy.

The degree of hypoechogenicity was categorized as mild (hypoechoic relative to the thyroid parenchyma, but hyperechoic relative to the anterior neck muscles), moderate (similar echogenicity to the anterior neck muscles), and marked (hypoechoic relative to the anterior neck muscles) (Fig. 3)5,7. We assessed other US characteristics regarding the composition and presence of suspicious features (punctuate echogenic foci, nonparallel orientation, and irregular margin) of the thyroid nodules based on the 2021 K-TIRADS16. In the 2021 K-TIRADS, punctate echogenic foci were defined as punctate (≤ 1 mm) hyperechoic foci within the solid component of a nodule, nonparallel orientation as the anteroposterior diameter of a nodule being longer than its transverse diameter in the transverse plane, and irregular margin as a non-smooth edge with spiculation or microlobulation. The suggested malignancy risk in the 2021 K-TIRADS is as follows16: high suspicion, > 60%; intermediate suspicion, 10–40%; low suspicion 3–0%; and benign, < 3%.

Figure 3
figure 3

Hypoechoic thyroid nodules with various degree of hypoechogenicity. (A) Markedly hypoechoic nodule (hypoechoic relative to the anterior neck muscles) Diagnosis: Conventional papillary thyroid carcinoma (B) Moderately hypoechoic nodule (similar echogenicity to the anterior neck muscle). Diagnosis: Conventional papillary thyroid carcinoma (C) Mildly hypoechoic nodule (hypoechoic relative to the normal thyroid parenchyma, but hyperechoic relative to the anterior neck muscle). Diagnosis: Follicular variant papillary thyroid carcinoma.

The interobserver agreement on nodule echogenicity and echotexture was assessed on 1400 (25%) of 5601 nodules by another radiologist (J.Y.L, with eight years of experience in thyroid imaging). This reader randomly selected and blindly assessed the nodules.

Data analysis and statistical analysis

We calculated each nodule category’s frequency and malignancy risk based on its echotexture and degree of hypoechogenicity. Chi-squared or Fisher’s exact tests were used to compare the malignancy risk among each group. We performed a subgroup analysis to assess the difference between the malignancy risks of thyroid nodules according to their composition (solid vs. partially cystic), and the presence of any suspicious US features. This reflected that the thyroid nodules’ malignancy risks differ according to their composition, echogenicity, and presence of suspicious US features4. Chi-squared or Fisher’s exact tests were used to compare the malignancy risk between homogeneous hypoechoic, heterogeneous hypoechoic, heterogeneous iso- or hyperechoic nodules, and homogeneous iso- or hyperechoic nodules. The interobserver agreement for the degree of echogenicity and echotexture was calculated using the Cohen κ statistic. All κ values were interpreted as follows: 0–0.20, slight; 0.21–0.40, fair; 0.41–0.60, moderate; 0.61–0.80, substantial; and 0.81–1.0, almost perfect agreement25. Statistical analyses were performed using IBM SPSS Statistics for Windows, Version 24.0 (IBM Corporation, Armonk, NY, USA, https://www.ibm.com/kr-ko/analytics/spss-statistics-software), and MedCalc (version 20.009, MedCalc Software Ltd, Ostend, Belgium, https://www.medcalc.org; 2022). A P value < 0.05 was considered statistically significant.

Ethics approval

This retrospective study was approved by the institutional review boards of 26 participating centers.

Consent to participate

The requirement for patient informed consent was waived.