Introduction

Thyroid nodules are very common diseases1. The clinical importance of thyroid nodules lies primarily with the possibility of thyroid cancer, which occurs in approximately 5% of all thyroid nodules2,3. Among the imaging modalities, high-resolution ultrasonography (US) is the most sensitive diagnostic modality for the detection of thyroid nodules4. This modality has provided the possibility of distinguishing thyroid tumors and predicting their prognosis based on various ultrasound characteristics of the thyroid nodule5,6,7,8,9. However, there has been no clear consensus on the standardized terminology for thyroid US, and most of the characteristics are qualitative and subjective, making it difficult to be universally defined or applied clinically10.

Among ultrasound features, several studies have mentioned hypoechogenicity as an important finding suggestive of malignancy11,12. However, current studies have revealed that approximately 30–55% of benign nodules are also hypoechoic and most hypoechoic nodules are benign considering the high prevalence of benign lesions, thereby decreasing the usefulness of this US feature13,14,15. Marked hypoechogenicity can be a more specific and more reliable criterion for a malignant thyroid nodule than hypoechogenicity in a broader sense with a specificity of 92–94%15,16. A serious concern is that ultrasound echogenicity assessed by clinicians (US-E) has been described qualitatively and is potentially subject to intra-observer and inter-observer variability7. Thus, a quantitative echogenetic value (EI) more objective and measurable is desired for clinical use.

To overcome the shortcomings of subjective judgment concerning the sonographic characteristics used in diagnosis, we have proposed computerized quantification methods to characterize the calcifications, heterogeneity and vascularity to make the diagnosis more objective17,18,19. Additionally, the aim of this study was to collect and quantify more US information; thus, quantitative echoic indexes (EIs) are proposed to study echogenicity.

Materials and Methods

Participants

The Institutional Review Board of National Taiwan University Hospital approved the prospective study, and informed consent was obtained from all of the participants. The methods were carried out in accordance with the approved guidelines. There were 353 patients with 443 thyroid nodules recruited from August 2007 to February 2011 who underwent thyroidectomy because of thyroid carcinoma, a suspicious thyroid nodule, follicular neoplasm or symptomatic nodular goiter diagnosed by ultrasound and fine needle aspiration cytology (FNA) results. The diagnosis results were based on the histopathological examinations of surgical specimens that were reviewed by pathologists. Those nodules with sizes larger than the array (5.2 cm) were excluded in the image assessment. Multinodular goiters without a separable nodule under ultrasound were also excluded. Therefore, 333 participants with 411 nodules participated in the final analysis.

Equipment and ultrasound procedures

All of the sonograms were acquired using a commercial ultrasound device (HDI 5000; Philips Healthcare, Bothell, WA) using a multifrequency linear probe (L12-5). The B-mode images, with the dynamic range of 170 dB, had widths equal to 5.2 cm, while the depths were at least 3.9 cm.

The procedure was performed with the participant in the supine position and the neck hyperextended. The images were captured using the maximum diameter of the nodule. Image analysis was conducted off-line using the Dicom format of images on a separate computer. Quantification of echogenicity was performed using commercial software (AmCAD-UT, AmCad BioMed., Taiwan). The analysis method using the software is described below in detail.

Analysis of echogenicity

During the analysis of ultrasound images, the boundaries of the nodules were defined by two thyroid specialists (K. Y. Chen and M. H. Wu) without knowledge of the FNA cytology or surgical pathology results. To select the references for comparison with the nodule echogenicity20, the regions of the strap muscle and thyroid were also manually selected by the sonographers using computer software. The coordinates of the three defined regions, the nodule, thyroid parenchyma, and strap muscle regions, were recorded in the database separately for subsequent analysis. Examples of the images with the selected regions are shown in Fig. 1.

Figure 1
figure 1

A representative image to delineate the regions of the nodule, thyroid, and strap muscle.

Next, the average gray values inside the selected regions of the nodule, thyroid and muscle, denoted as μnodule, μthyroid and μmuscle, respectively, were calculated. For the nodule part, the anechoic area and hyperechoic foci, clinically deemed as the cyst area and calcifications, respectively, were removed before calculation of the average. The gray values of these pixels can be regarded as outliers based on our previous study18, and they do not contribute to the echogenicity of the nodule. The average gray value for the remaining part of the nodule (μnodule) was denoted as the echogenicity index of the nodule (EIN). According to the literature15,16,21, μthyroid and μmuscle can be used as references to analyze the nodule echogenicity. The ultrasound feature of the nodule can be classified as “hypoechogenicity” when μnodule is smaller than μthyroid or as “marked hypoechogenicity” when μnodule is smaller than μmuscle. The differences between μnodule and μmuscle and between μnodule and μthyroid were recorded, respectively, to represent the adjusted EI of the nodule and were denoted as EIN-M and EIN-T, respectively.

In addition to the comparison to the manually selected references as aforementioned, an automatic calculated reference to the nodule for the echogenicity index was provided with the commercial software. Based on the anatomic knowledge that strap muscles are located mostly in the anterior region of the neck, the anterior region is defined as the area outside the contoured nodule and above the nodule center. Moreover, only those pixels in the anterior region with a gray value smaller than the average were included and defined as the outside reference to resemble the gray level similar to that of the muscle, which is generally darker than other tissue parts. Figure 2 shows the outside references calculated using the software for the same examples of images in Fig. 1.

Figure 2
figure 2

Autonomic references was calculated using the software for the same examples of images in Fig. 1.

An indicator variable Rij is defined as:

where GRij is the gray value of the pixel (i, j), and L is the average of the non-zero gray values of all pixels in the anterior region. The average gray value of the outside reference (μref) was then calculated as follows:

Finally, the automatic EI was obtained by

denoted as EI(N-R)/R, and used for further analysis.

Statistical analysis

Statistical analysis was performed using a software package (SPSS, version 12.0 for Windows; SPSS, Chicago, III.). Fisher’s exact test was used for the comparisons of two binary variables, and Student’s t test was used for comparisons of quantitative variables. The ultrasound features were compared with the histological diagnosis results to determine the sensitivity, specificity, negative predictive value, and positive predictive value. A p value less than 0.05 was considered to indicate statistical significance. A receiver operating characteristic curve (ROC) was also generated, and the area under the curve (AUC) was calculated to determine the diagnostic performance of the quantitative EI. In addition, multiple logistic regression analysis with significant variables in the univariate logistic regression model was performed to determine independent US predictors for malignancy from the US characteristics that showed statistical significance.

Inter-observer agreement was assessed for US characteristics using the Cohen kappa statistic. The interpretation of kappa values: 0.00–0.20 indicated slight agreement; 0.21–0.40, fair agreement; 0.41–0.60, moderate agreement; 0.61–0.80, substantial agreement; and 0.80–1.00, almost perfect agreement15,22.

Results

Of the 333 patients in our study, 269 were female, and 64 were male, with an average age of 48.37 years. The oldest patient is 81 years old and youngest is 11 years old. In total, 254 of 411 (61.8%) nodules were benign (225 were nodular goiter, and 29 were follicular adenoma), and 157 of 411 (38.2%) nodules were malignant with 7 follicular thyroid cancers (FTCs), 7 medullary thyroid cancers (MTCs), 2 anaplastic thyroid cancers (ATCs), 1 lymphoma and 140 papillary thyroid cancers (PTCs).

Conventional ultrasound features of the benign and malignant nodules

For malignancy, 56.1% were smaller than 2 cm, 89.8% were US-E hypoechoic, 82.17% had an irregular margin, 43.31% had microcalcification, and 96.82% were heterogeneous. All of these ultrasound features showed significant differences between the malignant and benign tumors (Table 1).

Table 1 Analysis of different US Characteristics of Benign and Malignant Thyroid Nodules.

EI values of benign and malignant nodules. 

The average μthyroid, μmuscle and μref values were 41.31, 18.59 and 21.72, respectively. The EIN, adjusted EI (EIN-T & EIN-M) and automatic EI (EI(N-R)/R) values between the benign and malignant nodules were all significantly different, with lower values for malignant nodules (p < 0.001, AUC = 0.735, 0.7043 & 0.7698; 0.77) (Table 2).

Table 2 Analysis of EIs of Benign and Malignant Thyroid Nodules.

In a univariate logistic regression analysis, either US-E hypoechoic or low EIs (cut-off set at median or zero) were statistically significant predictors of thyroid malignancy (Table 3).

Table 3 Results of Analysis of US Characteristics and EIs for Detection Malignant Thyroid Nodules.

US-E hypoechoic or EIN-T less than zero (the same as that defined by conventional “hypoechogenicity”) was combined with other significant features for multiple logistic regression analysis to determine independent US predictors for malignancy. It showed that each of them (US-E defined hypoechoic or EIN-T less than zero) was an independent predictor of thyroid malignancy (ORs: 3.51 and 3.69, respectively).

Diagnostic performance of EIs and conventional ultrasound features

The US-E hypoechogenicity had a sensitivity of 89.8%, specificity of 31.9% and accuracy of 54% in the diagnosis of malignant nodules. EIN-T less than zero had a sensitivity of 79.6%, a specificity of 52.4% and an accuracy of 62.8%. Among EIs, EIN-M less than zero, as defined by conventional “marked hypoechogenicity”, had the highest accuracy at 70.3% (Table 4).

Table 4 Diagnostic performance of different US Characteristics and EIs.

Agreement of the Echogenicity Characteristic of the Thyroid Nodules

Among 411 nodules in our study, there are 138 nodules with the echogenicity disagreed by US-E and EIN-T. We evaluated the hypoechogenicity as defined by the computer system (EIN-T less than zero) and clinician (US-E) and showed that they had slight agreement (kappa value 0.25). The mean |EIN-T| in patients with disagreement for the definition of hypoechogenicity was significantly lower than that in patients with agreement for the definition of hypoechogenicity (p < 0.0001).

Because the strap muscle is thought to be a relatively consistent and reliable reference, we further classified nodules into four groups according to the quartile of the EIN-M value. Figure 3 shows the prevalence of cancer in the four EIN-M groups, and the prevalence of malignancy was significantly increased when the value of EIN-M was decreased.

Figure 3
figure 3

Nodules were classified into four groups according to the quartile of the EIN-M value. It shows the prevalence of cancer in the four EIN-M groups, and the prevalence of malignancy was significantly increased when the value of EIN-M was decreased.

(p < 0.001).

EIs with different histology

EIN-M and automatic EI (EI(N-R)/R) values with different histology are shown in Fig 4. The value was high in follicular adenoma and nodular goiter and low in PTC and FTC. There were significant differences between the follicular neoplasms including differences between follicular adenoma and carcinoma.

Figure 4
figure 4

It shows that echogenicity index (EIN-M and automatic EI (EI(N-R)/R) values for lesions classified as nodular goiter (n = 225), follicular adenoma (n = 29), papillary thyroid cancer (n = 140), follicular thyroid cancer (n = 7), and others (medullary thyroid cancers (n = 7), anaplastic thyroid cancers (n = 2) and lymphoma (n = 1).

Discussion

We proposed a computerized method to evaluate ultrasound echogenicity quantitatively. From our study, using EI values, a statistically significant difference was observed between the benign and malignant nodules. The results of this quantitative evaluation also supported the usefulness of echogenicity in the diagnosis of thyroid nodules. To our knowledge, this is the first study to report that the quantitative measurement of ultrasound echogenicity could be a helpful approach in the diagnosis of thyroid nodules using a computerized method.

The presence of microcalcifications, hypoechogenicity, irregular margins, and a solid composition with a heterogeneous pattern suggests a malignancy potential for thyroid nodules3,5,23,24. However, the sensitivity and specificity of these US findings varied in the literature5,25. Additionally, the problem regarding the use of these conventional US features is usually no standardized lexicon and terminology for characterization7,13, leading to poor reliability for the presence of some features such as the echogenicity, pattern of composition and border7,26. In addition, different qualities andlevels of clinical experience and interpretation of these findings cause variable results of the diagnostic accuracy.

We found in the current study that, among the clinician-assessed features, US-E hypoechogenicity and microcalcification, rather than irregular margin and a heterogeneous pattern, were independent predictors for malignancy. Our study found that the frequency of US-E hypoechogenicity was significantly different between benign and malignant nodules, where US-E hypoechoic nodules included the majority (89.8%) of malignant nodules. Among the US markers studied, the US-E hypoechogenicity gained the highest OR. This is consistent with findings of Moon et al.15. Additionally, EINT, calculated by the computer system, when less than zero, has the same meaning as traditionally defined hypoechogenicity. Furthermore, we found EIN-T to be an independent predictive factor for thyroid malignancy. We double confirmed the importance of echogenicity using qualitative and quantitative methods.

Echogenicity was traditionally assessed or described by clinician judgment. Because both benign and malignant thyroid nodules exhibited a hypoechoic pattern to different degrees, it is difficult to detect subtle differences by qualitative assessment. Most US-E hypoechoic nodules are benign considering the high prevalence of benign lesions14, and the comparison of echogenicity without quantification does not provide much useful information7,27.

Our EIN-M, when less than zero, can be classified as the traditional term “marked hypoechogenicity”. We found that EIN-M (specificity: 93%; accuracy: 70.3%; ROC: 0.7698) was a more specific and reliable criterion for the diagnosis of malignant thyroid nodules than EIN-T (specificity: 52%; accuracy: 62.8%; ROC: 0.7043). This result is also consistent with those in other studies that found hypoechogenicity to be highly specific for diagnosing malignant nodules15,16.

Furthermore, in the present study, because a quantitative EIN-M value can be divided among different categories, we found it to be inversely correlated with the frequency of thyroid malignancy. When combined with other quantitative parameters, EIN-M should improve the US characterization of nodules and help to better establish risk groups and a reporting data system for thyroid lesions in the stratification of the malignant risk of nodules28,29.

Using quantitative analysis, we found that EIN-T (less than zero) had better specificity and accuracy but was less sensitive than US-E hypoechogenicity, indicating that more tumors were assessed as hypoechoic by clinicians than by the computerized system. The analysis also revealed that US-E hypoechogenicity and EIN-T (less than zero) showed a slight agreement. This relatively low interviewer reliability between the clinician and computerized system was consistent with the findings of previous studies7,15. A smaller difference in echogenicity between μnodule and μthyroid(low |EIN-T|) had a significantly higher disagreement for the definition of hypoechogenicity by the clinician and computerized system. The latter finding indicates that small subtle differences can only be differentiated by computer systems. EI seems more operator independent and more reproducible than the subjective term of US-E.

A lower EI value implies that the nodule is hypoechoic or markedly hypoechoic on the grayscale sonography, which has been defined as a suspicious sonographic feature in several guidelines30,31. It reflects the fact that a larger proportion of hypoechoic and markedly hypoechoic nodules are found in the malignant group than in the benign group. It is shown that the presence of hypoechogenicity, represented by EIN-T and US-E in this study, showed a relatively high sensitivity (79.6~89.8%) but a lower specificity (31.9~52.4%) while the presence of marked hypoechogenicity, represented by EIN-M in this study, was very specific (93.3%) but not sensitive (33.1%). EIN-T and US-E, with which comparisons are made against the thyroid parenchyma, have a higher sensitivity than EIN-M, with which comparisons are made against the strap muscle, because the echo level of the thyroid parenchyma is usually much higher than that of the strap muscle. The results also agree with the previous study15. As for the sensitivity difference between EIN-T and US-E, it is due to the disparity of the clinician perception from the computer calculation, It can be seen that the US-E is more sensitive while the EIN-T is more specific to detect the malignancy. In other words, the echo level of the nodule perceived by clinicians is easier, as compared to the objectively computerized index, to be lower than the echo level of the surrounding thyroid parenchyma. In clinical situation, the interpretation of sonograms is subjective, with the inter-observer variability being unavoidable in the sonographic assessment of thyroid nodules, and sonographic interpretation is particularly affected by how much experience an operator has1. Operators from a single institution with different experience in thyroid imaging diagnosis have been shown to result in a significant inter-observer variability when differentiating benign and malignant thyroid nodules with grayscale sonography32,33.

With the automatic selection of the outside reference by the computer system, we can also calculate the automatic EI (N-R)/R, with an accuracy near 70% and an AUC near 77%, consistent with the result of EIN-M. Additionally, these findings suggest that manual procedures to operate the software such as selecting the ROI of reference will be more automated in the future.

Previous studies have identified certain ultrasonic features that predict follicular cancer34,35. The present study indicates that there are significant differences in the EI values between follicular adenoma and carcinoma. The result hints a possible clinical application of EI to differentiate follicular neoplasms by FNA cytopathologic diagnoses. A further prospective study will be needed to confirm the finding.

This analysis of echogenicity can be easily and quickly performed within one minute. User-friendly quantification of ultrasound image echogenicity, as described in this paper, is feasible in routine clinical practice and can be used not only for diagnoses but also as a follow-up tool for a tumor.

Although the results obtained using this method for the quantitative measurement of ultrasonic echogenicity are promising, the diagnostic performance by this single feature is still not sufficiently accurate for diagnoses. It might be improved by combining it with other ultrasonic features of computerized methods. Therefore, future studies to combine the computerized EI values with other computerized ultrasonic features are needed.

In conclusion, most conventional US markers of malignancy have been proven to be significant; however, none has ensured both high sensitivity and specificity. The proposed quantitative EI seems more promising to constitute an important advancement compared with conventional qualitative US-E in allowing for a more reliable distinction between benign and malignant thyroid nodules.

Additional Information

How to cite this article: Wu, M.-H. et al. Quantitative analysis of echogenicity for patients with thyroid nodules. Sci. Rep. 6, 35632; doi: 10.1038/srep35632 (2016).