Quantitative analysis of echogenicity for patients with thyroid nodules

Hypoechogenicity has been described qualitatively and is potentially subject to intra- and inter-observer variability. The aim of this study was to clarify whether quantitative echoic indexes (EIs) are useful for the detection of malignant thyroid nodules. Overall, 333 participants with 411 nodules were included in the final analysis. Quantification of echogenicity was performed using commercial software (AmCAD-UT; AmCad BioMed, Taiwan). The coordinates of three defined regions, the nodule, thyroid parenchyma, and strap muscle regions, were recorded in the database separately for subsequent analysis. And the results showed that ultrasound echogenicity (US-E), as assessed by clinicians, defined hypoechogenicity as an independent factor for malignancy. The EI, adjusted EI (EIN-T; EIN-M) and automatic EI(N-R)/R values between benign and malignant nodules were all significantly different, with lower values for malignant nodules. All of the EIs showed similar percentages of sensitivity and specificity and had better accuracies than US-E. In conclusion, the proposed quantitative EI seems more promising to constitute an important advancement than the conventional qualitative US-E in allowing for a more reliable distinction between benign and malignant thyroid nodules.

August 2007 to February 2011 who underwent thyroidectomy because of thyroid carcinoma, a suspicious thyroid nodule, follicular neoplasm or symptomatic nodular goiter diagnosed by ultrasound and fine needle aspiration cytology (FNA) results. The diagnosis results were based on the histopathological examinations of surgical specimens that were reviewed by pathologists. Those nodules with sizes larger than the array (5.2 cm) were excluded in the image assessment. Multinodular goiters without a separable nodule under ultrasound were also excluded. Therefore, 333 participants with 411 nodules participated in the final analysis.
Equipment and ultrasound procedures. All of the sonograms were acquired using a commercial ultrasound device (HDI 5000; Philips Healthcare, Bothell, WA) using a multifrequency linear probe (L12-5). The B-mode images, with the dynamic range of 170 dB, had widths equal to 5.2 cm, while the depths were at least 3.9 cm.
The procedure was performed with the participant in the supine position and the neck hyperextended. The images were captured using the maximum diameter of the nodule. Image analysis was conducted off-line using the Dicom format of images on a separate computer. Quantification of echogenicity was performed using commercial software (AmCAD-UT, AmCad BioMed., Taiwan). The analysis method using the software is described below in detail.

Analysis of echogenicity.
During the analysis of ultrasound images, the boundaries of the nodules were defined by two thyroid specialists (K. Y. Chen and M. H. Wu) without knowledge of the FNA cytology or surgical pathology results. To select the references for comparison with the nodule echogenicity 20 , the regions of the strap muscle and thyroid were also manually selected by the sonographers using computer software. The coordinates of the three defined regions, the nodule, thyroid parenchyma, and strap muscle regions, were recorded in the database separately for subsequent analysis. Examples of the images with the selected regions are shown in Fig. 1.
Next, the average gray values inside the selected regions of the nodule, thyroid and muscle, denoted as μ nodule, μ thyroid and μ muscle , respectively, were calculated. For the nodule part, the anechoic area and hyperechoic foci, clinically deemed as the cyst area and calcifications, respectively, were removed before calculation of the average. The gray values of these pixels can be regarded as outliers based on our previous study 18 , and they do not contribute to the echogenicity of the nodule. The average gray value for the remaining part of the nodule (μ nodule ) was denoted as the echogenicity index of the nodule (EI N ). According to the literature 15,16,21 , μ thyroid and μ muscle can be used as references to analyze the nodule echogenicity. The ultrasound feature of the nodule can be classified as "hypoechogenicity" when μ nodule is smaller than μ thyroid or as "marked hypoechogenicity" when μ nodule is smaller than μ muscle . The differences between μ nodule and μ muscle and between μ nodule and μ thyroid were recorded, respectively, to represent the adjusted EI of the nodule and were denoted as EI N-M and EI N-T , respectively.
In addition to the comparison to the manually selected references as aforementioned, an automatic calculated reference to the nodule for the echogenicity index was provided with the commercial software. Based on the anatomic knowledge that strap muscles are located mostly in the anterior region of the neck, the anterior region is defined as the area outside the contoured nodule and above the nodule center. Moreover, only those pixels in the anterior region with a gray value smaller than the average were included and defined as the outside reference to resemble the gray level similar to that of the muscle, which is generally darker than other tissue parts. Figure 2 shows the outside references calculated using the software for the same examples of images in Fig. 1.
An indicator variable R ij is defined as:

ij ij
where GR ij is the gray value of the pixel (i, j), and L is the average of the non-zero gray values of all pixels in the anterior region. The average gray value of the outside reference (μ ref ) was then calculated as follows: Finally, the automatic EI was obtained by Statistical analysis. Statistical analysis was performed using a software package (SPSS, version 12.0 for Windows; SPSS, Chicago, III.). Fisher's exact test was used for the comparisons of two binary variables, and Student's t test was used for comparisons of quantitative variables. The ultrasound features were compared with the histological diagnosis results to determine the sensitivity, specificity, negative predictive value, and positive predictive value. A p value less than 0.05 was considered to indicate statistical significance. A receiver operating characteristic curve (ROC) was also generated, and the area under the curve (AUC) was calculated to determine the diagnostic performance of the quantitative EI. In addition, multiple logistic regression analysis with significant variables in the univariate logistic regression model was performed to determine independent US predictors for malignancy from the US characteristics that showed statistical significance. Inter-observer agreement was assessed for US characteristics using the Cohen kappa statistic. The interpretation of kappa values: 0.00-0.20 indicated slight agreement; 0.21-0.40, fair agreement; 0.41-0.60, moderate agreement; 0.61-0.80, substantial agreement; and 0.80-1.00, almost perfect agreement 15,22 .

Conventional ultrasound features of the benign and malignant nodules. For malignancy, 56.1%
were smaller than 2 cm, 89.8% were US-E hypoechoic, 82.17% had an irregular margin, 43.31% had microcalcification, and 96.82% were heterogeneous. All of these ultrasound features showed significant differences between the malignant and benign tumors (Table 1).  Table 2).

EI values of benign and malignant nodules.
In a univariate logistic regression analysis, either US-E hypoechoic or low EIs (cut-off set at median or zero) were statistically significant predictors of thyroid malignancy (Table 3).
US-E hypoechoic or EI N-T less than zero (the same as that defined by conventional "hypoechogenicity") was combined with other significant features for multiple logistic regression analysis to determine independent US predictors for malignancy. It showed that each of them (US-E defined hypoechoic or EI N-T less than zero) was an independent predictor of thyroid malignancy (ORs: 3.51 and 3.69, respectively). Diagnostic performance of EIs and conventional ultrasound features. The US-E hypoechogenicity had a sensitivity of 89.8%, specificity of 31.9% and accuracy of 54% in the diagnosis of malignant nodules. EI N-T less than zero had a sensitivity of 79.6%, a specificity of 52.4% and an accuracy of 62.8%. Among EIs, EI N-M less than zero, as defined by conventional "marked hypoechogenicity", had the highest accuracy at 70.3% (Table 4).

Agreement of the Echogenicity Characteristic of the Thyroid Nodules.
Among 411 nodules in our study, there are 138 nodules with the echogenicity disagreed by US-E and EI N-T . We evaluated the hypoechogenicity as defined by the computer system (EI N-T less than zero) and clinician (US-E) and showed that they had slight agreement (kappa value 0.25). The mean |EI N-T | in patients with disagreement for the definition of hypoechogenicity was significantly lower than that in patients with agreement for the definition of hypoechogenicity (p < 0.0001).
Because the strap muscle is thought to be a relatively consistent and reliable reference, we further classified nodules into four groups according to the quartile of the EI N-M value. Figure 3 shows the prevalence of cancer in the four EI N-M groups, and the prevalence of malignancy was significantly increased when the value of EI N-M was decreased.
EIs with different histology. EI N-M and automatic EI (EI (N-R)/R ) values with different histology are shown in Fig 4. The value was high in follicular adenoma and nodular goiter and low in PTC and FTC. There were significant differences between the follicular neoplasms including differences between follicular adenoma and carcinoma.

Discussion
We proposed a computerized method to evaluate ultrasound echogenicity quantitatively. From our study, using EI values, a statistically significant difference was observed between the benign and malignant nodules. The results of this quantitative evaluation also supported the usefulness of echogenicity in the diagnosis of thyroid nodules. To our knowledge, this is the first study to report that the quantitative measurement of ultrasound echogenicity could be a helpful approach in the diagnosis of thyroid nodules using a computerized method.    The presence of microcalcifications, hypoechogenicity, irregular margins, and a solid composition with a heterogeneous pattern suggests a malignancy potential for thyroid nodules 3,5,23,24 . However, the sensitivity and specificity of these US findings varied in the literature 5,25 . Additionally, the problem regarding the use of these conventional US features is usually no standardized lexicon and terminology for characterization 7,13 , leading to poor reliability for the presence of some features such as the echogenicity, pattern of composition and border 7,26 . In addition, different qualities and levels of clinical experience and interpretation of these findings cause variable results of the diagnostic accuracy.
We found in the current study that, among the clinician-assessed features, US-E hypoechogenicity and microcalcification, rather than irregular margin and a heterogeneous pattern, were independent predictors for malignancy. Our study found that the frequency of US-E hypoechogenicity was significantly different between benign and malignant nodules, where US-E hypoechoic nodules included the majority (89.8%) of malignant nodules. Among the US markers studied, the US-E hypoechogenicity gained the highest OR. This is consistent with findings of Moon et al. 15 . Additionally, EI NT , calculated by the computer system, when less than zero, has the same meaning as traditionally defined hypoechogenicity. Furthermore, we found EI N-T to be an independent predictive factor for thyroid malignancy. We double confirmed the importance of echogenicity using qualitative and quantitative methods.
Echogenicity was traditionally assessed or described by clinician judgment. Because both benign and malignant thyroid nodules exhibited a hypoechoic pattern to different degrees, it is difficult to detect subtle differences by qualitative assessment. Most US-E hypoechoic nodules are benign considering the high prevalence of benign lesions 14 , and the comparison of echogenicity without quantification does not provide much useful information 7,27 . Our EI N-M , when less than zero, can be classified as the traditional term "marked hypoechogenicity". We found that EI N-M (specificity: 93%; accuracy: 70.3%; ROC: 0.7698) was a more specific and reliable criterion for the diagnosis of malignant thyroid nodules than EI N-T (specificity: 52%; accuracy: 62.8%; ROC: 0.7043). This result is also consistent with those in other studies that found hypoechogenicity to be highly specific for diagnosing malignant nodules 15,16 .
Furthermore, in the present study, because a quantitative EI N-M value can be divided among different categories, we found it to be inversely correlated with the frequency of thyroid malignancy. When combined with other quantitative parameters, EI N-M should improve the US characterization of nodules and help to better establish risk groups and a reporting data system for thyroid lesions in the stratification of the malignant risk of nodules 28,29 .
Using quantitative analysis, we found that EI N-T (less than zero) had better specificity and accuracy but was less sensitive than US-E hypoechogenicity, indicating that more tumors were assessed as hypoechoic by clinicians than by the computerized system. The analysis also revealed that US-E hypoechogenicity and EI N-T (less than zero) showed a slight agreement. This relatively low interviewer reliability between the clinician and computerized system was consistent with the findings of previous studies 7, 15 . A smaller difference in echogenicity between μ nodule and μ thyroid (low |EI N-T |) had a significantly higher disagreement for the definition of hypoechogenicity by the clinician and computerized system. The latter finding indicates that small subtle differences can only be differentiated by computer systems. EI seems more operator independent and more reproducible than the subjective term of US-E.
A lower EI value implies that the nodule is hypoechoic or markedly hypoechoic on the grayscale sonography, which has been defined as a suspicious sonographic feature in several guidelines 30,31 . It reflects the fact that a larger proportion of hypoechoic and markedly hypoechoic nodules are found in the malignant group than in the benign group. It is shown that the presence of hypoechogenicity, represented by EI N-T and US-E in this study, showed a relatively high sensitivity (79.6~89.8%) but a lower specificity (31.9~52.4%) while the presence of marked hypoechogenicity, represented by EI N-M in this study, was very specific (93.3%) but not sensitive (33.1%). EI N-T and US-E, with which comparisons are made against the thyroid parenchyma, have a higher sensitivity than EI N-M , with which comparisons are made against the strap muscle, because the echo level of the thyroid parenchyma is usually much higher than that of the strap muscle. The results also agree with the previous study 15 . As for the sensitivity difference between EI N-T and US-E, it is due to the disparity of the clinician perception from the computer calculation, It can be seen that the US-E is more sensitive while the EI N-T is more specific to detect the malignancy. In other words, the echo level of the nodule perceived by clinicians is easier, as compared to the objectively computerized index, to be lower than the echo level of the surrounding thyroid parenchyma. In clinical situation, the interpretation of sonograms is subjective, with the inter-observer variability being unavoidable in the sonographic assessment of thyroid nodules, and sonographic interpretation is particularly affected by how much experience an operator has 1 . Operators from a single institution with different experience in thyroid imaging diagnosis have been shown to result in a significant inter-observer variability when differentiating benign and malignant thyroid nodules with grayscale sonography 32,33 .
With the automatic selection of the outside reference by the computer system, we can also calculate the automatic EI (N-R)/R , with an accuracy near 70% and an AUC near 77%, consistent with the result of EI N-M . Additionally, these findings suggest that manual procedures to operate the software such as selecting the ROI of reference will be more automated in the future.
Previous studies have identified certain ultrasonic features that predict follicular cancer 34,35 . The present study indicates that there are significant differences in the EI values between follicular adenoma and carcinoma. The result hints a possible clinical application of EI to differentiate follicular neoplasms by FNA cytopathologic diagnoses. A further prospective study will be needed to confirm the finding.
This analysis of echogenicity can be easily and quickly performed within one minute. User-friendly quantification of ultrasound image echogenicity, as described in this paper, is feasible in routine clinical practice and can be used not only for diagnoses but also as a follow-up tool for a tumor.
Although the results obtained using this method for the quantitative measurement of ultrasonic echogenicity are promising, the diagnostic performance by this single feature is still not sufficiently accurate for diagnoses. It might be improved by combining it with other ultrasonic features of computerized methods. Therefore, future studies to combine the computerized EI values with other computerized ultrasonic features are needed.
In conclusion, most conventional US markers of malignancy have been proven to be significant; however, none has ensured both high sensitivity and specificity. The proposed quantitative EI seems more promising to constitute an important advancement compared with conventional qualitative US-E in allowing for a more reliable distinction between benign and malignant thyroid nodules.