A beneficial role of computer-aided diagnosis system for less experienced physicians in the diagnosis of thyroid nodule on ultrasound

Ultrasonography (US) is the primary diagnostic tool for thyroid nodules, while the accuracy is operator-dependent. It is widely used not only by radiologists but also by physicians with different levels of experience. The aim of this study was to investigate whether US with computer-aided diagnosis (CAD) has assisting roles to physicians in the diagnosis of thyroid nodules. 451 thyroid nodules evaluated by fine-needle aspiration cytology following surgery were included. 300 (66.5%) of them were diagnosed as malignancy. Physicians with US experience less than 1 year (inexperienced, n = 10), or more than 5 years (experienced, n = 3) reviewed the US images of thyroid nodules with or without CAD assistance. The diagnostic performance of CAD was comparable to that of the experienced group, and better than those of the inexperienced group. The AUC of the CAD for conventional PTC was higher than that for FTC and follicular variant PTC (0.925 vs. 0.499), independent of tumor size. CAD assistance significantly improved diagnostic performance in the inexperienced group, but not in the experienced groups. In conclusion, the CAD system showed good performance in the diagnosis of conventional PTC. CAD assistance improved the diagnostic performance of less experienced physicians in US, especially in diagnosis of conventional PTC.

www.nature.com/scientificreports/ Computer-aided diagnosis (CAD) systems have been developed and applied for US diagnostics in various medical fields, catching up with the rapidly developing techniques of machine learning. Several recent studies showed that the diagnostic performance of machine learning in US CAD systems was comparable to that of expert radiologists [11][12][13][14][15] . However, a meta-analysis including 4 studies from the Samsung CAD system and 1 study from independently developed CAD system from China demonstrated that the specificity and the diagnostic odds ratio of the CAD system were lower than those of the experienced radiologist, while the sensitivity of the CAD system was similar 16 . We recently developed another US CAD system for thyroid nodule diagnosis using a machine learning method involving a deep convolutional neural network (CNN) model 17 . This system showed comparable or higher diagnostic performance than that of expert radiologists, however further validation of its diagnostic performance in various clinical settings and exploration of appropriate clinical use is needed.
Thyroid nodules are a common medical problem, and US is widely employed in the diagnosis of thyroid nodules not only by expert radiologists in the hospital but also by physicians in the primary clinics. However, weather the US CAD system is beneficial to the less experienced physicians or in the primary care setting has not been fully studied yet. The aim of this study was to investigate the potential benefits of the US CAD system in the diagnosis of thyroid nodules for less experienced physicians.

Results
Clinical characteristics of thyroid nodules. The clinical characteristics of the thyroid nodules are presented in Table 1. Of the 451 enrolled thyroid nodules, 300 nodules (66.5%) were surgically confirmed as malignant. Compared to the benign nodules, the malignant nodules were more frequently found in male patients (29.3% vs. 15.2%, p = 0.001) and were smaller on average (1.81 ± 1.0 vs. 2.52 ± 1.2 cm, p < 0.001). Patients' mean age at the time of diagnosis was similar between groups. The cases of thyroid cancer were categorized as conventional papillary thyroid carcinoma (cPTC), follicular variant papillary thyroid carcinoma (fvPTC), follicular thyroid carcinoma (FTC), medullary thyroid carcinoma, poorly differentiated thyroid carcinoma, and anaplastic thyroid carcinoma. cPTC, fvPTC, and FTC accounted for 83.7%, 7.0%, and 7.3% of the malignant nodules, respectively. The tumor size was < 2 cm in 78.9% of cPTCs, while 65.1% of FTCs and fvPTCs combined (FTC/ fvPTC) had a size of ≥ 2 cm (p < 0.001, Supplementary Table S1). Of the benign nodules, 38.4% were follicular adenoma, 31.8% were nodular hyperplasia, 23.8% were NIFTP, and 6.0% were other benign lesions.
Diagnostic performance of thyroid US CAD. The diagnostic performance of the CAD system is presented in Table 2 and Fig. 1. Overall, the AUC was 0.855 (Fig. 1A), and the sensitivity, specificity, PPV, and NPV, and accuracy were 85.3%, 63.6%, 82.3%, 68.6%, and 78.0%, respectively (Table 2). In the subgroup analysis, the CAD system showed higher diagnostic performance for thyroid nodules with a size < 2 cm than for larger nodules (≥ 2 cm) in terms of AUC (0.895 vs. 0.751, Fig. 1B Table S1), we then analyzed the diagnostic performance of the CAD system according to histologic subgroup. Compared to FTC/fvPTC, a higher AUC was found for cPTC (0.925 vs. 0.499, Fig. 1D,E). For cPTC, the CAD system also showed higher sensitivity (94.4% vs. 34.9%), PPV (85.3% vs. 26.8%), NPV (84.1% vs. 72.5%), and accuracy (85.0% vs. 56.3%). Interestingly, within the cPTC group, the diagnostic performance of the CAD system was similar regardless of size (AUC, 0.919 for nodules < 2 cm, Fig. 2A; 0.907 for nodules ≥ 2 cm, Fig. 2B). www.nature.com/scientificreports/ Diagnostic performance of physicians before and after CAD assistance. Next, the diagnostic performance was compared between the CAD system and physicians with different levels of experience, divided into the groups (the inexperienced and experienced groups) (  Table 3). Meanwhile, in experienced group, it cannot be said that there is a significant improvement after CAD assistance (Table 3). www.nature.com/scientificreports/ A subgroup analysis was performed according to the subtype of thyroid cancers. The AUC of the physicians was higher for PTC than for FTC/fvPTC (0.737-0.902 vs. 0.437-0.605), and CAD assistance significantly improved the AUC in most of the inexperienced group and a subset of experienced physicians for the diagnosis of cPTC, however, it cannot be said that there is a significant improvement with CAD assistance for the diagnosis of FTC/fvPTC (Supplementary Table S2 Table S3).

Discussion
In this study, the diagnostic performance for assessing the malignancy risk of thyroid nodules using US was compared between the CAD system and physicians with various levels of US experience, and the role of CAD assistance for physicians not board-certified radiologists was investigated. The AUC of the CAD system was 0.855 for all thyroid nodules and 0.925 for nodules diagnosed as cPTC, which was much higher than the AUC for nodules diagnosed as FTC/fvPTC. The diagnostic performance of physicians with less US experience was significantly lower than that of the CAD system, and CAD assistance improved their performance. Collectively, the present study demonstrated the beneficial role of assistance from the US CAD system for physicians with insufficient US training.
US is the most sensitive and widely used diagnostic tool for thyroid nodule assessment. Malignant nodules (especially PTCs) have specific US features in terms of echogenicity, solidity, orientation, and the presence of microcalcification 3,18 . Nonetheless, the reported diagnostic value of US varies considerably across studies, with high inter-performer and inter-observer variability. Although several guidelines have been established by related societies 2-6 , high inter-observer variability was still observed even among board-certified radiologists (κ = 0.51) 10 .  Table 3. Diagnostic performance of physicians with different levels of experience before and after CAD assistance. ACR-TIRADS 4 was used as the cut-off to calculate the diagnostic performance of physicians. CAD, computer-aided diagnosis; Before, physicians before CAD assistance; After, physicians after CAD assistance. PPV, positive predictive value; NPV, negative predictive value. P a , CAD vs. before; P b , before vs. after.

Inexperienced Experienced
Before (%) After (%) P a P b Before (%) After (%) P a P b  14,15,19 . The present study demonstrated that a US CAD system established by a deep learning method (CNN) can furnish useful diagnostic assistance for less experienced physicians. For mimicking the real-world practice, this study recruited physicians not radiologists, and divided them into two groups according to their years of US experience. Although the number of physicians in the experienced group was small (n = 3), the diagnostic performance between the inexperienced and experienced group was significantly different.
Since US is widely used both by well-trained radiologists and by physicians in their general clinics, the present study stated the first step to verify the clinical use of the US CAD system. However, several points need to be considered regarding the application of the present CAD system in the practice of primary care physicians. In the development and validation process of the current CAD system, both the training set and the study set of nodules were enrolled from a tertiary referral hospital which are different from those of the primary care system. Furthermore, the enrolled nodules in the present study were all surgically diagnosed, which can lead to selection bias. Thus, further study is needed in primary care conditions. In our daily practice, we generally use K-TIRAS system based on the short decision tree model, because it is easy and fast. However, to compare the diagnostic performance with or without CAD assistance we also used ACR-TIRADS which applies point-based system scoring system, scoring range from 0 to 14, since it showed the best sensitivity compared to other TIRADS 20 . Further study is needed to determine whether CAD-assistance can be widely applied in various TIRADS using field.
Additionally, the present study has several limitations. First, the US CAD system was originally developed using nodules 1 cm or larger, so it cannot be applied to nodules smaller than 1 cm. Although, the present study showed excellent results for the diagnosis of PTC using the CAD system in nodules of any size, an expanded CAD system would be needed for micro-nodules, which are identified at an increasing frequency. Second, the CAD system showed no beneficial role for the diagnosis of FTC/fvPTC. The AUC for FTC/fvPTC was 0.499, which was similar to that of physicians regardless of experience. Unlike cPTC, the US characteristics of FTC/fvPTC are very heterogeneous and non-specific [21][22][23][24] , and play a minimal role in the preoperative diagnosis 25 . Additionally, the US CAD system used in the present study was trained using PTC-dominant learning materials, as 96.5% of the nodule were PTCs. A challenge for further research would be to develop a highly advanced CAD system using artificial intelligence with sufficient data on FTC/fvPTCs.
In conclusion, the CAD system showed good diagnostic performance and had a beneficial assistive role for physicians with less US experience in assessing the malignancy risk of thyroid nodules, especially in PTCs. Therefore, this US CAD system can be a beneficial tool to assess less-experienced physicians in PTC-dominant areas.

Materials and methods
Study population. A total of 5581 US images of thyroid nodules from 4143 patients who had undergone fine-needle aspiration (FNA) at the Department of Endocrinology, Seoul National University Hospital from April 2014 to June 2019 were consecutively recruited and reviewed. The inclusion criteria were as follows: (i) Table 4. Comparisons of diagnostic performances between CAD and physicians before and after CAD assistance according to the pathologic subtype. ACR-TIRADS 4 was used as the cut-off to calculate the diagnostic performance of physicians. CAD, computer-aided diagnosis; Before, physicians before CAD assistance; After, physicians after CAD assistance; PPV, positive predictive value; NPV, negative predictive value; cPTC, conventional papillary thyroid carcinoma; FTC, follicular thyroid carcinoma; fvPTC, follicular variant papillary thyroid carcinoma. P a , CAD vs. before; P b , before vs. after.

Inexperienced Experienced
Before (%) After (%) P a P b Before After www.nature.com/scientificreports/ patients ≥ 20 years of age, (ii) a maximal nodule diameter ≥ 1 cm, and (iii) patients whose nodules were pathologically confirmed by surgery. Finally, 451 thyroid nodules were enrolled (Fig. 3). Thirteen physicians, not boardcertified radiologists, with various levels of US experience were recruited from three referral hospitals. Ten of them were general physicians who had US experience less than 1 year (designated as 'inexperienced group'), and 3 of them were endocrine faculties with more than 5 years of experience in thyroid USG imaging and FNA procedures (designated as 'experienced group'). This study was approved by the Institutional Review Board of Seoul National University Hospital (IRB No. 1911-039-1076). Written informed consent has been obtained from each patient after full explanation of the purpose and nature of all procedures used. All methods were carried out in accordance with relevant guidelines and regulations.
Cytologic and histologic evaluation of thyroid nodules. According to the recommendation of the Korean Thyroid Imaging Reporting and Data System (K-TIRADS) 5 , all nodules were evaluated by US and FNA was performed for suspicious nodules by experienced physicians. The cytology results were reported using the Bethesda System for Reporting Thyroid Cytopathology 26 by an expert pathologist who had more than 10 years of experience at a tertiary hospital. Surgery was performed in patients with Bethesda cytology categories IV, V, and VI. Additionally, patients having nodules with Bethesda cytology categories II or III also underwent surgery if they have large nodule size, simultaneous presence of other nodules confirmed as malignancy, or the presence of compressive symptoms. Noninvasive follicular thyroid neoplasm with papillary-like nuclear features (NIFTP) was defined as a benign lesion.
Ultrasonography examinations. US examinations were performed using high-resolution ultrasound machines (LOGIQ7; GE Healthcare, Milwaukee, WI, USA or Affinity 50G; Philips Healthcare, Bothell, WA, USA). Each system was equipped with a linear, high-frequency transducer (5)(6)(7)(8)(9)(10)(11)(12)(13)(14). After screening patients, we selected the representative images of each thyroid nodules in which the elements constituting TIRADS (composition, echogenicity, shape, margin, echogenic foci) are clearly visible ( Supplementary Fig. S1), and saved it as a JPEG file. A square region of interest for each nodule was drawn by an expert radiologist (J.Y.K). After the CAD system calculated the cancer probability, the US images of thyroid nodules were reviewed by 13 physicians. The physicians reviewed the US images twice using the ACR-TIRADS 2 . First, the US images were provided for 30 s and the physicians scored it without CAD-assistance. Immediately after first scoring, the results of CAD system, representing dichotomized as cancer (1) or benign (0), were provided to the physicians. The physicians re-reviewed the same US image again for 30 s, and re-scored it. All physicians were blinded to the patients' clinical information and pathology results.

US CAD system.
To evaluate malignancy risk, we used our CAD system that had been developed using a deep CNN model. The detailed development protocol of the US CAD system has been published previously 17 . Briefly, the algorithm was trained using 13,560 US images of thyroid nodules that were either surgically or cytologically proven as benign or malignant. For internal and external validation tests, surgically confirmed thyroid nodules were obtained from three tertiary hospitals and the tests verified that the diagnostic performance of the CAD system was comparable or higher than that of expert radiologists. Once a US image is input into the CAD system, the results are presented as cancer probabilities (%), and the images are also classified as malignant or benign, with a cut-off value of a 50% probability of malignancy.