Deep neural network based artificial intelligence assisted diagnosis of bone scintigraphy for cancer bone metastasis

Bone scintigraphy (BS) is one of the most frequently utilized diagnostic techniques in detecting cancer bone metastasis, and it occupies an enormous workload for nuclear medicine physicians. So, we aimed to architecture an automatic image interpreting system to assist physicians for diagnosis. We developed an artificial intelligence (AI) model based on a deep neural network with 12,222 cases of 99mTc-MDP bone scintigraphy and evaluated its diagnostic performance of bone metastasis. This AI model demonstrated considerable diagnostic performance, the areas under the curve (AUC) of receiver operating characteristic (ROC) was 0.988 for breast cancer, 0.955 for prostate cancer, 0.957 for lung cancer, and 0.971 for other cancers. Applying this AI model to a new dataset of 400 BS cases, it represented comparable performance to that of human physicians individually classifying bone metastasis. Further AI-consulted interpretation also improved human diagnostic sensitivity and accuracy. In total, this AI model performed a valuable benefit for nuclear medicine physicians in timely and accurate evaluation of cancer bone metastasis.

Cohorts and network architecture for AI model. To obtain an accurate testing results, 12,222 images were randomly assigned to three cohorts: (a) a training cohort of 9776 patients for DNNs construction, (b) a validation cohort of 1223 patients for optimization of the DNNs hyperparameters, and (c) a testing cohort of 1223 patients to test the performance of the model. As shown in Table 1, the images used in our study contained 6021 cases with lung cancer, 1844 cases with prostate cancer, 2100 cases with breast cancer, and 2257 cases with other cancers (37 kinds of cancers were listed in Supplementary Table S1).
Then, we proposed a multi-input convolutional neural network (CNN) which can accept multiple images as input. The original images (DICOM format) were resized to 256 × 768 and Hu matrix were normalized to [0,1] before going to the model. Previous studies indicate that fine-tuning with pre-trained networks is an effective method for training CNNs 16,17 . In this study, several ImageNet pretrained networks are explored and ResNet-50 has been chosen to extract high-level features from input images. Fully connection layer was removed from the final layer of ImageNet pretrained network ResNet-50 for feature extraction. The proposed network contains three parts. In the first part, ResNet-50 network was employed to extract high-level features. In the second part, max aggregation operator was used to aggregate high-level features extracted from two images. Since hotspots in the images usually present variant scales. Inspired by spatial pyramid pooling, three pooling layers with different kernel size were used to capture different scale information. In the final part, two fully connected layers were applied to classify the features into metastasis or non-metastasis. The detailed network architecture is shown in Fig. 1.
Evaluation of AI performance. Performance of the automated AI model was evaluated by the ROC analysis and AUC measurement using the testing cohort containing another 1223 cases. Total cases were divided as 4 subgroups by cancer types: prostate cancer (15.13%), breast cancer (17.17%), lung cancer (49.22%), and other cancers (18.48%), while the sensitivity, specificity, accuracy, PPV, and NPV in each cancer were calculated respectively. Gender and age related diagnostic performance was conducted to investigate whether these factors would affect the results by comparing the AUC values of male versus female, and patient's age < 60 years versus ≥ 60 years in these patients.
Then, an individual interpreting competition between AI and three nuclear physicians who had more than 5 years' experience was carried out. A new dataset containing 200 cases with cancer bone metastasis and 200 without metastasis were randomly chosen from 2786 examinations with confirmed conclusion between July and October 2018 in West China Hospital. In this competition, AI and physicians were blinded to the ground truth and distribution of patients, and interpreted images without extra radiologic and medical information, but only based on BS images. To further estimate the potential value of AI model, one hundred days later, these three physicians were required to re-interpreting the same test cohort of 400 cases, and they would give the final judgement after consulting AI's result. The time-cost, diagnostic sensitivity, specificity, accuracy, PPV, and NPV of AI system and physicians were evaluated, respectively. Performance of the AI model. After training and validating process, our AI model indicated considerable diagnostic accuracy of 93.38% in cancer bone metastasis in total of 1223 testing cases, which is better than other models in previous reports ( Table 2). As shown in Fig. 2, in subgroups divided by cancer types, our AI model displayed considerable high accuracy measured by AUC value, which was 0.955 for prostate cancer, 0.988 for breast cancer, 0.957 for lung cancer, and 0.971 for the other cancers. The age-based analysis indicated no significant diagnostic differences of bone metastasis in patients with breast cancer, lung cancer, and other cancers. However, statistically different diagnostic accuracy was investigated in patients between ≥ 60 years old (AUC = 0.938) and < 60 years old (AUC = 0.992) in prostate cancer group (P < 0.05). A probable reason might be the older ages of patients (71.0 ± 8.1 years) than other groups (P < 0.01), thus the increased risk of benign diseases in aging patients, such as osteophyte, arthrosis, osteoporotic fracture, and postoperative change, also displayed hot spots in BS and thus decreased the diagnostic accuracy of bone metastases. In addition, except for sexuality-related breast cancer and prostate cancer, there were no significant differences in the diagnosis of bone metastasis between male and female patients in lung cancer and other cancer groups.
There are still 81 misdiagnosed cases were found in the testing cohort of 1223 cases (6.62%), including 38 false-negative (3.11%) and 43 false-positive (3.51%) cases (Supplementary Table S3). Lesion number, size, and adjacent diffused signal were the major influence factors in false-negative cases. On the other hand, fracture, inflammation, degenerative, and postoperative change were the main reasons for the false-positive cases in our test. In the interpreting competition between AI model and three qualified nuclear medicine physicians, AI model cost only 11.3 s to complete the interpretation of 400 cases, while three physicians spent 116, 140, and 153 min, respectively, to accomplish the same work, which is corresponding to a time savings of 99.88%. Then, compared with the highest performance of three physicians, AI model manifested improved accuracy (93.5% vs. 89.00%) and sensitivity (93.5% vs. 85.00%) in calculating metastases in total cases (P < 0.001), but the specificity between AI model (93.50%) and human (94.50%) were not significantly different. However, after consulting the AI result, physician-1 and physician-3 indicated improved diagnostic performance, especially in finding the missed lesions and reducing the false-negative rate.
In detailed error analysis, we collected 13 cases with correct interpretation by AI but misdiagnosed by all three physicians. Among these cases, 11 patients were found to have small lesions (diameter for a few millimeters) or insufficient resolution of radioactive uptake, were ignored or judged as benign by humans ( Supplementary  Fig. S1). The other 2 patients who had osteoporotic vertebral compression fracture were misdiagnosed as metastases by humans (Supplementary Fig. S2). Interestingly, there were 6 cases misdiagnosed by AI but correctly interpreted by all three physicians. One patient with diffuse skeletal metastasis and two patients with humerus metastases were misdiagnosed as benign by AI (Supplementary Fig. S3). Then, one patient with multiple fractures and one patient with postoperative bone change, were misdiagnosed as malignant lesions by AI model (Supplementary Fig. S4); while the last misdiagnosed case was caused by the catheter on the patient.

Discussion
Despite the advent of various imaging modalities, such as PET/CT and multiparameter MRI, have been developed to detect skeletal metastasis, bone scintigraphies with 99m Tc-MDP remains one of the most effective diagnostic techniques for its considerable sensitivity and cost performance 21,22 . Skeletal imaging occupies 61.3% of 2.09 million of SPECT scans annually in China, and most of them were not fused with CT by the limited device utilization 23 . Thus, the diagnosis of BS planar image is still a challenge for the nuclear medicine physicians in China. Fortunately, an automated system might be an effective tool to overcome this dilemma. In this study, we constructed an AI model with deep neural network based on 12,222 cases to extract image features, and evaluated its efficiency for diagnosing cancer bone metastasis with BS images. This model simultaneously improved diagnostic performance and time-cost for interpreting images, and the AI consulting system could potentially www.nature.com/scientificreports/ www.nature.com/scientificreports/ improve physicians' diagnostic skills specially for younger physicians who lacked experience. Besides, by the first time, lung cancer was separated as an individual subgroup for AI analysis and indicated diagnostic accuracy of 93.36%, which seems promising for clinical use in the future study. Generally, deep neural networks with sufficient valid dataset is usually conducive for improving the final outcomes for AI analysis 24 . In this study, a dataset contained 12,222 BS examinations from 40 cancer types, which is the largest dataset for single-center BS image interpreting by now, was used to construct the DNN for AI modeling. Compared with traditional methods using hand-crafted features, the use of multi-input deep convolutional neural network allows AI model to follow the natural distribution, reduced subjective judgment of physicians, better generalization performance, and closer to the usual clinical environment. For example, previous studies 15,25 usually excluded cases that could be misleading during the training process, such as patients with large bladder, sternotomy, or fracture. However, there were not any atypical cases were excluded in our dataset to help the AI model come closest to a real index. Thus, as expected, our AI model represented improved diagnostic accuracy of AUC values (0.964) compared with other BS diagnostic AI models in previous reports (0.858, 0.91, and 0.932) [18][19][20] . Notably, although the AI model have made false-negative of 8 cases in navigating small lesions in testing cohort, it displayed better capability in small lesion recognition than humans in following competition. Although the AI model was able to efficiently improve the detection of missed small metastatic lesions by human and beneficial to reduce the readers' error rates of BS interpretation, there are several limitations should be noted. First, the estimations by our AI model were based on BS images only. The false-negative and false-positive cases have still appeared, which may be due to small lesion number, lesion size, lesion adjacent to physiological uptakes like bladder, and diffused skeletal metastasis manifested by diffused homogenous uptakes. These kinds of cases were also tricky for nuclear medicine physicians to interpret based on BS images only. However, in "real" clinical works, the patients' medical records, such as injury history, surgical record, characteristics of other www.nature.com/scientificreports/ imaging modalities, and the results of laboratory tests, must be considered to obtain accurate BS interpretation. According to this, the construction of a new AI model based on the fused SPECT/CT bone images is currently undergoing by our team, and we hope the addition of fused reference CT and medical records would effectively reduce the diagnostic errors. Secondly, the unsatisfied capability in recognizing add-ons on patients, such as a catheter, is still a noticeable disadvantage of this AI model but easy for physicians. Thirdly, our study just focused on the performance of the AI model on the diagnosis of absence or presence of bone metastasis to assist nuclear medicine physicians' interpretation. However, a series previous study demonstrated that the bone scan index (BSI) calculated by artificial neural networks is an effective biomarker for predicting the prognosis or survival of some malignant cancers [26][27][28][29] . Whether our AI model could be beneficial to the assessment of the prognosis or survival of some malignant cancers like BSI, it might require more concentration on lesion-based analysis. Last but not least, the retrospectively acquired database was collected from only one hospital for the present work. The patients at our hospital might not be considered typical of other centers, and the findings might be considered to be relatively institution-specific. A prospective multi-center study will also be needed to evaluate whether the AI model would be able to show satisfactory performance on BS images acquired with different gamma cameras, protocols, interpretive styles, and incidence of metastatic disease. These processes require considerable time for collecting more clinical data and will be studied in future works.

Conclusions
Our AI model achieved considerable time-efficiency, accuracy, specificity and sensitivity in diagnosis of bone metastasis in patients with lung cancer, prostate cancer, breast cancer, and other cancers. With further assessment and validation, this model could facilitate diagnosing programs and help physicians improve the diagnostic efficiency and accuracy of bone metastasis, particularly in remote or low-resource areas, leading to a beneficial clinical impact.

Data availability
Data confirming the results of this study are presented in the manuscript and are available from the corresponding author upon reasonable request.