Using artificial intelligence (AI) to prevent and treat diseases is an ultimate goal in computational medicine. Although AI has been developed for screening and assisted decision-making in disease prevention and management, it has not yet been validated for systematic application in the clinic. In the context of rare diseases, the main strategy has been to build specialized care centres; however, these centres are scattered and their coverage is insufficient, which leaves a large proportion of rare-disease patients with inadequate care. Here, we show that an AI agent using deep learning, and involving convolutional neural networks for diagnostics, risk stratification and treatment suggestions, accurately diagnoses and provides treatment decisions for congenital cataracts in an in silico test, in a website-based study, in a ‘finding a needle in a haystack’ test and in a multihospital clinical trial. We also show that the AI agent and individual ophthalmologists perform equally well. Moreover, we have integrated the AI agent with a cloud-based platform for multihospital collaboration, designed to improve disease management for the benefit of patients with rare diseases.
Artificial intelligence (AI) holds great promise in computational medicine. Much attention has been focused on creating an expert medical robot with all-round ability and high diagnostic accuracy. However, a doctor’s process in determining medical diagnoses and treatments is so complex that it is difficult to fully model and simulate using conventional algorithms and data. Moreover, mistakes regarding vital decisions are not acceptable. An attractive alternative use of AI is in the reduction of costs and in improving the efficiency of the medical process
. Several AI algorithms have been developed for medical purposes, such as screening, risk stratification and assisted decision-making
Rare diseases, which affect approximately 10% of the world’s population, can be life-threatening or chronically disabling 5 . The current approach to addressing these diseases involves combining medical resources to establish specialized care centres 6 . However, these centres are tremendously expensive and geographically scattered, with weak connections to non-specialized hospitals, where care for these patients is relatively poor 7 . Therefore, missed or mistaken diagnoses, as well as inappropriate treatment decisions, are common among rare-disease patients and are contrary to the goals of precision medicine, especially in developing countries with large populations, such as China 8,9 .
Congenital cataracts (CC) is a typical rare disease that causes irreversible vision loss
. Breakthroughs related to CC have contributed substantially to medical science
To explore the feasibility of applying AI to the management of rare diseases, we used deep-learning algorithms to create CC-Cruiser, an AI agent involving three functional networks: (i) identification networks for screening for CC in populations, (ii) evaluation networks for risk stratification among CC patients, and (iii) strategist networks to assist in treatment decisions by ophthalmologists. The AI agent showed promising accuracy and efficiency in silico. We also conducted a multihospital clinical trial, website-based study and a ‘finding a needle in a haystack’ test to validate its versatility and utility. We then performed a test to compare the real-world performance of CC-Cruiser with individual ophthalmologists. We also established a cloud-based AI platform for rare diseases, and propose an operating mechanism for multihospital collaboration.
CC-Cruiser includes three types of functional network (Fig. 1a). The identification networks are intended to identify potential patients from a large population. Using evaluation networks, the AI agent provides comprehensive evaluations of disease severity (lens opacity) with respect to three different indices (opacity area, density and location) for risk stratification, and provides a reference for treatment decisions for CC patients. To assist ophthalmologists in decision-making, the strategist networks provide the final treatment decision (surgery or follow-up) on the basis of results from both the identification networks and the evaluation networks.
The training pipeline for CC-Cruiser’s deep-learning network is shown in Fig. 1b,c. The training set, which included 410 ocular images of CC of varying severity and 476 images of normal eyes from children, was derived from routine examinations conducted as part of the Childhood Cataract Program of the Chinese Ministry of Health (CCPMOH) 12 , one of the largest pilot specialized care centres for rare diseases in China. After an expert panel categorized the images, a deep convolutional neural network was used for training and classification (Methods).
In silico test
After applying K-fold cross-validation (K = 5) for the in silico test, optimized accuracy and proportions of false positives and missed cases were recorded 15. Using the identification networks, the AI agent distinguishes patients and healthy individuals with 98.87% accuracy. Using the evaluation networks, CC-Cruiser estimates the three risk indices — opacity areas, densities and locations — with accuracies of 93.98%, 95.06% and 95.12%, respectively. Using the strategist networks, the agent provides a treatment suggestion with an accuracy of 97.56%. The sums of the proportions of false positives and missed cases are below 10% in all three networks (Fig. 2a). All the evaluation indices are presented in Supplementary Table 1. We also calculated receiver operating characteristic (ROC) curves (Supplementary Fig. 1a) and confusion matrices (Supplementary Fig. 2a).
Multihospital clinical trial
All the cases used in the in silico test were collected from the CCPMOH. To further investigate the versatility and utility of CC-Cruiser, we conducted a multihospital clinical trial and used the cases from non-specialized hospitals for testing. Between January 2012 and March 2016, three collaborating hospitals were involved in a phase-one trial for validation of the AI agent and platform, and 57 patients were enrolled from these collaborating hospitals (Methods). Taking the expert panel’s decision as a reference, the agent demonstrated 98.25% accuracy with the identification networks. With the evaluation networks, CC-Cruiser estimated opacity areas, densities and locations with accuracies of 100%, 92.86% and 100%, respectively. The strategist networks provided treatment suggestions with 92.86% accuracy. The sums of the proportion of accurate diagnoses, false positives and missed diagnoses are shown in Fig. 2b. All the evaluation indices are shown in Supplementary Table 1. We also report ROC curves (Supplementary Fig. 1b) and confusion matrices (Supplementary Fig. 2b).
We collected 53 images from website-based databases (Methods). Although the quality of the online cases varied significantly, CC-Cruiser detected cases with accuracy equal to that observed in the previous tests. Using the identification networks, the accuracy was 92.45%; with the evaluation networks, the accuracies were 94.87%, 84.62% and 94.87% for opaque areas, densities and locations, respectively; the strategist networks lead to 89.74% accuracy. The detailed distributions of accurate, false positives, missed detection and evaluation indices are presented in Fig. 2c and Supplementary Table 1. ROC curves (Supplementary Fig. 1c) and confusion matrices (Supplementary Fig. 2c) are also available.
‘Finding a needle in a haystack’ test
To further validate the AI agent’s ability when facing a realistic rare-event test, we collected from the CCPMOH a data set with a real-world ratio of rare-event disease to normal cases. The data set consisted of a total of 300 normal cases and 3 cataract cases of differing severity. The data set was divided into three rounds (100:1 ratio of normal:cataract) for a three-time independent validation. The agent successfully excluded the normal cases, identified the three cataract cases (total dense cataract in round one, non-dense perinuclear cataract in round two and Y-suture cataract in round three), and provided accurate evaluations and treatment decisions (Fig. 2d).
We have compared the results from the AI agent and the expert panel — but, of course, in real-world clinical practice there is typically only a doctor and a patient rather than a panel. We have therefore performed a test to compare the real-world performance of CC-Cruiser against that of individual ophthalmologists.
A test paper consisting of 50 cases involving various challenging clinical situations was designed and approved by an expert panel in order to evaluate real-world performance (Fig. 3a and Supplementary Information). CC-Cruiser and ophthalmologists with three degrees of expertise (expert, competent and novice) independently completed the same test paper without any additional information (Methods).
The test results are presented in Fig. 3b. The ocular images of five representative cases with more inaccurate detections and their respective reference labels are provided in Fig. 3c. Using the identification networks (normal versus cataract), the AI agent successfully detected all potential patients among the 50 cases. All the ophthalmologists incorrectly flagged case 3 because they used the previous case 2 (normal case, see Supplementary Information) as a reference, and mistook the high illumination intensity for a cataract. The agent’s evaluation networks performed well (9 false positives and 4 missed detections) when compared with individual ophthalmologists (expert: 5 false positives and 11 missed detections; competent: 11 false positives and 8 missed detections; novice: 12 false positives and 20 missed detections). The strategist networks also performed well, with 5 false positives and accurate treatment suggestions for all the patients in need of surgery (notably, no missed detections), which is comparable with the performance of the ophthalmologists (expert: 1 false positive and 3 missed detections; competent: 3 false positives and 1 missed detection; novice: 8 false positives and 1 missed detection). CC-Cruiser also provided accurate detections for all the normal and surgery cases. Therefore, we regard the performance of the AI agent to be comparable to that of a qualified ophthalmologist.
To establish a cloud-based multihospital collaboration for the management of CC, we built the CC-Cruiser website-based platform. The website (https://www.cc-cruiser.com/version1) includes user registration, case upload, evaluation by the three networks, sample downloading, and interaction between patients and CCPMOH doctors. A demonstration video of the instructions for using the CC-Cruiser website and multihospital AI platform is provided as Supplementary Video 1.
Before using the website for the first time, a user must register on the system. Demographic information, including name, sex, age and contact information, is required to prevent unauthorized use and to ensure that the doctors in the CCPMOH can communicate with the patients. Training instructions are presented after registration to help new users (Supplementary Fig. 3a).
The user has the option to upload a new case to CC-Cruiser (Supplementary Fig. 3b). Following the upload function, the output is presented on the website, including the results of the three networks: primary screening (normal versus cataract), comprehensive evaluation (opacity area, density and location), and final treatment decision (surgery versus follow-up) (Supplementary Fig. 3c).
For users who wish to test the AI agent, we also provided 50 typical sample cases for download. These cases cover most clinical situations: normal cases, cataracts with no need for surgery, various severities of opacity, and high-risk cases that should receive urgent intervention (Supplementary Fig. 3d).
Email and daytime telephone services were offered for all registered patients. Cases determined as needing surgery were referred to the CCPMOH for fast-track action. The administrator of the CCPMOH has the access to review all of the cases uploaded while ensuring data privacy, and contacts patients if needed (Supplementary Fig. 3e).
Cloud-based multihospital AI platform
A cloud-based platform for easier access to treatment for rare-disease patients is very much needed. We therefore created a multihospital platform that uses the CC-Cruiser website. As shown in Fig. 4 and Supplementary Video 1, when potential patients come to the non-specialized collaborating hospitals for ophthalmic evaluation, their demographic information, clinical data (images) and contact information are collected with their permission and immediately sent to the CC-Cruiser cloud platform. The AI agent then provides a comprehensive evaluation via the three networks and saves all of the information in a database. If the strategist networks recommend surgery, the fast-track notification system is triggered and an emergency notification is sent to doctors in the CCPMOH for immediate confirmation. Patients are then notified of the need for comprehensive examination at the CCPMOH. Additionally, to identify and prevent potential misclassification, on a weekly basis CCPMOH doctors check all cases (including normal and cataract) according to priority, determined on the basis of the evaluation by CC-Cruiser. The doctors can then communicate with the patients to confirm the need for timely surgery and to ensure that they fully understand the importance and urgency of their disease.
Because data heterogeneity is inevitable in clinical practice, the ability to tackle multisource and wide-format data is essential. Previous studies have investigated and made contributions to the classification of senile cataracts by using fundus images 16,17 . However, when compared to senile cataracts, the phenotype of CC is far more varied, which makes the classification of CC images more complex, influencing decision-making and patient prognosis. The ocular images used in this study are opaque lenses with varying degrees of opacity, and present distinct challenges. First, the opacity of CC is peculiarly complex, and there are no standard categories for the classification of CC morphology 18 . Second, the illumination intensity, angle and image resolution vary across different imaging machines and doctors, which is a source of significant heterogeneity in our data set, especially in that used for the website-based study. Third, eyelids, eyelashes and pupil size can obscure the lens. Nevertheless, the high accuracy of the three networks of CC-Cruiser in all of the tests demonstrates that the deep-learning-based agent can robustly recognize CC features in the real world. This highlights the advantages of deep-learning algorithms when facing realistic, clinically challenging situations.
Breakthroughs in AI, including those that led to the accomplishments of AlphaGo, have been achieved by using big data 19 . For rare diseases, it is difficult to obtain access to high-quality data and to develop models for disease management. The limited resources of patients and the isolation of the data in individual hospitals represent a bottleneck in data usage. Building a collaborative cloud platform for data integration and patient screening is an essential step in the development of treatments for rare diseases 12 . Our multicentre collaborative network to explore the management of rare diseases and the cloud-based AI platform for providing medical suggestions for non-specialized hospitals are designed to improve quality of care for rare diseases through pre-screening, evaluation and treatment suggestions. The platform will also collect and integrate precious rare-disease data, and the increasing size and coverage of the data set will extend the abilities of the AI agent.
The collaborative platform could be extended to the management of other rare diseases, and should be validated in different clinical scenarios. In fact, highly accurate diagnoses by AI have been primarily achieved in specific situations 20 . Further efforts are needed to explore the feasibility of clinical implementation more broadly, as well as whether the combination of AI and physician expertise could lead to improved quality of patient care, especially for intractable healthcare problems.
Data collection and labelling before agent training
The training data set, which included 410 ocular images of CC of varying severity and 476 images of normal eyes from children, was derived from routine examinations conducted as part of the CCPMOH 12 , one of the largest pilot specialized care centres for rare diseases in China. Images covering the valid lens area were eligible for training. There were no specific requirements for imaging pixels or equipment.
Each image was independently described and labelled by two experienced ophthalmologists, and a third ophthalmologist was consulted in the case of disagreement. The expert panel was blind and had no access to the deep-learning predictions. The identification networks involved two-category screening to distinguish between normal eyes and those with cataracts. There were no agreed-upon gold-standard criteria for CC assessment due to the complexity of the morphology of opacity 18 . For the function of the evaluation networks, the expert panel therefore defined three critical-lesion indices for comprehensive evaluation and treatment decisions. Opacity area was defined as ‘extensive’ when the opacity covered more than 50% of the pupil; otherwise, it was labelled as ‘limited’. Opacity density was defined as ‘dense’ when the opacity fully disrupted vision; otherwise, it was labelled as ‘non-dense’. Opacity location was defined as ‘central’ when the opacity fully covered the visual axis area; otherwise, it was labelled as ‘peripheral’. For the function of the strategist networks, patients showing dense opacity fully covering the visual axis were considered to require immediate surgery to remove the cloudy lens. For pre-processing, auto-cutting was employed to minimize noise around the lens, and auto-transformation was conducted to save the image at a size of 256 × 256 pixels.
Deep-learning convolutional neural network
A deep convolutional neural network, derived from the championship model from the ImageNet Large Scale Visual Recognition Challenge of 2014 (ILSVRC2014) 21 , which is considered to dominate the field of image recognition, was used for training and classification 22 . This model contained five convolutional or down-sample layers in addition to three fully connected layers, which have been demonstrated to show adaptability to various types of data 23 . The first seven layers were used to extract 4,096 features from the input data, and Softmax classifier 22,24 was applied to the last layers. The architecture of the deep convolutional neural network is provided in Supplementary Fig. 4 in the form of a diagram highlighting the arrangement of each layer.
Specifically, several techniques, including convolution
, overlapping pooling
In silico test and validation independence
K-fold cross-validation (K = 5) was applied for the in silico test. After in silico testing, the models trained on all of the training data sets were used for real-world pilot testing and the comparative test. The data sets for training were not used for testing. The trained deep-learning model was frozen before any validations. The deep-learning predictions were time-stamped, verified and saved by an individual who was blinded to the expert-panel labels to ensure that there was no information leak or double-dipping when comparing the predictions against expert panel labels.
Multihospital clinical trial
Three non-specialized collaborating hospitals (two sites in Guangzhou City and one in Qingyuan City) were involved in a phase-one trial for the validation of the utility of the AI agent and cloud-based platform. Fifty-seven eligible patients (43 normal and 14 cataracts) from January 2012 to March 2016 were enrolled in this trial. Informed consent was obtained from all subjects. All general medical practitioners participating in the pilot study were taught to master the use of CC-Cruiser before they obtained access to the platform. Patient information was uploaded by medical practitioners and evaluation results were collected from the platform. The research protocol was approved by the Institutional Review Boards/Ethics Committees of Sun Yat-sen University. The study was registered with ClinicalTrials.gov (identifier: NCT02748044).
We performed image searches of the search engines Google, Baidu and Bing throughout March 2016. In our image searches, we included a combination of keywords, such as ‘congenital’, ‘infant’, ‘pediatric cataract’ and ‘normal eye’, in the form of title words or medical subject headings. Two of the authors (E.L. and H.L.) completed the searches independently. In addition, they cross-checked and confirmed that all of the cases collected were congenital cataract or normal eye cases. When discrepancies arose, consensus was achieved after further discussion. All confirmed images (13 normal and 40 cataracts) were subsequently sent to CC-Cruiser for evaluation.
‘Finding a needle in a haystack’ test
A total of 300 normal cases and 3 cataract cases with different severity collected from the CCPMOH were pooled as a test data set considering rare-event:normal ratios. The data set was divided into three rounds (normal:cataract at a ratio of 100:1) for a three-time independent validation. We selected a total dense cataract (need timely surgery) in round one, a non-dense perinuclear cataract in round one and a Y-suture cataract in round three (both are easily missed) to test whether the intelligence agent could pick up the ‘needle’ (1 cataract case) in the ‘haystack’ (100 normal cases).
A test paper consisting of 50 cases involving various challenging clinical situations was designed and approved by an expert panel to compare the real-world performance of the AI to that of individual ophthalmologists. Options for diagnosis (normal versus cataract), severity (extensive versus limited, dense versus non-dense, and central versus peripheral) and treatment decision (surgery versus follow-up) were required for each case. In this test, three ophthalmologists of varying expertise (expert, competent and novice) were asked to independently complete the same test paper as the intelligence agent without any additional information. The expert ophthalmologist was a professor with over 10 years of experience in the cataract department. The competent ophthalmologist was a resident who had finished both clinical training and specific training for ophthalmology, and the novice ophthalmologist was a student who had completed their theoretical studies in ophthalmology and had begun clinical practice. The test paper is available in the Supplementary Information.
The source code for the AI agent CC-cruiser is available in Github at https://github.com/longerping/cc-cruiser.
The CC-Cruiser website is available at https://www.cc-cruiser.com/version1. The instructions for using the CC-Cruiser platform are provided in Supplementary Video 1. The authors declare that all other data supporting the findings of this study are available within the paper and its Supplementary Information.
How to cite this article: Long, E. et al. An artificial intelligence platform for the multihospital collaborative management of congenital cataracts. Nat. Biomed. Eng. 1, 0024 (2017).
The authors acknowledge the involvement of the China Rare-disease Medical Intelligence (CRMI) Collaboration, which currently consists of Zhongshan Ophthalmic Centre (H.L., E.L. and Y.L.), Guangdong General Hospital (J. Zeng), Qingyuan People’s Hospital (Y. Lu) and the First Affiliated Hospital of Guangzhou University of Traditional Chinese Medicine (X. Yu). The CRMI Collaboration may be extended with more participating hospitals in the future. This study was funded by the 973 Program (2015CB964600), the NSFC (91546101), the Guangdong Provincial Natural Science Foundation for Distinguished Young Scholars of China (2014A030306030), the Youth Pearl River Scholar Funded Scheme (H.L., 2016), and the Special Program for Applied Research on Super Computation of the NSFC-Guangdong Joint Fund (the second phase).
Instructions for using the CC-Cruiser website.