Introduction

Myopia has become a global public health issue with it affecting the vision and even psychological health of children and adolescents, while imposing financial burdens on individuals, families, and society1. As myopia progresses, it increases the risk of ocular complications, visual impairment, and even irreversible blindness2. According to a report by World Health Organization in 2022, over 2 billion people worldwide suffered from vision impairment caused by uncorrected refractive errors due to late detection1. Early detection and prediction of children at high risk of myopia onset and progression are needed for effective interventions to prevent vision loss and subsequent complications3.

In myopia onset prediction, ophthalmic measurements, including baseline spherical equivalent (SE) and axial length (AL)4,5, and lifestyle parameters e.g., time spent outdoors6,7, have been used. Among these, the model using cycloplegic SE as a predictor was the most accurate, with an area under the receiver operating characteristic curve (AUC) of 0.93. However, cycloplegic SE-based approaches have limitations in large-scale screening since longer time and greater efforts are required, particularly in rural settings due to the lack of cycloplegic drops and availability of opticians and optometrists8. Furthermore, a prediction model useful for intervention guidance is important for personalized myopia control, yet it is currently not available. Therefore, a predictive model using non-cycloplegic variables to estimate the risk of myopia progression can potentially provide guidance in intervention strategies for children.

Deep learning-based artificial intelligence (AI) algorithms have recently achieved great success in retinal fundus image analysis in ophthalmic studies. Specifically for fundus images, deep learning has been applied to extract novel information, such as refractive errors with remarkable accuracy9. Recently, two studies in Singapore have used deep learning to detect high myopia and then to predict a 5-year high myopia risk in children10,11. While these studies demonstrated the feasibility of using retinal fundus images for myopia prediction, they come with certain limitations. The 5-year prediction horizon may be less clinically practical when compared to shorter-term predictions. Additionally, the focus on high myopia may limit their applicability to a more general population. Importantly, the previous studies mainly addressed myopia detection and high myopia risk prediction without considering interventions.

Apart from AI advances, improvements in camera hardware such as cameras built into modern smartphones, are capable of acquiring images in a cost-effective, portable, and convenient manner12. There are reported studies on a variety of smartphone-based fundus imaging applications that have become available13,14. This progress in technology may enhance the feasibility of myopia screening without the need for cycloplegia.

In this study, we developed a deep learning algorithm, named DeepMyopia, for retinal fundus image analysis in children and adolescents for myopia prediction. A large dataset acquired from a cohort with annual follow-up for 3 years was used. We identified that DeepMyopia as a decision support system, was capable: (1) to predict myopia onset using non-cycloplegic ophthalmic parameters; (2) to accurately identify individuals at high risk of myopia, such that allows to guide the downstream intervention; and (3) in reducing the burden of myopia. Our study serves as a proof-of-concept for establishing a robust and effective protocol in myopia prediction and management through a large-scale screening and intervention guidance for myopic children.

Results

Study design

Our study is based in Shanghai, China and the study pipeline is shown in Fig. 1. Our developed DeepMyopia system has been rigorously validated on the primary dataset acquired from Shanghai, which was then validated on external independent datasets acquired from different regions of China including Shanghai, Beijing, Hohhot, Urumqi, Kunming, Guangzhou, and Hong Kong.

Fig. 1: Schematic overview of DeepMyopia.
figure 1

a Dataset used in this study. b Evaluation and application of DeepMyopia. DeepMyopia enables myopia detection, myopia onset prediction within a 3-year timeframe, as well as risk stratification of myopia onset. An emulating randomized controlled trial (eRCT) was performed based on the risk stratification, extending the applicability of DeepMyopia in public health. c Application scenarios for DeepMyopia. DeepMyopia could identify myopic children among children population and predict the risk for the future myopia onset in non-myopic children, such that provided guided intervention. d The architecture of the proposed DeepMyopia. AL axial length.

Prediction of myopia onset

To detect myopia, the training and validation datasets included baseline data from two large prospective studies. Subsequently, the developed fundus model was validated on seven independent datasets. Myopia detection resulted in AUC of 0.995 with the internal test set and AUCs ranging from 0.885 to 0.951 with the external test set (Supplementary Fig. 1a). Based on the detected results, we developed four models to predict future risk of myopia onset using: (i) Fundus model with retinal fundus images as input, (ii) Cyc-metadata model, which included metadata such as age, sex, and SE from cycloplegic examinations, (iii) NonCyc-metadata model, which included age, sex, and AL from non-cycloplegic examinations, and (iv) DeepMyopia—a combination of (i) and (iii), including fundus retinal images, age, sex, and AL.

The performance of all four models in 3-year myopia onset prediction was depicted in Fig. 2 and Supplementary Table 1. The AUCs of the internal test for the fundus model were 0.870 (95% CI: 0.808–0.923), 0.801 (95% CI: 0.751–0.850), and 0.794 (95% CI: 0.741–0.843) for year 1, year 2, and year 3, respectively. DeepMyopia resulted in AUCs of 0.908 (95% CI: 0.871–0.943) in year 1, 0.813 (95% CI: 0.763–0.864) in year 2 and 0.810 (95% CI: 0.757–0.859) in year 3, respectively. The Cyc-metadata model achieved AUCs of 0.947, 0.843, and 0.845, while NonCyc-metadata model had AUCs of 0.672, 0.605, and 0.594, respectively of years 1–3. No significant differences were observed between DeepMyopia and the Cyc-metadata model while DeepMyopia outperformed the NonCyc-metadata model with significantly higher AUCs (Supplementary Table 2 and Supplementary Fig. 2).

Fig. 2: Performance of four models to predict myopia onset at various time points.
figure 2

The receiver operating characteristic (ROC) curves with the areas under the curve (AUCs) to depict the predictive performance for myopia onset across years 1 to 3. Year 1 (a), year 2 (b), year 3 (c) in the internal test set and in the external test set for year 1 (d), year 2 (e), year 3 (f). The fundus model utilized fundus images as input, while Cyc-metadata model incorporated age, sex, and SE from cycloplegic exams. NonCyc-metadata model included age, sex, and AL from non-cycloplegic examinations. The combined model integrated fundus images, age, sex, and AL.

Consistent results were obtained in the external test set (the Hong Kong Children Eye Study, HKCES), where the fundus model exhibited AUCs of 0.767, 0.761, and 0.758, the DeepMyopia achieved AUCs of 0.796, 0.808, and 0.767, Cyc-metadata model obtained AUCs of 0.952, 0.839 and 0.778, and NonCyc-metadata model attained AUCs of 0.728, 0.627, and 0.568. These models also achieved competitive performance in high myopia prediction as summarized in Supplementary Table 3.

Risk stratification of myopia onset

Accurate identification of children subject to myopia is essential for the efficient management of myopia. Thus, we developed Cox proportional hazards (CPH) models utilizing age, sex, AL and risk scores derived from the retinal fundus images. The performance of myopia risk stratification models was evaluated (Supplementary Table 4). For 1-year risk stratification, DeepMyopia demonstrated a high C-index of 0.82 (95% CI: 0.77–0.87), with a significant improvement in stratifying myopia onset risk compared to the NonCyc-metadata model (C-index: 0.57, 95% CI: 0.46, 0.66). For 2- and 3-year prediction, DeepMyopia also resulted in a high C-index of 0.81 and 0.75 while the NonCyc-metadata model achieved a C-index of 0.60 and 0.59.

We stratified the non-myopic children into low- and high-risk groups based on the risk score in the developmental dataset. The Kaplan–Meier curves achieved high degrees of separation between two groups in the internal test set (p < 0.001, Fig. 3a). DeepMyopia also demonstrated the capability in discriminating the low- and high-risk groups for developing myopia with a significant separation in HKCES dataset (p < 0.001, Fig. 3b).

Fig. 3: Kaplan–Meier plots to predict myopia onset according to risk stratification.
figure 3

Red and blue curves represent the survival probabilities of high and low risk groups respectively, and shaded areas represent 95% confidence intervals (CI). The number of individuals at risk at each time point is presented. The survival probability of DeepMyopia not becoming myopic in the internal longitudinal test set (a) and external longitudinal test set (b) over time progression. Statistical significance was tested using log-rank tests.

DeepMyopia-assisted emulating randomized controlled trial (eRCT)

Further validation of risk stratification was performed using an intervention cohort. We conducted an accurate screening-assisted eRCT (Fig. 1b) with a real-world cohort. For high-risk individuals identified by DeepMyopia or metadata models, intervention by more outdoor time (≥120 min/day) was implemented, while low-risk individuals were assigned less outdoor time (<120 min/day) as the control group. The performance of the models was assessed by comparing the myopia onset rate as the outcome measure. Figure 4 showed that interventions leveraging DeepMyopia had the potentials in preventing more myopia onset. After adjusting the covariates (Fig. 4a–c), no differences in myopia onset were observed within the group identified as high risk (with more outdoor time) by both DeepMyopia and the NonCyc-metadata model (adjusted relative reduction, ARR: 29.9%; 95% CI: −1.5%, 61.3%) (Fig. 4d). However, within the less outdoor time group, DeepMyopia exhibited prevention of myopia onset among low-risk populations with an ARR of −24.8% (95% CI: −36.9%, −12.8%). Moreover, in the overall population, the net beneficial effect from DeepMyopia-assisted interventions on myopia onset with an ARR of −17.8% (95% CI: −29.4%, −6.4%) compared to NonCyc-metadata-assisted interventions. Importantly, DeepMyopia did not show a significant difference in intervention benefits when compared to Cyc-metadata model (Supplementary Fig. 3, ARR: −2.3%; 95% CI: −15.9%, 11.2%).

Fig. 4: Emulating randomized controlled trial for DeepMyopia-assisted Intervention.
figure 4

a The distribution of the partial hazard score over the control (orange) and the intervention (blue) groups, with the DeepMyopia in the horizontal axis and the NonCyc-metadata model in the vertical axis. b The distribution of estimated propensity scores over the DeepMyopia (orange) and the NonCyc-metadata model (blue), with the DeepMyopia above the dashed line and the NonCyc-metadata model below the dashed line. c The standardized mean differences (SMD) of the top eight well-balanced covariates. The dashed line indicates the threshold of balancing. d Incidence of myopia and adjusted relative reduction of myopia incidence within subgroups with different time outdoors between the DeepMyopia (purple) and NonCyc-metadata model (pink). The error bars represent 95% confidence intervals (CI).

Effectiveness of DeepMyopia in reducing the disease burden of myopia

We used a Markov model to simulate the lifetime experience of myopia among children, such that allows us to evaluate the effectiveness of DeepMyopia-assisted intervention in reducing the disease burden of myopia and these results were presented in Fig. 5. At the baseline, all children were non-myopic, which were then progressively developed myopia and other associated diseases. With DeepMyopia-assisted intervention, the model predicted a gain of 0.33 (95% CI: 0.11, 0.62) quality-adjusted life years (QALYs) per person and avoided blindness years of 5.90 (95% CI: 1.92, 11.19) per 1 million persons compared to the NonCyc-metadata-assisted general intervention. In contrast to natural lifestyle with no active intervention, DeepMyopia-assisted intervention resulted in a gain of 0.75 (95% CI: 0.53, 1.04) QALYs per person and avoided blindness years of 13.54 (95% CI: 9.57, 18.83) per 1 million persons.

Fig. 5: Markov model for simulating the lifetime experience from normal to myopia, pathologic myopia, and myopic maculopathy-related blindness.
figure 5

The transition probabilities indicated the incidence per year and the probabilities were retrieved from published literatures21,41,42.

Discussion

Results of this study showed that DeepMyopia—an innovative deep learning model was capable to accurately predict myopia onset. In DeepMyopia, the inputs consist of retinal fundus images, AL, age and sex. The ZEISS IOLMaster used for AL measurement does not require cycloplegia and can operate efficiently in a non-cycloplegic setting, contributing to the practicality of our model in real-world applications where cycloplegic conditions may not be readily achievable. Other limitations of cycloplegic refraction, especially in a public health perspective in a screening setting include the procedure being time-consuming and are usually not accepted by young children. DeepMyopia performed well in risk stratification of myopia onset with a significant separation between low- and high-risk groups. By combining retinal fundus images with demographic and ophthalmic data, DeepMyopia demonstrated superior performance compared to metadata or fundus model. DeepMyopia was effective and allowed for early detection and thus effective intervention to prevent future myopia complications and reduce public health burden.

The fundus model’s myopia detection performance was evident from the achieved AUCs. By setting different operating thresholds15, we could fine-tune the model’s behavior based on the environment (Supplementary Fig. 1b). At high specificity operating point, the model was conservative, minimizing the risk of misdiagnosing individuals without myopia. If this threshold was exceeded, immediate attention and referral to an ophthalmologist would be warranted. This would help to reduce unnecessary cycloplegic exams for children. Conversely, at high sensitivity operating point, the model was less conservative. Children exceeding this threshold could prompt increased vigilance and scheduling the next screening sooner for those at risk.

DeepMyopia as a deep learning algorithm, shows it capability in accurately predicting the risk of myopia onset. While several studies have attempted to use different predictors to predict later myopia onset and achieve commendable accuracy4,5,6,7,16, these studies mainly focused on using the baseline cycloplegic SE, where cycloplegic autorefraction is considered as the gold standard for myopia assessment. Unfortunately, the use of cycloplegia in a day-to-day routine clinical workflow or a big screening program consisting of hundreds of thousands of children would require great time commitment and efforts. Young children may find it uncomfortable and there can be adverse effects like blurred near vision and photophobia.

Notably, our study showed the capability of DeepMyopia to achieve a high predictive accuracy even on non-cycloplegic ocular parameters. It provided a reliable 1-year myopia onset prediction with an AUC of 0.908 and maintained consistent accuracy after 3 years. The incorporation of retinal fundus images significantly improved the predictive accuracy, outperforming the NonCyc-metadata model. In comparison with the Cyc-metadata model constructed using the optimal predictive factors, DeepMyopia exhibited competitive performance, with no statistically significant differences. This characteristic feature is beneficial for large-scale screenings of children, offering a quick, easy, non-invasive, and reliable myopia assessment during routine medical check-ups.

Accurate identification of children at risk of myopia is crucial for effective myopia management, as it enables to recommend appropriate myopia prevention strategies. Encouraging outdoor activities has been recognized as a promising approach to prevent myopia onset in children17. However, the practical implementation of more outdoor times for school children may pose significant practical challenges, particularly within highly competitive educational systems18,19. While the primary benefits in clinical trials stem from the intervention itself, it is noteworthy that AI models, such as DeepMyopia, may bridge the gap between theoretical benefits and practical implementation. Our eRCT demonstrated DeepMyopia’s capacity to help and advance myopia management, as demonstrated by the significant reduction in myopia onset (an ARR of −17.8%). Different from other studies focused on simulated clinical trials20, our study delves into real-world applications of myopia prevention and control in children.

DeepMyopia’s intervention guidance is adjustable and could include measures such as atropine or red-light therapy, depending on individual risk profiles and clinician’s assessments. This flexibility allows to optimize resource allocation and to ensure individuals will receive interventions tailored to their specific risk profiles.

In addition to its predictive capabilities, we also evaluated the effectiveness of DeepMyopia in reducing the disease burden of myopia. Our model’s main novelty is the capability in helping us find suitable candidates for early myopia intervention. Combined with noncycloplegic refraction data, DeepMyopia has great potentials to be useful in public health screening. Although it does not directly contribute to the effectiveness of interventions, as we have identified suitable candidates, it is possible to improve the cost-effectiveness of the intervention. While the prevention of blindness and improvements in QALYs primarily result from the benefits of the intervention itself, there is an important role of AI models. For example, a recent study has shown that annual AI telemedicine screening in China is highly cost-effective in both rural and urban areas21. Our study further supports these findings, indicating that DeepMyopia could improve health outcomes when compared to no intervention or general intervention strategies.

Explainability analysis is conducted for investigating the mechanism of DeepMyopia in predicting myopia and high myopia. Specifically, we adopted Grad-CAM22 to highlight the most predictive feature for individual image. High gradient values indicate areas in the image that have a significant impact on the model’s decision. When predicting myopia onset, the optic disc and the diffuse reflectance signal around the macula in baseline retinal images were highlighted (Fig. 6a). Tilted optic disc and reduced macular reflection were observed in the corresponding regions of the retinal fundus images taken within a year of myopia onset. Optic disc tilting observed was consistent with previous studies23. While evidence linking macular reflections to myopia is lacking, it is commonly observed in fundus images of young, healthy individuals. Research has shown that increased AL in myopic eyes can affect the morphological and optical properties of the macula24, potentially resulting in reduced or lost reflectance signals around the macula. For high myopia onset prediction, the macula was highlighted (Fig. 6b). In retinal images taken in the year of high myopia onset, the presence of choroidal vessels was due to the thinning of the retinal layer in the macular area, resembling tessellated fundus25. This finding was consistent with the crucial role of the fovea in central vision, where its integrity correlates with the best-corrected visual acuity level9,26.

Fig. 6: Explainability of myopia onset predictions.
figure 6

Representative fundus images and their corresponding saliency maps for predicting myopia (a) and high myopia (b) onset. The images were from a participant who developed myopia (a) and (b) high myopia during the follow-up period. From left to right are the fundus images at the baseline, saliency map overlaid on the fundus images and follow-up images with developed myopia and high myopia. The color gradients used in the saliency maps indicate the degree of importance of the corresponding regions in the fundus image for prediction. Specifically, the regions with warmer colors indicate higher importance, while the cooler colors indicate lower importance. Zoomed view shows lesion areas, which align well with the saliency maps. SHAP-based model explainability is shown in (c), where each point represents a participant in the bee swarm plot. Different colors indicate the value of the feature, where red and blue represent high and low values, respectively. d Pie chart of the feature contributions, which was calculated by summing the SHAP values based on individual sets.

In addition, we conducted Shapley Additive exPlanations (SHAP)27 analysis as shown in Fig. 6c, d, revealing a positive correlation between the fundus and myopia onset prediction. SHAP analysis highlighted fundus’ significant role, contributing 49.06% of the predictive power. DeepMyopia might detect the early stage of myopia development through subtle morphological changes as revealed by fundus images including changes in retinal and choroidal thickness.

DeepMyopia had several strengths with clinical implications. (1) DeepMyopia utilized a longitudinal cohort with up to 3 years of follow-up data and thus provided a solid foundation for model development and validation. (2) Objective data including fundus images and non-cycloplegic ocular parameters at baseline were employed, avoiding subjective recall bias and eliminating the need for repeated longitudinal follow-ups. (3) This study was evaluated with a large cohort dataset consisting of images acquired from various sites. (4) We have also demonstrated the comprehensive performance of DeepMyopia’s assisted intervention in prediction, risk stratification, and facilitating targeted interventions without cycloplegic refraction.

Despite the advantages of the proposed DeepMyopia, there are several limitations with regard to its clinical applications. Firstly, we mainly focused on evaluating DeepMyopia with external cohorts using retinal images. Additional information such as physiological and genetic information, which are important biomarkers for myopia development, could be included for further analysis. Secondly, the predicted myopia onset was set to within 3 years. We plan to evaluate the long-term effectiveness of DeepMyopia beyond this timeframe. Thirdly, the initial testing of DeepMyopia was conducted on Chinese cohorts. However, recognizing the importance of global applicability, the adaptability and performance of DeepMyopia in diverse populations globally have to be investigated. Fourthly, the shift from high-quality, professionally captured images to smartphone-captured fundus images poses challenges. Variations in image quality, resolution, and lighting conditions may affect DeepMyopia’s predictive accuracy. Moreover, access to AL measurement outside of eye provider settings may be limited. However, new optical biometry devices are emerging, particularly in regions with high myopia prevalence like China, potentially offering improved accessibility. Lastly, despite advantages and benefits of DeepMyopia on myopia screening especially in public health setting, the prevention of myopia and its consequences come from the benefits of the intervention and not the DeepMyopia directly, although it should decrease cost or inconvenience for children who are not at risk. Future research should focus on adapting the model to diverse devices and ensuring reliable performance in real-world conditions.

In conclusion, this study presented a workable model for robust and effective myopia prediction and management. DeepMyopia, based on fundus image analysis and non-cycloplegic ocular parameters, demonstrated excellent performance in predicting myopia onset and provided a reliable and efficient tool for early detection and intervention guidance for children in the general population.

Methods

Ethical approval

The study protocol in Shanghai was approved by the Ethics Committee of Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine (approved in year 2016 and 2018), and adhered to the tenets of the Declaration of Helsinki. The study procedures in Kunming, Guangzhou, Hohhot, Beijing, Hong Kong were approved by the Ethics Committee of the First Affiliated Hospital of Kunming Medical University (approved in year 2015), Zhongshan Ophthalmic Center (approved in year 2007), the Affiliated Hospital of Inner Mongolia Medical University (approved in year 2016), Beijing Friendship Hospital Pinggu Campus, Capital Medical University (approved in year 2022), and the Chinese University of Hong Kong (approved in year 2015), respectively. Written informed consent was obtained from all children and their legal guardians. All fundus images and clinical data were de-identified prior to model development. The study was registered on the ClinicalTrials.gov (identifier number of NCT05835115).

Definitions

Based on cycloplegic autorefraction results, the SE was calculated by the sum of the sphere and half of the cylinder. Myopia and high myopia are defined as SE ≤ −0.50D and SE ≤ −6.00D28. Myopia onset was defined as the occurrence of myopia at the follow-up without myopia in the baseline timepoint. High myopia onset was defined as the occurrence of high myopia at follow-up with low myopia at baseline. In our study, metadata was defined as the ophthalmic clinical data and demographic data.

Development and validation datasets

We utilized the latest census data from a city-wide prospective survey (the Shanghai Child and Adolescent Large-scale Eye Study [SCALE])29 to pre-train the DeepMyopia. In the survey, visual acuity and non-cycloplegic autorefraction were performed for all participating children. SCALE consisted of 1,638,315 individuals (age range 4–14 years) in kindergarten, primary schools, and junior high schools in Shanghai. The characteristics of the pre-training set are provided in Supplementary Table 5.

To detect myopia, the training and validation datasets included baseline data from two large prospective studies: (1) a school-based cohort (the Shanghai Time Outside to Reduce Myopia [STORM] trial)30; and (2) a population-based cohort of High Myopia Registration Study (SCALE-HM)31. Cycloplegic autorefraction was performed. The data from two studies were split into training (70%), tuning (10%) and internal test (20%) sets at the participant level. The tuning set was used to tune the hyperparameters for the model development. The developed model was then validated on seven independent datasets. All external independent datasets included SE and retinal fundus images acquired with various digital retinal cameras. Demographic and ophthalmic information of the enrolled participants are provided in Table 1, Supplementary Table 6 and Supplementary Fig. 4.

Table 1 Baseline characteristics of participants in different datasetsa

To predict myopia onset, the STORM was used for the development (Supplementary Table 7). Since outdoor activities may affect the onset of myopia, only the control group was included. Participants with follow-up visits in HKCES32 were used as an external test set. HKCES is an ongoing territory-wide and population-based cohort study of Chinese children aged 6–8 years from primary schools in the city of Hong Kong.

An emulating randomized controlled trial (eRCT) was conducted to evaluate the effectiveness of DeepMyopia in risk stratification of myopia onset. A total of 3303 participants with a mean age of 7.8 years were selected. These participants were non-myopes in the intervention group of the STORM study.

Image quality control

Retinal fundus images from the SCALE, STORM and SCALE-HM studies were taken by swept-source optical coherence tomography (SS-OCT, DRI OCT Triton, Topcon, Tokyo, Japan) with an inserted 45-degree digital retinal camera. Retinal fundus images of the external datasets were taken by different digital retinal cameras including CR-2 (Canon, Tokyo, Japan), TRC-NW400 (Topcon), VISUCAM 224 (Zeiss, Jena, Germany), VISUCAM 524 (Zeiss), Handheld Fundus Camera (MicroClear Medical, Suzhou, China), TRC-50DX (Topcon), DRI OCT Triton (Topcon).

All images were first desensitized for reliable protection of sensitive private data. Subsequently, a retinal image quality assessment (RIQA) was conducted on four commonly used quality indicators: blurring, uneven illumination, low contrast, and artifacts. The images were categorized into three quality grades: “Good”, “Usable”, and “Reject”, based on the MCF-Net method33. Examples of excluded images are presented in Supplementary Fig. 5. The results after RIQA quality control were further subjected to manual review by two ophthalmologists (ZQ and TC). A senior ophthalmologist (XX) was required to adjudicate if there was any disagreement between ophthalmologists ZQ and TC.

Model development of the system

As DeepMyopia incorporated information from both retinal fundus images and NonCyc-metadata as input, we designed a dual branch pipeline in the pretraining procedure (Fig. 1d). This included one branch for image feature processing and another for NonCyc-metadata processing. In the image feature processing branch, we conducted a two-step pretraining procedure with a pre-trained ResNet-50 model34 trained with the ImageNet dataset35. Firstly, to effectively capture intrinsic patterns from retinal fundus images, the model was trained in a self-supervised way with an InfoNCE loss36 on all available fundus images from the SCALE dataset. In this process, without relying on annotations, augmented views of the same sample were regarded as positive pairs, which are learned to be closer in the embedding space. In contrast, negative pairs created from different samples should be pushed away. Secondly, to enhance the extraction of underlying refractive-related information, the model was subsequently fine-tuned to detect myopia from fundus images only. The model’s performance was tested on different test sets (Supplementary Fig. 1). Following this, the fundus model could be utilized for further fine-tuning specific downstream tasks, such as myopia onset prediction. In the metadata branch pretraining, age, sex, and AL from the SCALE dataset served as input, and we predicted myopia using two Multi-Layer Perceptrons (MLPs) in a supervised learning way. The first MLP acted as a metadata encoder, embedding NonCyc-metadata into an intermediate vector representation, which was used by the second MLP for myopia detection. Following this pretraining, the metadata encoder could be separately further used for myopia onset prediction in DeepMyopia.

For predicting myopia onset, all four models (fundus model, Cyc-metadata model, NonCyc-metadata model and DeepMyopia) output risk scores, which were probabilities of a binary classification task on whether a healthy individual would be myopic within 1 year, 2 years or more years. For the fundus model, an image-level risk score was generated using baseline retinal image. To achieve this, a fully connected layer with a SoftMax layer attached to the pretrained ResNet-50 model was employed, with a batch size of 32 images, an initial learning rate of 5 × 10−5, and the Adam optimizer with a weight decay of 10−6 during training. The eye-level retinal fundus score was the average of all image-level risk scores from the images of the respective eye. For DeepMyopia model, NonCyc-metadata were first passed through the pretrained metadata encoder to form a NonCyc-metadata-based vector. This vector was combined with eye-level retinal fundus score to generated risk scores. Stacking generalization37 was used to generate risk scores for Cyc-metadata, NonCyc-metadata and DeepMyopia model. It combined the predictions of multiple base models by training a meta-classifier on the outputs of these base models. Machine learning algorithm such as XGBoost, LightGBM, KNN, SVM, Random Forest, MLP, Logistic Regression were considered as potential base models and candidate meta models. Cross-validation was used to determine the optimal base models and meta models as well as optimal hyperparameters. The models with the best AUC on the internal tuning set were selected for further validation.

Notably, several image augmentation techniques were adopted during training. First, the original 512 × 512 images were randomly resized with crop size drawn from [0.08, 1.0] and aspect ratio drawn from [0.75, 1.33] by bicubic interpolation mode as primary augmentations. Secondly, RandAugment38 was applied to apply a series of random augmentation to the image. The specific augmentations were chosen randomly from a predefined set of pixel-level operations and affine transformations, such as posterize increasing, solarize increasing, contrast increasing, brightness increasing, and sharpness increasing and rotations, translations, and shearing. The strategy randomly selected two parameters: the number of augmentations to apply, and the strength of each augmentation. We used “rand-m9-mstd0.5-inc1” as the augmentation setting. Finally, we apply RandomErasing39 with p = 0.25 after normalization as the final augmentation. It randomly selected a rectangle with a side drawn from [0.02, 0.33] of the whole image and an aspect ratio drawn from [0.3, 3.3]. The pixel values in that region followed the normal distribution. To this end, a larger and more diverse dataset was feed to the model for better generalizability.

Risk stratification of myopia onset

For risk stratification, we trained CPH models using variables based on metadata and risk scores derived from the retinal fundus images. The fundus images-based risk score was a predicted score generated from the myopia prediction model for the first visit. The metadata and the fundus images-based risk score were combined to predict the risk of onset. High- and low-risk thresholds for developing myopia were determined based on the cutoff point derived from the distribution of the partial hazard score in the training set. The cutoff of stratification was based on a sensitivity value of 0.8 and a specificity being the highest among all the groups40. This approach ensured that high-risk individuals could be accurately identified while reducing the chances of incorrectly classifying them as low-risk. Supplementary Fig. 6 showed the distribution of partial hazard scores and the associated thresholds across the dataset.

Emulating randomized controlled trial

Our study focused on intervention of outdoor time as a proof-of-concept to demonstrate the efficacy of DeepMyopia in identifying high-risk children, thus improving the intervention effect in myopia control. Outdoor time was measured objectively with wrist-worn wearable devices. The potential confounders between the two groups of participants were balanced by inverse probability weighting (IPW) with a propensity score. The balance for each confounder was qualified using the standardized mean difference (SMD). The average treatment effects were estimated to evaluate the intervention effect on myopia onset between different subgroups using ARR.

Explainability of AI predictions

To visualize the specific regions within the retinal fundus images that DeepMyopia prioritized for predicting the myopia onset, we employed the Gradient-weighted Class Activation Mapping (Grad-CAM) method22. This approach allowed us to generate visual explanations by highlighting the areas contributing the most to the model’s output. In addition, SHAP27 based on the SHAP value was used to reveal the contributions of each feature in CPH model.

Statistical analysis

Participants’ demographic information was summarized using descriptive statistics. The data distribution was examined by Kolmogorov–Smirnov test. Continuous variables were reported as means and standard deviations (SDs), and categorical variables were reported as frequencies and percentages. Both eyes from the same participants were included in the analysis. The analyses were performed using Python (version 3.9) and R (version 4.2.2). A two-sided p < 0.05 was considered statistically significant.

The receiver operating characteristic (ROC) curve was drawn to calculate the AUC with 95% confidence interval (CI) and the corresponding sensitivity and specificity. For risk stratification, we constructed Kaplan–Meier curves for each group. We then tested for significant differences between the groups using the log-rank test. A Multistate Markov model was built to evaluate the effectiveness of DeepMyopia-assisted intervention. The primary outcomes were QALYs and avoided blindness years. The transition probabilities between states, age-specific mortalities and utility values for QALYs were all retrieved from published literatures21,41,42.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.