Introduction

Myopia is a global public health concern. It is estimated that 57% of countries will have a myopia prevalence of more than 50% by 20501. The World Health Organization reported in 2019 that at least 2.2 billion people have a vision impairment, of whom at least 1 billion have a vision impairment that could have been prevented2. As myopia is currently difficult to be cured completely, it is vital to prevent its onset and progression. An early and appropriate intervention can effectively mitigate the risks and consequences related to myopia3. The spherical equivalent (SE) is the basis for screening and diagnosing myopia4. Quantitative prediction of SE can indicate the specific changes in the progression of myopia, and help in designing targeted interventions in advance. Previous studies have reported a number of risk factors for the onset or progression of myopia, including age, gender, heredity, outdoor activities, etc.5,6,7. Matsumura et al.8 suggested that the historical progression of myopia is associated with future changes in visual acuity. Therefore, we believe that the historical vision records, together with other demographic information, can be used to quantitatively predict SE.

In recent years, a growing body of research has considered the prediction of myopia or high myopia in different populations9. Most known studies used traditional models like linear regression, support vector machines, decision trees, and so on10,11,12,13,14,15,16,17. In comparison, deep learning can be trained with complex and nonlinear parameters to learn data structures18, and is deemed to perform better than traditional models in a variety of medical prediction tasks19,20,21,22. However, there are only few applications of deep learning in myopia prediction.

Spadon et al.23 argued that the temporal dynamics provides valuable information in addition to static symptom observation. However, in usual vision records, the uneven distribution of time intervals between historical records makes the extraction of temporal features very difficult. This paper uses Time-Aware Long Short-Term Memory (T-LSTM) to capture the temporal features in irregularly sampled time series, and to quantitatively predict children’s and adolescents’ SE based on their variable-length historical vision records. The proposed method is widely applicable.

Methods

Data description

The dataset for this study contains 232,244 historical vision records from 37,586 school-aged children and adolescents (aged 6–20 years) in Chengdu, China. They were collected by Eye See Inc. from October 2019 to March 2022 through unscheduled refractive screening in schools. Eye See Inc. is a company in Chengdu, China, providing medical services for myopia prevention and control. As of December 2022, Eye See Inc. has completed myopia screening for more than 1,600,000 children and adolescents in more than 2000 schools. Tumbling E Logarithmic Visual Acuity Chart (under the National Standard of the People’s Republic of China No. GB11533-2011), Slit Lamp Microscope (SL-3G, Topcon), Auto Kerato-Refractometer (KR-800, Topcon), and Optical Biometer (AL-Scan, Nidek) were used for data collection. Inclusion criteria: elementary, middle and high school students between the ages of 6 and 20. Exclusion criteria: students who did not obtain consent from their parents or their guardians, students who were unable to cooperate with the examination or did not complete the examination due to intellectual or physical reasons. The following examinations were performed based on the standard clinical protocols: (1) distant vision examination; (2) slit-lamp microscope examination; (3) pre-cycloplegic objective refractive examination; (4) axial length measurement.

The myopia diagnostic criteria associated were developed in accordance with the Consensus on Myopia Management for Asia 2021, published by the Asia Optometric Management Academy (AOMA) and Asia Optometric Congress (AOC)4. Based on SE when the eye is relaxed, the criterion of myopia is \(SE \le -\,\, 0.5\ (D)\), and the level of myopia is classified as follows: (1) low myopia: \(-3.0\ (D) < SE \le -\,\,0.5\ (D)\); (2) moderate myopia: \(-\,\,6.0\ (D) < SE \le -\,\,3.0\ (D)\); (3) high myopia: \(SE \le -\,\,6.0\ (D)\).

The cleaned dataset contains 75,172 eyes (samples) of 37,586 children and adolescents. Each sample is associated with 2–6 records. The number of samples with 2, 3, 4, 5 and 6 records is 27,015, 18,732, 25,109, 4,314 and 2, respectively. The interval time between the first record and the last record for any sample ranges from 1 (\(< 1\ quarter\)) to 10 (\(\ge 9\ quarters, < 10\ quarters\)). Each record is associated with 16 features, as described in Table 1. Figure 1 shows distributions of these features. A possible time-series data for an adolescent is shown in Table 2.

Table 1 Feature description.
Figure 1
figure 1

The distributions of features. Particularly, as children and adolescents grow taller over time, the size of eyeballs will gradually elongate, so the overall distribution of axial lengths is divided to three distributions, each corresponds to one age group.

Table 2 The time-series data for an adolescent whose Id is 1.

Data preprocessing

Firstly, in order to exclude the interference between categories of the original sequential encoding, one-hot encoding was performed for the unordered categorical features, say correction method and gender. It creates unit vectors for each option within the categorical feature, where the dimensionality of the vector equals the number of categories24. For example, a possible one-hot encoding for gender, as above, is male: (1, 0), and female: (0, 1).

After one-hot encoding, the features were standardized except for Id and Check date to speed up the convergence of the model. The standardization rescales the sample mean to zero (\(\mu =0\)) and variance to unit (\(\sigma =1\))25, as

$$\begin{aligned} x^\prime =\frac{x\ -\ \mu }{\sigma }. \end{aligned}$$
(1)

To increase the sample size, the historical records of a child or an adolescent were split into several samples, ensuring that all input data used for training and predicting is recorded before the label (i.e., the SE value). For example, a child’s or an adolescent’s 4 records (a, b, c, d) can be split into 11 samples as shown in Table 3.

Table 3 The enhanced samples from a child’s or an adolescent’s 4 records (a, b, c, d).

The number of samples in the dataset increased to 490,420 after the data preprocessing. The sample sizes are 277,035, 162,348, 46,709, 4,326 and 2 for sequence lengths of 1, 2, 3, 4 and 5, respectively. Particularly, sample with sequence length 5 is too few to be included in the training. The dataset was then divided into layers by the lengths of sequences. Each layer was further divided into training set (80%), validation set (10%) and testing set (10%).

LSTM

Recurrent Neural Network (RNN) is a neural network structure that can effectively link contextual information to achieve long term memory, but suffer from the problem of gradient vanishing or exploding26,27. To solve this challenge, Hochreiter et al.28 proposed the method named Long Short-Term Memory (LSTM), which is a variant of RNN, combining short-term memory with long-term memory through gate control. LSTM solves the problem of gradient vanishing to a certain extent and allows for the learning of long-term dependent information.

Standard LSTM unit (Fig. 2a) consists of a forget gate, an input gate, an output gate and a cell state. The current state \(h_t\) is influenced by the previous state \(h_{t-1}\) and the current input \(x_t\).

Forget gate:

$$\begin{aligned} f_t=\sigma \left( W_fx_t+U_fh_{t-1}+b_f\right) \end{aligned}$$
(2)

Input gate:

$$\begin{aligned} i_t=\sigma \left( W_ix_t+U_ih_{t-1}+b_i\right) \end{aligned}$$
(3)
$$\begin{aligned} \widetilde{C_t}=tanh{\left( W_cx_t+U_ch_{t-1}+b_c\right) } \end{aligned}$$
(4)

Output gate:

$$\begin{aligned} o_t=\sigma \left( W_ox_t+U_oh_{t-1}+b_o\right) \end{aligned}$$
(5)
$$\begin{aligned} h_t=o_t\cdot t a n h{\left( C_t\right) } \end{aligned}$$
(6)

Cell state:

$$\begin{aligned} C_t=f_tC_{t-1}+i_t\widetilde{C_t} \end{aligned}$$
(7)

where \(\sigma\) and tanh represent the activation functions, and WU and b are the learnable parameters.

The standard LSTM assumes that the time intervals between sequential elements are uniformly distributed, and thus cannot handle the problem with irregular time intervals.

T-LSTM

T-LSTM (Fig. 2b) introduces time interval information based on the standard LSTM, and attenuates the short-term memory according to the time intervals in order to capture the temporal dynamics of the sequential data with temporal irregularity29. T-LSTM accepts two inputs: the current record and the current time step elapsed. T-LSTM differs from the standard LSTM primarily in the subspace decomposition of the previous time step, which adjusts the short-term memory according to the time intervals between records. The subspace decomposition method does not change the effect of the current input on the current output, but changes the effect of the previous memory on the current output. Specifically, T-LSTM adds the following features to the standard LSTM: (1) Short-term memory \(C_{t-1}^S\), obtained through the memory of the previous time step, as

$$\begin{aligned} C_{t-1}^S=tanh{\left( W_dC_{t-1}+b_d\right) }. \end{aligned}$$
(8)

(2) Discounted short-term memory \({\hat{C}}_{t-1}^S\), obtained by weighting \(C_{t-1}^S\) with time elapsed, as

$$\begin{aligned} {\hat{C}}_{t-1}^S=C_{t-1}^S\cdot g\left( \Delta _t\right) . \end{aligned}$$
(9)

(3) Long-term memory \(C_{t-1}^T\), which is the supplementary subspace of short-term memory, as

$$\begin{aligned} C_{t-1}^T=C_{t-1}-C_{t-1}^S. \end{aligned}$$
(10)

(4) Adjusted previous memory \(C_{t-1}^*\), obtained through combining discounted short-term memory and long-term memory, as

$$\begin{aligned} C_{t-1}^*=C_{t-1}^T+{\hat{C}}_{t-1}^S. \end{aligned}$$
(11)

Application of T-LSTM in myopia prediction

The input of each cell of T-LSTM is the current record \(x_t\) and the time interval \(\mathrm {\Delta }_t\) between \(x_{t-1}\) and \(x_t\). The output is the current state \(h_t\). In the myopia prediction model proposed in this paper, the input of each cell is changed to the current record \(x_t\) and the time interval \(\mathrm {\Delta }_{t+1}\) between \(x_t\) and \(x_{t+1}\). There are two kinds of inputs, namely records and time intervals. The record of an individual is an \(n\times 16\) matrix containing n checks, and in each check there are 16 features (after the one-hot encoding, the number of features related to gender and correction method becomes 4). Correspondingly, the time intervals of this individual is a vector containing n time interval values. The last time interval value is the same to the prediction duration. The output is the next state \(h_{t+1}\). The final prediction is the output of the last step which is passed through the fully connected neural network. The structure of the model is shown in Fig. 2c. When performing myopia prediction, the values of visual acuity at any future moment can be predicted by changing the value of the last time interval. The training parameters of the model are as follows: Learning Rate = 0.0001, Batch Size = 256, Optimizer is Adam Optimizer, Epochs = 500, RNN Layers = 1, T-LSTM Hidden Size = 1024, and Early Stopping Patience = 10.

Figure 2
figure 2

The structure of LSTM, T-LSTM and T-LSTM in myopia prediction, where x denotes the temporal input data, C is the cell state representing the long-term memory, h is the hidden state representing the short-term memory, \(\mathrm {\Delta }_t\) is the time interval between records \(x_t\) and \(x_{t-1}\), \(\sigma\) is the sigmoid activation function, and tanh is the tanh activation function.

Metrics

The model’s cost function is the mean square error (MSE) of SE, often referred to as the loss. The MSE lies in the range \([0, +\infty )\), as

$$\begin{aligned} MSE=\frac{1}{m}\sum _{i=1}^{m}\left( y_i-{\hat{y}}_i\right) ^2, \end{aligned}$$
(12)

where \(y_i\) is the actual value, \({\hat{y}}_i\) is the predicted value, and m is the number of samples. Equation (12) is a smooth, continuous and everywhere derivable function, and thus being convenient for the gradient descent algorithm. The prediction performance of the model is evaluated by the mean absolute error (MAE), which is the average of the absolute deviations, as

$$\begin{aligned} MAE=\frac{1}{m}\sum _{i=1}^{m}\left| \left( y_i-{\hat{y}}_i\right) \right| . \end{aligned}$$
(13)

It takes values in the range of \([0, +\infty )\). A smaller MAE indicates a better model.

Ethics declarations

The experimental protocol was established, according to the ethical guidelines of the Helsinki Declaration and was approved by the Human Ethics Committee of University of Electronic Science and Technology of China (No. 106142022101324706). Written informed consent was obtained from individual or guardian participants.

Results

After 405 training iterations, the model converges with the loss (i.e., MSE) of the training process displayed in Figure 3. The MAE of future SE is 0.103 ± 0.140 (D) on the testing set. The stratified MAE is shown in Table 4. When sequence lengths are 1, 2, 3 and 4, the corresponding MAE ranges from 0.115 (D) to 0.187 (D) for 2 to 10 quarters, 0.082 (D) to 0.109 (D) for 2 to 6 quarters, 0.071 (D) to 0.079 (D) for 2 to 4 quarters and 0.040 (D) for 2 quarters, respectively. When the levels of myopia are no myopia, low myopia, moderate myopia and high myopia, the corresponding means and standard deviations of MAE are 0.116 ± 0.127 (D), 0.100 ± 0.136 (D), 0.094 ± 0.147 (D) and 0.153 ± 0.237 (D), respectively. When the age groups range from 6 to 8, 9 to 11, 12 to 14, 15 to 17 and 18 to 20, the corresponding means and standard deviations of MAE are 0.121 ± 0.156 (D), 0.099 ± 0.132 (D), 0.091 ± 0.128 (D), 0.088 ± 0.134 (D) and 0.056 ± 0.074 (D), respectively. Four case examples are shown in Fig. 4. The prediction curves well capture the trend of the SE changes while there are some unstable fluctuations that may be resulted from sparse records. Overall speaking, the longer the sequence length and the shorter the prediction duration, the smaller the prediction error. The MAE of SE within 0.75 (D) is considered to be a clinically acceptable prediction13. Based on the accuracy and robustness of the model, as well as the variance of the prediction performance, the model provides a clinically valuable prediction of children’s and adolescents’ vision in the short and medium term.

The result of the T-LSTM, standard LSTM, Random Forest (RF), and Linear Regression (LR) is shown as Table 5. Since LSTM, RF and LR do not specifically deal with time intervals, the time intervals are treated as one additional feature added to the input records, and thus the input record of an individual in those models is an \(n\times 17\) matrix. Because RF and LR can only handle fixed-length sequences, separated models were trained for different length sequences. As shown in Table 5, the overall MAE of T-LSTM is much better than the other three models. The reason why T-LSTM and LSTM outperform RF and LR lies in the fact that the former two models have the ability to capture long-term dependencies in data, and the reason why T-LSTM outperforms LSTM is that the former model can better capture temporal tendency by separately process temporal features.

Figure 3
figure 3

The change of MSE in the model. An epoch means training the neural network with all the training data for one cycle.

Figure 4
figure 4

Four case examples of prediction using T-LSTM. In each example, the curve except the starting point means the predicted value, and the data points denote the true values.

Table 4 The MAE of SE in the testing set for T-LSTM.
Table 5 Comparison of the MAEs of different models.

Conclusions

As the symptoms of myopia are not typical, they are often ignored by parents in the early stages of development. However, if low myopia is not controlled, it can lead to high myopia and very serious blinding ocular complications, such as posterior scleral and macular degeneration, as well as a substantially higher chance of developing cataracts and glaucoma30,31. The earlier the onset of myopia, the more likely the eye axial length will elongate, the faster myopia will progress, and the higher the final diopter32. This paper can quantitatively predict the children’s and adolescents’ SE within two and a half years, and help to identify the progression of myopia earlier so that targeted interventions and corrective measures can be taken. This is of great significance for the prevention and control of myopia.

As the development of myopia is affected by a number of complex factors, such as heredity, environment, and behaviors33,34, to achieve accurate myopia predictions is challenging. Deep learning is able to infer new features from the limited sets of features contained in the training set, while avoiding complex feature engineering. This paper applied T-LSTM to captured the temporal features in irregularly sampled time series, which is more in line with the characteristics of real data and thus has higher applicability.

Discussion

To the best of our knowledge, only a very small number of studies include quantitative predictions of future visual acuity. Among them, Lin et al.13 achieved quantitative prediction of future SE in a study of nearly 130,000 people in Guangdong, China, 2018, where the MAE for 1 to 8-year SE prediction ranges from 0.253 to 0.799. This paper achieves higher prediction accuracy on a smaller dataset. In usual vision records, the uneven distribution of time intervals between historical records and the variable lengths of records make the utilization of temporal information very difficult for traditional methods. The proposed T-LSTM model is capable to handle data of indefinite sequence lengths, and can well capture temporal tendency by separately processing temporal features, even if the time intervals are irregular. This study can indicate the trend of refraction and visual acuity in the next two and a half years. The results are interesting not only for medical institutions to make statistics, but also for parents to see the level of vision loss more intuitively. In this way, it will guide guardians to take their children for timely myopia correction and early myopia prevention and control, which is more important and proactive than the post intervention by medical institutions and will contribute to the prevention and control of early myopia in children and adolescents.

The current study has some limitations. Firstly, This is a short follow-up period to analyze via T-LSTM. However, the visual test datasets with long time periods are rare and the current dataset is hard-won. In addition, even with the short period, the T-LSTM show remarkable advantage compared with other benchmark methods, and even the standard LSTM outperforms the linear regression. Secondly, the sample area is concentrated, and thus the representation is insufficient. Thirdly, the depth of longitudinal data still needs to be enhanced. Fourthly, myopia progression is related to many factors. For example, Juntae et al.35 have found that retinal factors also contribute to myopic progression. However, our dataset only contains visual screening records and fundus images was not available in this study. Multimodal learning involving both fundus images and screening records may further improve the prediction accuracy.