Predicting intraocular pressure using systemic variables or fundus photography with deep learning in a health examination cohort

The purpose of the current study was to predict intraocular pressure (IOP) using color fundus photography with a deep learning (DL) model, or, systemic variables with a multivariate linear regression model (MLM), along with least absolute shrinkage and selection operator regression (LASSO), support vector machine (SVM), and Random Forest: (RF). Training dataset included 3883 examinations from 3883 eyes of 1945 subjects and testing dataset 289 examinations from 289 eyes from 146 subjects. With the training dataset, MLM was constructed to predict IOP using 35 systemic variables and 25 blood measurements. A DL model was developed to predict IOP from color fundus photographs. The prediction accuracy of each model was evaluated through the absolute error and the marginal R-squared (mR2), using the testing dataset. The mean absolute error with MLM was 2.29 mmHg, which was significantly smaller than that with DL (2.70 dB). The mR2 with MLM was 0.15, whereas that with DL was 0.0066. The mean absolute error (between 2.24 and 2.30 mmHg) and mR2 (between 0.11 and 0.15) with LASSO, SVM and RF were similar to or poorer than MLM. A DL model to predict IOP using color fundus photography proved far less accurate than MLM using systemic variables.

Intraocular pressure (IOP) is a measure of the fluid pressure within the eye and it is an important marker for many ophthalmological diseases, including glaucoma, which is one of the world's leading causes of irreversible blindness 1 . IOP is the result of the balance between the rates of aqueous humor production at the ciliary body and aqueous outflow from the eye through the conventional and uveoscleral pathways. The magnitude of IOP is primarily decided by local factors, such as resistance of the trabecular meshwork and juxtacanalicular connective tissues [2][3][4] . However, in the conventional pathway, aqueous humor is drained into the Schlemm's canal and ultimately the episcleral vein 2 , and thus IOP is also affected by exogenous (systemic) factors, as suggested by a recent study 5 . Indeed, we recently investigated the associations of various systemic factors with IOP using a dataset from a health examination program database, and it was suggested that some of these were significantly associated with IOP level, including age, percent body fat, systolic blood pressure (SBP), pulse rate, albumin, and hemoglobin A1c (HbA1c) 3 . The first purpose of the current study was to investigate how much of IOP can be explained using various systemic factors.
It would be beneficial to predict IOP accurately using only systemic factors and without a tonometry at various settings, such as medical check-up, however it is presumed that IOP is not only decided by systemic factors, but also local (ocular) conditions. A fundus photography is one of the most representative and basic ophthalmological measurement. There have been remarkable recent developments in artificial intelligence (AI) and its application to a fundus photography. For instance, Poplin et al. showed that the sex of an individual can be identified from a DL model to predict IOP from fundus photography. We adopted a type of convolutional neural network (CNN) known as ResNet 6 to predict IOP from fundus photographs, following our previous studies in which a diagnosis of glaucoma was predicted from fundus photographs 7,8,19 . Unlike the simple CNN, 'identical skip connections' that skip one or more layers are used in ResNet and features are propagated to succeeding layers, which is well-known to be useful for image classification and feature extraction. This is because it enables ResNet to facilitate a deeper and larger network, which is helpful to acquire more effective and conceptual features without overfitting. In the current study, a ResNet model with 18 layers was pre-trained with the ImageNet classification 20 . This methodology is inspired by recent successes in fine-tuning deep neural networks 21 , whereby parameters of a network are first derived in a different but large pre-training dataset and then used to initialize training in a new and smaller training dataset. We attempted further improvements of the model by applying image augmentation of the training data 22 : all of the images in the training; dataset were horizontally flipped. The last fully-connected layer in ResNet was used to output the predicted value of IOP. Left eyes were mirror imaged to right eyes. Details of the parameters used in ResNet were: learning rate: 0.01, batch size: 100, damping capacity: 0.9 and weight decay: 0.0001.

Models to predict IOP from systemic variables.
First, using the training dataset, a multivariate linear regression model (MLM) was built to predict IOP using 35 variables (age, sex, height, BMI, SBP, DBP, history of DM, history of HT, history of hyperlipidemia, past and current smoking habitat, 25 blood examinations). Using this model, IOP values in the testing dataset were predicted, and the absolute prediction error was calculated. A number of other prediction models were also constructed using the following machine learning methods: (1) support vector machine (SVM) 23 , (2) Random Forest (RF) 24 , and (3) least absolute shrinkage and selection operator regression (LASSO) 25,26 . Support vector machine performs regression in a latent space (kernel space) to yield an accurate prediction even in a non-linear regression. Random Forest consists of many decision trees (regression trees), and outputs the averaged value from all individual trees. Each tree is constructed using a different bootstrap sample from the original data (bootstrapping is repeated sampling until the original sample size is reached, allowing duplication). In LASSO, the sum of the absolute values of the regression coefficients is constrained or penalized, so that the final model gives an accurate prediction. The details of each method follow.
1. Support vector machine: radial basis function, penalty parameter = 1.0 2. Random forest: number of trees = 10,000, criterion = Gini index, minimum number of samples required to split an internal node = 2, the minimum number of samples required to be at a leaf node = 1 www.nature.com/scientificreports/ 3. LASSO: optimum lambda value was decided the minimum prediction error with the leave-one cross validation within the training dataset.
Subsequently, using these models, IOP values in the testing dataset were predicted, and absolute prediction errors were calculated. Statistical analysis. Absolute prediction errors were compared using the linear mixed model whereby values were nested within patients. The linear mixed model adjusts for the hierarchical structure of the data, modeling in a way in which measurements are grouped within subjects to reduce the possible bias derived from the nested structure of data 27,28 .
Furthermore, the association between the predicted IOP values and actual IOP values in the testing dataset was calculated using the correlation coefficient. Again, considering the nested structure of the current dataset, the association was also calculated using the marginal R-squared (mR 2 ) value with the linear mixed model, following a method proposed by Nakagawa and Holger 29 .
The results of univariate analyses between various systemic parameters and the IOP are summarized in Table 2. Among 35 parameters, 28 parameters showed significant association with IOP when not adjusted for age and sex (p < 0.05). When adjusted for age and sex, 23 (among 33) parameters showed significant association with IOP.
The absolute prediction error with each method is shown in Table 3. Table 4 shows the results of the MLM obtained with the training dataset. Among the 35 parameters, 11 showed a significant association with IOP p < 0.05), including Height, BMI, Age, sex, smoking habitat, TP, HbA1c and SBP.
The mean squared error, for the DL model, with the validation dataset saturated at < 100 epochs, as shown in Fig. 1. The predicted IOP values were derived from epoch = 100. The relationship between the predicted IOP values with each prediction method and actual IOP value is shown in Fig. 2a-e, using the Bland Altman plot. The correlation coefficient and mR 2 values of these variables are shown in Table 5. Significant correlations were observed between IOP and the predicted IOP values with MLM, LASSO, SVM, and RF (p < 0.05), but not with the DL model using color fundus photographs (p = 0.16 or 0.17). There was a significant association between (difference between predicted IOP and actual IOP) and (mean of predicted IOP and actual IOP) with all models (p < 0.001).
The absolute error associated with MLM is illustrated in Fig. 3.

Discussion
In the current study, IOP was predicted using a variety of modelling methods and different data. A considerably more accurate prediction of IOP was achieved using a MLM of systemic variables (mean absolute error = 2.29 dB and mR 2 = 0.15) compared to a DL model with color fundus photography (mean absolute error = 2.70 dB and mR 2 = 0.0066). Machine learning methods (LASSO, SVM and RF) did not improve prediction accuracy. The MLM included 11 variables that were significantly correlated with IOP. We recently reported that several systemic factors were associated with IOP level, including age, percent body fat, SBP, pulse rate, albumin, and HbA1c 30 . We observe that older age, higher SBP, and higher HbA1c were again significantly associated with increased IOP. The effect of age on IOP is controversial. Previous cross-sectional studies from Italy 31 and the United States 32,33 suggested a significant positive association between age and IOP, however, the inverse effect has also been reported in cross-sectional or longitudinal studies from other countries, mainly in Asia, including Japan [34][35][36][37][38][39] . The current study-conducted in Japan-also suggested a negative association between age and IOP. The significant positive correlation between higher SBP and IOP is in agreement with other previous studies 33,35,[37][38][39][40][41][42][43][44][45][46] , where the mechanism has been speculated as an increased filtration fraction of the aqueous humor through elevated ciliary artery pressure, increased serum corticoids and also sympathetic tone result in elevated IOP 47,48 . The association between HbA1c and IOP is also in agreement with previous studies [33][34][35]37,39,[42][43][44]46,47,49,50 . Several mechanisms have been reported for obesity to be associated with increasing IOP, such as sympathetic hyperactivation, increased corticosteroid, excessive intraorbital adipose tissue, increases in blood viscosity with high hemoglobin and hematocrit values, increased episcleral venous pressure, a consequent decrease in the facility of aqueous outflow also transitory elevations in IOP resulting from breath-holding and thorax compression while tonometry is performed during slit-lamp examinations in obese patients 47,[51][52][53][54] . Our previous study suggested percent body fat is associated with increased IOP, whereas this was the case for BMI in the current study. Smoking status was significantly associated with elevated IOP, agreeing with a previous study 55 .
It is widely acknowledged that ordinary statistical models, such as linear or binomial logistic regression, may be over-fitted to the original sample, especially when the number of predictor variables is large. We have reported on the usefulness of applying machine learning methods for many applications, including diagnosing glaucoma from optical coherence tomography measurements [56][57][58][59] , predicting vision related quality of life 60 , and VF progression 61-63 , compared to ordinal linear or logistic regression. Nonetheless, in the current study, there was no improvement in the prediction accuracy of machine learning methods compared to the MLM. This may be because of the size of the training dataset was quite large (5540 examinations) and therefore overfitting was www.nature.com/scientificreports/ less of a problem. Despite the significant association between predicted IOP and true IOP, only a moderate mR 2 value was obtained (up to 0.15). Coefficient of determination value represents how much of the data is explained by the model. Correlation coefficient is identical to the square root of coefficient of determination value. The mR 2 value shows how much of the data can be explained by the fixed effect in the linear mixed model. Hence, the current results suggested that approximately 15% of IOP was explained by MLM and other machine learning models. In other words, our results suggested IOP can be only partially explained by systemic factors, and the remaining part may only be described locally (using measurements from the eye). As shown in the Bland-Altman plots (Fig. 2), the distribution of the difference between the predicted and actual IOP values were not horizontal, and correlated with the mean of these values. This is because the prediction accuracy was relatively poor and the predicted values were relatively constant regardless of the actual IOP value. Furthermore, although it has been suggested that the Random Forests method is more useful than other machine learning methods [64][65][66] , this merit was not observed compared to other machine learning methods in the current study. These finding would also support that IOP can be only partially explained by systemic factors, and the predictability cannot be considerably improved by merely applying machine leaning methods, such as the Random Forests. A recent study revealed that DL could discriminate sex from fundus photography with very high accuracy 6 . In contrast, we recently suggested that the discrimination of sex can be achieved, at least to some extent Table 1. Subjects' demographic data. IOP intraocular pressure, SD standard deviation, BMI body mass index, SBP systolic blood pressure, DBP diastolic blood pressure, TP total protein, A/G albumin/globulin, AST aspartate aminotransferase, ALT alanine aminotransferase, γGTP guanosine triphosphate, ALP alkaline phosphatase, HDL-C high-density lipoprotein cholesterol, LDL-C low-density lipoprotein cholesterol, HbA1c glycosylated hemoglobin A1c, WBC white blood cell, RBC red blood cell, BUN blood urea nitrogen, Na sodium, k potassium, Cl chlorine, Ca calcium. www.nature.com/scientificreports/ www.nature.com/scientificreports/ (AUC = 77.9%), using a 'visible' machine learning method (LASSO) with clinically meaningful variables such as color intensities, tessellation, and also geometrical information of the optic disc and retinal vessels. As a result, it was implied that the DL model learned a principle to discriminate sex from color fundus photographs. On the other hand, the current study suggested that DL was not accurate to predict IOP from fundus photographs since there only a poor association (mR 2 = 0.0066) was observed between the IOP predicted from this approach and actual IOP. We attempted other DL methods, instead of ResNet18 (VGG16 67 and Inception-v3 68 ), however, results were not improved (data not shown in "Result"). This may suggest little valuable information is present in color fundus photography regarding IOP. This study included a fairly large training dataset, however, it was much smaller compared to other representative datasets for DL, such as ImageNet (14,000,000 images) 20 and CIFAR10 (60,000 images, https ://www.cs.toron to.edu/~kriz/cifar .html), although we have recently suggested the diagnosis of glaucoma, using color fundus photographs and DL, can be achieved with an even smaller sample size (N = 3132) [7][8][9] . Better results might be observed if DL was applied to a larger dataset. The current results suggested that IOP can only be partially explained using systemic factors (15%; as suggested by the mR 2 value) or color fundus photography with DL (0.66%), which implies we need to continue to conduct IOP measurement using a tonometry. The merit of accurately predicting systemic factors using a color fundus photograph, such as shown in 69 , cannot be overestimated, such as medical check up in developing countries without tonometry. This www.nature.com/scientificreports/ is in particular true with a smart-phone base fundus photography, since recent studies have suggested that the usefulness of a deep learning-assisted program to screen for retinal diseases using a smartphone 70,71 . The current study had several limitations, the first of which was the use of non-contact tonometry, which is generally believed to be less reliable than Goldmann applanation tonometry (the repeatability coefficient with non-contact tonometry has been reported as ± 3.2 mmHg, whereas that with Goldmann applanation tonometry was between ± 2.2 and 2.5 mmHg) 72,73 although IOP is usually measured using the non-contact tonometry in a health examination outside eye clinics. Further, there was an absence of central corneal thickness measurements that are known to induce measurement errors during tonometry 74,75 In addition, the usefulness of applying DL to  www.nature.com/scientificreports/ color fundus photography in glaucomatous eyes should also be investigated in a future study. The current study consisted of a health examination cohort, and hence the vast majority cases had normal IOP values. A further study is needed to investigate whether the current approach is more useful in eyes with higher IOP values. In particular it should be further investigated that whether DL enables more accurate prediction of IOP using a larger dataset.
In conclusion, the current study, using a health examination cohort, suggested that IOP cannot be adequately predicted from clinical parameters or retinal photographs even using state-of-art ML techniques. Further investigation with DL using a larger amount of data would be needed.
Received: 20 July 2020; Accepted: 21 December 2020 Table 5. The correlation coefficient and mR 2 values of these variables. mR 2 marginal R-squared value (following a method proposed by Nakagawa and Holger 29 ), MLM multivariate linear regression, LASSO least absolute shrinkage and selection operator regression, SVM support vector machine, RF random forest, DL deep learning. www.nature.com/scientificreports/