Evaluation of the external validity of a joint structure–function model for monitoring glaucoma progression

The dynamic structure–function (DSF) model was previously shown to have better prediction accuracy than ordinary least square linear regression (OLSLR) for short series of visits. The current study assessed the external validity of the DSF model by testing its performance in an independent dataset (Ocular Hypertension Treatment Study–Confocal Scanning Laser Ophthalmoscopy [OHTS–CSLO] ancillary study; N = 178 eyes), and also on different test parameters in a sample selected from the Diagnostic Innovations in Glaucoma Study or the African Descent and Glaucoma Evaluation Study (DIGS/ADAGES). Each model was used to predict structure–function paired data at visits 4–7. The resulting prediction errors for both models were compared using the Wilcoxon signed-rank test. In the independent dataset, the DSF model predicted rim area and mean sensitivity paired measurements more accurately than OLSLR by 1.8–5.5% (p ≤ 0.004) from visits 4–6. Using the DIGS/ADAGES dataset, the DSF model predicted retinal nerve fiber layer thickness and mean deviation paired measurements more accurately than OLSLR by 1.2–2.5% (p ≤ 0. 007). These results demonstrate the external validity of the DSF model and provide a strong basis to develop it into a useful clinical tool.


Scientific Reports
| (2020) 10:19701 | https://doi.org/10.1038/s41598-020-76834-4 www.nature.com/scientificreports/ This encouraging finding prompted the need to determine whether the DSF model could yield similar results when tested in different populations, and with different tests and parameters. The present study was designed to evaluate the external validity of the DSF model. Using an independent dataset from the Ocular Hypertension Treatment Study-Confocal Scanning Laser Ophthalmoscopy (OHTS-CSLO) ancillary study 28 , the prediction error (PE) resulting from the prediction of global and sectoral RA-MS paired measurements was compared between the DSF model and OLSLR. We paired RA and MS in order to be consistent with the structure-function pairs used in the study that we seek to validate 26 . To determine the applicability of the DSF model to different parameters, we compared PE obtained for the joint prediction of retinal nerve fiber layer thickness (RNFLT) and mean deviation (MD) from a resampled cohort of POAG eyes enrolled in the DIGS and ADAGES studies. RNFLT and MD were considered for this analysis because they are the most common and sensitive structural 29,30 and functional 31,32 parameters used by clinicians to monitor glaucoma progression. The objective here was to ascertain whether the DSF model will perform well with new tests and parameters that will emerge as clinically useful in the future.

Results
Validation of the DSF model in an independent (OHTS-CSLO) dataset: prediction of RA and MS paired measurements. Figure 1 shows the median PE obtained for the DSF and OLSLR models for the prediction of global RA and MS paired measurements. When RA was predicted jointly with MS derived from the 30-2 static automated perimetry (SAP) test pattern, the median PE obtained for the DSF model was significantly lower (1.8-5.5%, p ≤ 0.004) than for OLSLR at the 4th-6th visits (Fig. 1a). For the joint prediction of RA with MS computed from the 24-2 SAP test pattern (Fig. 1b), the median PE for the DSF model was significantly lower (3.2-4.8%, p ≤ 0.001) than for OLSLR at the 4th-6th visits. For both types of RA-MS pair, the difference in median PE between the two models was not significant at the 7th visit. On average, the DSF model had lower PE than the OLSLR model in 72% of the eyes at visit 4, in 67% at visit 5, in 62% at visit 6 and in 53% at visit 7. Figure 2 shows comparisons of median PE between the two models for the prediction of sectoral RA and MS paired measurements. Except for predictions at the 7th visit, the median PE obtained for the DSF model was significantly lower (2.1-6.4%, p ≤ 0.002) than for OLSLR in all sectors considered.
Validation of the DSF model with different parameters in the DIGS/ADAGES: prediction of RNFLT and MD paired measurements. Figure 3 shows comparisons of median PE between the DSF and OLSLR models for the prediction of RNFLT and MD paired measurements. From the 4th to 7th visit, the median PE obtained for the DSF model was significantly lower than for OLSLR by 1. www.nature.com/scientificreports/ ADAGES dataset included 393 POAG eyes which were previously subclassified at baseline into glaucomatous optic neuropathy only (GON-alone; 121 eyes), glaucomatous visual field only (GVF-alone; 97 eyes) and those with both GON and GVF (175 eyes) 27 . Table 1 presents the comparison of median PE between both models for the three baseline classifications of POAG eyes. In eyes with GON only, the DSF model had significantly lower PE than OLSLR across all visits. Similar results were obtained in eyes with both GON and GVF, except at the 7th visit where there was no significant difference in PE between both models. In eyes with GVF only, while the PE was always lower than that of OLSLR, statistical significance was reached only at visit 5. The DSF model made more accurate prediction in a greater proportion of eyes than OLSLR (19 -39% more for GON-alone eyes, 13 -32% more for GVF-alone and 14 -36% more for eyes with both GON and GVF).

Discussion
Several mathematical models have been developed with the goal of improving the understanding of disease trajectories and also to aid clinical decision making 19 . The majority of such models, however, do not make inroads into clinical practice partly due to their lack of external validity 18,19 . In this study, we assessed the generalizability of the DSF model which has been reported to have better predictive accuracy than the OLSLR model for series of 4-7 visits 26 . We assessed the performance of the DSF model in an independent dataset 28 , and on different test indices (RNFLT and MD), and found that it predicted subsequent structure-function paired measurements more accurately than OLSLR for short series of up to 7 visits. Our results show that the DSF model can be generalized to different populations and test indices. This provides the basis to further develop the DSF model into a useful clinical tool for detecting and/or predicting glaucoma progression. The DSF model's superior predictive ability over short follow-up series also alludes to the possibility of applying it to assess progression when limited data is available. This may lead to earlier detection of progression and inform clinical decisions to stop or slow vision loss. However, at present, the DSF model lacks the ability to make a determination of the progression of status of an eye. Developing it into a clinical tool, will involve two crucial steps. The first step is to incorporate a robust statistical test into the DSF model to evaluate change in predicted measurements. The next step is to establish how the model's sensitivity compares to that of conventional methods used for assessing progression. The present study expanded the analysis beyond predicting global structure-function paired measurements by comparing the performance of both models in estimating sectoral measurements. The DSF model predicted sectoral RA and MS paired measurements more accurately than OLSLR (Fig. 2).
Recent advances in ocular imaging, such as optical coherence tomography (OCT), have enhanced our ability to assess the optic nerve head and different retinal layers. The OCT-derived RNFLT has better sensitivity to detect early glaucomatous changes than the Heidelberg Retinal Tomograph (HRT)-derived RA [33][34][35] ; hence it has been widely adopted in both clinical and research settings. To ascertain whether the DSF maintains its performance for different structure-function parameters, RNFLT was predicted jointly with MD for 393 POAG eyes selected from the DIGS/ADAGES dataset. We found that the median PE obtained for the DSF model was significantly lower than that for OLSLR (Fig. 3). This finding is consistent with the results obtained with RA and MS, either in the present study or the previous one 26 . Furthermore, this finding suggests that the DSF model can be applied to other structure-function parameters that will eventually emerge as promising to identify change in glaucoma. Additional analyses also showed that the DSF model obtained lower median PE than the OLSLR for each POAG subclassification (Table 1) and had better prediction accuracy in a larger percentage of eyes. This finding supports the DSF model's potential as a valuable clinical tool.
Given that the detection of glaucoma progression is partly limited by measurement variability 3 , it is crucial to assess its impact on the performance of the DSF model. This impact was determined by comparing the median PE resulting from the joint predictions of RA with 30-2 MS (Fig. 1a) and with 24-2 MS (Fig. 1b). Heijl and colleagues found that, within the central 30° of the visual field, the threshold sensitivities in the periphery were significantly more variable than those in the midperiphery 36 . This suggests that the 30-2 test pattern may have more variable test indices (e.g. MS and MD) because it includes 22 additional test locations outside the area of the 24-2 test pattern. We found no statistically significant difference in the prediction accuracy of the DSF model when RA was predicted jointly with MS from either 30-2 and 24-2 test pattern (median PE difference = 0.08-1.1%, all p > 0.06). This observation is further illustrated with mean difference plots in Fig. 4. The closeness of the mean difference lines to zero suggests that the prediction accuracy of the DSF model was not adversely impacted by differences in measurement variability between the two tests. Ramezani et al. reported that the use of MS from contrast sensitivity perimetry, a test with lower test-retest variability than SAP, did not improve the prediction www.nature.com/scientificreports/ accuracy of the DSF model 37 . These observations suggest that measurement variability may have little or no impact on the performance of the DSF model. This study has limitations. The first limitation is the potential misestimation of parameters in percent of mean normal given interindividual variations in structural measurements in healthy population 27 coupled with the presence of floor effect 38,39 . Rescaling of parameters in percent of mean normal was, however, necessary in this study. The DSF, being a two-dimensional model, was applied to structural and functional components initially measured in different scales. In order to assess these different parameters jointly, we expressed them in a comparable scale. Another limitation is that RA measurements were rescaled based on normative data obtained from a different cohort of 91 healthy eyes described elsewhere 40 . This was necessitated by the unavailability of RA measurements taken at baseline in the OHTS study. Measurement of RA with HRT was later included in the OHTS protocol as an ancillary study 28 . The mean normal RA (1.44 mm 2 ), computed from this separate dataset 40 , was within the range of average RA values (1.37-1.76 mm 2 ) reported for healthy cohorts [41][42][43][44] . Of note, rescaling of the parameters was systematically applied to all participants and used to assess prediction accuracy in both models; hence any potential impact of the data source used to achieve this rescaling would have affected both models equally. Therefore, the quantification of parameters in percent of mean normal and the use of different normative datasets did not selectively influence the performance of one model over the other.
In conclusion, we assessed the external validity of the DSF model by determining its performance in an independent dataset and also with different parameters. Consistent with the previous study 26 , the DSF model had better prediction accuracy than OLSLR over short series of visits. The current study also showed that the performance of the DSF model is generalizable to different structure-function parameters. These results suggest that the DSF model has good external validity, is generalizable and has the potential to eventually be used as a clinical tool for early detection of glaucoma progression.

Methods
Study design. The present study was a retrospective analysis of two datasets to evaluate the external validity of the DSF model. An independent dataset, selected from the OHTS-CSLO ancillary study 28 , was used to assess the performance of the DSF model. The OHTS-CSLO data were released through a data access agreement OHTS-CSLO dataset. We selected 178 eyes of 105 patients (mean age: 53 ± 7 years) from the OHTS-CSLO ancillary (mean follow-up was 6.5 ± 0.6 years). The OHTS-CSLO study prospectively followed a cohort of ocular hypertensive patients with HRT (Heidelberg Engineering, GmbH, Dossenheim, Germany), disc photography and SAP 30-2 full-threshold test (Humphrey Field Analyzer, Zeiss, Dublin, CA, USA) 28 . The baseline paper details the eligibility and exclusion criteria used in the OHTS 45 . DIGS/ADAGES dataset. We included 393 eyes of 254 POAG patients (mean age: 64 ± 10 years) selected from the DIGS or ADAGES cohort. Described in detail elsewhere 27 , the DIGS and ADAGES are multicenter longitudinal studies that enrolled and prospectively monitored retinal structure and function among healthy, glaucoma suspects and glaucoma patients. Eligibility criteria included one good quality stereoscopic photograph and a 24-2 SAP test at baseline, open angles, best-corrected acuity of 20/40 or better, spherical refraction within 5.0 diopters, and cylinder correction within 3.0 diopters, no history of intraocular surgery (except for uncomplicated cataract or glaucoma surgery), absence of comorbidities and use of medications that affect the visual field.
RNFLT and MD data for the current study. For the current study, we selected only patients with a POAG diagnosis at the DIGS/ADAGES baseline. Out of the 393 POAG eligible eyes, 121 had GON-alone, 97 had GVFalone, and the remaining 175 had both GON and GVF 27 . In addition, we required each patient had a minimum of 7 visits, with RNFLT measurement taken with the Spectralis OCT (software version 5. Rescaling structural and functional data to percent of mean normal. All measurements were rescaled to percent of the mean normal values 48,49 to ensure that structural and functional data were quantified in a comparable scale. For the OHTS-CSLO dataset, the mean normal MS value was obtained from the normal OHTS baseline SAP tests. For the mean normal RA value, we used a separate dataset of 91 healthy eyes 40 . For the DIGS/ADAGES dataset, the mean normal values for both RNFLT and MD were derived from 395 healthy eyes selected from the DIGS/ADAGES using the selection criteria explained above. The mean normal value for each parameter is presented in Table 2. For healthy individuals with normal optic disc and intact vision, approximately 100% of mean normal is expected for RA and MS values. To exemplify the conversion to percent of mean normal values, we provide this example for a patient with POAG; with an RA of 1.05 mm2 and MS of 28 dB (630.96 1/L), the converted values will be 72.9% and 64.9%, respectively. www.nature.com/scientificreports/ Prediction of structure-function pairs. In the current study, the DSF and OLSLR models were independently applied to predict future RA-MS paired measurements from the OHTS-CSLO dataset. The two models were also used to predict future RNFLT-MD paired measurements from the resampled DIGS/ADAGES dataset.
The section below provides a description of how each model was used to predict future structure-function measurements. A detailed description of the DSF model is available in Hu et al. 26 . In Fig. 5, we briefly describe how the DSF model and OLSLR were applied to predict RA-MS paired measurements at the 5th visit.
Predictions by DSF model. The DSF model employs two vectors: a centroid and a velocity vector, to predict future structure-function paired measurements from preceding data. Whereas the centroid is an estimate of the current stage of the disease (the central location of the series of observed structure-function paired measurements), the velocity vector is a measure of the direction and speed at which the structure-function pairs are changing over time. Consider RA-MS paired values (X1, X2, X3 and X4) measured over four visits with time (t) intervals points: t1, t2, t3 and t4. To predict the RA-MS pair at the 5th visit (at time t5) by the DSF model, first, the arithmetic mean for the first four observed data pairs is calculated as the centroid(C); C = (X1+X2+X3+X4) 4 . The model then determines the velocity vector(V), which is computed as an average of all rates of change from visit to visit. Thus, V = (X2−X1) (t2−t1) + (X3−X2) (t3−t2) + (X4−X3) (t4−t3) . The expected paired values at the 5th visit (P) are derived by adding the paired values at the current state of the disease (centroid) and the average change in paired measurements. This is mathematically represented as P = C + V t5−t4 . As, shown in the left panel of Fig. 5, the predicted measurements are then compared to the observed values for the RA-MS pair at the 5th visit to estimate the error in prediction.
Predictions by OLSLR model. OLSLR predictions were derived by fitting the model separately to the available series of structural and functional measurements. For example, to predict RA and MS measurements at the 5th visit, OLSLR was fitted separately to the first four RA measurements and to the first four MS measurements. The expected measurements at the 5th visit were estimated from the best fit lines for the RA and MS series, as shown in the right panel of Fig. 5.
Statistical analysis. The prediction accuracy for each model was assessed by determining the magnitude of the resulting PE in percent of mean normal. The magnitude of the PE was computed as the square root of the sum of the squared differences between the predicted value and the observed value for each component of the structure-function pair. Predictions were from the 4th to 7th visit for global and sectoral RA-MS pairs, and for RNFLT-MD pairs. For each category of prediction, the Wilcoxon signed rank test was used to determine whether the difference in median PE between the DSF model and OLSLR was statistically significant. Significance level was set at 0.05. All analyses were carried out in R 50   In the left panel, the DSF model is depicted in two-dimensional space with MS on the x-axis and RA on the y-axis (both expressed as % of mean normal). Numbers 1-5 (in gray text) represent the observed RA and MS measurements at the 1st to 5th visit. To predict the values of RA and MS at the 5th visit with the DSF, the first 4 observed RA-MS pairs are used to estimate the centroid (C, solid red circle) and velocity vector (V, red arrow), which are in turn used to predict the paired measurement at the 5th visit (number 5 in red text). In the right panel, the first four series of observed RA and MS data are plotted separately over time. For OLSLR prediction of RA and MS values at the 5th visit (number 5 in blue text), the expected value is estimated from the best fit line for each series, as shown with the blue arrow. For both models, the error in prediction is estimated by comparing the predicted measurements (colored "5 s") to the observed measurement (gray "5 s"). www.nature.com/scientificreports/

Data availability
The datasets analyzed in the current study are not publicly available due to data sharing agreement issued by the primary sources of the two datasets. Information for submitting requests to access datasets from these studies is available from the corresponding author.