An inter-day assessment of the ABC parameters in the evaluation of progressive keratoconus

The progression of keratoconus is commonly determined by comparing the results of corneal tomographic measurements on different occasions. However, investigations on the repeatability of measurements are commonly performed within the same day, thus not taking the inter-day variation into account. The effect of keratoconus disease severity on the measurement error is also seldom considered. In this post hoc investigation, the parameters A, B and C in the Belin ABCD Progression Display were evaluated in relation to disease severity in intra-day and inter-day measurements. Four consecutive measurements were performed on 61 patients with keratoconus on the same day (intra-day). In another cohort, four consecutive measurements were obtained and then repeated 3 days later in 25 patients with keratoconus and 25 healthy controls (inter-day). The results suggest that the diagnosis of disease progression would benefit from inter-day measurements, and the stratification of the parameters A and C according to disease severity. It is also recommended that tomographic systems such as the Pentacam HR be modified to allow the comparison of both single measurements and the mean of replicate measurements of the parameters used in the assessment of progression of keratoconus.

www.nature.com/scientificreports/ investigated. As the ABCD Progression Display is integrated into the Pentacam HR, which is the most commonly used tomographic instrument in the management of keratoconus 5 , it can be assumed that this software is widely used in both clinical practice and scientific investigations. It is therefore important to evaluate whether our previous findings are also relevant for the ABC parameters. It is of particular interest to analyse the inter-day effect on the ABC parameters, as these are based on intra-day measurements. Values of the ABC parameters obtained in our previous investigations were therefore analysed.

Definitions and abbreviations.
• Within-subject standard deviation (S w ). The square root of the variance between subjects.
• Precision = 1.96 × S w . The difference between a measurement and the true value should lie below this limit in 95% of the measurements. • Repeatability coefficient (RC) = 2.77 × S w . The difference between two measurements should lie below this limit in 95% of the pairs of observations. • Coefficient of variation (CoV). S w divided by the total subject mean.
• Intra-class correlation coefficient (ICC). The variance between subjects divided by [the variance between subjects + the variance within subjects]. • Prediction limit (PL) = 95% CI for differences between two future single measurements.

Results
The ICC showed high values for all the measured parameters in all intra and inter-day measurements in all the groups. Therefore, variability could be interpreted as resulting from differences between subjects rather than within subjects (Tables 1, 2

and 3).
Repeatability and disease severity. A correlation between the magnitude of a measured parameter and its SD indicates a worsening of the repeatability of the measurements with increasing parameter magnitude. Disease severity was found to be significantly associated with measurement error for the parameters A and C, but not for B (the correlation for the parameter A was not significant in the group n = 61 for two-tailed CIs but was significant for 1-tailed CIs). This correlation was more pronounced in inter-day measurements. Onetailed 95% CIs showed a stronger association than two-tailed 95% CIs (Table 3). The strongest association was seen in inter-day measurements of A in subjects with keratoconus (Spearman's rho = − 0.481, p = 0.007, Kendall's Tau-b = − 0.377, p = 0.009), followed by measurements of C in the same group (Spearman's rho = − 0.480, p = 0.008, Kendall's Tau-b = − 0.350, p = 0.02), and in C in the intra-day measurements in subjects with keratoconus (Spearman's rho = − 0.265, p = 0.02, Kendall's Tau-b = 0.174, p = 0.05) and in C in the same group (Spearman's rho = − 0.265, p = 0.02, Kendall's Tau-b = 0.174, p = 0.05). Nevertheless, in intra-day measurements of A in subjects with keratoconus the correlation was close to being significant (Spearman's rho = − 0.246, p = 0.03, Kendall's Tau-b = − 0.162, p = 0.07) (Tables 1 and 3). No significant association was found in measurements of B in subjects with keratoconus (Tables 1, 2 and 3). Neither was any significant association found between the repeatability and magnitude in the inter-day measurements of any of the parameters in the control group (Table 3). No significant association was found in intra-day measurements for all the parameters in both subjects with keratoconus and the control group with the exception of parameter A in day 0 and 3 in subjects with keratoconus (Spearman's rho = − 0.396, p = 0.025, Kendall's Tau-b = − 0.284, p = 0.047 and Spearman's rho = − 0.387, p = 0.028, Kendall's Tau-b = − 0.264, p = 0.065) and parameter B in day 0 in the control group (Spearman's rho = 0.34, p = 0.048, Kendall's Tau-b = 0.230, p = 0.107) ( Table 2). Figure 1 illustrates the mean values for each parameter for inter-day and intra-day measurements in subjects with keratoconus and the healthy control group.
Intra-day repeatability of measurements. In intra-day measurements in subjects with keratoconus, the best repeatability was found for parameter A, followed by C and B (Table 1) and the same happened when Table 1. Descriptive statistics and repeatability of Pentacam measurements made on a single day in subjects with keratoconus. A = Anterior curvature of the 3 mm zone over the thinnest point (mm). B = Posterior curvature of the 3 mm zone under the thinnest point (mm). C = Thickness of the thinnest point on the cornea (μm). a Subject mean. b Subject SD versus subject mean. Inter-day repeatability of measurements using a mean of replicates. The repeatability of interday measurements of the parameters A, B and C was better in the control group than in subjects with keratoconus (Table 3). It was a factor of 4 worse for A, a factor of 2 worse for B, and a factor or 1.2 worse for C in subjects with keratoconus. The best repeatability in the inter-day measurements was seen in the control group for parameter A (RC = 0.033 mm, 95% CI 0.024-0.042 mm, CoV 0.15%), followed by B (RC = 0.056 mm, 95% CI 0.041-0.072 mm, CoV 0.32%) and C (RC = 6.47 μm, 95% CI 4.68-8.27 μm, CoV 0.43%). In subjects with keratoconus, the repeatability in the inter-day measurements was best for parameter C (RC = 8.17 μm, 95% CI 5.91-10.4 μm, CoV 0.60%), followed by A (RC = 0.13 mm, 95% CI 0.092-0.16 mm, CoV 0.64%) and B (RC = 0.12 mm, 95% CI 0.088-0.16 mm, CoV 0.79%). When stratifying parameter A, subjects with keratoconus with a value below the median value for that parameter (7.33 mm) showed a repeatability about 2 times better that those with a value above the median (RC = 0.017 mm, 95% CI 0.10-0.23 mm vs. RC = 0.007 mm, 95% CI 0.040-0.0943 mm). Repeatability was also approximately two times better when stratifying parameter C for subjects with keratoconus with a value of that parameter above the median value (482.5 μm) than for those with a value below the median (RC = 5.75 mm, 95% CI 3.54-7.96 mm vs. RC = 10.2 mm, 95% CI 6.10-14.2 mm) ( Table 3).
Inter-day repeatability of measurements using single measurements (PLs). The PLs for single inter-day measurements in subjects with keratoconus were − 0.19 to 0.17 mm for parameter A, − 0.19 to 0.16 mm for B and − 12.5 to 12.9 μm for C. In the control group, the PLs for single inter-day measurements were − 0.04 to 0.05 mm for A, − 0.10 to 0.11 mm for B and − 10.6 to 12.9 μm for C (Table 4). When stratifying the parameters A and C according to the median value, the PLs for single inter-day measurements in subjects with keratoconus were − 0.25 to 0.22 mm for values of A below the median value, and − 0.11 to 0.081 mm for values above the Table 2. Descriptive statistics and repeatability of intra-day Pentacam measurements in subjects with keratoconus and healthy controls. A = Anterior curvature of the 3 mm zone over the thinnest point (mm). B = Posterior curvature of the 3 mm zone under the thinnest point (mm). C = Thickness of the thinnest point on the cornea (μm). a Subject mean. b Subject SD versus subject mean. Inter-day progression. In a randomized comparison between two measurements in each subject with keratoconus, six subjects (24%) showed progression according to one parameter (in three of these subjects the parameter A indicated progression, while in two subjects the parameter B suggested progression), and in one subject both parameters A and B indicated progression. In a second randomized comparison among the subjects with keratoconus, progression was indicated by one parameter in two of the subjects (8%). In one of these subjects parameter A indicated progression, while in the other B suggested progression. Two parameters (A and B) indicated progression in three of the subjects (12%), and all three parameters suggested progression in one of the subjects (4.0%).

Discussion
The results of this study demonstrate the statistically significant association between disease severity and measurement error in the parameters A and C, but not B, in the Belin ABCD progression display. This association was more pronounced in inter-day measurements than in intra-day measurements. One-tailed 95% CIs also showed a stronger association with disease severity than two-tailed 95% CIs. These findings suggest progression should be diagnosed based on limits stratified according to disease severity for the parameters A and C. There appears to be a threshold at 7.0 mm for A, i.e. approximately 48 D, at which the measurement error begins to increase. This threshold appears to be equivalent to that for K max , which is not surprising as they are based on the same measurements 12 . The association between measurement error and disease severity was statistically significant for both A (Kendall's Tau-b = − 0.377, p = 0.009) and K max (Kendall's Tau-b = 0.483, p = 0.0001), although the association for A was somewhat weaker. As a lower value of A indicates greater disease severity, Kendall's Tau-b is negative, whereas a lower value of K max indicates less severe disease. The threshold for C is at approximately 500 µm, below which measurements are more prone to error. It was also reported in a recent study that the repeatability of measurements of A, B and C deteriorated with increasing disease severity 11 . However, those calculations were based on intra-day measurements, and the association between deteriorating repeatability and disease severity was not investigated per se. An inter-day scenario is more appropriate as this reflects the clinical situation. Factors such as changes in the shape of the cornea due to diurnal variation or the natural biomechanical weakness of corneas affected by keratoconus could lead also to deterioration in the repeatability of inter-day measurements. However, other factors may improve the repeatability of measurements, such as learning effects among the patients. No association was seen between the measurement error and the magnitude of the measured parameters among healthy controls, and the repeatability of these measurements was clearly superior to those obtained in patients with keratoconus, in particular regarding the parameters A and B. Progression can be assessed in the ABCD progression display by comparing the results with the 80% or 95% CIs obtained from a reference cohort of patients with keratoconus, or from a reference cohort of healthy  www.nature.com/scientificreports/  Table 4. Inter-day differences between single measurements of the parameters A, B and C with prediction limits for subjects with keratoconus and healthy controls (single measurements). A = Anterior curvature of the 3 mm zone over the thinnest point (mm). B = Posterior curvature of the 3 mm zone under the thinnest point (mm). C = Thickness at the thinnest point on the cornea (μm). τ 2 = squared between-subject mean variance between Day 0 and Day 3; σ 2 1 = squared within-subject mean variance on Day 0; σ 2 2 = squared within-subject mean variance on Day 3; α 1 − α 2 = difference between means on Day 0 and Day 3.

Variance components
Mean difference Lower prediction limit Upper prediction limit www.nature.com/scientificreports/ subjects. The latter could be appropriate in subjects with less severe keratoconus, as the repeatability of these measurements will probably be more similar to those in a healthy cohort than a general cohort of patients with all stages of disease. In fact, in the abovementioned study 11 the repeatability of measurements of A, B and C was reported to be identical in healthy subjects and in subjects with subclinical keratoconus. However, there will be a threshold at which some subjects with keratoconus will be over-diagnosed as progressive if compared to a healthy cohort. If stratified limits were implemented in the detection of progression in keratoconus, there would be no need for a comparison with a healthy cohort. As well as considering the effects of disease severity, the thresholds at which progression could be detected were evaluated assuming two clinical scenarios: using one measurement on each occasion, and using the mean of replicate measurements (in this case the mean of four). This has been addressed in a few studies 12-14 but it is seldom considered in the enrolment of subjects in clinical studies on CXL, and there is no software in the Pentacam HR allowing for the comparison of mean values. In order to avoid unnecessarily narrow and erroneous prediction limits for single measurements, the variance between the four replicates was included in the statistical analysis 15 . This provided more accurate results and reduced the risk of over-interpreting the results as indicating progression. However, and as expected, it can be concluded that comparing the mean values obtained on each occasion further improves the ability to detect progression, and it is therefore recommended that appropriate software be developed for this purpose.

Subjects with keratoconus
The ABCD progression display is based on one-tailed 80% and 95% CIs. On the one hand, one-tailed intervals seem logical, as only a decrease in the magnitude of the parameter indicates progression; but on the other, the parameters can increase or decrease, which suggests that two-tailed intervals are more appropriate. Two-tailed 95% CIs were used in this study, and 80% CIs were avoided. The 95% CIs of the non-stratified repeatability of measurements of the parameters A and B in this study were wider than those used in the ABCD progression display, suggesting that there is a risk of over-interpreting the results as indicating progression. Empirically, the proportion of false positive results in the inter-day scenario was 24% (n = 6), for one or more parameters. When this analysis was repeated the same results were obtained. This empirical analysis describes a one-to-one measurement scenario and the false positive results are explained by the fact that the 95% prediction limits (reflecting a one-to-one measurement scenario) are wider than the 95% CIs in the Belin ABCD progression display, in particular for parameters A and B. It is important to note that only subjects with Stage 1-2 AKC (Amsler Krumeich Classification System) were included in this inter-day analysis. If subjects with Stage 3 disease had also been included, this would most likely have increased the proportion of false positive results due to the association between measurement error and disease magnitude. However, if the means of replicates were compared between days this would, as expected, reduce the number of false positive progressions. Unfortunately, this feature is not available in the Pentacam HR and could thus not be tested empirically.
When stratifying the parameters A and C above/below the median value, those with more advanced disease showed an approximately two times poorer repeatability for both the single measurements and the mean of replicates than those with less advanced disease. If comparing the limit in the ABCD Progression Display with the results for subjects with more advanced disease (bearing in mind that the whole cohort consisted of subjects with less advanced keratoconus) there would have been a further shift towards false positive results. However, in the group with the lower disease severity, the repeatability was close to the limit in the ABCD Progression Display for the scenario involving single measurements. If, on the other hand, the mean of replicate measurements is used, the 95% CIs of the repeatability of measurements of parameters A and C are below the limit in the ABCD Progression Display, leading to the risk of false negative results. In this case, it appears reasonable to compare this group with the suggested limits for a normal population in the Belin ABCD Progression Display. While the limits for parameter C are rather similar, the repeatability of the measurements of parameter A is still three times higher in the below-median group of keratoconus than in the normal population in the ABCD Progression Display, highlighting the difference in the repeatability between healthy subjects and subjects with keratoconus. The subjects included in the below-median group had K max values ranging from 44.8 to 48.6 D. The repeatability of the measurements in the healthy controls in this investigation was similar to that presented in the Belin ABCD Progression Display.
A possible weakness of this study is that the optimal time frame for comparing inter-day repeatability is unclear. We chose three days as we deemed this to be sufficient to allow for inter-day changes in corneal shape, but sufficiently short to avoid true disease progression. Males were overrepresented in the keratoconus groups, reflecting the gender difference in patients with keratoconus at our clinic 10 , and the healthy controls were not matched for sex or age. We believe that diurnal variation would not affect the measurements significantly. The measurements were in general obtained between 09.00 a.m. and 15.00 p.m. It has been suggested previously that the corneal thickness is significantly reduced within the first 1-2 h after awakening but then remains relatively unchanged during the daytime 16,17 . In fact, the diurnal variation of keratometric and corneal thickness measurements in subjects with keratoconus has been suggested to be clinically insignificant 18 if obtained between 09.00 a.m. and 17.00 p.m. We therefore believe that the results in this investigation are applicable in a daytime setting.
There is no gold standard for measuring progress in keratoconus, and thus measurement accuracy is of paramount importance, in both clinical practice and scientific investigations. As mentioned in the introduction, there is no consensus on the definition of progression. However, a consensus on which parameters should be used may be less important than understanding the repeatability and the dynamics of the parameters used and designing the investigation accordingly. This would be an important step towards facilitating the meta-analysis of data. More specifically, the use of reference data in the Belin ABCD Progression Display based on inter-day measurements should be considered. The association between measurement error and disease severity should also be considered for parameters A and C as this would allow progression to be diagnosed earlier in patients with less severe disease, and help avoid erroneous diagnosis of progression in those with more advanced disease. www.nature.com/scientificreports/ Furthermore, it is desirable that tomographic systems such as the Pentacam HR allow for the comparison of both single measurements and the mean of replicates for parameters used in the assessment of progression of keratoconus.
The findings of this investigation could be of interest for developers of software for the detection of progression in keratoconus, but may also be useful in clinical practice. The results of measurements of the A, B and C parameters are presented in the Progression Display and changes in the magnitude of the parameters between visits can be evaluated by comparing with the results of this investigation. However, clinicians would probably find it more practical to compare single measurements between visits as the mean of replicates would have to be calculated manually, as the current system does not allow for the comparison of mean values.

Subjects and methods
The studies were conducted at the Department of Ophthalmology at Skåne University Hospital, Lund, Sweden, according to the declaration of Helsinki. The Regional Ethics Committee in Lund, Sweden, approved the studies (No. 2015/373).

Enrolment.
Patients with keratoconus fulfilling the inclusion criteria described below were enrolled consecutively after signing an informed consent form. The inclusion criteria were: keratoconus Stage ≤ 3 (Investigation 1) 10 and keratoconus Stage ≤ 2 (Investigation 2) 12 with no history of, and no current signs of, other ocular pathology, including ocular surface disease and external diseases such as dry eyes and atopy. Only subjects who had not undergone prior ocular surgery and who were aged ≥ 18 years were recruited and pregnant and breastfeeding women were also excluded 10,12 . Contact lens wear was discontinued at least 2 weeks before the measurements were made 10,12 . Subjects with advanced keratoconus (Stage 4) were excluded from Investigation 1 10 due to the presence of corneal scarring. In Investigation 2 12 , patients with Stage 3-4 keratoconus were excluded as the purpose was to study subjects with less advanced disease. In both investigation 1 10 and 2 12 keratoconus was diagnosed clinically and by examination using The Pentacam HR. More specifically, the sagittal curvature pattern, posterior and anterior elevation maps, and corneal thickness pattern were assessed, in addition to information from the Belin-Ambrosio Enhanced Ectasia Display.
Sixty-one patients (Investigation 1) 10 and 25 patients (Investigation 2) 12 were enrolled. Only one eye was eligible for inclusion in 31 subjects in these investigations due to previous CXL, previous penetrating keratoplasty or too advanced stage of keratoconus. If two eyes were eligible for inclusion, both were examined (see "Examination" below). Computerised randomisation was performed in subjects where both eyes met the inclusion criteria to select one eye for inclusion in the study (41 right eyes and 45 left eyes). Seventy-six participants were males, and 10 females, and the mean age of all participants was 28 years (18-45 years).
Healthy controls (Investigation 2) 12 (n = 25) were enrolled from among medical students and residents in ophthalmology after signing an informed consent form. The inclusion criteria were: age ≥ 18 years, no history of any ocular pathology or previous ocular surgery. Pregnant and breastfeeding women were excluded. Ocular pathology was excluded by clinical examination and by examination using the Pentacam HR. Only one eye was eligible for inclusion in three patients, due to scarring of the cornea. If two eyes were eligible for inclusion, both were examined and computerized randomization was performed, as described above, resulting in 12 right eyes and 13 left eyes. Fourteen participants were males, and 11 females, and their mean age was 29 years (23-41 years).
Instruments. The Pentacam HR is a Scheimpflug-based tomographic system (Pentacam HR, version 1.20r10, Oculus Optikgeräte GmbH, Wetzlar, Germany). The technical features of this system have been described elsewhere 19 . The default setting of 25 pictures/s was used.
Examination. Measurements were made on a single day (Investigation 1) 10 and on two separate occasions (Investigation 2) 12 by the same examiner (IG). In the latter study, 4 consecutive measurements were made on Day 0, and four on Day 3. Subjects were instructed to blink between measurements, but not to lean back. Measurements were made during normal working hours without taking diurnal corneal variation into account. Only examinations deemed "OK" by the Pentacam were accepted. The right eye was examined first, then the left, if both eyes were eligible for inclusion. This represents normal clinical practice where both the patient's eyes are usually examined. When recruitment to the study was complete, computerised randomisation was performed to select one participating eye per subject.
Statistical methods and calculations. The values obtained from the four replicate measurements were used to calculate the repeatability in Investigation 1. The measurements obtained on Day 0 and Day 3 in Investigation 2 were averaged for each day, and used to calculate the inter-day repeatability in the clinical situation where the mean value of several measurements is used to assess progression. When calculating prediction limits in the clinical scenario where single measurements are used to assess progression, the variance between replicate measurements was included in the calculation to provide more accurate results.
IBM SPSS Statistics 22 for Windows (IBM Corporation, Armonk, NY, USA) and SAS Enterprise Guide 6.1 for Windows (SAS Institute Inc., Cary, NC, USA) were used for statistical analyses. Results were considered statistically significant when the p-value was ≤ 0.05. Descriptive statistics are given as subject mean, standard deviation (SD), and minimum and maximum values. Repeatability was assessed by calculating the within-subject SD, precision, repeatability coefficient, intra-class correlation and coefficient of variation with associated confidence intervals (CIs) [20][21][22] . Kendall's Tau-b was used to assess the relationship between the mean and SD, and natural logarithm transformed data were analysed when appropriate. The limits of agreement (denoted prediction limits) were calculated including the variance of the replicates using a linear mixed-effect model 15  www.nature.com/scientificreports/ In the empirical analysis of progression, the four measurements in the inter-day data were randomised to define one measurement as the baseline (at Day 0), and the other as the follow-up measurement (at Day 3), for each subject. The procedure was repeated to confirm the results.