On the impact of different approaches to classify age-related macular degeneration: Results from the German AugUR study

While age-related macular degeneration (AMD) poses an important personal and public health burden, comparing epidemiological studies on AMD is hampered by differing approaches to classify AMD. In our AugUR study survey, recruiting residents from in/around Regensburg, Germany, aged 70+, we analyzed the AMD status derived from color fundus images applying two different classification systems. Based on 1,040 participants with gradable fundus images for at least one eye, we show that including individuals with only one gradable eye (n = 155) underestimates AMD prevalence and we provide a correction procedure. Bias-corrected and standardized to the Bavarian population, late AMD prevalence is 7.3% (95% confidence interval = [5.4; 9.4]). We find substantially different prevalence estimates for “early/intermediate AMD” depending on the classification system: 45.3% (95%-CI = [41.8; 48.7]) applying the Clinical Classification (early/intermediate AMD) or 17.1% (95%-CI = [14.6; 19.7]) applying the Three Continent AMD Consortium Severity Scale (mild/moderate/severe early AMD). We thus provide a first effort to grade AMD in a complete study with different classification systems, a first approach for bias-correction from individuals with only one gradable eye, and the first AMD prevalence estimates from a German elderly population. Our results underscore substantial differences for early/intermediate AMD prevalence estimates between classification systems and an urgent need for harmonization.

Concordance of AMD status between two graders applying the Three Continent AMD Consortium Severity Scale. Supplementary Table 10. Evaluation of the effect of single eye grading based on the 885 participants with both eyes gradable in a five-category interpretation.
Supplementary Table 11. Evaluation of the effect of single eye grading based on the 885 participants with both eyes gradable in a three-category interpretation.
Supplementary ) Did not answer the invitation letter nor a written reminder.

Supplementary Text 1. Assessment of participant data.
Smokers were categorized as current smokers (having smoked ≥ 1 cigarette per month), exsmokers (having stopped smoking ≥ 1 month), and never smokers (having smoked less than 100 cigarettes in their lifetime). Pack years were calculated by multiplying the number of packs of cigarettes smoked per day with the number of years the person has smoked. The number of years of smoking was computed from the age at examination for current smokers or the age when smoking stopped for ex-smokers minus 18 under the assumption that smoking started around the age of 18.
The metabolic parameters, body-mass-index (BMI), type 2 diabetes mellitus (T2DM), and hypertension, were assessed by physical examination and interview information: BMI (kg/m2) was computed based on measured weight in kg (in light clothing, to nearest 0.1 kg) and height in m (to nearest 0.005 m) as weight divided by squared body height. 1,2 T2DM was assessed as self-reported type 2 diabetes or reported anti-diabetes therapy intake.
Hypertension was assessed according to previous work 3, 4 5 as measured systolic blood pressure of ≥ 140 mmHg, diastolic blood pressure of ≥ 90 mmHg, or anti-hypertensive medication taken, given that the participants were aware of having hypertension.
A history of non-AMD related eye diseases such as cataract, glaucoma, or diabetic retinopathy was assessed via self-report during a standardised face-to-face interview.
Supplementary Text 2. Acquisition and processing of color fundus images.
Color fundus images of the central retina were acquired using the automatized DRS camera (Digital Retinography System; CenterVue, Padova, Italy). In line with the standard operating procedure of the NaKo study 6 , we initially relinquished mydriasis for practical reasons.
Consistent with previous observations that the quality of fundus photography depends on pupil size and that pupil size depends on age 7 , we found the pupil size and thus the quality of fundus images to be insufficient for a substantial proportion of our elderly study cohort. We thus altered the protocol in January 2015 to administer a mild mydriasis (Mydriaticum UD, pharmaSTULLN, Stulln, Germany), after obtaining special written informed consent and explicit information about the consequences of mydriasis, such as a ban on driving and a small risk for acute angle-closure glaucoma (1 in 20,000 to 1 in 3,000) 8 ) Only considering fundus lesions within 2 standard disc diameters (approx. 3000 μm) of the center of the macula/fovea. c ) For determination of drusen size, the shortest drusen diameter was compared to that of an average normal retinal vein at the disc margin, considered to be approximately one-twelfth disc diameter or approximately 125 μm, when the average disc diameter is taken as 1500 μm. The standard procedure for assessing AMD disease status of a participant is to analyze each eye and utilize the result of the worse eye to define the AMD status of the participant.
However, in almost all epidemiological studies, there are usually participants with a gradable image only available for one eye and the question arises whether these can be utilized for AMD grading and AMD prevalence estimation.
In our study with = 1040 analyzed participants, = 155 participants were classified based on only one eye (one-eye participants), the other = 885 on both (two-eye participants). While we here assume an AMD classification system with five categories where category one classifies the AMD-free participants and higher categories denote a more severe disease status, the here presented approach can be readily extended to grading systems with more or fewer categories. Utilizing the observed disease status of a sole eye as disease stage for a one-eye participant will yield a less severe stage for the participant, when the less affected eye was the sole observed eye. We assume for the following that the missing process is random, i.e. independent of the disease status of the eye. Thus, the prevalence estimates based on the observed disease status of sole eyes of one-eye participants are biased in favor of lower disease categories. In the following, we present an approach to estimate the true prevalence for each AMD status k ∈ {1, … ,5} based on twoand one-eye participants that adjusts for the misclassification in one-eye participants.
Let ∈ {1, … ,5} be the true (potentially unobserved) AMD status of a participant (i.e. AMD status of the worse eye) and * ∈ {1, … , 5} the observed AMD status of one (randomly selected) eye. Thus, for the two-eye participants, observations on and * are available, for the one-eye participants, only observations on * . If we additionally assume that the twoeye participants are a random subsample of the participants, they represent an internal validation sample 1 . In this case the observed disease stage relative frequencies for two-eye participants are valid unbiased estimates of the AMD prevalence in the overall population.
An appropriate additional consideration of the potentially misclassified participants can, however, yield an unbiased estimate with smaller standard error compared to the scenario where only the two-eye participants are utilized.
We can describe the misclassification procedure in the one-eye participants by predictive values: For a five-category AMD status, let ( | * ) be the 5 × 5 matrix of predictive values with entries λ kl = (Y = k|Y * = l), k, l ∈ {1, … ,5}, which denote the probabilities that the persons' true AMD stage is k (worse eye) given that the stage observed in one eye is l. Since the true AMD stage (worse eye) is always higher than or equal to the AMD stage observed in one eye, all entries above the main diagonal of this matrix are zero.
The estimate of the probability for the error-prone observed AMD stage is given by the observed relative frequency of AMD stage among the one-eye participants, ̂( * = ).
To derive estimates of the predictive values, λ kl , , = 1, … ,5, we can utilize the internal validation sample of two-eye participants, as we observe both and * for these individuals: The column of the matrix of predictive values, ( | * ) • = ( 1 , … , 5 )´, represents the distribution of true (worse eye) disease stages in participants with at least one eye graded in disease stage . This is exactly how we estimate the predictive values: For each = 1, … ,5, we compute the relative frequencies of worse eye classifications of all two-eye participants with at least one single eye classified in disease stage .
To derive a bias-corrected AMD stage prevalence among one-eye participants, we add the observed AMD stage relative frequency among one-eye participants, ̂( * = ), multiplied with the estimated predictive values, λ kl , across l ∈ {1, … ,5} in analogy to (2) Using a matrix notation, this can be expressed as with the vectors ̂ denoting the bias-corrected overall prevalence estimates (disease stage probabilities), ̂ denoting the disease stage probability estimates for the two-eye participants, and ̂ * (as above) denoting the biased disease stage probability estimate of the one-eye participants. This approach to correct for misclassification is often referred to as adjustment using predictive values or adjustment using calibration probabilities. 14,15 Tenenbein (1972) 16 shows that (4) is the maximum likelihood estimate of the true class probabilities under misclassification with an internal validation sample (assuming a multinomial distribution of true class counts) and derives formulas to calculate asymptotic variances of the estimates using the delta method. Following maximum likelihood theory, the estimates are therefore asymptotically efficient. Kuha and Skinner (1997) 14 compare this approach to the adjustment using the misclassification probabilities ( * | ) (matrix method) and show in an example that the latter is, in situations where the adjustment using predictive values is adequate, less efficient.
To obtain bias-corrected AMD prevalence estimates standardized to the Bavarian population, the bias-adjusted disease stage probabilities for each sex and 5-year age-group can be estimated by (4) and then combined by a weighted sum, with weights corresponding to the proportion of the respective groups in the Bavarian population. In general, it would be possible to estimate different predictive values for each age-sex group, which would be important if the predictive values differed by age-sex group. However due to only few (twoeye) observations for some of the age-sex groups, these estimates turn out to be rather unstable. Therefore we decided to assume common predictive values for all age-sex groups and estimate them based on all two-eye observations.
Variance estimates for the standardized prevalence estimates could again be derived asymptotically using the delta method, as an alternative we propose to use a non-parametric bootstrap procedure.
lifestyle factors, metabolic parameters and self-reported eye diseases/conditions for all participants (n = 1,133) and separately for those without acquired fundus images, without any eye gradable, and for those constituting the analyzed sample (at least one eye gradable). ) T2DM is defined as a self-reported diagnosis or anti-diabetes medication intake. f ) Hypertension is defined as actually measured systolic blood pressure of ≥ 140 mmHg, diastolic blood pressure of ≥ 90 mmHg or corresponding medication taken, given that the participants were aware of having hypertension. g ) History of cataract, glaucoma and diabetic retinopathy was assessed via self-report. h ) History of cataract surgery was assessed via self-report among those with reported cataract. i ) Pupil size per person is defined as the smaller pupil diameter of both eyes.

Supplementary Table 5. Observed relative frequencies of AMD status for two classification systems by sex and five-year age-groups. Shown
are the observed frequencies (for men and women in parentheses) for each AMD status based on the Clinical Classification 11 and the Three Continent AMD Consortium Severity Scale 13 in the 1,040 analyzed individuals with at least one eye gradable.  When evaluating specific AMD features per eye (drusen, pigmentary abnormalities, details on GA or NV), we found the following (Supplementary  ) For the Three Continent AMD Consortium Severity Scale, collapsing mild early AMD, moderate early AMD, and severe early AMD to "any early" AMD.  13 for the three categories "no AMD", "any early or intermediate AMD", and "late AMD".

Supplementary
These predictive values are computed based on the two-eye participants and used for the bias-correction of the relative AMD frequencies for the one-eye observations. The true AMD stage (defined as worst eye AMD stage) and the observed AMD stage (one eye) are denoted by Y and Y*, respectively. ) Early AMD is defined as modified Rotterdam Study classification grades 1-2 ("early" AMD) and 3 ("intermediate" AMD).