Introduction

Development of anti-scarring therapies for local delivery to the eye is a current therapeutic goal for patients with conjunctival cicatrization. Objective assessment of worsening of conjunctival cicatrization, and knowledge of normal conjunctival fornix depth values, are an essential requirement when evaluating the efficacy of anti-scarring therapies. In the prototypical scarring disorder ocular mucous membrane pemphigoid, it has been observed that scarring can progress despite apparent clinical control of inflammation.1, 2 Measurement of conjunctival cicatrization in mucous membrane pemphigoid, according to Mondino, Foster, Tauber and Rowsey, has focused only on inferior fornix depth.3, 4, 5 Progression of cicatrization by a reduction in superior fornix depth is often overlooked. Sight-threatening sequelae including lagophthalmos can ensue from upper subtarsal fibrosis and upper lid entropion. Inclusion of upper conjunctival fornix measurements ensures that the ocular surface is evaluated as a whole.

Early identification of any semblance of progressive cicatrization is the key to management of conjunctival scarring disorders. Clinically, one must seek increased conjunctival shrinkage or development of symblephara. Objective measurement of conjunctival fornix depth, with knowledge of the expected normal range of values, would allow earlier identification of conjunctival fornix shrinkage (Foster stage II),6 ideally before the development of (Foster stage III) symblephara.

The ideal measuring device should have excellent intra- and inter-observer reliability, be inexpensive and easily available, easy to use, simple to thoroughly disinfect between patients, and comfortable for the patient. Some have produced a plastic fornix depth measurer (FDM) engraved by jewelry software7 or a metal rod.8 Others have used a ruler in different aspects of gaze4 or the slit lamp beam.9

Schwab et al10 first published normal age-stratified data on inferior fornix depth using a short biconcave FDM. We developed a Moorfields modification of this FDM which is elongated, in order to allow measurement of the upper conjunctival fornix. Khan et al,11 using a modified plastic FDM, have measured conjunctival fornix depth in South Asian eyes, but not in patients who are ethnically white Caucasian, who constitute 86% of the UK population and also suffer from conjunctival scarring diseases.

This study was undertaken to evaluate intra- and inter-observer variability with the Moorfields FDM, and to establish normal central upper and lower conjunctival FD measurements according to age and gender, in an epidemiologic cross-sectional study of healthy white Caucasian eyes.

Materials and methods

Design and use of the FDM

Polymethylmethacrylate FDMs were created at Moorfields (designed by VS, SH and DC) using a hand-made plaster cast shaped to account for scleral curvature. A ruler is embedded within the plastic, with 2 mm black line gradations, and red lines indicating 10 and 20 mm. The maximum number of gradations is 15. The FDM device itself is approximately 42 × 11 mm in size (Figures 1a and b).

Figure 1
figure 1

Moorfields conjunctival fornix depth measurer. (a, b) Polymethylmethacrylate biconcave fornix measurer constructed by hand with an embedded ruler. Black lines are at 2 mm intervals, red lines are at 10 mm intervals. (c, d) The fornix measurer was inserted after instillation of 1 drop of proxymetacaine hydrochloride 0.5%. Subjects were asked to look up for measurement of the lower fornix, and to look down for measurement of the upper fornix. A central fornix depth measurement was obtained by identifying which mark aligned with the posterior lid margin.

The FDM was sterilized according to our local National Health Service Trust protocol for sterilizing non-disposable applanation gonioscopy lenses, that is, cleaning with mild detergent-soaked wipe in a circular motion for 20 s, rinsing with sterile water and drying with a non-linting tissue. After instillation of proxymetacaine hydrochloride 0.5% eye drops, patients were asked to look in the opposite direction to the placement of the FDM in the fornix: downgaze at 0600 hours to the floor for the upper fornix,and upgaze at 1200 hours to the ceiling for the lower fornix with the face in primary position, when gently inserting the FDM over the center of the pupil into the conjunctival sac (Figures 1c and d). Depth measurements were obtained by identifying which marks aligned with the posterior lid margin. Each mark represented 2 mm and if the lid margin fell between marks, an additional 1 mm was added to the total.

Care was taken to avoid stretching the fornix during measurements. No adverse effects were observed during the course of the study.

Ethical approval

Institutional research governance and ethics committee approval was obtained before commencing the study, and written informed consent was obtained from all participants. The study conformed to the tenets of the Declaration of Helsinki.

Subjects

There were 252 ethnically white Caucasian subjects aged 20 to 80+ years consecutively recruited into the study from the Outpatient and Casualty clinics of Moorfields Eye Hospital. All 252 subjects were measured by the same observer (GJ). Previous sample size calculations indicated that at least 240 subjects would need to be recruited.11 Before recruitment, each subject had an eyelid and ocular surface examination to exclude subtarsal fibrosis. Exclusion criteria were non-white Caucasians, patients with any ocular surface pathology or any ocular disease requiring long term topical treatment (eg, topical lubricants, intraocular pressure lowering medication, topical steroids), patients with a history of eyelid surgery, or surgery or trauma involving conjunctival incisions (eg, pterygium, vitreoretinal surgery), and patients with ptosis or giant fornix syndrome.12

Validation of the FDM measurements

Masked independent measurements of upper and lower fornix depth in right and left eyes were undertaken by two observers on 49 of the 252 participants. All FD measurements were performed twice, with the first of the two measurements used for inter-observer comparison, and repeated 1 h later with masking to the previous data, to estimate intra-observer agreement.

Statistical analysis

Intra-observer and inter-observer comparison using Bland–Altman plots of differences in measurements vs mean measurements, and 95% limits of agreement, were calculated using Excel for Macintosh (Microsoft Office 2011). As described previously,7 a 10% threshold or tolerance was used as an allowance for intra-observer variation.

For the epidemiological study, age-stratified and gender-stratified data were analyzed by two-way analysis of variance. Comparisons of data according to age or gender was analyzed by non-parametric methods with the Kolmogorov–Smirnov test. The analysis was performed using Stata V10 with P-values <0.05 taken as significant. No missing data was encountered.

Results

Demographics

The F : M ratio of subjects recruited was 1.3 : 1. The number of subjects recruited in each age decade is shown in Table 1.

Table 1 Estimated marginal means of upper and lower conjunctival fornix depths per age group and separated by gender

Intra-observer variation

All (100%; 49/49) of the intra-observer observations for the lower conjunctival fornix showed exact agreement for observer 1, and 94% (47/49) for observer 2. When allowing for 1 mm ‘tolerance’ (approximating to 10% of the normal lower fornix), 100% of intra-observer observations fell within 1 mm for both observers.

For the upper conjunctival fornix, 100% (49/49) of intra-observer observations showed exact agreement for observer 1, and 98% (48/49) for observer 2. When allowing for a 1.5mm tolerance (approximating to 10% of the normal upper fornix according to that measured in South Asians11), 100% of intra-observer observations fell within 1.5 mm for both observers.

Inter-observer variation

For the lower conjunctival fornix, inter-observer variation showed a mean difference in lower fornix measurements of 0.20 mm, with 95% limits of agreement (±2 SDs) of −1.36 to +0.95 mm (Figure 2a). Inter-observer agreement within the 10% allowance (approx ±1 mm) of total lower fornix depth was 96% (47/49).

Figure 2
figure 2

Bland–Altman plots evaluating inter-observer variation of upper (a) and lower (b) central conjunctival fornix depth. The millimeter difference in assessment between observer 1 and 2 is plotted against mean millimeter measurement for each patient, and the mean±2 SDs. Each plot shows results for 49 individual assessments, but due to overlapping data points there appear to be fewer than 49 assessments.

For the upper fornix, inter-observer variation showed a mean difference in upper fornix measurements of 0.02 mm, with 95% limits of agreement (±2 SDs) of −1.33 to +1.28 mm (Figure 2b). By using an allowance of 10% (approx ±1.5 mm) based on Khan et al’s total upper fornix depth,11 there was agreement of 95% (46/49).

No significant difference between right and left eyes was found, for lower or upper fornix depth, using repeated measures analysis of variance to account for correlation between the measurements on the left and right eyes of each volunteer.

Upper and lower conjunctival fornix depths according to age and gender

The overall mean upper FD across our Caucasian study population was 15.6 mm (95% confidence interval (CI), 12.5–18.8), and the overall mean lower FD was 10.9 mm (95% CI, 8.0–13.7).

Mean upper and lower FD stratified according to age and gender is shown in Table 1. There was a progressive decline in FD with age: mean upper fornix depth was 16.3 mm±1.2 at age 20–29, and 15.0 mm±1.8 at age 80 (P=0.04). Mean lower fornix depth was 11.25 mm±1.5 at age 20–29, and 10.0 mm±1.3 at age 80 (P=0.04).

Females have significantly smaller FDs than males. Mean upper FD was 15.3 mm±1.6 in females, 16.2 mm±1.4 in males (P<0.001). Mean lower FD was 10.6 mm±1.3 in females, 11.3 mm±1.4 males (P<0.001). Estimated marginal means of fornix depths per age group and separated by gender are shown in Table 1 and Figures 3a and b. When stratified according to age, lower fornix depth decreased with age (11.2 in 20s to 10.2 in 80s), and female subjects had smaller measurements across all decades examined (P=0.03). As this is a cross-sectional study, a caveat when making associations between the measurements and age is that the influences of factors (which may or may not influence forniceal depth, for example nutrition, height, smoking) may have been different for the current 80-year-old age group when they were 30 years old, compared with the current 30-year-old age group.

Figure 3
figure 3

Estimated marginal means of (a) upper and (b) lower fornix depths, per age group and separated by gender with 95% confidence intervals (CIs). When stratified according to age, lower fornix depth decreased with age, and female subjects had smaller measurements across all decades examined (P=0.03).

Patient comfort and tolerance

The FDM measurement was well tolerated, with little or no discomfort reported by patients. There were no instances of ocular surface damage, visual alteration, or infection following the FDM measurement.

Discussion

In this study, we have shown that despite using a slightly different custom-made fornix depth measurer, our results for central upper and lower conjunctival fornix depth in 252 white Caucasian patients are similar to those published previously by Khan et al,11 where measurements were taken in 240 South Asian patients, and also similar to the results published by Schwab et al10 in 420 patients. Table 2 summarises results from this study and the Khan and Schwab studies. There is no information regarding which ethnic group(s) the patients in the Schwab study belong to. All three FDMs give comparable results, and it appears to be less important which actual FDM is used, but more important that the user gains experience and consistent technique using one single device. Fornix depth measurements appear to show a different spread of the 95% CIs between the three studies. A study directly comparing the FDM of Khan et al, the Moorfields FDM (which was used in this study) and the Schwab FDM, in the same population of patients, could help ascertain the reason for this.

Table 2 Summary of Fornix depth measurement data using different FDM devices10, 11

The FDM used by Khan et al and the FDM used by Schwab et al, both have 2 mm gradations. Similarly the FDM used in this study has 2 mm gradations. An advantage of the 2 mm gradations is that it facilitates rapid reading of the measurement, by counting in 2’s rather than counting in 1’s from the beginning of the ruler. The FDMs used by Khan et al and Schwab et al, have been found to be accurate to 1 mm and this study found a similar tolerance of 1 mm for the FDM which was studied.

Apart from fornix depth measuring gauges, a number of other methods for measuring conjunctival fornix depth have been reported. Kawakita’s study using a metal rod in an ethnically Japanese Asian population did not measure the central conjunctival fornix depth, and the measurements are marginally (1.5 mm) smaller in the upper fornix.8 However the number of normal eyes measured in Kawakita’s study is small, only 20, and the age range is large (38–80 years). A larger age and gender-stratified study in this ethnic population, measuring central conjunctival fornix depth, would help clarify these findings. As they have different orbits, different anatomical landmarks for adnexal structures and globe axial lengths, further data on conjunctival fornix depth from a healthy Chinese or Japanese Asian population, and a healthy Black/African/Caribbean/Black British population, would be ideal in future studies.

Reeves et al9 have reported measuring bulbar conjunctival fornix depth in mucous membrane pemphigoid scarred eyes using the slit lamp. Given that the maximum length of the slit beam on the slit lamp is sometimes only 8 mm, and a proportion of eyes with healthy conjunctiva have inferior fornix depth measurements of greater than 10 mm, we have not found this method to be useful, and it was not possible to use this method as a comparison for this study. There is also no published data regarding measurements in eyes with healthy conjunctiva using the Reeves method. The sensitivity of this method in detecting cicatricial progression may be reduced in early disease, because the tarsal conjunctiva is often involved first (Foster stage I6). Notably, Reeves et al describe increased variability in their method when there are lesser degrees of conjunctival involvement, and it has been found by others that inter-observer agreement for this method is less consistent.7 Furthermore, it is not possible to evaluate scarring of the upper conjunctival fornix with this method.

Reeves et al have commented that the tarsus is a relatively fixed structure and only the conjunctiva below the tarsus tends to shrink. Scarring along the tarsal plate is one of the earliest stages of cicatrizing conjunctivitis (Foster stage I). This scarring along the tarsal plate often causes vertical contracture and shortening of the tarsus, which is well documented and common in cicatrizing conjunctival disease. By measuring fornix depth with a depth gauge which uses the posterior lid margin as the reference point, one not only measures contracture of the fornix below the tarsus, but also contracture of the tarsus itself, which can manifest in early disease.

An alternative method described by Rowsey,4 measuring the distance between the lower limbus and the posterior edge of the retracted lower eyelid margin in three different gazes: dextroelevation, laevoelevation and central elevation, suggests that the normal conjunctiva should be 15 mm in each observed area and a decrease of 3 mm is indicative of disease progression. Rowsey’s small study of only four patients with scarred eyes, did not evaluate intra- and inter-observer variations. The technique of putting the conjunctiva on tension is heavily reliant on the examiner and variations can be expected. In our clinical practice we have found it difficult to be consistently accurate in locating the 5 o’clock and 7 o’clock positions for measurement, when using this method, and furthermore there is no published data using this method in healthy eyes.

Concerns about variability of the degree of pressure used, and variable flattening of the curved contour of the conjunctival fornix when measuring fornix depth with a FDM are valid, however we believe that with experience and practice using one device, this variability can be minimized as indicated by the good intra- and inter-observer agreement found in this study.

Concerns about changes in the position and features of the conjunctival fornix with upgaze and downgaze compared with the primary position are valid, but (a) for the inferior fornix, all the inferior fornix measurements in the study of Khan et al were taken similarly in upgaze, yet the lower fornix measurement values reported are similar to those reported by Schwab et al, who does not specify that the measurements were taken in a particular gaze position, and (b) for the upper fornix, there is only data from Khan et al, who measured the upper fornix in downgaze. The position of the fornix is likely to change in downgaze, but providing all measurements are consistently taken in the same position of gaze (which they were, in both our study and the study of Khan et al), then the measurements reported in both our study and that of Khan et al could be used by others who wish to evaluate upper conjunctival fornix depth.

If there was an attempt to measure the fornix in primary position, there would be variability in fornix measurements according to the degree of opening of the palpebral aperture. Measuring the upper fornix in downgaze minimizes any contraction of the levator muscle which could bring significant variability into the measurements. Similarly measuring the lower fornix in upgaze minimizes contraction of the lower lid retractors. Furthermore, measuring the fornix in primary gaze is very uncomfortable for the patient and risks causing corneal trauma.

We have not observed any problems with sterilization of the FDM following its use in over 300 subjects, and have had no problems with cleaning the device over the markings. The markings on the fornix depth gauge used in this study are embedded within the plastic device, not milled into the surface.

In designing this FDM, our intention was to facilitate similar FDMs to be made by ocular prosthetists in other eye departments, to encourage accurate and reproducible measurement of fornix depth, and detection of disease progression, by all corneal and general ophthalmologists looking after patients with cicatrizing conjunctivitis. We have found that FDM measurements in the Caucasian population are similar to the South Asian population, and that non-identical FDMs appear to give similar results.

The goal when designing a FDM is to give information on1 the severity of scarring in relation to reference data in eyes with normal conjunctiva,2 progression of scarring over time. This is important for any individual diagnosed with ocular mucous membrane pemphigoid and other scarring conditions of the eye, and also for measuring the efficacy of anti-scarring therapies.