Introduction

Chronic graft-versus-host disease (cGVHD) is a major complication of allogeneic hematopoietic stem cell transplantation (HSCT)1, with features similar to autoimmune disorders. Symptoms usually emerge within 12 months after HSCT and may be restricted to a single organ or tissue, or they may appear at multiple sites2. Over the past 3 decades, the number of allogeneic hematopoietic stem cell transplantation (HSCT) procedures and hence the cases of cGVHD, have increased.

cGVHD often affects the eyes. In this paper, we define ocular GVHD as the dry eye arising after HSCT in cGVHD patients. As humans obtain 70–80% of their sensory information from vision, improving the diagnosis3,4,5,6 and treatment of ocular GVHD4,7,8,9,10,11,12 will substantially enhance patients' quality of life13,14. Manifestations of ocular GVHD are reported in nearly 30–50% of allogeneic HSCT recipients14,15,16,17,18,19,20. Ocular GVHD typically occurs within 6 months after HSCT. The dry eye can progress rapidly to a severe state16,17, with an increased risk for corneal ulcer and occasional perforation21,22.

GVHD occurring more than 100 days after transplantation was previously defined as cGVHD; however, this definition has been revised6,23. A research group of the National Institutes of Health (NIH) suggests that a diagnosis of cGVHD requires a distinction from acute (aGVHD) and the presence of at least one diagnostic manifestation of cGVHD or at least one distinctive manifestation; the diagnosis is confirmed by pertinent biopsy or other relevant tests in the same or another organ6. Importantly, these criteria mean that if ocular GVHD is the only cGVHD-associated symptom, cGVHD cannot be diagnosed.

There are several diagnostic criteria for assessing the severity of ocular GVHD. The NIH eye score is a clinical scoring system proposed as NIH consensus criteria in 2005, as part of a global assessment of cGVHD severity based on the number of organs involved and the degree of impairment of the affected organs6,23. The Japanese dry eye score, revised in 2006, is used in Japan for ocular GVHD as well as dry eye caused by other diseases24. It has three parts, which assess dry eye symptomatology, tear film abnormality and conjunctival and corneal epithelial damage24. More than 4,000 HSCT procedures are performed annually in Japan and the number is increasing. Thus, the Japanese dry eye score is widely applied. The dry eye workshop score (DEWS), reported in 2007, diagnoses dry eye based on dry eye symptomatology, tear film abnormality, conjunctival and corneal epithelial damage and lid/meibomian gland dysfunction7.

The NIH consensus criteria for organs beside the eyes25,26 and the NIH eye score were previously validated27,28,29. Inamoto et al. reported that the NIH eye score shows the greatest sensitivity to symptom changes among various common scales, including the global rating of eye symptoms, Lee eye subscale, Ocular Surface Disease Index (OSDI) and Schirmer test28. The results obtained using all the assessment tools except for Schirmer test are correlated with both provider-reported and patient-reported changes in ocular GVHD activity. There are studies investigating the classification of ocular GVHD assessed by DEWS 2007 score19,30,31,32. However, the NIH eye score has not been compared with the Japanese dry eye score or the DEWS 2007 score with respect to the severity of dry eye diagnosed. The novel aspects of the present study are (1) that we compared the diagnostic rate and severity of ocular GVHD assessed by three grading scales: the NIH eye score, the Japanese dry eye score and the DEWS 2007 score and (2) that all of the ophthalmological examinations were conducted using standardized methods and schedules through the Keio BMT program; in addition, all of the HSCT patients were examined at the Keio dry eye clinic before receiving HSCT.

The purposes of this study were (1) to examine the diagnostic rates of ocular GVHD, including its severity and prognosis using the three above-mentioned grading scales at Keio University School of Medicine and (2) to assess the agreement among the three grading scales.

To the best of our knowledge, this is the first study to compare the severity of ocular GVHD patients assessed by three grading scales and to determine the agreement among those scales.

Results

The demographic and clinical characteristics of the 82 patients who underwent HSCT are summarized in Table 1. According to the NIH consensus criteria, 57 patients had both systemic and ocular GVHD, 10 had only ocular GVHD, 10 had neither systemic nor ocular GVHD and the diagnosis for 5 patients was unknown.

Table 1 Demographic and clinical characteristics of ocular GVHD patients (total n = 82)

The diagnosis and severity of ocular GVHD in each of the 82 patients who underwent HSCT was graded using three different grading scales: the NIH eye score, the Japanese dry eye score and the DEWS 2007 score (Table 2). By these scales, ocular GVHD was diagnosed in 56 patients (68.3%), 51 patients (62.2%) and 52 patients (63.4%), respectively. The proportion of patients diagnosed with ocular GVHD was analyzed as shown in Table 3. The Kappa coefficient (K) for the proportion of patients diagnosed with ocular GVHD by the NIH eye score and Japanese dry eye score was K = 0.85 (95% CI: 0.75 to 0.98), by the NIH eye score and DEWS 2007 score was K = 0.89 (95% CI: 0.92 to 1.00) and by the Japanese dry eye score and DEWS 2007 score was K = 0.95 (95% CI: 0.79 to 0.99). Thus, all of the Kappa coefficients for the proportion of patients diagnosed with ocular GVHD showed good agreement, especially between the Japanese dry eye score and DEWS 2007 score.

Table 2 Number of patients with ocular GVHD of different severities, graded according to the NIH eye score, Japanese dry eye score and DEWS 2007 score (total n = 82)*
Table 3 Proportion of patients diagnosed with ocular GVHD

A Cochran Armitage trend test was conducted to analyze the proportion of progressive cases obtained with the three measurement scales. There was a significant relationship between the scores and prognosis of ocular GVHD using all three grading scales (p < 0.0001) (Table 4).

Table 4 Proportion of progressive cases using three grading scales

Discussion

In this study, we found that three grading scales, the NIH eye score, the Japanese dry eye score and the DEWS 2007 score, showed statistically good agreement in the diagnostic rate and the severity of ocular GVHD. The clinical manifestations and pathology of ocular GVHD have been previously studied in Japan. The natural course17, the baseline profiles of the ocular surface and the tear dynamics of ocular GVHD were reported32. Examination of the histopathologic features of ocular GVHD revealed stromal fibroblasts in the lacrimal glands, an increased expression of HSP4733 and morphological alteration of the conjunctival mucosal microvilli34. In contrast, the present study was conducted specifically to examine the severity and the agreement among different grading scales for assessing ocular GVHD.

In the present study, the NIH eye score, the Japanese dry eye score and the DEWS 2007 score indicated ocular GVHD in 68.3%, 62.2% and 63.4% of the HSCT patients, respectively. Similarly, Balaram et al35 reported that 62% of HSCT patients (21 eyes out of 34 eyes) showed Schirmer test scores < 5 mm. A recent cross-sectional study of 40 allo-HSCT patients investigated the severity of ocular GVHD19. They reported that ocular GVHD was seen in 24 eyes (30% of all the patients), with dry eye severity of score 1 (10.0%), score 2 (2.5%) and score 3 (17.5%) diagnosed by DEWS 2007 score. Ocular GVHD was seen 63.4% of HSCT patients diagnosed by DEWS 2007 score in our study. The proportion of each score diagnosed by DEWS 2007 score in our study was score 1 (14.6%), score 2 (28.1%), score 3 (11.0%) and score 4 (9.8%) showing more severe cases. It is possible that a different frequency of ocular GVHD patients would be observed in a study group even among tertiary hospitals and from another secondary hospital; for example, the patient profile might be different, or more patients may have a milder form of dry eye, because Keio University School of Medicine would tend to have more severe cases. Ocular GVHD accompanies cGVHD manifestations in other organs in a large percentage of cases; however, it can also be the initial and/or the only manifestation of cGVHD30,36. In our study, there were 10 non-systemic cGVHD patients (12.2% of all patients) who had ocular GVHD. Of these patients, some had severe manifestations of typical ocular GVHD, such as fluorescein and rose bengal staining scores of 6 points and a Schirmer test of 0 mm. Interestingly, one of these 10 patients later developed systemic cGVHD in the skin. As our study is retrospective, it is possible that some other patients who developed cGVHD initially in the eyes and then developed systemic cGVHD immediately afterwards. However, before their next ophthalmic examination, they were included in the group of systemic cGVHD patients with ocular GVHD. Therefore, prospective research is needed to clarify the development of this disease.

It should be noted that the NIH consensus criteria, which require the involvement of an additional organ system, did not diagnose these 10 cases as cGVHD. Ocular GVHD can, however, alert clinicians to the possible presence of cGVHD in other organs36. Importantly, when ocular GVHD is present, but cGVHD manifestations do not occur in other organs, the NIH consensus criteria could result in a failure to diagnose patients with cGVHD, which could delay appropriate treatment.

There were some other interesting discrepancies among the scores obtained using the three grading scales. In our study, 42 patients (51.2% of all patients) were diagnosed with moderate ocular GVHD (score 2) by the NIH eye score. In other words, 75.0% of the 56 patients diagnosed with ocular GVHD by the NIH eye score were deemed to have moderate ocular GVHD (score 2). This percentage of moderate ocular GVHD patients is similar to the rate reported by Arai et al27. On the other hand, we found that the ocular GVHD patients were more evenly divided into different groups of severity when assessed by the Japanese dry eye score and DEWS 2007 score than by the NIH eye score. In the NIH eye scoring system, the severity of ocular GVHD is assessed by the frequency of eye drop usage. For example, patients who need to use eye drops > 3 times per day or punctual plugs are diagnosed with moderate ocular GVHD (score 2). However, ophthalmologists instruct most of their ocular GVHD patients to use eye drops > 3 times per day. Hence, patients who have mild (score 1) or severe (score 3) ocular GVHD and use eye drops > 3 times per day could be included in the moderate ocular GVHD (score 2) group using the NIH eye score.

Some of the 42 cases diagnosed as moderate ocular GVHD by the NIH eye score were diagnosed as mild or severe cases by the Japanese dry eye score or the DEWS 2007 score. Notably, cases diagnosed as moderate or severe ocular GVHD by the NIH eye score are treated similarly23. Therefore, a case diagnosed as mild ocular GVHD by the other two methods but as moderate by the NIH eye score might be treated like a severe case. This could result in overtreatment, increasing the risk of cataract, glaucoma, corneal ulcer, or infection, induced by corticosteroid eye drop use. On the other hand, of the cases diagnosed as moderate ocular GVHD by the NIH eye score, 12 (14.6% of all the patients) were diagnosed as severe by the Japanese dry eye score. Since moderate and severe ocular GVHD by the NIH consensus criteria are treated similarly, the treatment in these cases would not necessarily differ as a result of their classification. However, an accurate evaluation of the severity of ocular GVHD is still important for transplantation teams, including hematologists and ophthalmologists29,37, since it affects the patient's health and prognosis. Thus, there were some interesting discrepancies in the assessment of ocular GVHD severity using the three grading scales, which indicate that inadequate assessments could lead to inappropriate treatment.

Jacob et al. reported the false positive and false negative rates of diagnosing ocular GVHD by Schirmer test without nasal stimulation to be 19.4% and 36.4% respectively17,38. In addition, the presence of dry eye may not always be due to ocular GVHD; other causes of ocular surface damage include infectious keratitis induced by the use of immunosuppression, the use of anti-glaucoma eye drops to treat corticosteroid-induced ocular hypertension, side effects of medications used to treat organ systems beside the eyes, or conditioning treatments including total body irradiation5. Therefore, a comprehensive ocular evaluation is recommended rather than screening with the Schirmer test to establish a diagnosis of ocular GVHD. It is important to include other ocular evaluations in the NIH eye score.

Meibomian gland dysfunction (MGD) is one of the most frequent complications of ocular cGVHD17,32 and it is recognized more frequently in HSCT patients with cGVHD (63.0%) than in those without it (23.5%). Atrophic meibomian glands and excessive fibrosis are observed in cGVHD patients with severe dry eye, by in vivo laser confocal microscopy39. The extent and severity of MGD are worse in patients with immune disorders like Sjögren's syndrome and cGVHD. Especially in cGVHD, inflammatory cell infiltration and excessive fibrosis are observed around the meibomian glands, similar to the cGVHD lacrimal gland. Therefore, we should further pay much attention to the frequency, severity and relationship between dry eye and the meibomian glands.

In this study, we found that the diagnostic criteria in each of the three grading scales were similarly useful for obtaining a diagnosis of definitive ocular GVHD. However, our findings were insufficient to define a standard evaluation system for ocular GVHD that can be used by both ophthalmologists and internists. We recommend that a prospective study be performed before and after HSCT, if possible. A common system for evaluating ocular GVHD worldwide should be easy to perform, reliable and familiar to both ophthalmologists and internists. Therefore, we recommend using the parameters proposed by ophthalmologists at the chronic ocular GVHD consensus meeting, which include the OSDI, corneal fluorescein staining score, Schirmer test value and conjunctival injection, in future studies40. In this context, it is worth noting that Alvis reported that the best combination of tests to diagnose dry eye with high sensitivity, specificity and accuracy is the OSDI, tear film break-up time and Schirmer test31.

Multi-center validation research by ophthalmologists is needed. At present, the chronic ocular GVHD consensus meeting of ophthalmologists to improve the sensitivity and specificity of diagnosing and treating this serious disease is in progress40.

Methods

Patients and methods

Informed consent was obtained from all the participants and an institutional ethics review board approval was obtained at Keio University School of Medicine (#2012-541). This study followed the guidelines of the tenets of the Declaration of Helsinki. Ethical guidelines for clinical study from the Japanese Ministry of Health, Labor and Welfare indicate the studies which do not involve biological tissue and which involve reviewing medical records retrospectively, researchers do not need to obtain written informed consent from patients. Following the guidelines of the ethics committees, we posted a detailed written guideline and ethical statement of the present study in our outpatient clinic of ophthalmology. The notice included the following factors: background of the study, purpose of the study, study design, privacy policy, freedom to withdraw, inclusion and exclusion criteria, the factors assessed in the medical records, advantage and disadvantage of participating the study, disclosure of the data, presenting the data at a conference or in a journal and contact information.

Inclusion criteria for this study were patients who received bone marrow transplantation and peripheral blood stem cell transplantation at the Division of Hematology, Keio University School of Medicine, Tokyo, Japan between April 2004 and January 2010. Patients 20 years or older who underwent HSCT for the first time and survived 100 days after the transplant and those with sustained donor engraftment were included. The severity of ocular GVHD is affected by multiple factors, including donor-recipient gender difference and the stem cell source41,42. In this study, we included ocular GVHD and excluded any patients with other risk factors for dry eye, so the severity of ocular GVHD could be compared under standardized conditions. In particular, patients with a history of Sjögren's syndrome, rheumatic disease, diabetes mellitus, Graves' disease, other systemic or ocular diseases including glaucoma, a history of ocular surgery including LASIK, contact lens use, or drug use including psychotropic drugs, were excluded. Patients who had dry eye before HSCT were also excluded. Patients who underwent cord blood stem cell transplantation were excluded. In total, the records of 82 HSCT patients (38 women and 44 men) with a median age of 45.5 years (range, 19–61) were reviewed retrospectively.

All of the HSCT patients underwent standardized clinical and ophthalmological evaluations at the dry eye clinic at Keio University School of Medicine before HSCT and 3, 6, 9, 12, 18 and 24 months after HSCT, as arranged by the Keio BMT program transplant internist. The Keio BMT program was begun in 1994 to establish a collaboration between internists and Ophthalmologists17. We used the data obtained approximately 6–9 months after HSCT to assess the severity of the ocular GVHD. We also assessed the degree of ocular GVHD 3–6 months after the onset of dry eye, for up to 24 months, to categorize the patients as no change, improved, or progressive. We previously found that severe dry eye with diminished reflex tearing appeared 36 months after the onset of dry eye17. A progressive case was defined as the worsening of objective ocular signs and/or the requirement for additional therapy. Some progressive cases were evaluated every week as additional examinations.

Tear function examinations and ocular surface vital staining

To compare patients with three measurement scales, we used the values for tear function and ocular surface vital staining abnormality based on the Japanese diagnostic criteria, as reported previously24,43. For ocular surface double staining, 2 μl of a preservative-free mixture of 1% rose bengal and 1% fluorescein was instilled into the conjunctival sac with a micropipette, as reported previously44. To determine the tear break up time, the interval between the last complete blink and the appearance of the first corneal black spot in the stained tear film was measured three times and the mean value was calculated. The Schirmer 1-test was performed without topical anesthesia using standardized strips of filter paper (Alcon Inc., Fort Worth, TX, USA). Readings were reported in millimeters of wetting for 5 min.

Grading scales

NIH eye score

In the NIH consensus, the ocular criteria for diagnosis are defined as, “new ocular sicca documented by low Schirmer test values with a mean value of both eyes ≤ 5 mm at 5 minutes or a new onset of keratoconjunctivitis sicca by slit-lamp examination with mean values of 6 to 10 mm on the Schirmer test is sufficient for the diagnosis of chronic GHVD if accompanied by distinctive manifestations in at least 1 other organ.” The NIH eye score has a range of 0–3 (0 = non dry eye, 1 = mild dry eye, 2 = moderate dry eye, 3 = severe dry eye) (Appendix 1; Supplemenary information)23.

Japanese dry eye score

The Japanese dry eye criteria for diagnosis are: (1) disturbance of the tear film (Schirmer test ≤ 5 mm or tear film breakup time ≤ 5 seconds); (2) conjunctivocorneal epithelial damage (fluorescein staining score ≥ 3 points or rose bengal staining score ≥ 3 points); and (3) dry eye symptomatology. The presence of all three criteria is necessary for a diagnosis of definite dry eye disease (Appendix 2; Supplementary information)24. The severity of the dry eye with this system is scored with a range of 0–2 (0 = non dry eye, 1 = mild dry eye, 2 = severe dry eye). A score of 0 indicates non dry eye presenting no manifestations/symptoms, a score of 1 indicates symptoms, Schirmer test ≤ 5 mm, fluorescein score < 3 points and rose bengal score < 3 points; and a score of 2 indicates symptoms, Schirmer test ≤5 mm, fluorescein score ≥ 3 points and rose bengal score ≥ 3 points17,45.

DEWS 2007 score

Third, the DEWS 2007 score has a score range of 0–4 (0 = non dry eye, 1 = mild dry eye, 2 = moderate dry eye, 3 = severe dry eye, 4 = very severe dry eye) and is determined from 9 parameters, including symptoms, Schirmer test score, tear film breakup time and abnormalities in the conjunctiva, cornea, tear, lid and meibomian glands (Appendix 3; Supplementary information)7.

Since each grading scale has a different score range, we evaluated the reduction in reflex tearing by conducting Schirmer tests with nasal stimulation. We defined severe dry eye as reduced reflex tearing (Schirmer test with nasal stimulation < 10 mm), as Japanese dry eye score 2, or as DEWS 2007 scores 3 and 4. Therefore, there was overlap in the patients with Schirmer test with nasal stimulation ≥ 10 mm, between those with NIH score 2 and Japanese dry eye score 1; and in the patients with Schirmer test with nasal stimulation < 10 mm, between the patients with NIH score 2 and Japanese dry eye score 2.

Statistical analyses

Descriptive statistics, including the median and range for continuous variables and the percentage and frequencies of categorical variables, were calculated in assessing the demographic and clinical characteristics of ocular GVHD patients. Agreement of binary diagnoses between pairs of the three grading scales (NIH eye score, Japanese dry eye score and DEWS 2007 score) was evaluated with the Kappa coefficient and its 95% confidence intervals (95% CI) were estimated.

The linear trend in an event proportion across a factor was tested by the exact Cochran-Armitage trend test. Significance levels for all tests were two-sided and 0.05. All data were analyzed with SPSS version 19.0 (SPSS Inc., Chicago, IL, USA) and the SAS version 9.3 (Cary, NC, USA). A p-value < 0.05 was considered statistically significant.