‘Much of what medical researchers conclude in their studies is misleading, exaggerated, or flat-out wrong.’ That was the headline in an article titled ‘Lies, Damned Lies and Medical Science’ published in the November 2010 issue of the Atlantic Monthly.1 The story featured the contributions of Dr John Ioannidis, whose essay ‘Why most published research findings are false’2 deserves more attention among investigators involved in studies of chronic GVHD.
The modern era of allogeneic hematopoietic cell transplantation (HCT) began 40 years ago. Chronic GVHD was initially recognized as a major complication of HCT after survival beyond the first 3 months improved from ∼10% during the early 1970s to more than 50% at the end of the 1970s. Only four randomized trials were carried out to evaluate treatment for chronic GVHD between 1980 and 2004, when interest in the field was spurred by the National Institutes of Health Consensus Conference on Criteria for Clinical Trials in chronic GVHD. Clinicaltrials.gov now lists six phase III trials for treatment of chronic GVHD. Despite this progress, clinicians and investigators still have much to learn.
In 2009, a Consensus Conference on Clinical Practice in chronic GVHD was held in Regensberg, Germany with the goals of summarizing the currently available evidence for diagnosis, immunosuppressive treatment and supportive management of chronic GVHD. The findings and recommendations regarding the diagnosis and treatment of pulmonary chronic GVHD are reported in the current issue. The Conference report by Hildebrandt et al.3 provides an excellent summary of literature describing symptoms and clinical manifestations of chronic GVHD involving the lungs, together with pulmonary function abnormalities, radiographic findings and pathologic findings that define the diagnosis. The remainder of this report, however, should be read with a critical eye.
Chronic GVHD of the lung, also known as bronchiolitis obliterans syndrome (BOS), is a rare complication of allogeneic HCT. As with most rare syndromes, clinical management of BOS has been based largely upon results from small to medium-sized case series reports in the literature, as the few large registry-based studies provide little insight into how this disease should be treated. The Conference report includes detailed algorithms intended to summarize guidelines for the diagnosis and management of BOS. Although such guidelines are useful, it should be recognized that most of the recommendations are based upon expert opinion, not data. We are concerned that these guidelines might not enable physicians to diagnose BOS effectively and efficiently. For example, we question whether a >5% annualized decrease in the forced expiratory volume in 1 s (FEV1) provides sufficient justification for an extensive medical evaluation. We have observed that ∼40% of patients have >5% annualized decline in FEV1 at day 100 (ref. 4), and 25% have >5% annualized decline in FEV1 at 1 year,5 whereas the prevalence of BOS has been estimated at ∼5.5%.6 We have also observed that 18% of patients have an FEV1 <80% with an FEV1/forced vital capacity ratio <0.7 before allogeneic HCT.7 The same study showed that many of these patients did not have clinically meaningful changes in lung function after HCT.
A few general principles deserve emphasis and would help to simplify the overall approach in evaluating pulmonary problems after HCT. All patients should have pulmonary function tests (PFTs) at baseline before HCT, and testing should be repeated at periodic intervals after HCT, especially among patients with chronic GVHD in other organ systems. We have suggested quarterly monitoring of the FEV1 by spirometry during the first year after HCT.4 More frequent monitoring of the FEV1 should be started if results suggest airflow decline, and further evaluation to assess airflow obstruction by pulmonary function testing is indicated if new impairment persists or progresses in the absence of infection. Evaluation of symptomatic patients should begin with physical exam and chest X-ray. If an infiltrate is detected, then further evaluation by conventional computed tomography scanning and bronchoalveolar lavage will probably be necessary. An unresolved issue requiring further study in this context is whether symptomatic patients should have bronchoalveolar lavage even in the absence of an infiltrate.
If serial PFTs show a clinically meaningful evolving obstructive pattern, a high resolution computed tomography scan can be used to help confirm the diagnosis of BOS by revealing air-trapping or other small airway features of BOS, such as bronchial wall thickening and bronchiectasis.8 Consensus regarding the definition of clinically meaningful change has not been reached. We and others have suggested that a 10% decrease in the FEV1 generally deserves further investigation, even though treatment might not be indicated.9 Decisions regarding the need for invasive procedures such as visually assisted thoracoscopic biopsy or open lung biopsy should be made according to the associated risks and the potential utility of the results in subsequent clinical management. As HCT already incurs significant financial costs, the evaluation should begin with less expensive and less invasive tests, such as spirometry, escalating to more expensive and risky approaches such as bronchoscopy and lung biopsy only when an appropriate clinical indication exists and only when the knowledge gained will truly affect the treatment plan.
The portion of the Conference report that might attract the most attention is the summary of information regarding treatment of chronic GVHD involving the lung, together with a table indicating the strength of recommendations and the quality of evidence supporting various treatment approaches. The table makes it clear that the quality of evidence supporting clinical practices in the management of chronic GVHD involving the lungs is generally poor. No randomized trials have ever been conducted specifically to evaluate treatment for this disease, although some studies have documented objective improvement by comparing pulmonary function before and after treatment. All, but one of the treatments reviewed by the conference had evidence ratings of III, indicating support from opinions of respected authorities based on clinical experience, descriptive studies or reports from expert committees. In many cases, the strength of the recommendations exceeds the quality of the supportive evidence.
The Regensberg Conference endorsed first-line systemic treatment with glucocorticoids for all patients, with the recommendation that treatment with a topically active inhaled steroid should generally be offered as well. Systemic steroid treatment is supported by grade II evidence from well-designed clinical trials without randomization, from cohort or case–controlled analytic studies or well-documented longitudinal studies. Topical steroid treatment is supported by two retrospective studies. The study by Bergeron et al.10 showed clinically significant improvement in the FEV1 after starting treatment with inhaled budesonide and formoterol in all seven patients who developed new fixed obstructive defects in lung function after HCT. Statistically significant improvement was demonstrated with the use of sophisticated mixed-effects models that accounted for intra-patient correlation of serial measures. The initial response after starting treatment with topical treatment was sustained in all but one patient. Despite the retrospective nature of the study, several features enhanced the credibility of the findings. The study was led by pulmonologists who clearly have a comprehensive understanding of lung disorders after HCT, and the diagnosis of obstructive airway disease was thoroughly characterized and documented. The study focused on patients who had no extra-thoracic signs of chronic GVHD and therefore did not require any change in immunosuppressive treatment, and the topical pulmonary treatment was identical for all patients.
The retrospective study by Bashoura et al.11 evaluated results of treatment with high-dose fluticasone in 17 patients with constrictive bronchiolitis after HCT. In all patients, the FEV1 declined during the interval from the evaluation before HCT to the evaluation before starting treatment. The FEV1 showed stabilization or improvement after 3–6 months of treatment in all but one of the patients. On the basis of the Wilcoxon's signed rank test, the authors concluded that the results showed a trend for improvement after treatment with fluticasone (P=0.057). The report did not describe criteria for the diagnosis of constrictive bronchiolitis. Treatment with fluticasone was not standardized, and the analysis did not account for the possible effects of any changes in concomitant treatment with other agents. The claim that lung function stabilized cannot be taken at face value, because the report did not show the shorter-term trajectory of lung function changes during the months immediately before beginning treatment with fluticasone. The improvement in FEV1 documented in the study by Bergeron et al.10 contrasts with the less striking improvement in the study by Bashoura et al.11 and suggests that the bronchodilator in the regimen used by Bergeron et al.10 had an essential role in the observed improvement.
The Regensberg Conference also endorsed extracorporeal photopheresis (ECP) and azithromycin for consideration in first-line treatment of chronic GVHD involving the lungs. Three publications were cited as specific support in considering ECP for first-line treatment of chronic GVHD.12, 13, 14 In our view, the evidence from these publications does not support this endorsement. One of the three studies was a single-arm prospective clinical trial evaluating the safety and efficacy of ECP for treatment of chronic GVHD and was not limited to patients with pulmonary involvement.12 Five of the patients enrolled in this study had documented restrictive pulmonary disease. Two of these patients had improvement in diffusion capacity without improvement in the restrictive defect. Two had no improvement, and in one additional patient with mild pulmonary involvement, lung function tests were not repeated. The other two studies were retrospective reviews. In one study,13 responses were observed in six of 11 patients with chronic GVHD involving the lung, but criteria for definition of pulmonary involvement and response were not provided. In addition, the report did not describe time to response or duration of response and did not account for possible effects of any changes in concomitant treatment with other agents.
The other study cited as supporting ECP focused specifically on patients with pulmonary involvement manifested as BOS defined according to NIH criteria.14 This report summarizes percent predicted FEV1 and carbon monoxide diffusion capacity (DLCO) adjusted for hematocrit in nine patients on three occasions, before HCT, before ECP and at last follow-up. Comparisons of results before HCT and before ECP showed marked deterioration in FEV1 and DLCO or both between the two assessments. The authors suggested that six of the nine patients had stabilized lung function after ECP. The data reported in the study indicate a positive 1.3% average difference in the paired values of FEV1 (P=0.71, paired t-test) and a positive 6.6% average difference in the paired values of DLCO (P=0.01). Like the earlier study of fluticasone,11 the claim that lung function stabilized cannot be taken at face value. The annualized change in FEV1 from the onset of ECP to the end of follow-up should have been compared with the annualized change in FEV1 from the end of the previous treatment for BOS to the onset of ECP. The authors appropriately noted that the results were limited by the retrospective design of the study, the small sample size and short follow-up.
In the abstract, the authors concluded that their results support the need for a larger prospective study to ‘confirm’ the impact of ECP on BOS. In the Discussion, they suggested that their findings should be ‘validated’ in larger prospective studies. This terminology reveals a bias that the use of ECP already has demonstrable value in the treatment of BOS. Such bias encourages physicians faced with decisions regarding treatment of BOS to adopt the presumption that ECP is effective, when in fact, this costly and time-consuming treatment might not produce any benefit for this indication. In addition, the presumption of efficacy discourages the impetus to undertake the difficult and expensive task of conducting a properly designed multi-center prospective study to address the question in an unbiased and scientifically rigorous way.
Evidence supporting the use of azithromycin in the treatment of BOS is stronger than acknowledged by the Conference report. The high-quality prospective study by Khalid et al.15 was focused specifically on patients with BOS and enrolled eight patients. The protocol pre-specified a 0.2 L increase in FEV1 and a 12% increase in forced vital capacity from the baseline evaluation to a subsequent evaluation at 12 weeks as clinically significant. According to these criteria, five of the eight patients had improvement in the FEV1, and six of the eight had improvement in forced vital capacity. Paired t-tests showed that the improvement was highly statistically significant for both measures. The patients who participated in this study did not have bronchoalveolar lavage to rule out infection before enrollment in the study, and the authors appropriately noted that the improvement could have resulted from anti-microbial effects as opposed to anti-inflammatory effects. The authors highlighted their results as ‘preliminary’ in the title and showed admirable restraint in concluding that the potential role of azithromycin in the treatment of BOS is intriguing and warrants further testing.
The single study cited by Conference report as support for considering the use of azithromycin in first-line treatment of pulmonary GVHD15 differs strikingly from the three studies cited as support for considering the use of ECP for the same indication.12, 13, 14 The high-quality evidence from the azithromycin study far outweighs the ostensible numerical advantage of the much less persuasive ECP studies. In evaluating the literature, the strength of the evidence is not necessarily related to the number of publications recommending any given treatment. Reliable conclusions regarding the strength of the evidence can be deduced only by carefully evaluating the magnitude of benefit consistently observed in high-quality studies.
The Conference report appropriately endorsed thoraco-abdominal irradiation, infliximab, etanercept and topical cyclosporine only for experimental testing, reflecting concern about side effects or lack of experience. On the other hand, the endorsement of mTOR inhibitors, calcineurin inhibitors, mycophenolate mofetil, imatinib, pulsed high-dose steroids and montelukast as justified for use in second-line treatment of pulmonary chronic GVHD could be questioned because they are all supported only by reports from retrospective studies or small, uncontrolled clinical trials of uncertain quality. Clinicians feel an overwhelming imperative to ‘do something’ when faced with progressive deterioration in a patient with pulmonary GVHD, and the treatment choice is guided primarily by safety considerations under the aphorism, ‘Above all, do no harm.’ In fact, all such treatments are case-by-case experiments, as the existing literature cannot offer confident expectations of efficacy.
Several factors might account for the large number of publications and their generally questionable value regarding the treatment of pulmonary GVHD. Without effective treatment, pulmonary GVHD is typically characterized by a progressive course. Clinicians understandably resort to testing any available agent that might plausibly offer efficacy, but such empirical testing is unguided by any specific insight into the pathophysiology leading to the disease. Pressure to report results of inconclusive case series and retrospective studies reflects the mantra ‘Publish or Perish’ often used to describe academic life. In the absence of higher quality studies with more rigorous standards of evidence and interpretation, however, the mantra could become ‘Publish and Perish.’ Patients will perish because truly effective therapies have not been distinguished from those that are ineffective. Investigators will perish for lack of success by any true measure, and the field will perish from lack of credibility.
In assessing various treatments, the Regensberg conferees relied too heavily on expert opinion, an approach that has been shown to be extremely unreliable.16 An example of such thinking is the suggestion that topical treatment with a long-acting bronchodilator plus an inhaled glucocorticoid with or without montelukast should be used to prevent BOS before the diagnosis is established. This suggestion is based on the idea that early treatment with anti-inflammatory agents might help to prevent the development of irreversible fibrotic or other structural changes. The conferees should have emphasized that this recommendation is highly premature for clinical application, but might be appropriate for clinical trials, because scientific evidence to support the efficacy and safety of this approach does not yet exist. The recommendation has seductive appeal for clinicians because the risk of harm is very low, but if this recommendation is widely adopted, equipoise will vanish and the opportunity to determine the efficacy of this approach in appropriately designed clinical trials will forever be lost.
The Conference served a useful purpose by engaging investigators in a dialog to improve standards for diagnosis of pulmonary GVHD and for evaluation of results in clinical trials. Going forward, the most useful result of the Conference is the concluding suggestion that the field might be well served by a rigorous prospective trial comparing the role of ECP vs azithromycin added to first-line systemic steroid treatment for patients with BOS diagnosed according to a standardized definition. Other suggestions related to treatment of pulmonary chronic GVHD, however, only echo current mythology, biases and the wish for efficacy. Myths can contain elements of truth, and as noted by Ioannidis and his colleagues,17 wish bias does not necessarily mean that the defended beliefs are wrong. Without question, clinical trials for pulmonary chronic GVHD pose enormous challenges for clinical investigators. We should nonetheless have the courage to put unproven beliefs to the test, and we should also have the humility to recognize that underpowered, early-phase clinical trials infrequently yield true results that withstand the test of time.2
The Atlantic Monthly story concluded with the suggestion that scientists should recognize failures openly instead of disguising them as success.1 This behavior would make it possible to move expeditiously toward testing further ideas in succession until the very occasional breakthrough is reached. In the words of the author, ‘As long as careers remain contingent on producing a stream of research that's dressed up to seem more right than it is, scientists will keep delivering exactly that.’