Introduction

Mucopolysaccharidosis type II (MPSII), or Hunter syndrome, is an X-linked, progressive, multisystem disorder caused by deficiency of the lysosomal enzyme iduronate-2-sulfatase, encoded by the IDS gene.1 Deficiency of this enzyme results in accumulation of glycosaminoglycans (i.e., dermatan and heparan sulfates) in almost all cells, tissues, and organs, with resulting enlargement of cells and disruption of normal cell physiology.1, 2 Affected individuals are almost always male, but symptomatic carrier females have been reported.1, 2 MPSII has been identified in 0.3 to 1.3 per 100,000 live born males in mainly white populations3, 4, 5, 6 and about 2 per 100,000 in Taiwan.7

MPSII is characterized by significant clinical heterogeneity related to phenotype (i.e., continuum from severe to attenuated) and genotype (i.e., more than 450 unique IDS mutations).1, 8 Historically, management of MPSII has been supportive (e.g., surgery, respiratory support) or palliative. Hematopoietic stem cell transplantation (HSCT), previously used to treat other MPS disorders, was applied to MPSII in the 1980s.9 In 2006, the US Food and Drug Administration (FDA) approved intravenous (IV) infusion of idursulfase (Elaprase, Shire HGT, Lexington, MA) for enzyme replacement therapy (ERT) in confirmed MPSII patients 5 years and older.10 This exogenous enzyme preparation cannot cross the blood–brain barrier; thus, it cannot affect cognitive decline. Today, questions remain about the safety, efficacy, and appropriate use of both ERT2 and HSCT11 for the treatment of individuals with MPSII.

This report summarizes a pilot systematic evidence review (SER) commissioned by the American College of Medical Genetics and Genomics (ACMG) Foundation. This pilot study is the first step in a transition from expert opinion and consensus-based clinical recommendations to evidence-based recommendations, particularly important for rare diseases in order to build confidence in new interventions. Supporting evidence from the original SER is included in text and/or Supplementary Materials online.

The stated aims were threefold: (i) to evaluate the utility of standard SER methodologies for review of rare diseases and inform development of specific methods for ACMG; (ii) to identify and summarize available data on benefits and harms of selected interventions by outcome, to support the potential development of an evidence-based clinical guideline; and to (iii) identify gaps in knowledge that most affect decision making about these interventions. No evidence-based or consensus clinical guidelines have been developed in North America to address the safety, efficacy, and use of HSCT, ERT, or potential combination therapies for MPSII.

We report on three of four questions addressed in the original SER (Supplementary Table S1):

  • Key question 1 (overarching)—What interventions lead to improved clinical or patient-centered outcomes in children and adults diagnosed with MPSII?

  • Key question 2—What are the benefits and harms related to clinical and patient-centered outcomes of disease-specific treatments for MPSII, including ERT, HSCT, and combination therapies?

  • Key question 4—How do benefits and harms differ based on presymptomatic diagnosis, phenotype, or age? (Information was insufficient to assess key questions for females.)

Key question 3, not reported here, was a first look at potential benefits and harms of common surgical interventions. Strength of evidence was consistently low, as the impacts of surgeries on future health/quality of life (e.g., adenotonsillectomy on apnea/hypopnea index) were not addressed.

Materials and methods

For this pilot study, we conducted broad-based searches of published databases (MEDLINE, Cochrane) and gray literature (Google, websites) (Supplementary Table S2) through 29 September 2014, and updated through 31 December 2015. An inclusive approach was utilized due to low disease incidence, limited prior review of literature, and the need to assess the SER methods’ utility in this setting. Included were English-language articles/documents addressing one or more key questions, associated interventions, and outcomes. Study designs included randomized controlled trials (RCTs), nonrandomized trials, observational studies, registry data, SERs, and health technology assessments. Study subjects were males with enzymatically confirmed MPSII, of any age, phenotype, genotype, stage of progression, or family history. Interventions of interest were ERT and HSCT, with associated outcomes (Supplementary Table S3). Selected gray literature included FDA data assessments, existing clinical guidelines, and policy/coverage decision documents (Supplementary Table S4).

Two investigators (L.A.B. and H.R.M.H.) entered findings into a validated database (DistillerSR, Evidence Partners, Ottawa, ON, Canada), independently reviewed citations/abstracts from database and hand searches, and selected relevant full articles and documents for data extraction using preset criteria. Discrepancies were resolved through discussion or input from a third reviewer (G.E.P.). Quantitative data from individual studies were presented in tables and figures. For each outcome measure reported, we attempted to estimate effect size and direction, significance, and important covariates (e.g., age, phenotype, prior treatment). Effect-size estimates included percent change from study baseline to endpoint (weighted average, range), mean difference (95% confidence intervals), and P values. Composite outcomes were not addressed. When possible, pooled estimates were derived using a random effects model (Comprehensive Meta-Analysis, BioStat, Englewood, NJ).

Quality (internal validity) of individual included studies (Supplementary Table S4) and the overall strength of evidence (Supplementary Table S5-A,B) for each outcome were individually evaluated by two reviewers (L.A.B. and G.E.P.).12 Criteria for quality assessment of findings from the gray literature (e.g., unpublished, web-based) were document type, source credibility, peer review level, potential for bias, generalizability, and dependability.12, 13, 14 We investigated all outcomes reported, but assessed their relative importance (critical, important, or of limited importance for decision making) to determine whether critical outcomes might be missing.15 Although a wide range of outcomes are important to patients and families, those evaluated primarily by clinicians and/or using medical records are referred to as “clinical,” and those with significant input from patients/family/caregivers as “patient-centered.”

Results

Database searches through 31 December 2015 identified 1,221 unique abstracts. A total of 356 articles and 27 gray-literature documents underwent full text review; 126 articles and 27 documents met inclusion criteria (Figure 1). These include two previous SERs of ERT studies.2, 16

Figure 1
figure 1

Results of literature search. Flow diagram demonstrates the review and selection process for published articles and gray-literature documents identified by the original SER (through 31 December 2015). aThe ERT box includes 25 studies, 2 SERs, 7 studies on MPSII prevalence and phenotypes, 4 on very early ERT, and 9 on other topics. The 11 gray-literature documents include the Shire package insert, 2 US Food and Drug Administration reports, and 8 gray documents included in Supplementary Table S10. bGenotype–phenotype association data will be reported in a separate paper in preparation. ERT, enzyme replacement therapy; HSCT, hematopoietic stem cell transplantation; SER, systematic evidence review.

Outcomes of ERT with IV idursulfase infusion

FDA approval of Elaprase was based on one industry-sponsored RCT.17 The literature search identified 24 additional studies having some quantitative data on 12 outcomes of ERT (Table 1). Studies variably enrolled patients with only attenuated MPSII,17, 18, 19, 20, 21, 22, 23, 24, 25 both phenotypes,23, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35 and only the severe phenotype.36, 37, 38 Only one study39 and the RCTs17, 20 had a control group; many results describe changes in outcome measures from study baseline to endpoints. Table 1 presents the characteristics for each study and specific outcomes addressed. Table 2 summarizes results. These results follow, ordered by the confidence in the evidence (SOE).

Table 1 Characteristics of 25 clinical studies that provided data on 12 outcomes of interest in patients with MPSII who received intravenous infusions of ERT (Elaprase)
Table 2 Summary of results for outcomes in MPSII patients treated with ERT

Urinary glycosaminoglycan levels

Because of the high variability in absolute urinary glycosaminoglycan (uGAG) measurements between and within testing methods (Supplementary Figure S2), uGAG results from nine studies (330 patients) were standardized as a proportion of each group’s baseline level.17, 19, 20, 21, 22, 24, 28, 35, 36 This allowed visualization of relative changes in uGAG levels in treated patients (Figure 2). Results were consistent in effect size (percent uGAG reduction from baseline to 53 weeks or other endpoint) and direction, and demonstrated that:

  • The 0.5 mg/kg/week idursulfase dosage at 53 weeks in six studies17, 20, 21, 24, 28, 35 of 103 treatment-naive MPSII patients reduced uGAGs from baseline levels by a weighted average of 61% (range 43% to 80%), but was statistically significant in only two studies.17, 21

  • uGAG reductions were >68% in four studies (139 patients) reporting more than 1 year of treatment.19, 21, 28, 35

  • Both idursulfase doses (0.5 mg/kg/week or every other week) in the large RCT17 (64 cases, 32 controls) resulted in a significant uGAG reduction (P<0.0001), but the higher dose resulted in a larger reduction (P=0.039), demonstrating a dose response. Results were not significant in two other studies, but supported a positive relationship between dose and the extent of 1-year uGAG reduction.20, 22

  • uGAG reductions were not related to age (61 patients <5 years24, 28, 40 versus 10 adults21 or 145 patients with a range of ages)17, 35, 36 or phenotype (60 mainly severe patients28, 35, 36 versus 231 attenuated17, 19, 20, 21, 24).

  • The effect of treatment stabilized over time (1–3.5 years) in 3 studies (130 patients).19, 35, 36

Figure 2
figure 2

Relative uGAG levels reported in nine studies of MPSII patients by ERT dose and length of treatment. Relevant information for included studies are summarized below the figure.17, 19, 20, 21, 22, 24, 28, 35, 36 The relative changes in uGAG measurements are shown on the vertical axis beginning at the baseline of 100% (initiation of treatment—reduction of 0%). The x axis identifies the individual studies. Open circles (studies 1 and 2) represent patients receiving a placebo (controls). Thick lines indicate treatment with 0.5 mg/kg/week idursulfase, and thin lines represent alternative doses (lower or higher). Patients were treatment-naive in studies 1 to 8, but not 9. Relative uGAG reductions were observed by 3–4 months, ranging from 43–80% at 1 year at 0.5 mg/kg (studies 1–8). Study 9 compared two drugs in patients previously treated for 14 months with Elaprase (0.5 mg/kg idursulfase). After a 2-week “washout” period, subjects were assigned to one of three groups, continuing 0.5 mg/kg Elaprase or changing to 0.5 or 1.0 mg/kg Hunterase (idursulfase beta). aPercent of enrolled patients having an attenuated/mild MPSII phenotype; the percent with a severe phenotype can be derived. bFU (yrs) = duration of treatment/follow-up in years. ERT, enzyme replacement therapy; uGAG, urinary glycosaminoglycan.

This effect of ERT in the large RCT17 has also been reported as the mean difference of −207.4 (95% CI −284.8 to −129.9) between uGAG levels in treatment (0.5 mg/kg/week) and placebo arms at 1 year.2, 16, 17

The benefit of uGAG reduction was ranked as moderate SOE, based on an early effect in all studies (e.g., 3–4 months), consistent large effect size and direction, precision, generalizability to varying subpopulations of MPSII patients, and an observed dosage gradient. Although sometimes referred to as an “efficacy” variable,21, 22, 24, 33 decreasing excretion of uGAGs directly demonstrates the functional capacity of the infused enzyme to degrade GAGs, a critical initial observation. However, the sources of the excreted GAGs (e.g., circulating, liver/spleen, other tissues/organs), and the clinical impact, must be determined by documenting changes in short- and long-term clinical and patient-centered outcomes deemed critical or important.

Liver volume

Hepatomegaly was reported in 75% of untreated patients in the RCT17, but liver volume in 80% of the patients with hepatomegaly normalized after 1 year of treatment (no change in controls).17 In three studies (130 cases), liver volume (measured by imaging) was reduced by a weighted average of 25% (range 24% to 33%) within a few months (P<0.002 all studies),17, 19, 21 then remained relatively constant for 2 years19 (Supplementary Figure S3). A fourth study24 reported reduced liver size (P<0.001) using a different metric (surface area), but could not be directly compared. Based on limited data, this effect appeared consistent in attenuated patients regardless of enzyme dose.17, 19, 21

The potential benefit of liver volume reduction was based on consistently large and statistically significant effect sizes, even in small studies (moderate SOE). However, the impact of this finding on longer-term clinical outcomes (e.g., hospitalizations, mortality) is not clear. Liver volume reduction could be considered an intermediate patient-centered outcome (e.g., decreased discomfort, improved quality of life), but patient/family perspectives were not studied. Findings for spleen volume were consistent (data not shown).

Development of antibodies

Ten studies reported immunoglobulin G (IgG) antibody status,17, 18, 19, 20, 21, 22, 23, 24, 28, 36 one IgM status,17 four IgE status17, 21, 31, 36 and five measured neutralizing IgG antibodies.18, 19, 22, 24, 28 Based on four studies (74 cases), 58% (95% CI 46–69%) of treated patients developed IgG antibodies in the first year (0.5 mg/kg/week).17, 20, 21, 24 Based on four studies (150 cases), a smaller proportion (33%, 95% CI 20–49%) had detectable IgG antibodies at 2 to 4 years (difference P=0.019) (Supplementary Table S7).18, 22, 23, 36 A subset of 130 of these patients (weighted mean 66%, range 41–100%) developed neutralizing antibodies (translates to about a quarter of all treated patients).18, 19, 22, 24, 28

Evidence supports the potential harm that about half of patients undergoing ERT will develop IgG antibodies, and some smaller proportion will develop neutralizing antibodies (moderate SOE). The presence of persistent IgG17, 18 or neutralizing antibodies19 may render treatment less effective by limiting reduction in uGAG levels17 (one study reported no difference21) or improvement in absolute forced vital capacity (FVC) results.19 Another study reported that smaller reductions in uGAG levels and liver volume after treatment may be associated with both neutralizing antibodies and the patient’s IDS genotype.41 The impact of IgG and neutralizing antibodies on clinical outcomes could not be determined due to conflicting evidence (low SOE).17, 18, 19, 21, 41

Three studies17, 21, 22 did not identify IgE using standard assays, but one31 found IgE antibodies in 21% of patients using a high-sensitivity assay. Anaphylaxis occurred in three of the IgE positive cases, but this association has not been confirmed. One study reported IgM antibodies in 3% of patients. Harms of IgM and IgE antibodies were unclear based on limited numbers of studies and patients (insufficient SOE).

Infusion-related reactions and serious adverse events

Six studies (213 cases) reported that about two-thirds of patients (median 63%, range 32–75%) experienced an infusion related reaction (IRR) during, or within 24 h of, infusion in the first year of treatment (0.5 mg/kg/week), most within 3 months.17, 19, 20, 21, 24, 26 The RCT17 reported IRRs in 69% of patients, with a similar unexplained proportion experiencing IRR in the placebo group (66%) (Supplementary Table S8). Five studies (182 cases) reported that serious adverse events (SAEs) occurred in fewer patients (median 13%, range 2 to 25%) in the first year (0.5 mg/kg/week).17, 20, 21, 24, 26 The RCT reported 25% SAEs in patients, and an unexplained 28% in the placebo group (Supplementary Table S8).17 No patient withdrew based on these adverse reactions/events. Four deaths in 245 patients were reported as unrelated to ERT.17, 19, 21 The clinical impact of these harms is unclear, due to some small studies, variable definitions of reactions and adverse events, events in controls, and high unexplained variability in rates between studies (low SOE).

Six-minute walk test

The 6-minute walk test (6-MWT) evaluates the integrated response of cardiopulmonary and neuromuscular functions into a single outcome.17, 42 Significant pre- to posttreatment changes in walk distance reflect improvement or deterioration in these functions, attributed to treatment.42 The FDA approved 6-MWT as the primary efficacy measure for another MPS study, based on absence of other informative outcomes.43 Limitations associated with the 6-MWT include variability in administration; lack of appropriate reference ranges; and variable impacts of age, stature, impaired cognition, musculoskeletal limitations, medications, and comorbidities.42, 43 Five studies17, 19, 20, 21, 22 reported on patients with attenuated phenotypes (i.e., none severe). The average posttreatment 6-MWT distance was greater in all these studies (Supplementary Figure S4), but only the RCT reached statistical significance (approximate 1-year increase of 10%).17, 19 However, 40% of study enrollees were not tested at 36 months,17, 19 possibly biasing an estimate of effect. The impact of this benefit was unclear, based on small effect size, one controlled study, testing variability, possible attrition bias, and unknown association of 6-MWT with long-term health outcomes in patients with attenuated or severe phenotypes (low SOE).

Growth: height

Very young untreated MPSII patients were of average height, but as they aged their growth velocity lagged behind the general population. By age eight, four studies showed the average age-adjusted z-score was greater than or equal to −1.5.23, 25, 27, 30 Eight studies variably reported actual height, growth velocity, or standardized measurements (z-score normalized for age and sex).19, 23, 24, 25, 27, 30, 35, 39 Five studies reported data in age-adjusted z-scores;23, 27, 30, 35, 39 only one39 had an untreated control group (Supplementary Table S9). ERT did not reverse the downward trend, but appeared to result in a less steep decline in age-adjusted height. The benefit of increased height was unclear, due to small studies, small effect size, some attrition, and potential difficulty measuring height in patients with skeletal abnormalities (low SOE).

Pulmonary function: forced vital capacity

As expected due to growth, absolute forced vital capacity (FVC) results improved in three studies17, 19, 21 (significant at 1 year17 and 3 years19). However, after normalization for age and sex, studies found a small (nonsignificant) increase in FVC%17 or essentially no change19, 21 (Supplementary Figure S5). The benefit of increased FVC% was unclear, due to small effect size, nonsignificant findings, and potential impact of normalization on the outcomes (low SOE).

Joint range of motion

There was no consistent finding of improvement in joint range of motion (JROM) among enzyme-naive19, 21 or previously treated22 patients. It is reasonable to assume that improvements in JROM could improve function and quality of life. However, this benefit cannot yet be assessed, due to variability in the joints assessed and in the selection and reliability of measures, small effect sizes, and not accounting for multiple comparisons (insufficient SOE).

Physical disability/quality of life

Three studies19, 29, 34 reported modest improvements in attenuated patients with treatment, but another24 found only small changes. Two reported consistently smaller effects in patients with severe phenotypes.29, 34 Each of the four studies used entirely different survey tools with varying focus on clinical versus patient/family/caregiver perspectives. The beneficial outcome of improving quality of life could not be assessed due to limited studies, small effect size, some attrition, and one-time use of different (one unvalidated) survey tools (insufficient SOE).

Cardiac function

Three studies20, 21, 22 provided too few quantitative data to make conclusions about improvement of cardiac function based on treatment. The benefit of improving cardiac function could not be assessed, due to number, size and quality of studies, lack of controls and long-term follow-up, and attrition (insufficient SOE).

Sleep apnea

Two small studies20, 21 provided very limited data to make any conclusions regarding the effect of ERT on obstructive sleep apnea (e.g., apnea hypopnea indices).33, 34 The benefit of improving sleep apnea could not be assessed due to size and quality of studies, inconsistent and nonsignificant findings, and attrition (e.g., patients with tracheostomy, continuous positive airway pressure excluded) (insufficient SOE).

Long-term outcomes

No studies addressed longer-term measurable ERT outcomes, such as causes and durations of hospitalizations, common quantifiable measures of progression (e.g., frequency/severity of respiratory infections, progressive need for additional treatment or support), repeated testing for patient-centered outcomes (e.g., function, pain, quality of life), and survival.

Possible impact of IV ERT cessation

Potential harmful effects of ERT cessation have been reported in 6 MPSII cases,44, 45 as well as 13 cases in other metabolic disorders (2 MPS1;46, 47 4 MPSVI,44 7 late-onset Pompe disease48). Outcome measures included reduced uGAGs and liver volume; improved 6-MWT and FVC%; and frequency/severity of respiratory infections, diarrhea, and sleep apnea. Improvements with treatment were lost soon after ERT cessation; some declined below pretreatment status. Restarting treatment in 11 patients improved outcomes, but only two cases (both MPSVI) recovered to levels observed with initial treatment. This evidence is preliminary, due to small numbers, lack of controls, and potential for bias related to case selection. However, the possibility of risk associated with ERT cessation may warrant further study.

Presymptomatic diagnosis of MPSII

It has been proposed that early diagnosis and initiation of treatment (e.g., family history, newborn screening) would improve ERT effectiveness.11, 49 Pathology studies of five 18- to 30-week fetuses reported identification of prenatal GAG storage in liver/spleen, heart, and central nervous system.50, 51 A case series40 provided additional evidence that developmental abnormalities can appear soon after birth and, possibly, prenatally. Eight cases treated at less than 3 years37, 38, 40 showed that ERT was tolerated in very young patients (i.e., none discontinued treatment). However, evidence on benefits and harms are limited by study quality, data limitations, and the possibility of confounding (insufficient SOE).

Potential benefits and harms of in-home idursulfase infusion

Two small studies reported about 6% (95% CI 2.9–9.8%) fewer missed infusions in patients moved from outpatient to in-home idursulfase infusion.52, 53 The benefit of fewer missed infusions was based on study quality, effect size, statistical significance, and consistency between studies, but selection protocols for patients offered in-home infusion raised questions about generalizability of findings (low SOE). The observation of no increase in harms related to IRRs52, 53, 54, 55 was unclear due to poor study designs, small numbers, and incomplete reporting (low SOE). Information on costs and experience of in-home ERT was based on two small parental studies (insufficient SOE).53, 56

Hematopoietic stem cell transplantation

Hematopoietic stem cell transplantation (HSCT) is not routinely used for MPSII treatment in the United States due to earlier reports of SAEs and death.9, 57 However, most HSCT studies were of low quality (Supplementary Table S9) and the SER summary results (78% survival, 62% event-free survival, 1982 to 2007) were derived from studies that may not have accounted for patients’ age, phenotype, advanced disease, neurological symptoms, donor status, or transplant protocols. Posttransplant enzyme levels were reported as normal in leukocytes, but low in serum; uGAG levels were variably decreased but did not normalize.58, 59, 60 HSCT for MPSII is routinely offered in Japan, with a reported 5-year survival rate of 88.5% (1990–2003).61 A 2009 HSCT study in MPSI (Hurler) patients reported that event-free transplant survival of 58% from 1994 to 2004 rose to 91% between 2005 and 2008.57 Improvements in pre- and posttransplant protocols, earlier transplants, new immunosuppression drugs, or better access to acceptable donors may have contributed to improved event-free survival.

Evidence on harms of HSCT is limited based on age and poor design of many studies (low SOE). SOE was low for uncertain clinical benefits, based on age and quality of some studies, qualitative results, and small treatment comparison studies.32, 34, 61 Evidence of benefits and harms of HSCT treatment for neurological effects is based on a small number of mainly qualitative reports (insufficient SOE).23, 24, 28

Combined therapies

One completed62 and one ongoing (AIM-IT) ERT study have combined monthly intrathecal infusion (idursulfase-IT) with weekly IV infusion of idursulfase. No identified study focused on outcomes of combined HSCT and IV ERT.

DISCUSSION

The evidence showed that ERT with weekly IV idursulfase infusions generally reduced uGAG levels in urine and liver/spleen volume in patients with MPSII. For some other outcomes, evidence was less clear. No evidence was found for long-term outcomes considered critical or important. Using pulmonary disease as an example, such outcomes could include health monitoring (e.g., frequency of respiratory infections), cause and duration of hospital visits, progression to supportive measures (e.g., ventilation), and survival.

Until evidence on the clinical impact of emerging treatments is available, the focus for MPSII treatments is on IV ERT. The sequential questions that follow highlight some information identified, as well as knowledge gaps.

Is treatment of MPSII patients with IV ERT effective?

If the clinical outcomes defining a treatment benefit were only reduced uGAGs and liver/spleen volume (presuming applicability to all MPS II subpopulations), then treatment is effective. However, if documented improvements in longer-term outcome measures (e.g., improved or stabilized cardiopulmonary function, reduced hospitalizations, survival) were used instead, then IV ERT treatment has not yet been shown to be effective. Thus, the definition of effectiveness is dependent on the clinical outcome(s) selected as critical or important.

Which MPSII patients should be eligible for IV ERT?

The answer is contingent on selection of outcomes and whether treatment is considered effective (presumes safety and treatment effectiveness in MPS II subpopulations). If safe and effective, all patients would be offered treatment. If not, no patients would be offered treatment. We identified no evidence-based intermediate definition of criteria for rational selection of candidate patients that would support offering treatment to a subpopulation of patients (e.g., attenuated/severe, infants/children/adults).

When should IV ERT begin?

It has been suggested that initiation of IV ERT prior to 6 months of age might reduce both circulating GAGs and the smaller accumulations of GAGs in tissues, resulting in improved clinical outcomes. However, evidence is insufficient to support this premise. Current evidence shows that ERT was tolerated in infants (i.e., none discontinued treatment), and reduced uGAG levels. However, there was insufficient evidence on how the balance of benefits (e.g., improvements in clinical and patient-centered critical outcomes) and harms (e.g., longer course of treatment, increased potential for later development of neutralizing antibodies, as yet undiscovered harms15) might change were IV ERT to be moved closer to birth (e.g., via newborn screening). Consequently, the optimal approach for early treatment would likely be in the context of a research trial.

How long should ERT continue?

The only evidence-based indications for cessation of IV ERT identified were (i) detection of neutralizing antibodies that could render treatment ineffective, (ii) treatment-related adverse reactions that could not be controlled, and (iii) a patient/family decision. The only consistently used outcomes for monitoring ERT treatment are uGAG levels and liver/spleen size. If uGAG levels and liver/spleen volume in all or a subset of patients normalized, then one might continue treatment or consider reducing the treatment dosage or frequency to determine the minimal effective dose. After 3 years of follow-up, many, but not all, studied patients had uGAG levels below the upper limit of normal.

Some have suggested initiating ERT for 6 to 18 months, then discontinuing if improvements were not observed.11, 49 However, for outcomes critical for decision making, it is not clear what measures, and at what levels, should define “improvement.” In addition, low-quality evidence suggests that there may be risk associated with cessation of ERT treatment.44, 45, 46, 47, 48

Results in context

Figure 3 shows a hypothetical cohort of 100 males diagnosed with MPS II beginning 2 years of IV ERT between ages 5 and 10 years. This approximates the number of cases that would occur in this age range in the United States in 1 year. The potential benefits (lines 2 through 11) and associated harms (rows 12 through 14) are provided for those with severe versus attenuated phenotypes. The figure legend lists sources of estimates for benefits and harms. Some represent “reasonable estimates” given limited and relatively low-quality information. When data were insufficient to estimate effect, the result for effect reads “little if any” (based on the null hypothesis of no effect, and the likelihood that a large effect would be identified). In general, the benefits are listed from clinically less important outcomes (e.g., uGAG reduction) to more important outcomes (e.g., cardiopulmonary).

Figure 3
figure 3

Modeled clinical outcomes for a hypothetical cohort of 100 male MPS II patients (5 to 10 years old) receiving ERT for two years. Potential outcomes are stratified by phenotype (severe versus attenuated). When possible, sources of estimates are identified from the Results and Supplementary Materials as indicated. (1) Supplementary Figure S1; MPSII patient survival by phenotype. (2) Supplementary Figure S2; absolute uGAG levels. (3) Supplementary Figure S3; relative liver volume change (%). (4) and (5) Supplementary Figure S4; 6-minute walk test (6-MWT) distance (meters). (6) Supplementary Table S8; Growth (Height). (7) Joint range of motion (JROM). (8) Supplementary Figure S5; Pulmonary function: forced vital capacity (FVC) and %FVC. (9) Cardiac function. (10) Sleep apnea. (11) Physical disability/Quality of life. (12) Supplementary Table S7, assumes an average SAE patient-specific rate of about 10%. (13) Supplementary Table S6; assumes IgG antibodies in about half of treated patients, with at least 41–100% of those patients developing neutralizing antibodies. (14) 2012 Health Technology Assessment23 estimate of $1.06 million per patient for 2 years of weekly infusions of 0.5 mg/kg idursulfase (converted to 2014 US dollars).

Limitations of studies

Because MPSII is a rare disease with considerable clinical heterogeneity, it was not surprising that certain study designs, particularly RCTs, were difficult, but not impossible, to organize and successfully complete. In addition, maintaining a placebo or control group for more than 6 months was challenging. These constraints emphasize the need for strict attention to methodological, statistical, and reporting principles. Given that only one RCT may be done per rare disease treatment, it is critical that the trial address the most important questions concerning expansion of treatment.

Nearly all included studies lacked controls, as well as consistency in the selection of measures and validated scoring systems used to assess specific clinical outcomes (e.g., growth, joint range of motion, cardiac function, disability/quality of life). Some studies provided insufficient information to characterize study participants (e.g., phenotype, progression/clinical status, prior treatment), or did not clearly describe randomization and concealment methods. Others based effectiveness conclusions on indirect/surrogate outcomes, selected inappropriate patient subsets for result analysis (e.g., patients remaining after dropout, best responses from a time series), or made unclear adjustments to effect sizes and P values. One large study failed to report on all outcomes that were planned for study, raising the risk of selective outcome reporting.17

With the exception of the large RCT17, 19 and Hunter Outcome Survey reports,26, 30 studies included 40 or fewer patients and lacked statistical power to detect any but the largest treatment effects. Variation in definitions and measures selected (e.g., SAEs, cardiac function, JROM) made data pooling difficult. Stratification of ERT outcomes to determine the impact of multiple covariates (e.g., age, phenotype, prior treatment, length of treatment, dosage) was rarely possible. Attention to these and other methodological issues would have resulted in higher quality ratings of studies and SOE grades.

About one-third of 25 included studies (Table 1) were funded by the manufacturers of Elaprase or Hunterase. More than half included one or more authors who were current/past employees or paid consultants of these companies (Table 1). The potential for bias related to conflict of interest was present. Authors of future (and ongoing) studies should consider making raw data available to other researchers for independent analyses. Such outside review may reduce the perceived bias.

Application of SER methods to rare diseases

The current SER for MPSII identified more studies than anticipated, and provided an overview of the quantity and quality of existing studies with relevance to the key questions. Observational studies were often small and of relatively poor quality, but provided data that could confirm and extend results from the large RCT. Gray literature provided contextual information on international interpretation of evidence on ERT and the resulting policy decisions (Supplementary Table S10).

As RCTs will probably remain uncommon, a comprehensive review of published literature is warranted for future rare disease SERs. However, the scope of the literature review (i.e., the number of articles selected for review) and the resources needed to review and extract the data can be constrained by restricting the scope. For example, this can be done by refining four or fewer key questions that are highly focused on information that is most critical to clinical guideline development, and that addresses what physicians and patients/families want and need to know. Searches of the gray literature can be limited to review of web-posted SERs, regulatory reports, and existing clinical guidelines.

A recent publication63 proposed that establishing treatment effectiveness and supporting rapid translation into existing care systems requires new research approaches. These include emphasis on identifying long-term impacts of care (i.e., risks and benefits, clinical and patient-centered) and addition of new study designs (e.g., hybrid observational/pragmatic trials) that help account for other factors (e.g., cointerventions, clinical heterogeneity, medication) impacting outcomes, and support real-world critical decision making.63

Key gaps in knowledge

Current knowledge and continuing research allows clinicians to provide the best available objective information to MPSII patients and families on treatment options and their potential benefits and harms. Addressing key gaps in knowledge entails the following:

  • Engaging clinicians, patients, family members, researchers, and other stakeholders to develop a consensus list of outcomes that are ranked (e.g., critical, important but not critical, not important), in order to focus on those that provide the best evidence for decisions on effectiveness. Outcomes should be paired with effect measures for which data is feasible to collect, and that have clearly defined definitions of what constitutes clinical and patient-centered “improvements.” Such consistency would allow for pooling of studies, increasing the power to conclude that findings are both statistically significant and reasonably consistent.

  • Collecting more information about the safety of IV ERT and further define its effectiveness in improving short- and long-term outcomes.

  • Engaging the support of families and patients for longer-term follow-up studies. A robust long-term follow-up of individuals in the large RCT17, 19 could help resolve important knowledge gaps by collecting rates of hospitalizations, frequency of infections, and other documented outcomes.

  • Strengthening and expanding registries such as the Hunter Outcome Survey, using other registries (e.g., Cystic Fibrosis Patient Registry) as models for a curated collection of clinical, biochemical, and molecular data.

  • Further investigating the benefits and harms of initiating IV ERT soon after birth as part of formal research trials in patients diagnosed early (e.g., family history). Pilot trials should be of sufficient scale to adequately power research questions.

  • Reconsidering whether early HSCT (with or without adjunctive ERT) may now be a viable treatment option.

Conclusions

The evidence on intravenous ERT presented has limitations, but provides some support for clinical guideline development on the treatment utility of Elaprase at 0.5 mg/kg/week in MPSII patients. The effects of key variables such as age and phenotype are known for several outcomes, but not for others. Important knowledge gaps for ERT-treated patients are data on long-term outcomes, practical measures of progression (e.g., frequency/severity of infections, need for ventilation support), benefits and harms of early treatment, and patient-centered outcomes (e.g., function, pain, quality of life). An important need is consensus on selection of critical outcomes and measures to evaluate treatment effectiveness, with clear definitions regarding what constitutes “improvement.”

With limited key questions and strict focus on critical outcomes, standard SER methods can be effective in developing evidence to support clinical guidelines for treatment of rare diseases. The number of RCTs is likely to remain low but they are achievable. Observational studies and pragmatic trials can also contribute to knowledge of the impact of treatments, particularly in the context of health-care systems.63