The complexity of multiple sclerosis (MS) treatment means that doctors and decision-makers need the best available evidence to make the best decisions for patient care. Randomized controlled trials (RCTs) are accepted as the gold standard for assessing the efficacy and safety of any new drug, but conclusions of these trials do not always aid in daily decision-making processes. Indeed, RCTs are usually conducted in ideal conditions, so can measure efficacy only in restricted and unrepresentative populations. In the past decade, a growing number of MS databases and registries have started to produce long-term outcome data from large cohorts of patients with MS treated with disease-modifying therapies in real-world settings. Such observational studies are addressing issues that are otherwise difficult or impossible to study. In this Review, we focus on the most recently published observational studies designed to identify predictors of poor outcome and treatment response or failure, and to evaluate the relative and long-term effectiveness of currently used MS treatments. We also outline the statistical approaches that are most commonly used to reduce bias and limitations in these studies, and the challenges associated with the use of 'big MS data' to facilitate the implementation of personalized medicine in MS.
The repertoire of disease-modifying therapies for relapsing–remitting multiple sclerosis (MS) has broadened greatly in the past decade
Evidence-based recommendations from randomized clinical trials are insufficient to guide choices between most available MS drugs
The combination of increasing worldwide availability of and access to large MS registries and databases and the growing ability to share and analyse large datasets is enabling real-world observational studies to be conducted
Observational real-world studies are providing insights into predictors of MS treatment response, comparative effectiveness of disease-modifying therapies, and long-term treatment effectiveness that is useful for directing daily clinical practice
Several new statistical methods are available, and continue to evolve, to minimize biases and limitations of real-world observational studies, thereby optimizing their validity and reliability
In future, datasets from individual MS databases and registries should be aggregated into big data algorithms to develop new tools that will enable the implementation of personalized medicine
The past decade has seen extraordinary progress in the treatment of relapsing–remitting multiple sclerosis (RRMS). To date, 12 disease-modifying therapies (DMTs) have been approved for RRMS on the basis of their efficacy in randomized controlled trials (RCTs). Nevertheless, uncertainty in daily treatment decisions remains common because although RCTs can establish the efficacy of an intervention under ideal conditions1,2, they do not necessarily provide the best indication of its effectiveness in real-world practice.
The patients who are included in RCTs are not representative of real-world patient populations, mainly because these trials have to comply with restrictions in terms of participant age and comorbidities, and require participants to have high levels of disease activity at baseline. The limited follow-up periods of RCTs also make them unsuitable for evaluating long-term efficacy and safety outcomes of treatment3,4. Furthermore, RCTs are not routinely used to compare different drugs5 so provide no evidence to guide the choice between most drugs that are available for RRMS, and cannot establish the therapeutic value of most new drugs beyond that of existing drugs6. The increasing worldwide availability of and access to several large MS cohort studies and registries7,8,9,10,11,12,13 combined with a growing ability to collect, share and analyse large amounts of data14 are facilitating real-world observational studies that offer many advantages over RCTs15,16.
Real-world observational studies include large populations of patients who might benefit from a given treatment, as well as subgroups that are not typically included in initial RCTs. In addition, the follow-up periods in real-world observational studies can be sufficiently long to assess delayed risks and long-term benefits of a drug and treatment combinations and/or sequences. These studies also enable comparisons of effectiveness and safety among the increasing number of DMTs for MS, and the characterization of prognostic subgroups of patients with a view to developing personalized treatment strategies. Finally, observational studies are invaluable for capturing patient-reported and patient-centred outcomes that are expected to become increasingly important criteria for regulatory and pricing and reimbursement decisions17. Today more than ever, therefore, patients, physicians, industry and policy makers have an active interest in real-world observational studies, as they have the potential to answer the questions that are most relevant to daily treatment decision-making.
Despite the many potential benefits of real-world observational studies, they are subject to biases that must not be overlooked (Table 1). Advances in statistical methodology have mitigated the impacts of these biases, however, (Table 1) and the findings of rigorous observational studies correspond well with those of clinical trials across the medical literature18.
Here, we first review the statistical methods that are most commonly used to reduce bias in observational studies before providing an overview of real-world observational studies published in the past 10 years that were based on data acquired from large MS cohort studies and registries, and which we consider to provide useful information for directing MS intervention strategies in daily practice. In particular, we focus on studies that have aimed to predict poor outcomes of MS and treatment responses or failure, and to compare the long-term effectiveness and safety of DMTs in current use. Finally, we consider the future challenges in using so-called big MS data.
Methodology of real-world studies
Observational studies have great potential for real-world comparisons of treatment efficacy and long-term outcomes, but come with a variety of analytical challenges3,4. Tracking large cohorts brings the advantages of power and generalizability, but these advantages can be offset by confounding and bias that result from differences in prognostic correlates between comparison groups of interest. In clinical trials, randomization balances treatment arms to avoid confounding, especially as a result of factors that are not measurable, whereas uncorrected bias in observational cohorts can limit the ability to attribute an observed difference to the treatment rather than selection effects, even within large data sets. However, a number of continuously evolving methodological and statistical strategies can, when appropriately applied to a sufficiently large data set, minimize the impact of biases and limitations, leading to more-reliable and replicable estimates of effects (Table 1). Some of these methods are discussed in this section.
Propensity score adjustment. An increasingly popular statistical approach to observational studies is propensity score adjustment via matching, stratification or weighting19,20,21,22. Propensity score adjustment has become prominent in the medical literature over the past 15–20 years, but before that time had been widely used in health economics23,24,25,26,27 and the analysis of large administrative health care and claims data sets28.
Propensity score adjustment involves matching a group of interest — typically a treatment cohort — to one or more comparator groups on the basis of similarities in patient, disease and paraclinical factors at baseline. In MS, these factors are often prognostic correlates of typical study end-points, such as relapse rate or disability progression. The propensity score is calculated for each patient using a multivariate model (typically a logistic regression), in which treatment assignment is the dependent variable, defined as a function of the baseline covariates which are the independent variables. The score for each patient estimates the probability of being treated or untreated according to baseline characteristics. Once the propensity score has been estimated, it can be used to create matched cohorts or to adjust the comparison of the primary outcome between treatment arms.
The ability to correct imbalances between treatment arms, at least with respect to the variables used to derive the propensity score, makes propensity score adjustment an attractive option for head-to-head treatment comparisons in real-world observational cohorts. Furthermore, when applied to 'big data', propensity score adjustments can be used to expand conventional two-way treatment comparisons into multi-arm comparisons that are better suited to analysis of the increasing range of treatment options available in the modern MS treatment era.
Unlike randomization in clinical trials, propensity score adjustment does not account for unmeasured confounding. Methods are available for estimating the minimum effect size of an unmeasured confounder that would indicate that apparent treatment effects are in fact attributable to selection effects29,30, but this approach is not equivalent to adjusting the analysis for imbalance. Indeed, all methods used to quantify, identify and, where possible, correct for bias in observational studies require complete or near-complete data collection, underlining the importance of recording all factors relevant to disease and treatment in any big MS data repository or registry-based cohort. Furthermore, a propensity score adjustment to balance comparator treatment arms is typically implemented at a single point in time, such as treatment initiation; as a result, the technique ensures a good balance of prognostic correlates at baseline, but does not control for systematic differences introduced into the sample after enrolment.
Marginal structural modelling. Recent developments in marginal structural modelling (MSM) provide one potential solution to the problem of simultaneously managing time-invariant (baseline only) and time-varying confounding, particularly in long-term (>5-year) observational MS cohorts31,32,33. MSM is an extension of conventional propensity score adjustment in which treatment comparison groups are adjusted for systematic differences in critical confounders at baseline and during the observation period. This approach can more effectively manage overall confounding and, consequently, more effectively isolate true treatment effects.
MSM can be extended to model relapse and progression outcomes across complex MS treatment trajectories, in which patients might switch between multiple therapies over time as a result of insufficient efficacy, tolerability issues or reasons of convenience. Like multi-arm propensity score adjustments, MSM requires a sufficiently large sample for optimal performance.
Use of 'big MS data'. Use of very large data sets from MS databases and registries — 'big MS data' — can overcome some of the statistical limitations of real-world observational studies. For example, although adjustment methods might perform poorly in insufficiently powered samples, big MS data sets — whether standalone or a combination of data from multiple sources — can be used to test these adjustment methods and validate their suitability for application to MS data. Big MS data can also be used to reliably identify early predictors of poor outcomes, which can then be used in the development of tools such as risk calculators and prognostic nomograms that can identify subsets of patients who would benefit most from timely intervention33.
Moreover, researchers have recently begun to use the tools that are commonly used in meta-analysis of clinical trials to identify and adjust for their heterogeneity as a means of dealing with differences between combined data from different sources when developing risk prediction tools. For example in one study, large data sets created by merging data from multiple sources and patient cohorts were used to successfully validate a new framework approach that combines meta-analytical heterogeneity adjustment with conventional multivariate regression for developing clinical prediction models34. This approach has not been used in MS, but has been successfully applied and validated for comparable compilations of observational cohorts in big oncology and cardiology data sets35.
Large data sets can be further exploited to identify treatment responsive subpopulations of patients with MS. In one study, for example, a least absolute shrinkage and selection operator (LASSO) procedure was used to develop a parametric scoring system based on multiple patient and disease factors at baseline, a system that could be used for estimating differences in treatment response between groups of subjects36. Not only can such an approach be useful to the clinician for identifying subsets of patients who might benefit most from a particular treatment, it can also guide patient selection for clinical trials by providing a systematic and data-driven approach to identifying subgroups of patients that are likely to respond well to a treatment, either new or established.
Finally, attempts are being made to combine grouped data from RCTs with individual patient data from observational studies, and one study has presented a unified modelling framework for this purpose37. Rather than simply pooling the evidence into an overall estimate of treatment effect after adjusting for potential confounding, the intention of this approach is to explore treatment effects in specific patient populations in the observational cohorts. Data from individual patients from observational cohorts can be used to build prediction models and to identify groups of patients with different levels of risk. A bivariate random effect model can then be used to combine baseline risk and treatment effects from observational cohorts and aggregated clinical trial data. This approach, which is based on a Bayesian hierarchical metaregression model, combines submodels that represent different types of data: that from individual patients and that aggregated from clinical trials.
Maximizing data quality
In multicentre and multinational observational cohort studies and disease registries, ensuring high data quality is a major challenge14. Data from the real world are inevitably less complete and possibly less accurate than data from controlled trials, so quality control is important. Good-quality studies should meet certain for the evidence they provide to be considered reliable38 (Box 1).
Data standardization. The most fundamental requirement for maximizing data quality is that all centres and participants involved in a multicentre observational study use the same minimum data set based on the same definitions, with an appropriate training framework in place for people who record the data39. Clinical trials in MS conducted over the past 20 years have helped to promote standardization of basic aspects, such as diagnostic criteria, disease course and definition of attacks, and of short-term outcomes, such as attack rates, disability and MRI parameters. However, consensus is still lacking regarding the choice of instruments for measuring health-related quality of life and patient- reported outcomes, and how to best define and assess long-term outcomes in MS. Improved consensus on these issues is highly warranted, and likely to contribute to evaluation of long-term benefits of current treatments and the development of successful treatments for progressive MS.
Data collection. Regardless of the data that are collected, several measures can be taken during the collection process to maximize its quality. First, a computerized, rather than paper-based, collection system should ideally be used to allow direct entry of data locally or online40, as double handling of data (for example, a clinician recording data on paper and office staff transcribing these data into a database) and the logistic complexity of using paper forms are known to double the error rate39. In addition, basic data quality checks can easily be built into data collection software: systems can be programmed so that they will not accept or will query nonsensical or improbable data entries. Automated data quality checks can also be exploited to identify gaps in data collection41. Addition of real-time data analysis tools, such as calculators of likely treatment responses, to data collection software can also enhance the value of engaging with the software for the clinician, thereby encouraging complete data collection.
As data collection ideally takes place in real-time, during consultations, the size of the minimum data set should be kept small. Within the minimum data set, however, core outcome variables should always be recorded to ensure that the absence, and not just the presence, of a given event is recorded. This approach avoids the situation that can arise if only positive events are spontaneously reported; in this case, the 'not reported' category can include absent events and events that occurred but were not reported, with no provision made to distinguish between them.
In multicentre studies, the quality of data from each participating centre should be assessed, and each centre should be able to access their own data quality 'score' and ranking in relation to other centres; this approach would enable identification of systematic gaps in data capture and stimulate friendly competition, thereby driving improvements in data collection42. The presence of data quality metadata (for example, a measure of visit frequency, or of the proportion of records with a complete minimum data set) can also be useful in data analytics, in which they can serve as either covariates or prespecified inclusion or exclusion criteria for analyses or studies; for example, centres that report statistically outlying relapse rates or low visit frequencies could be excluded)43.
Policy and legislation. All large databases and registries are the result of long-standing collaborations between clinical and academic units, and other organizations. Each such network has an elaborate set of rules that govern data ownership and procedures for data export, and which must be respected. In addition, although data merging involves anonymized or pseudonymized data, patient consent is usually needed for scientific use of data, and scientific projects have to be cleared by ethics review boards. In observational studies, it is desirable for patient informed consent to allow a priori use of the pseudonymized data sets for all descriptive analyses and models that use this data set, so that specific analyses do not have to be individually approved. This type of blanket ethical approval and consent minimizes selection bias because, often, the research questions are unknown or unspecified at the time of data collection.
Real-world observational studies in MS
Disease progression and treatment response
The ability to efficiently predict responses to drug treatments before treatment starts would help with matching the right drug to the right patient, thereby minimizing the risk to benefit ratio and optimizing cost-effectiveness of treatment. In a long-lasting disease, such MS, observational studies are crucial for identifying early predictors of poor long-term outcomes44,45 and for gathering information about the safety and efficacy of treatments in patients with comorbidities3,46. Furthermore, the ability to accurately predict mid-term to long-term disability outcomes at various stages of disease is crucial for providing individualized treatment counselling.
Prediction of progression in CIS. Several studies have aimed to identify predictive factors in patients with clinically isolated syndrome (CIS) and indicate that early treatment is beneficial. In one large, single-centre, prospective observational study44, possible risk factors for conversion from clinically isolated syndrome (CIS) to a diagnosis of MS and accumulation of disability were evaluated in 1,058 patients who were followed-up for a mean of 6.5 years. MRI features (number and topography of lesions) were identified as the main prognostic factor at disease onset; demographic characteristics (age and gender) and the presence of oligoclonal bands restricted to the cerebrospinal fluid (CSF) emerged as low-impact and medium-impact prognostic factors, respectively. This analysis also suggested that initiation of a DMT before a second attack can reduce the risk of disability accumulation. The benefit of early and sustained treatment in reducing the risk of disability progression was also found in a larger multicentre observational study of a cohort of ∼2,000 patients in the MSBase registry with CIS and early MS45, and in a previous observational study of 2,260 patients from the Italian MS registry with definite MS47.
Definite relapsing–remitting MS. Several studies of patients with RRMS have assessed whether clinical activity (relapses and disability progression, defined as a change in Expanded Disability Status Scale (EDSS) score) and/or MRI activity (new gadolinium-enhancing lesions and/or new or enlarging T2-weighted lesions compared with baseline) are valid and early predictors of non-response to DMTs48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63. One study analysed the predictive value of the presence of at least one clinical relapse or disability progression (an increase of 1 in EDSS score confirmed at 6 months) and active MRI lesions (at least three new T2-weighted or gadolinium-enhancing lesions) in patients with RRMS who were treated with IFN-β49. Patients who were positive for at least two of the three criteria after 1 year of treatment had a higher probability of experiencing disability progression or relapse activity during the subsequent 2 years. The isolated presence of relapses or MRI activity, however, did not predict the risk of new clinical activity or disease progression. Development of the so-called modified Rio Score was prompted by the study above56. This score takes into consideration the combination of relapses (one or more than one) and substantial MRI activity (>5 new T2-weighted lesions) over the first year of IFN-β treatment.
More recently, the MRI in MS (MAGNIMS) network studied a large multicentre data set and found that the presence of relapses during the first year of IFN-β therapy was the main predictor of the risk of disability progression over the subsequent 3 years59. Moreover, isolated MRI activity, defined as the presence of at least three new T2-weighted lesions, also predicted an increase in disability. MRI activity alone during the first year of IFN-β treatment has also been associated with a poor clinical outcome in other studies52,53,57,58,59,61, although the exact cutoff associated with a poor outcome has differed between studies57,58,59,61,64.
The fact that different criteria (Table 2) have been proposed to define a suboptimal treatment response, however, highlights the controversy about the degree of clinical or MRI activity required to define a patient as a nonresponder and consequently switch to another treatment. These conflicting results might be explained by confounding factors, such as the timing of the reference scan in relation to initiation of treatment, the ongoing disease activity before the drug became effective, the lack of strict repositioning techniques when re-scanning and interobserver variability55,63.
As therapy becomes more effective, so-called no evidence of disease activity (NEDA) is increasingly being considered as a possible treatment goal. NEDA-3 is a composite score based on a 'zero tolerance' concept and is defined as no relapses, no sustained disability progression (measured with the EDSS), and no new or enlarging T2-weighted or T1 gadolinium-enhancing lesions detected with MRI. The definition of NEDA-3 is evolving to include brain atrophy (NEDA-4)65,66, as a meta-analysis demonstrated that the combination of focal T2-weighted lesion load over 2 years and whole brain volume loss in the second year explained 75% of the variance of disability progression among patients who were on DMTs over a period of 2 years (a greater proportion than that produced by each of these measures alone)67. In future, other metrics, such as patient-reported outcome measures or fluid biomarkers (for example, cerebrospinal fluid neurofilament levels) could also be incorporated into the definition of NEDA68,69. However, whether NEDA persists in patients who achieve it, and whether it accurately predicts long-term prognosis, is currently controversial70.
Clinical insights. Despite the uncertainties left by the observational studies in CIS and definite RRMS, they provide useful information for directing MS intervention strategies in daily practice. First, they have confirmed findings of RCTs that DMTs should be initiated in patients with CIS or early MS to maximize their impact on disease evolution71,72,73,74,75,76,77,78. In this context, the number and topography of brain MRI lesions are the best currently available prognostic factor for guiding treatment choice. Second, they have demonstrated that the combination of clinical relapses and MRI activity seems to be the best tool for defining nonresponders to DMTs, whereas the relevance of minimal MRI activity alone is controversial. In real-world settings, minimal degree of disease activity (MEDA) might be a more realistic goal than NEDA. An important weakness of these observational studies, however, is that the vast majority are based on patients receiving IFN-β treatment, and similar data from cohorts of patients receiving other DMTs are scarce55,79,80. This shortfall highlights the great need for large data sets of real-world data with long-term follow-up periods for each therapy.
Comparison of disease-modifying therapies
Logistical and ethical concerns mean that RCTs cannot provide evidence about the comparative efficacy of DMTs. In the past decade, several observational studies that were based on data sets from large MS cohort studies or registries and used high-quality statistical approaches have addressed some of the most crucial questions about relative effectiveness of MS drugs in current use.
DMTs in treatment-naive MS patients. First-line injectable immunomodulators for MS are glatiramer acetate, subcutaneous IFN-β1a, intramuscular IFN-β1a, and intramuscular IFN-β1b (known collectively as BRACE, an acronym of their trade names). A network meta-analysis of data from RCTs has suggested that effectiveness in controlling relapse activity differs between BRACE therapies, when used in treatment-naive patients81. In a real-world observational study82, six head-to-head comparisons of the four injectable therapies were performed, using propensity score matching and paired mixed models83, in a population of 3,326 treatment-naive patients with RRMS from the MSBase registry. A slightly lower relapse rate was observed among patients who were treated with glatiramer acetate or subcutaneous IFN-β1a than among patients who were treated with intramuscular IFN-β1a and IFN-β1b, in keeping with the results of most of the clinical trials. However, no discernible difference in disability outcomes was seen at the end of the 12-month follow-up period. The study design controlled for indication, selection and attrition bias, and unidentified confounders.
Unsurprisingly, use of real-world data to compare BRACE therapies with second-line therapies84 revealed marked differences in their relative effectiveness in treatment-naive patients with active RRMS (≥1 gadolinium-enhancing lesion on cerebral MRI at baseline or ≥1 relapse within the 12 months before baseline). A propensity score-matched analysis85 (366 patients per group) revealed superiority of natalizumab over BRACE therapies, with a difference of 0.4 relapses per year between the groups84. The difference in confirmed disability progression, however, was not significant. This study provided Class IV evidence that first-line use of natalizumab rather than IFN-β or glatiramer acetate for RRMS improves relapse outcomes.
The results from these studies (Table 3) indicate that the differences between BRACE therapies have only limited clinical relevance to reduction of disease activity, and no discernible relevance to short-term or mid-term disability outcomes. Moreover, they suggest that more-effective but riskier therapeutic strategies86, such as use of natalizumab, should be considered for treatment-naive patients with high levels of pretreatment disease activity.
DMTs after treatment failure. MS relapses are associated with accumulation of disability87,88,89, and on-treatment relapses herald a particularly poor prognosis in this respect51. Optimization of immunomodulatory therapy in patients with suboptimal disease control is, therefore, likely to improve long-term disease outcomes. Evidence from RCTs suggests that escalation to second-line DMTs (natalizumab or fingolimod) after treatment failure of first-line BRACE therapies is likely to provide better control of MS activity than switching therapy to a second drug in the same class90,91. Several real-world observational studies (Table 4) that used data from MS registries with propensity score matching analysis techniques have directly compared the effects of treatment switching and treatment escalation.
Two of these studies showed that in patients who had experienced relapses and/or progression of disability during treatment with injectable therapies, escalation of treatment to natalizumab or fingolimod reduced relapses by 0.38 and 0.1 relapses per year, respectively, and the risk of disability progression by 26% and 47%, respectively, compared with switching to another injectable therapy92,93. In both studies, measures of relapse and disability were compared between propensity score-matched treatment arms (869 patients on natalizumab versus 869 on BRACE therapies, and 148 patients on fingolimod versus 379 on BRACE therapies); these comparisons were performed using a Cox Marginal Model and negative binomial models or frailty proportional hazards models adjusted for magnetic resonance imaging variables94. Sensitivity analyses confirmed that the results were sufficiently stable under different inclusion and matching criteria and unlikely to be influenced by unmeasured factors.
Observational studies have also provided direct comparisons of the benefits of escalation to natalizumab or fingolimod. In two direct head-to-head comparisons in patients with RRMS95,96 who did not respond to first-line therapies, escalation to natalizumab resulted in a lower relapse rate (by 0.2 relapses per year)95,96 and a higher probability of disability regression (by 180%)95, than did escalation to fingolimod. Both studies involved large, prospectively acquired cohorts (578 and 204 patients, respectively), and propensity score matching was used to select subpopulations with comparable baseline characteristics. Moreover, in the second study96, in the group that received natalizumab, a lower percentage of patients exhibited MRI activity, and a higher percentage of patients achieved NEDA-3 after 2 years of follow-up.
Natalizumab also proved superior to fingolimod in a French multicentre cohort study that involved a broad spectrum of patients with RRMS who exhibited EDSS scores of 0–5.5, a wide range of disease durations and variable prior disease activity (during first-line therapy in 88,5%)97. A brain MRI performed within the year before treatment initiation was available for all patients. The study included 326 patients who received natalizumab and 303 patients who received fingolimod. Two different methods were used for statistical analysis: logistic regression and propensity score inverse probability of treatment weighting. In the natalizumab-treated group, ∼10% more patients remained relapse-free over the initial 2 years of treatment than in the fingolimod-treated group, and 20% more patients exhibited no radiological signs of MS activity (gadolinium-enhancing lesions or new T2-weighted lesions). This study provides Class IV evidence that for patients with RRMS, natalizumab decreases the proportion of patients who experience at least one relapse within the first year of treatment to a greater extent than does fingolimod.
By contrast, a study from the Danish MS Treatment Register98 found no difference between the effects of natalizumab and fingolimod on clinical disease activity in patients with pretreatment relapse activity that was lower than in the studies above. The two treatment arms (464 patients in each treatment group) were propensity score matched by baseline covariates.
Interestingly, one study has shown that when switching from an injectable therapy to oral agents for reasons of convenience rather than failure of the injectable (that is, in patients with stable disease on the injectable therapy), no short-term improvement was observed in the control of clinical disease activity99. In this study, 396 patients who switched to oral agents were propensity score matched to 396 patients who remained on BRACE therapies. The proportion of patients who experienced at least one relapse in the first 1–6 months did not differ between the treatment arms, and neither did the mean annualized relapse rate or the rate of disability progression.
In combination, these observational studies of relative effectiveness suggest that, in clinical practice, for patients with first-line treatment failure, escalation therapy should be considered rather than switching to another BRACE therapy. They also show that differences between second-line therapies (natalizumab and fingolimod) are more apparent in patients with highly active disease, such as those with prior on-treatment breakthrough relapses than in patients with relatively low prior disease activity.
DMTs after treatment discontinuation. Natalizumab treatment, although very effective (particularly in patients with highly active RRMS), is associated with an increased risk of progressive multifocal leukoencephalopathy (PML)100. This risk is high in patients who are seropositive for anti-JC virus (JCV) antibodies, and seems to increase with longer treatment exposure. For this reason, patients who are receiving natalizumab are routinely screened for anti-JCV antibodies, and many patients with stable RRMS have to interrupt natalizumab treatment suddenly owing to a positive JCV status. Several observational studies have been conducted to evaluate the risk of RRMS reactivation after discontinuation of natalizumab101,102,103,104, most of which have confirmed a high risk101,102,103. No clear consensus has been reached on how to manage natalizumab discontinuation in order to reduce this risk. To address this issue, several large observational studies have evaluated different switching strategies102,103,105,106,107.
The Italian prospective observational study TY-STOP105 proved continuation of natalizumab to be superior to switching to a first-line BRACE therapy. Three studies have assessed whether switching to fingolimod can prevent disease reactivation102,103,106. A French cohort study102 across 36 tertiary referral centres, assessed the occurrence of MS relapses in 333 patients who made a planned switch from natalizumab to fingolimod. The risk of relapse was evaluated during the washout period and during the first 6 months of fingolimod therapy. 27% of patients experienced a relapse during the washout period, and 20% experienced a relapse during the first 6 months of fingolimod therapy. Multivariate analyses showed that the occurrence of a relapse during the washout period was the only significant predictor of a relapse during fingolimod therapy. A washout period of <3 months was associated with a lower risk of relapse.
Similarly, a cohort study106 based on data from the MSBase registry used adjusted binomial regression to compare the risk of relapse in three different patient groups who commenced fingolimod: patients who switched from natalizumab (89 patients, usually to mitigate the risk of PML), patients who switched from BRACE therapies (350 patients), and patients who were naive to treatment (97 patients). The median follow-up period was 10 months. This study showed that the number of relapses in the 6 months before initiation of fingolimod, and a washout period of 2–4 months rather than no washout period, were independent predictors of the time to first relapse after initiation of fingolimod.
An Italian head-to-head study103 used a propensity score matching method108 to demonstrate that switching to fingolimod is superior to switching to BRACE therapies for controlling the risk of relapses after discontinuation of natalizumab. This study involved a prospectively acquired cohort of 433 patients with RRMS. A Poisson regression analysis showed a 64% reduction in relapse incidence with fingolimod. Patients who switched to fingolimod or BRACE were also matched for propensity score on a one-to-one basis at the switching date. Switching to fingolimod was associated with a 48% reduction in the risk of relapse in comparison with switching to BRACE. Sensitivity analyses of subgroups with different washout durations (less than or more than 3 months) confirmed a lower risk of relapses in patients who switched to fingolimod than in patients who switched to BRACE therapies (47% and 59% risk reduction, respectively). The strongest independent factors that influenced the relapse risk after initiation of fingolimod were a washout duration >3 months, the number of relapses experienced during and before natalizumab treatment, and the presence of comorbidities (for example, thyroid dysfunction, allergy, headache and other). The time to 3-month confirmed disability progression, however, did not differ between the two groups.
In a subsequent Swedish cohort study107, the efficacy of rituximab was compared with that of fingolimod in patients with stable RRMS who had to switch from natalizumab owing to anti-JCV antibody positivity. Rituximab was more effective than fingolimod in reducing relapse frequency (a difference of 0.14 relapses per year) and the proportion of patients with gadolinium-enhancing MRI lesions (a difference of 23%). Multivariate logistic and Cox proportional hazards models were used to adjust for possible confounders, including age, sex, disability status, time on natalizumab, washout time, follow-up time and study centre. These models showed a 99% reduction in the risk of gadolinium-enhancing lesions and a 91% reduction in the risk of first relapse in rituximab-treated patients.
Taken together, these studies (Table 5) suggest that highly effective therapies should be considered for patients who are at high risk of disease reactivation owing to discontinuation of natalizumab, and that a washout duration that lasts >1–3 months is not acceptable in clinical practice. The results of these real-world observational studies are closely aligned with the outcomes that would be expected according to the relative effectiveness of the drugs identified in phase III RCTs. This alignment strongly suggests that the techniques used in the observational studies to reduce indication bias — particularly propensity score matching — are highly successful.
Long-term treatment effectiveness
The most important goal of MS therapy is to prevent or delay the development of long-term disability. Over the past decade, many real-world observational studies have been conducted in attempts to assess the long-term effects of the first approved BRACE therapies for the treatment of RRMS patients.
The first observational study to address this issue109 used inverse probability of treatment weighting propensity score-adjusted Cox regression models110 to show that, over the median follow-up period of 5.7 years, severe disability milestones (EDSS scores of 4.0 and 6.0) and conversion to secondary progressive MS were significantly delayed (1.7, 2.2 and 3.8 years, respectively) in patients who received IFN-β relative to those who were untreated.
The results of this study were confirmed in several subsequent real-world observational studies. An Italian study111, in which a Bayesian approach was used112 to adjust comparison groups for treatment imbalances, demonstrated that conversion to secondary progressive MS was delayed in patients who were treated with IFN-β or glatiramer acetate compared with those who were untreated. A Swedish study113 in which time-dependent Cox regression analysis was used to compare patients who received IFN-β (730 patients) with an historical cohort of untreated patients (186 patients) showed a trend towards a positive effect of first-line injectable therapy versus no treatment. A British Colombia cohort study that used time-dependent Cox regression analysis114 revealed a trend towards a positive benefit of IFN-β treatment on disability progression when treated patients were compared with an historical untreated cohort, but an opposite trend — towards inferiority — when treated patients were compared with a contemporary untreated cohort. The UK Multiple Sclerosis Risk Sharing Scheme115 assessed the long-term effectiveness and cost-effectiveness of IFN-β treatment. The cohort that received treatment was compared with an untreated cohort from British Columbia using two models: a continuous Markov model and a multilevel model116,117. Both models revealed that treated patients progressed more slowly on the EDSS than did untreated controls over a 6-year period, and cost–utility ratios were consistent with cost-effectiveness. Most recently, median quantile regression analysis of data from the MSBase registry has been used to evaluate the long-term effect of BRACE therapies in 2,466 patients over a follow–up period of at least 10 years. The results confirmed that BRACE therapies were independently associated with smaller increases in EDSS scores over 10 years51.
Risk of bias
No real-world observational studies are free from criticism3,4, highlighting the difficulty of eliminating biases (Table 1), even with rigorous statistical analysis. The most likely sources of bias are the choice of control groups and the way in which the treated and untreated groups are assembled. In concurrent cohort studies, selection bias can lead to untreated control groups having better prognoses than treated groups51,109,111,114, whereas in a design that includes an historical control cohort, selection bias can amplify the treatment effect, favouring the treated groups113,115. Indeed, patients among historical control groups are likely to have had a worse prognosis than that of modern patients, owing to improvements in diagnostic technologies that enable earlier diagnosis, improvements in supportive care over time, and an increase in the diagnosis of patients with a benign course. An example of the opposing biases — one that masks the treatment effect and one that amplifies it, can be seen in the British Colombia cohort study discussed above114.
Another potential shortfall of real-world observational studies is immortal time bias, which can arise when the period between cohort entry and initial exposure to a drug (during which the outcome of interest has not occurred) is excluded from the analysis. The first observational study of IFN-β treatment described above109 was criticized for being at risk of immortal time bias, although sensitivity analyses that addressed the potential for residual bias as a result of unmeasured confounders showed the results to be stable under different scenarios. Despite these limitations, the majority of observational studies51,109,111,113 of long-term treatment outcomes conducted to date have consistently demonstrated that early and sustained treatment reduces the risk of long-term disability progression in RRMS.
An important but as yet unresolved clinical challenge is the amelioration of disability accrual in progressive MS phenotypes. Potential reasons for the lack of evidence that DMTs are effective in progressive MS include that fact that RCTs have relatively short follow-up periods (typically 2–3 years) and the fact that we have no universal definition of progressive MS, especially secondary progressive MS. Use of large MS data sets holds the promise of addressing these limitations by providing long-term clinical follow-up data, defining reliable disability outcomes118, and enabling development of a unified clinical definition of secondary progressive MS that is informed by data119. Large numbers of patients with secondary progressive MS will continue to receive DMTs in clinical practice, thereby presenting the opportunity for comparisons of treatment effects in secondary progressive MS.
Safety of DMTs
Rare, delayed and/or long-term adverse effects of new drugs for the treatment of MS are unlikely to be established during most RCTs. The challenges involved in monitoring drugs once approved are numerous, but many questions are being raised about the efficacy and efficiency of currently used tools for detection of adverse events (largely spontaneous reporting of adverse events).
In the past 5–10 years, several national and international drug registries have been implemented to monitor the safety of new MS drugs120,121,122,123,124 (Box 2). In future, greater use of MS disease registries for post-marketing drug safety assessment would be desirable. Unlike drug registries, large disease registries can include information not only about products and procedures, but also about patients who receive different treatments or no treatment for the same clinical indications, enabling better evaluation of adverse event rates, consequences of long-term use, and/or effects of various combinations and sequences of treatments. Moreover, use of disease registries could provide a better understanding of how comorbidity alters the effectiveness and safety of DMTs125.
Potential for development
As discussed above, many questions that RCTs are unable to answer can be addressed by studies based on real-world data. However, despite the fact that cohort studies and registries typically include considerable numbers of patients, analyses are often limited by poor statistical power owing to insufficient numbers of patients. The small numbers are explained by several factors: the large number of DMTs available for MS and the resulting large number of treatment combinations; the need to correct or match for many variables to avoid bias; and the frequent problem of missing data as a result of centres that do not pass quality control steps. One obvious solution to this problem is to merge data from several sources.
Over the past 3 years, five leading MS databases and registries have been working together to explore opportunities for data sharing. Combined, the registries collect longitudinal data on >150,000 patients with MS from the Italian, Danish and Swedish MS registries, the French Observatoire Française de Sclérose en Plaque network and the international MSBase. To date, this so-called BigMSData group has identified and agreed on a minimal set of parameters and initiated three pilot projects with joint data. The group expects to prove that data sharing between MS databases and registries is both possible and scientifically rewarding.
Moreover, MS databases and registries are expected to gradually merge with other data sources, such as administrative databases (for example, records of hospital admissions, deaths, social security, employment and income), MRI databases and gene sequencing data sets, allowing big data algorithms to be built for multiple research purposes. In the near future, socioeconomic parameters, patient-reported outcomes, quantitative MRI metrics (for example, volumetric MRI and automated detection of new T2-weighted lesions) and biomarkers (including genetic and neurodegenerative markers, such as neurofilament levels in cerebrospinal fluid) are likely to be integrated into clinical practice at large MS centres. Analyses of such large data sets will have a crucial role in the development of new healthcare innovations126, will help to improve the efficiency of research and clinical trials, and will contribute to the development of new tools that will enable physicians to deliver personalized medicine.
Pragmatic trials, or point-of-care trials (PCTs), are an alternative to RCTs, and raising awareness of their potential within the scientific community and among practitioners, policy makers and patients could help to maximize the benefits of real-world data. PCTs are effectively RCTs that are embedded into usual care, so have the advantage of being conducted in real-world settings while retaining the strengths of the experimental approach, including randomization127,128,129. In PCTs, comparators are active treatments of either the same or a different class as the treatment of interest. In relation to RCTs, PCTs also involve broader inclusion criteria, include larger and more diverse patient populations, and consider a wider spectrum of clinically relevant health outcomes, most of which are patient-oriented. As a result, PCTs overcome many limitations of both RCTs and nonrandomized observational studies, producing results that can be generalized and applied in routine clinical practice settings130,131. Several obstacles still currently hinder PCTs (technical issues and issues related to research governance procedures), but once these have been overcome, this approach can be fully exploited to provide the best method for the real-world comparisons of alternative therapies.
Data generated from real-world observational studies in MS that are based on large clinical data sets that are collected in daily practice offer the scientific community and patients many benefits and can address issues that are difficult or impossible to study with RCTs. In the past decade, studies of this kind have provided good-quality findings that are useful for directing and improving MS intervention strategies in daily practice (Box 3). Moreover, the majority of these studies have produced findings that correlate well with those of clinical trials, providing reassurance against criticisms of their overall validity. Ongoing development and application of new statistical methods for optimizing the validity and reliability of modelling in MS cohorts while minimizing the impacts of confounding, bias and heterogeneity, is providing MS researchers with a promising set of tools that will enable more sophisticated and relevant analysis of 'big data' from MS registries worldwide.
We searched PubMed for papers published between January 2007 and June 2016 using the terms “multiple sclerosis”, “observational study”, “treatment effectiveness” and “treatment safety” in combination with “retrospective cohort study” or “prospective cohort study” and/or “registry” or “database”. The search was limited to English, full-length articles in the peer-reviewed medical literature. Reference lists of identified papers were searched for further relevant articles. Papers were selected for inclusion on the basis of their originality and relevance to the topic, impact of the journal in which they were published, and critical reading by the authors. The aim was to review the most influential work rather than all work that has been done.
- Prognostic nomograms
Graphical prediction tools designed to assess the risk of future event based on specific patient and disease characteristics.
- Least absolute shrinkage and selection operator (LASSO) procedure
A regression analysis method that enhances the prediction accuracy and interpretability of the statistical model.
- Bayesian hierarchical metaregression model
A metaregression is a meta-analysis designed to assess factors associated with the size of the treatment effect; Bayesian hierarchical modelling allows estimation of the parameters of the metaregression.
- Inverse probability of treatment weighting
A weighting method that uses propensity scores to derive a synthetic sample within which the distribution of baseline prognostic confounding variables is independent of the treatment assignment; the weight given to a patient is the inverse of the probability that he or she would receive the treatment that he or she actually did receive.
- Progressive multifocal leukoencephalopathy
A viral encephalitis caused by JC virus, predominantly involving white matter and reported in patients being treated with certain immunosuppressive and immunomodulatory therapies.
- Bayesian approach
A method of statistical inference that allows prior information about a population parameter to be combined with evidence from a sample to guide the statistical inference process.
- Continuous Markov model
A model used in economics that is based on a stochastic process with the Markov property, which defines serial dependence between adjacent periods only; the model can be used to describe systems in which the next event depends only on the current state of the system.
- Multilevel model
A statistical model in which parameters vary at more than one level. Observational studies in which many observations are made per subject include two levels of variability: the variability between subjects and the variability within each subject over time.
- Cost–utility ratios
The outcomes of cost–utility analysis, a form of financial analysis used to guide decisions. The cost–utility ratio estimates the ratio between the cost of a health-related intervention and the benefit it produces in terms of the number of years lived in full health by the beneficiaries.
About this article
Two-year real-life efficacy, tolerability and safety of dimethyl fumarate in an Italian multicentre study
Journal of Neurology (2018)