Review Article | Published:

Treatment decisions in multiple sclerosis — insights from real-world observational studies

Nature Reviews Neurology volume 13, pages 105118 (2017) | Download Citation


The complexity of multiple sclerosis (MS) treatment means that doctors and decision-makers need the best available evidence to make the best decisions for patient care. Randomized controlled trials (RCTs) are accepted as the gold standard for assessing the efficacy and safety of any new drug, but conclusions of these trials do not always aid in daily decision-making processes. Indeed, RCTs are usually conducted in ideal conditions, so can measure efficacy only in restricted and unrepresentative populations. In the past decade, a growing number of MS databases and registries have started to produce long-term outcome data from large cohorts of patients with MS treated with disease-modifying therapies in real-world settings. Such observational studies are addressing issues that are otherwise difficult or impossible to study. In this Review, we focus on the most recently published observational studies designed to identify predictors of poor outcome and treatment response or failure, and to evaluate the relative and long-term effectiveness of currently used MS treatments. We also outline the statistical approaches that are most commonly used to reduce bias and limitations in these studies, and the challenges associated with the use of 'big MS data' to facilitate the implementation of personalized medicine in MS.

Key points

  • The repertoire of disease-modifying therapies for relapsing–remitting multiple sclerosis (MS) has broadened greatly in the past decade

  • Evidence-based recommendations from randomized clinical trials are insufficient to guide choices between most available MS drugs

  • The combination of increasing worldwide availability of and access to large MS registries and databases and the growing ability to share and analyse large datasets is enabling real-world observational studies to be conducted

  • Observational real-world studies are providing insights into predictors of MS treatment response, comparative effectiveness of disease-modifying therapies, and long-term treatment effectiveness that is useful for directing daily clinical practice

  • Several new statistical methods are available, and continue to evolve, to minimize biases and limitations of real-world observational studies, thereby optimizing their validity and reliability

  • In future, datasets from individual MS databases and registries should be aggregated into big data algorithms to develop new tools that will enable the implementation of personalized medicine


The past decade has seen extraordinary progress in the treatment of relapsing–remitting multiple sclerosis (RRMS). To date, 12 disease-modifying therapies (DMTs) have been approved for RRMS on the basis of their efficacy in randomized controlled trials (RCTs). Nevertheless, uncertainty in daily treatment decisions remains common because although RCTs can establish the efficacy of an intervention under ideal conditions1,2, they do not necessarily provide the best indication of its effectiveness in real-world practice.

The patients who are included in RCTs are not representative of real-world patient populations, mainly because these trials have to comply with restrictions in terms of participant age and comorbidities, and require participants to have high levels of disease activity at baseline. The limited follow-up periods of RCTs also make them unsuitable for evaluating long-term efficacy and safety outcomes of treatment3,4. Furthermore, RCTs are not routinely used to compare different drugs5 so provide no evidence to guide the choice between most drugs that are available for RRMS, and cannot establish the therapeutic value of most new drugs beyond that of existing drugs6. The increasing worldwide availability of and access to several large MS cohort studies and registries7,8,9,10,11,12,13 combined with a growing ability to collect, share and analyse large amounts of data14 are facilitating real-world observational studies that offer many advantages over RCTs15,16.

Real-world observational studies include large populations of patients who might benefit from a given treatment, as well as subgroups that are not typically included in initial RCTs. In addition, the follow-up periods in real-world observational studies can be sufficiently long to assess delayed risks and long-term benefits of a drug and treatment combinations and/or sequences. These studies also enable comparisons of effectiveness and safety among the increasing number of DMTs for MS, and the characterization of prognostic subgroups of patients with a view to developing personalized treatment strategies. Finally, observational studies are invaluable for capturing patient-reported and patient-centred outcomes that are expected to become increasingly important criteria for regulatory and pricing and reimbursement decisions17. Today more than ever, therefore, patients, physicians, industry and policy makers have an active interest in real-world observational studies, as they have the potential to answer the questions that are most relevant to daily treatment decision-making.

Despite the many potential benefits of real-world observational studies, they are subject to biases that must not be overlooked (Table 1). Advances in statistical methodology have mitigated the impacts of these biases, however, (Table 1) and the findings of rigorous observational studies correspond well with those of clinical trials across the medical literature18.

Table 1: Methods to adjust for bias in real-world studies

Here, we first review the statistical methods that are most commonly used to reduce bias in observational studies before providing an overview of real-world observational studies published in the past 10 years that were based on data acquired from large MS cohort studies and registries, and which we consider to provide useful information for directing MS intervention strategies in daily practice. In particular, we focus on studies that have aimed to predict poor outcomes of MS and treatment responses or failure, and to compare the long-term effectiveness and safety of DMTs in current use. Finally, we consider the future challenges in using so-called big MS data.

Methodology of real-world studies

Statistical methods

Observational studies have great potential for real-world comparisons of treatment efficacy and long-term outcomes, but come with a variety of analytical challenges3,4. Tracking large cohorts brings the advantages of power and generalizability, but these advantages can be offset by confounding and bias that result from differences in prognostic correlates between comparison groups of interest. In clinical trials, randomization balances treatment arms to avoid confounding, especially as a result of factors that are not measurable, whereas uncorrected bias in observational cohorts can limit the ability to attribute an observed difference to the treatment rather than selection effects, even within large data sets. However, a number of continuously evolving methodological and statistical strategies can, when appropriately applied to a sufficiently large data set, minimize the impact of biases and limitations, leading to more-reliable and replicable estimates of effects (Table 1). Some of these methods are discussed in this section.

Propensity score adjustment. An increasingly popular statistical approach to observational studies is propensity score adjustment via matching, stratification or weighting19,20,21,22. Propensity score adjustment has become prominent in the medical literature over the past 15–20 years, but before that time had been widely used in health economics23,24,25,26,27 and the analysis of large administrative health care and claims data sets28.

Propensity score adjustment involves matching a group of interest — typically a treatment cohort — to one or more comparator groups on the basis of similarities in patient, disease and paraclinical factors at baseline. In MS, these factors are often prognostic correlates of typical study end-points, such as relapse rate or disability progression. The propensity score is calculated for each patient using a multivariate model (typically a logistic regression), in which treatment assignment is the dependent variable, defined as a function of the baseline covariates which are the independent variables. The score for each patient estimates the probability of being treated or untreated according to baseline characteristics. Once the propensity score has been estimated, it can be used to create matched cohorts or to adjust the comparison of the primary outcome between treatment arms.

The ability to correct imbalances between treatment arms, at least with respect to the variables used to derive the propensity score, makes propensity score adjustment an attractive option for head-to-head treatment comparisons in real-world observational cohorts. Furthermore, when applied to 'big data', propensity score adjustments can be used to expand conventional two-way treatment comparisons into multi-arm comparisons that are better suited to analysis of the increasing range of treatment options available in the modern MS treatment era.

Unlike randomization in clinical trials, propensity score adjustment does not account for unmeasured confounding. Methods are available for estimating the minimum effect size of an unmeasured confounder that would indicate that apparent treatment effects are in fact attributable to selection effects29,30, but this approach is not equivalent to adjusting the analysis for imbalance. Indeed, all methods used to quantify, identify and, where possible, correct for bias in observational studies require complete or near-complete data collection, underlining the importance of recording all factors relevant to disease and treatment in any big MS data repository or registry-based cohort. Furthermore, a propensity score adjustment to balance comparator treatment arms is typically implemented at a single point in time, such as treatment initiation; as a result, the technique ensures a good balance of prognostic correlates at baseline, but does not control for systematic differences introduced into the sample after enrolment.

Marginal structural modelling. Recent developments in marginal structural modelling (MSM) provide one potential solution to the problem of simultaneously managing time-invariant (baseline only) and time-varying confounding, particularly in long-term (>5-year) observational MS cohorts31,32,33. MSM is an extension of conventional propensity score adjustment in which treatment comparison groups are adjusted for systematic differences in critical confounders at baseline and during the observation period. This approach can more effectively manage overall confounding and, consequently, more effectively isolate true treatment effects.

MSM can be extended to model relapse and progression outcomes across complex MS treatment trajectories, in which patients might switch between multiple therapies over time as a result of insufficient efficacy, tolerability issues or reasons of convenience. Like multi-arm propensity score adjustments, MSM requires a sufficiently large sample for optimal performance.

Use of 'big MS data'. Use of very large data sets from MS databases and registries — 'big MS data' — can overcome some of the statistical limitations of real-world observational studies. For example, although adjustment methods might perform poorly in insufficiently powered samples, big MS data sets — whether standalone or a combination of data from multiple sources — can be used to test these adjustment methods and validate their suitability for application to MS data. Big MS data can also be used to reliably identify early predictors of poor outcomes, which can then be used in the development of tools such as risk calculators and prognostic nomograms that can identify subsets of patients who would benefit most from timely intervention33.

Moreover, researchers have recently begun to use the tools that are commonly used in meta-analysis of clinical trials to identify and adjust for their heterogeneity as a means of dealing with differences between combined data from different sources when developing risk prediction tools. For example in one study, large data sets created by merging data from multiple sources and patient cohorts were used to successfully validate a new framework approach that combines meta-analytical heterogeneity adjustment with conventional multivariate regression for developing clinical prediction models34. This approach has not been used in MS, but has been successfully applied and validated for comparable compilations of observational cohorts in big oncology and cardiology data sets35.

Large data sets can be further exploited to identify treatment responsive subpopulations of patients with MS. In one study, for example, a least absolute shrinkage and selection operator (LASSO) procedure was used to develop a parametric scoring system based on multiple patient and disease factors at baseline, a system that could be used for estimating differences in treatment response between groups of subjects36. Not only can such an approach be useful to the clinician for identifying subsets of patients who might benefit most from a particular treatment, it can also guide patient selection for clinical trials by providing a systematic and data-driven approach to identifying subgroups of patients that are likely to respond well to a treatment, either new or established.

Finally, attempts are being made to combine grouped data from RCTs with individual patient data from observational studies, and one study has presented a unified modelling framework for this purpose37. Rather than simply pooling the evidence into an overall estimate of treatment effect after adjusting for potential confounding, the intention of this approach is to explore treatment effects in specific patient populations in the observational cohorts. Data from individual patients from observational cohorts can be used to build prediction models and to identify groups of patients with different levels of risk. A bivariate random effect model can then be used to combine baseline risk and treatment effects from observational cohorts and aggregated clinical trial data. This approach, which is based on a Bayesian hierarchical metaregression model, combines submodels that represent different types of data: that from individual patients and that aggregated from clinical trials.

Maximizing data quality

In multicentre and multinational observational cohort studies and disease registries, ensuring high data quality is a major challenge14. Data from the real world are inevitably less complete and possibly less accurate than data from controlled trials, so quality control is important. Good-quality studies should meet certain for the evidence they provide to be considered reliable38 (Box 1).

Box 1: Assessing the quality of real-world observational studies

Several factors can affect the quality and reliability of real-world observational studies. For that reason, several parameters should be considered when assessing their quality. We used the following criteria to identify most of the studies included in this Review:

• Treatment details and primary outcomes are adequately recorded

• Primary outcomes are appropriate and objectively measured

• Confounders of treatment effect are adequately recorded and taken into account in the analysis

• The statistical methods for reducing bias are properly used

• Sensitivity analyses are used to explore residual confounding

• Study limitations are openly acknowledged and discussed

Data standardization. The most fundamental requirement for maximizing data quality is that all centres and participants involved in a multicentre observational study use the same minimum data set based on the same definitions, with an appropriate training framework in place for people who record the data39. Clinical trials in MS conducted over the past 20 years have helped to promote standardization of basic aspects, such as diagnostic criteria, disease course and definition of attacks, and of short-term outcomes, such as attack rates, disability and MRI parameters. However, consensus is still lacking regarding the choice of instruments for measuring health-related quality of life and patient- reported outcomes, and how to best define and assess long-term outcomes in MS. Improved consensus on these issues is highly warranted, and likely to contribute to evaluation of long-term benefits of current treatments and the development of successful treatments for progressive MS.

Data collection. Regardless of the data that are collected, several measures can be taken during the collection process to maximize its quality. First, a computerized, rather than paper-based, collection system should ideally be used to allow direct entry of data locally or online40, as double handling of data (for example, a clinician recording data on paper and office staff transcribing these data into a database) and the logistic complexity of using paper forms are known to double the error rate39. In addition, basic data quality checks can easily be built into data collection software: systems can be programmed so that they will not accept or will query nonsensical or improbable data entries. Automated data quality checks can also be exploited to identify gaps in data collection41. Addition of real-time data analysis tools, such as calculators of likely treatment responses, to data collection software can also enhance the value of engaging with the software for the clinician, thereby encouraging complete data collection.

As data collection ideally takes place in real-time, during consultations, the size of the minimum data set should be kept small. Within the minimum data set, however, core outcome variables should always be recorded to ensure that the absence, and not just the presence, of a given event is recorded. This approach avoids the situation that can arise if only positive events are spontaneously reported; in this case, the 'not reported' category can include absent events and events that occurred but were not reported, with no provision made to distinguish between them.

In multicentre studies, the quality of data from each participating centre should be assessed, and each centre should be able to access their own data quality 'score' and ranking in relation to other centres; this approach would enable identification of systematic gaps in data capture and stimulate friendly competition, thereby driving improvements in data collection42. The presence of data quality metadata (for example, a measure of visit frequency, or of the proportion of records with a complete minimum data set) can also be useful in data analytics, in which they can serve as either covariates or prespecified inclusion or exclusion criteria for analyses or studies; for example, centres that report statistically outlying relapse rates or low visit frequencies could be excluded)43.

Policy and legislation. All large databases and registries are the result of long-standing collaborations between clinical and academic units, and other organizations. Each such network has an elaborate set of rules that govern data ownership and procedures for data export, and which must be respected. In addition, although data merging involves anonymized or pseudonymized data, patient consent is usually needed for scientific use of data, and scientific projects have to be cleared by ethics review boards. In observational studies, it is desirable for patient informed consent to allow a priori use of the pseudonymized data sets for all descriptive analyses and models that use this data set, so that specific analyses do not have to be individually approved. This type of blanket ethical approval and consent minimizes selection bias because, often, the research questions are unknown or unspecified at the time of data collection.

Real-world observational studies in MS

Disease progression and treatment response

The ability to efficiently predict responses to drug treatments before treatment starts would help with matching the right drug to the right patient, thereby minimizing the risk to benefit ratio and optimizing cost-effectiveness of treatment. In a long-lasting disease, such MS, observational studies are crucial for identifying early predictors of poor long-term outcomes44,45 and for gathering information about the safety and efficacy of treatments in patients with comorbidities3,46. Furthermore, the ability to accurately predict mid-term to long-term disability outcomes at various stages of disease is crucial for providing individualized treatment counselling.

Prediction of progression in CIS. Several studies have aimed to identify predictive factors in patients with clinically isolated syndrome (CIS) and indicate that early treatment is beneficial. In one large, single-centre, prospective observational study44, possible risk factors for conversion from clinically isolated syndrome (CIS) to a diagnosis of MS and accumulation of disability were evaluated in 1,058 patients who were followed-up for a mean of 6.5 years. MRI features (number and topography of lesions) were identified as the main prognostic factor at disease onset; demographic characteristics (age and gender) and the presence of oligoclonal bands restricted to the cerebrospinal fluid (CSF) emerged as low-impact and medium-impact prognostic factors, respectively. This analysis also suggested that initiation of a DMT before a second attack can reduce the risk of disability accumulation. The benefit of early and sustained treatment in reducing the risk of disability progression was also found in a larger multicentre observational study of a cohort of 2,000 patients in the MSBase registry with CIS and early MS45, and in a previous observational study of 2,260 patients from the Italian MS registry with definite MS47.

Definite relapsing–remitting MS. Several studies of patients with RRMS have assessed whether clinical activity (relapses and disability progression, defined as a change in Expanded Disability Status Scale (EDSS) score) and/or MRI activity (new gadolinium-enhancing lesions and/or new or enlarging T2-weighted lesions compared with baseline) are valid and early predictors of non-response to DMTs48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63. One study analysed the predictive value of the presence of at least one clinical relapse or disability progression (an increase of 1 in EDSS score confirmed at 6 months) and active MRI lesions (at least three new T2-weighted or gadolinium-enhancing lesions) in patients with RRMS who were treated with IFN-β49. Patients who were positive for at least two of the three criteria after 1 year of treatment had a higher probability of experiencing disability progression or relapse activity during the subsequent 2 years. The isolated presence of relapses or MRI activity, however, did not predict the risk of new clinical activity or disease progression. Development of the so-called modified Rio Score was prompted by the study above56. This score takes into consideration the combination of relapses (one or more than one) and substantial MRI activity (>5 new T2-weighted lesions) over the first year of IFN-β treatment.

More recently, the MRI in MS (MAGNIMS) network studied a large multicentre data set and found that the presence of relapses during the first year of IFN-β therapy was the main predictor of the risk of disability progression over the subsequent 3 years59. Moreover, isolated MRI activity, defined as the presence of at least three new T2-weighted lesions, also predicted an increase in disability. MRI activity alone during the first year of IFN-β treatment has also been associated with a poor clinical outcome in other studies52,53,57,58,59,61, although the exact cutoff associated with a poor outcome has differed between studies57,58,59,61,64.

The fact that different criteria (Table 2) have been proposed to define a suboptimal treatment response, however, highlights the controversy about the degree of clinical or MRI activity required to define a patient as a nonresponder and consequently switch to another treatment. These conflicting results might be explained by confounding factors, such as the timing of the reference scan in relation to initiation of treatment, the ongoing disease activity before the drug became effective, the lack of strict repositioning techniques when re-scanning and interobserver variability55,63.

Table 2: MRI criteria for predicting RRMS IFNβ treatment non-response identified in real-world studies

As therapy becomes more effective, so-called no evidence of disease activity (NEDA) is increasingly being considered as a possible treatment goal. NEDA-3 is a composite score based on a 'zero tolerance' concept and is defined as no relapses, no sustained disability progression (measured with the EDSS), and no new or enlarging T2-weighted or T1 gadolinium-enhancing lesions detected with MRI. The definition of NEDA-3 is evolving to include brain atrophy (NEDA-4)65,66, as a meta-analysis demonstrated that the combination of focal T2-weighted lesion load over 2 years and whole brain volume loss in the second year explained 75% of the variance of disability progression among patients who were on DMTs over a period of 2 years (a greater proportion than that produced by each of these measures alone)67. In future, other metrics, such as patient-reported outcome measures or fluid biomarkers (for example, cerebrospinal fluid neurofilament levels) could also be incorporated into the definition of NEDA68,69. However, whether NEDA persists in patients who achieve it, and whether it accurately predicts long-term prognosis, is currently controversial70.

Clinical insights. Despite the uncertainties left by the observational studies in CIS and definite RRMS, they provide useful information for directing MS intervention strategies in daily practice. First, they have confirmed findings of RCTs that DMTs should be initiated in patients with CIS or early MS to maximize their impact on disease evolution71,72,73,74,75,76,77,78. In this context, the number and topography of brain MRI lesions are the best currently available prognostic factor for guiding treatment choice. Second, they have demonstrated that the combination of clinical relapses and MRI activity seems to be the best tool for defining nonresponders to DMTs, whereas the relevance of minimal MRI activity alone is controversial. In real-world settings, minimal degree of disease activity (MEDA) might be a more realistic goal than NEDA. An important weakness of these observational studies, however, is that the vast majority are based on patients receiving IFN-β treatment, and similar data from cohorts of patients receiving other DMTs are scarce55,79,80. This shortfall highlights the great need for large data sets of real-world data with long-term follow-up periods for each therapy.

Comparison of disease-modifying therapies

Logistical and ethical concerns mean that RCTs cannot provide evidence about the comparative efficacy of DMTs. In the past decade, several observational studies that were based on data sets from large MS cohort studies or registries and used high-quality statistical approaches have addressed some of the most crucial questions about relative effectiveness of MS drugs in current use.

DMTs in treatment-naive MS patients. First-line injectable immunomodulators for MS are glatiramer acetate, subcutaneous IFN-β1a, intramuscular IFN-β1a, and intramuscular IFN-β1b (known collectively as BRACE, an acronym of their trade names). A network meta-analysis of data from RCTs has suggested that effectiveness in controlling relapse activity differs between BRACE therapies, when used in treatment-naive patients81. In a real-world observational study82, six head-to-head comparisons of the four injectable therapies were performed, using propensity score matching and paired mixed models83, in a population of 3,326 treatment-naive patients with RRMS from the MSBase registry. A slightly lower relapse rate was observed among patients who were treated with glatiramer acetate or subcutaneous IFN-β1a than among patients who were treated with intramuscular IFN-β1a and IFN-β1b, in keeping with the results of most of the clinical trials. However, no discernible difference in disability outcomes was seen at the end of the 12-month follow-up period. The study design controlled for indication, selection and attrition bias, and unidentified confounders.

Unsurprisingly, use of real-world data to compare BRACE therapies with second-line therapies84 revealed marked differences in their relative effectiveness in treatment-naive patients with active RRMS (≥1 gadolinium-enhancing lesion on cerebral MRI at baseline or ≥1 relapse within the 12 months before baseline). A propensity score-matched analysis85 (366 patients per group) revealed superiority of natalizumab over BRACE therapies, with a difference of 0.4 relapses per year between the groups84. The difference in confirmed disability progression, however, was not significant. This study provided Class IV evidence that first-line use of natalizumab rather than IFN-β or glatiramer acetate for RRMS improves relapse outcomes.

The results from these studies (Table 3) indicate that the differences between BRACE therapies have only limited clinical relevance to reduction of disease activity, and no discernible relevance to short-term or mid-term disability outcomes. Moreover, they suggest that more-effective but riskier therapeutic strategies86, such as use of natalizumab, should be considered for treatment-naive patients with high levels of pretreatment disease activity.

Table 3: Real-world studies comparing disease-modifying therapies in treatment-naive patients with RRMS

DMTs after treatment failure. MS relapses are associated with accumulation of disability87,88,89, and on-treatment relapses herald a particularly poor prognosis in this respect51. Optimization of immunomodulatory therapy in patients with suboptimal disease control is, therefore, likely to improve long-term disease outcomes. Evidence from RCTs suggests that escalation to second-line DMTs (natalizumab or fingolimod) after treatment failure of first-line BRACE therapies is likely to provide better control of MS activity than switching therapy to a second drug in the same class90,91. Several real-world observational studies (Table 4) that used data from MS registries with propensity score matching analysis techniques have directly compared the effects of treatment switching and treatment escalation.

Table 4: Real-world studies comparing disease-modifying therapies after treatment failure

Two of these studies showed that in patients who had experienced relapses and/or progression of disability during treatment with injectable therapies, escalation of treatment to natalizumab or fingolimod reduced relapses by 0.38 and 0.1 relapses per year, respectively, and the risk of disability progression by 26% and 47%, respectively, compared with switching to another injectable therapy92,93. In both studies, measures of relapse and disability were compared between propensity score-matched treatment arms (869 patients on natalizumab versus 869 on BRACE therapies, and 148 patients on fingolimod versus 379 on BRACE therapies); these comparisons were performed using a Cox Marginal Model and negative binomial models or frailty proportional hazards models adjusted for magnetic resonance imaging variables94. Sensitivity analyses confirmed that the results were sufficiently stable under different inclusion and matching criteria and unlikely to be influenced by unmeasured factors.

Observational studies have also provided direct comparisons of the benefits of escalation to natalizumab or fingolimod. In two direct head-to-head comparisons in patients with RRMS95,96 who did not respond to first-line therapies, escalation to natalizumab resulted in a lower relapse rate (by 0.2 relapses per year)95,96 and a higher probability of disability regression (by 180%)95, than did escalation to fingolimod. Both studies involved large, prospectively acquired cohorts (578 and 204 patients, respectively), and propensity score matching was used to select subpopulations with comparable baseline characteristics. Moreover, in the second study96, in the group that received natalizumab, a lower percentage of patients exhibited MRI activity, and a higher percentage of patients achieved NEDA-3 after 2 years of follow-up.

Natalizumab also proved superior to fingolimod in a French multicentre cohort study that involved a broad spectrum of patients with RRMS who exhibited EDSS scores of 0–5.5, a wide range of disease durations and variable prior disease activity (during first-line therapy in 88,5%)97. A brain MRI performed within the year before treatment initiation was available for all patients. The study included 326 patients who received natalizumab and 303 patients who received fingolimod. Two different methods were used for statistical analysis: logistic regression and propensity score inverse probability of treatment weighting. In the natalizumab-treated group, 10% more patients remained relapse-free over the initial 2 years of treatment than in the fingolimod-treated group, and 20% more patients exhibited no radiological signs of MS activity (gadolinium-enhancing lesions or new T2-weighted lesions). This study provides Class IV evidence that for patients with RRMS, natalizumab decreases the proportion of patients who experience at least one relapse within the first year of treatment to a greater extent than does fingolimod.

By contrast, a study from the Danish MS Treatment Register98 found no difference between the effects of natalizumab and fingolimod on clinical disease activity in patients with pretreatment relapse activity that was lower than in the studies above. The two treatment arms (464 patients in each treatment group) were propensity score matched by baseline covariates.

Interestingly, one study has shown that when switching from an injectable therapy to oral agents for reasons of convenience rather than failure of the injectable (that is, in patients with stable disease on the injectable therapy), no short-term improvement was observed in the control of clinical disease activity99. In this study, 396 patients who switched to oral agents were propensity score matched to 396 patients who remained on BRACE therapies. The proportion of patients who experienced at least one relapse in the first 1–6 months did not differ between the treatment arms, and neither did the mean annualized relapse rate or the rate of disability progression.

In combination, these observational studies of relative effectiveness suggest that, in clinical practice, for patients with first-line treatment failure, escalation therapy should be considered rather than switching to another BRACE therapy. They also show that differences between second-line therapies (natalizumab and fingolimod) are more apparent in patients with highly active disease, such as those with prior on-treatment breakthrough relapses than in patients with relatively low prior disease activity.

DMTs after treatment discontinuation. Natalizumab treatment, although very effective (particularly in patients with highly active RRMS), is associated with an increased risk of progressive multifocal leukoencephalopathy (PML)100. This risk is high in patients who are seropositive for anti-JC virus (JCV) antibodies, and seems to increase with longer treatment exposure. For this reason, patients who are receiving natalizumab are routinely screened for anti-JCV antibodies, and many patients with stable RRMS have to interrupt natalizumab treatment suddenly owing to a positive JCV status. Several observational studies have been conducted to evaluate the risk of RRMS reactivation after discontinuation of natalizumab101,102,103,104, most of which have confirmed a high risk101,102,103. No clear consensus has been reached on how to manage natalizumab discontinuation in order to reduce this risk. To address this issue, several large observational studies have evaluated different switching strategies102,103,105,106,107.

The Italian prospective observational study TY-STOP105 proved continuation of natalizumab to be superior to switching to a first-line BRACE therapy. Three studies have assessed whether switching to fingolimod can prevent disease reactivation102,103,106. A French cohort study102 across 36 tertiary referral centres, assessed the occurrence of MS relapses in 333 patients who made a planned switch from natalizumab to fingolimod. The risk of relapse was evaluated during the washout period and during the first 6 months of fingolimod therapy. 27% of patients experienced a relapse during the washout period, and 20% experienced a relapse during the first 6 months of fingolimod therapy. Multivariate analyses showed that the occurrence of a relapse during the washout period was the only significant predictor of a relapse during fingolimod therapy. A washout period of <3 months was associated with a lower risk of relapse.

Similarly, a cohort study106 based on data from the MSBase registry used adjusted binomial regression to compare the risk of relapse in three different patient groups who commenced fingolimod: patients who switched from natalizumab (89 patients, usually to mitigate the risk of PML), patients who switched from BRACE therapies (350 patients), and patients who were naive to treatment (97 patients). The median follow-up period was 10 months. This study showed that the number of relapses in the 6 months before initiation of fingolimod, and a washout period of 2–4 months rather than no washout period, were independent predictors of the time to first relapse after initiation of fingolimod.

An Italian head-to-head study103 used a propensity score matching method108 to demonstrate that switching to fingolimod is superior to switching to BRACE therapies for controlling the risk of relapses after discontinuation of natalizumab. This study involved a prospectively acquired cohort of 433 patients with RRMS. A Poisson regression analysis showed a 64% reduction in relapse incidence with fingolimod. Patients who switched to fingolimod or BRACE were also matched for propensity score on a one-to-one basis at the switching date. Switching to fingolimod was associated with a 48% reduction in the risk of relapse in comparison with switching to BRACE. Sensitivity analyses of subgroups with different washout durations (less than or more than 3 months) confirmed a lower risk of relapses in patients who switched to fingolimod than in patients who switched to BRACE therapies (47% and 59% risk reduction, respectively). The strongest independent factors that influenced the relapse risk after initiation of fingolimod were a washout duration >3 months, the number of relapses experienced during and before natalizumab treatment, and the presence of comorbidities (for example, thyroid dysfunction, allergy, headache and other). The time to 3-month confirmed disability progression, however, did not differ between the two groups.

In a subsequent Swedish cohort study107, the efficacy of rituximab was compared with that of fingolimod in patients with stable RRMS who had to switch from natalizumab owing to anti-JCV antibody positivity. Rituximab was more effective than fingolimod in reducing relapse frequency (a difference of 0.14 relapses per year) and the proportion of patients with gadolinium-enhancing MRI lesions (a difference of 23%). Multivariate logistic and Cox proportional hazards models were used to adjust for possible confounders, including age, sex, disability status, time on natalizumab, washout time, follow-up time and study centre. These models showed a 99% reduction in the risk of gadolinium-enhancing lesions and a 91% reduction in the risk of first relapse in rituximab-treated patients.

Taken together, these studies (Table 5) suggest that highly effective therapies should be considered for patients who are at high risk of disease reactivation owing to discontinuation of natalizumab, and that a washout duration that lasts >1–3 months is not acceptable in clinical practice. The results of these real-world observational studies are closely aligned with the outcomes that would be expected according to the relative effectiveness of the drugs identified in phase III RCTs. This alignment strongly suggests that the techniques used in the observational studies to reduce indication bias — particularly propensity score matching — are highly successful.

Table 5: Real-world studies comparing disease-modifying therapies after discontinuation of natalizumab

Long-term treatment effectiveness

The most important goal of MS therapy is to prevent or delay the development of long-term disability. Over the past decade, many real-world observational studies have been conducted in attempts to assess the long-term effects of the first approved BRACE therapies for the treatment of RRMS patients.

The first observational study to address this issue109 used inverse probability of treatment weighting propensity score-adjusted Cox regression models110 to show that, over the median follow-up period of 5.7 years, severe disability milestones (EDSS scores of 4.0 and 6.0) and conversion to secondary progressive MS were significantly delayed (1.7, 2.2 and 3.8 years, respectively) in patients who received IFN-β relative to those who were untreated.

The results of this study were confirmed in several subsequent real-world observational studies. An Italian study111, in which a Bayesian approach was used112 to adjust comparison groups for treatment imbalances, demonstrated that conversion to secondary progressive MS was delayed in patients who were treated with IFN-β or glatiramer acetate compared with those who were untreated. A Swedish study113 in which time-dependent Cox regression analysis was used to compare patients who received IFN-β (730 patients) with an historical cohort of untreated patients (186 patients) showed a trend towards a positive effect of first-line injectable therapy versus no treatment. A British Colombia cohort study that used time-dependent Cox regression analysis114 revealed a trend towards a positive benefit of IFN-β treatment on disability progression when treated patients were compared with an historical untreated cohort, but an opposite trend — towards inferiority — when treated patients were compared with a contemporary untreated cohort. The UK Multiple Sclerosis Risk Sharing Scheme115 assessed the long-term effectiveness and cost-effectiveness of IFN-β treatment. The cohort that received treatment was compared with an untreated cohort from British Columbia using two models: a continuous Markov model and a multilevel model116,117. Both models revealed that treated patients progressed more slowly on the EDSS than did untreated controls over a 6-year period, and cost–utility ratios were consistent with cost-effectiveness. Most recently, median quantile regression analysis of data from the MSBase registry has been used to evaluate the long-term effect of BRACE therapies in 2,466 patients over a follow–up period of at least 10 years. The results confirmed that BRACE therapies were independently associated with smaller increases in EDSS scores over 10 years51.

Risk of bias

No real-world observational studies are free from criticism3,4, highlighting the difficulty of eliminating biases (Table 1), even with rigorous statistical analysis. The most likely sources of bias are the choice of control groups and the way in which the treated and untreated groups are assembled. In concurrent cohort studies, selection bias can lead to untreated control groups having better prognoses than treated groups51,109,111,114, whereas in a design that includes an historical control cohort, selection bias can amplify the treatment effect, favouring the treated groups113,115. Indeed, patients among historical control groups are likely to have had a worse prognosis than that of modern patients, owing to improvements in diagnostic technologies that enable earlier diagnosis, improvements in supportive care over time, and an increase in the diagnosis of patients with a benign course. An example of the opposing biases — one that masks the treatment effect and one that amplifies it, can be seen in the British Colombia cohort study discussed above114.

Another potential shortfall of real-world observational studies is immortal time bias, which can arise when the period between cohort entry and initial exposure to a drug (during which the outcome of interest has not occurred) is excluded from the analysis. The first observational study of IFN-β treatment described above109 was criticized for being at risk of immortal time bias, although sensitivity analyses that addressed the potential for residual bias as a result of unmeasured confounders showed the results to be stable under different scenarios. Despite these limitations, the majority of observational studies51,109,111,113 of long-term treatment outcomes conducted to date have consistently demonstrated that early and sustained treatment reduces the risk of long-term disability progression in RRMS.

An important but as yet unresolved clinical challenge is the amelioration of disability accrual in progressive MS phenotypes. Potential reasons for the lack of evidence that DMTs are effective in progressive MS include that fact that RCTs have relatively short follow-up periods (typically 2–3 years) and the fact that we have no universal definition of progressive MS, especially secondary progressive MS. Use of large MS data sets holds the promise of addressing these limitations by providing long-term clinical follow-up data, defining reliable disability outcomes118, and enabling development of a unified clinical definition of secondary progressive MS that is informed by data119. Large numbers of patients with secondary progressive MS will continue to receive DMTs in clinical practice, thereby presenting the opportunity for comparisons of treatment effects in secondary progressive MS.

Safety of DMTs

Rare, delayed and/or long-term adverse effects of new drugs for the treatment of MS are unlikely to be established during most RCTs. The challenges involved in monitoring drugs once approved are numerous, but many questions are being raised about the efficacy and efficiency of currently used tools for detection of adverse events (largely spontaneous reporting of adverse events).

In the past 5–10 years, several national and international drug registries have been implemented to monitor the safety of new MS drugs120,121,122,123,124 (Box 2). In future, greater use of MS disease registries for post-marketing drug safety assessment would be desirable. Unlike drug registries, large disease registries can include information not only about products and procedures, but also about patients who receive different treatments or no treatment for the same clinical indications, enabling better evaluation of adverse event rates, consequences of long-term use, and/or effects of various combinations and sequences of treatments. Moreover, use of disease registries could provide a better understanding of how comorbidity alters the effectiveness and safety of DMTs125.

Box 2: Drug registries designed to monitor safety of new multiple sclerosis drugs

The Post-authorization non-interventional German safety of Gilenya (PANGEA) study120 is an ongoing prospective observational study of fingolimod, designed to collect data on the effectiveness and adverse events in 4,000 patients with relapsing–remitting multiple sclerosis (RRMS). A pharmacoeconomic substudy of 800 patients with RRMS will collect patient-reported outcome measures of disability, quality of life, compliance, treatment satisfaction and usage of resources during a 24-month observational phase.

The GoCARD study was a prospective, multicentre observational study designed to assess the cardiac safety profile of fingolimod in patients with RRMS during the 6 h after initiation or restarting of treatment121. This study involved 217 patients in 42 study centres. 78 patients had a profile that indicated a cardiac risk, but none of that group experienced bradycardia during the 6 h of observation. Overall, only four of all patients experienced bradycardia during the 6 h, and none exhibited a new or persistent onset atrio-ventricular block.

A multicentre retrospective observational study122 has evaluated the safety of oral dimethyl fumarate in the real world. Of 644 patients with MS treated with dimethyl fumarate, 143 withdrew owing to side effects: gastrointestinal discomfort (82 patients) and lymphopenia (34 patients) were the most frequently reported reasons, confirming safety results from randomized controlled trials.

The Sweden Immunomodulation and Multiple Sclerosis Epidemiology study123 analysed differences in 1-year drug survival among patients who initiated treatment with natalizumab (640 patients) or fingolimod (876 patients), 44% of people who initiated fingolimod had previously used natalizumab. This study demonstrated that both drugs are well tolerated, but fingolimod is less well tolerated, especially in patients switching from natalizumab.

The Tysabri (natalizumab) Observational Program (TOP)124 is an open-label, multinational, 10-year prospective study in clinical practice settings designed to evaluate long-term safety and effectiveness of natalizumab in patients with RRMS. The 5-year interim analysis on 4,821 patients followed-up for at least 4 years confirmed that natalizumab's overall safety profile in clinical practice is comparable with that in randomized controlled trials.

Potential for development

Data sharing

As discussed above, many questions that RCTs are unable to answer can be addressed by studies based on real-world data. However, despite the fact that cohort studies and registries typically include considerable numbers of patients, analyses are often limited by poor statistical power owing to insufficient numbers of patients. The small numbers are explained by several factors: the large number of DMTs available for MS and the resulting large number of treatment combinations; the need to correct or match for many variables to avoid bias; and the frequent problem of missing data as a result of centres that do not pass quality control steps. One obvious solution to this problem is to merge data from several sources.

Over the past 3 years, five leading MS databases and registries have been working together to explore opportunities for data sharing. Combined, the registries collect longitudinal data on >150,000 patients with MS from the Italian, Danish and Swedish MS registries, the French Observatoire Française de Sclérose en Plaque network and the international MSBase. To date, this so-called BigMSData group has identified and agreed on a minimal set of parameters and initiated three pilot projects with joint data. The group expects to prove that data sharing between MS databases and registries is both possible and scientifically rewarding.

Moreover, MS databases and registries are expected to gradually merge with other data sources, such as administrative databases (for example, records of hospital admissions, deaths, social security, employment and income), MRI databases and gene sequencing data sets, allowing big data algorithms to be built for multiple research purposes. In the near future, socioeconomic parameters, patient-reported outcomes, quantitative MRI metrics (for example, volumetric MRI and automated detection of new T2-weighted lesions) and biomarkers (including genetic and neurodegenerative markers, such as neurofilament levels in cerebrospinal fluid) are likely to be integrated into clinical practice at large MS centres. Analyses of such large data sets will have a crucial role in the development of new healthcare innovations126, will help to improve the efficiency of research and clinical trials, and will contribute to the development of new tools that will enable physicians to deliver personalized medicine.

Pragmatic trials

Pragmatic trials, or point-of-care trials (PCTs), are an alternative to RCTs, and raising awareness of their potential within the scientific community and among practitioners, policy makers and patients could help to maximize the benefits of real-world data. PCTs are effectively RCTs that are embedded into usual care, so have the advantage of being conducted in real-world settings while retaining the strengths of the experimental approach, including randomization127,128,129. In PCTs, comparators are active treatments of either the same or a different class as the treatment of interest. In relation to RCTs, PCTs also involve broader inclusion criteria, include larger and more diverse patient populations, and consider a wider spectrum of clinically relevant health outcomes, most of which are patient-oriented. As a result, PCTs overcome many limitations of both RCTs and nonrandomized observational studies, producing results that can be generalized and applied in routine clinical practice settings130,131. Several obstacles still currently hinder PCTs (technical issues and issues related to research governance procedures), but once these have been overcome, this approach can be fully exploited to provide the best method for the real-world comparisons of alternative therapies.


Data generated from real-world observational studies in MS that are based on large clinical data sets that are collected in daily practice offer the scientific community and patients many benefits and can address issues that are difficult or impossible to study with RCTs. In the past decade, studies of this kind have provided good-quality findings that are useful for directing and improving MS intervention strategies in daily practice (Box 3). Moreover, the majority of these studies have produced findings that correlate well with those of clinical trials, providing reassurance against criticisms of their overall validity. Ongoing development and application of new statistical methods for optimizing the validity and reliability of modelling in MS cohorts while minimizing the impacts of confounding, bias and heterogeneity, is providing MS researchers with a promising set of tools that will enable more sophisticated and relevant analysis of 'big data' from MS registries worldwide.

Box 3: Key results from real-world observational studies in multiple sclerosis

Predictors of multiple sclerosis treatment response

• Brain MRI features (the number and topography of lesions) are the best available prognostic factors before starting treatment44,45.

• The combination of clinical relapses with MRI activity appears to be the best tool for defining nonresponders to first line disease-modifying therapies, whereas minimal MRI activity (one or two new T2-weighted lesions) alone is controversial as an indication of treatment response49,52,56,57,59,61.

• With increasingly effective therapies, 'no evidence of disease activity' (NEDA) is being increasingly considered as a treatment goal65,66,67.

Comparative effectiveness of multiple sclerosis disease-modifying therapies

• In treatment-naive patients, the differences between first-line injectable therapies have limited clinical relevance82. More-effective second-line therapies (natalizumab and fingolimod), although riskier, should be considered for treatment-naive patients with higher pretreatment disease activity84.

• In patients for whom first-line treatment fails92,93,95,96,97 or who have a high risk of disease reactivation after suspension of treatment, highly effective therapies (natalizumab, fingolimod, rituximab)102,103,105,106,107 should be considered. A washout duration that lasts >1–3 months is not acceptable in clinical practice102,103,106.

Long-term effectiveness of multiple sclerosis disease-modifying therapies

• The majority of observational studies suggest that early and sustained treatment reduces the risk of long-term disability progression in relapsing–remitting multiple sclerosis51,109,111,113,115.

Review criteria

We searched PubMed for papers published between January 2007 and June 2016 using the terms “multiple sclerosis”, “observational study”, “treatment effectiveness” and “treatment safety” in combination with “retrospective cohort study” or “prospective cohort study” and/or “registry” or “database”. The search was limited to English, full-length articles in the peer-reviewed medical literature. Reference lists of identified papers were searched for further relevant articles. Papers were selected for inclusion on the basis of their originality and relevance to the topic, impact of the journal in which they were published, and critical reading by the authors. The aim was to review the most influential work rather than all work that has been done.


  1. 1.

    External validity of randomised controlled trials: “to whom do the results of this trial apply?” Lancet 365, 82–93 (2005).

  2. 2.

    , , , & How to assess the external validity of therapeutic trials: a conceptual approach. Int. J. Epidemiol. 39, 89–94 (2010).

  3. 3.

    & Can we measure long-term treatment effects in multiple sclerosis? Nat. Rev. Neurol. 11, 176–182 (2015).

  4. 4.

    & Observational data: understanding the real MS world. Mult. Scler. 22, 1642–1648 (2016).

  5. 5.

    Indirect comparisons: the mesh and mess of clinical trials. Lancet 368, 1470–1472 (2006).

  6. 6.

    , & Availability of comparative trials for the assessment of new medicines in the European Union at the moment of market authorization. Br. J. Clin. Pharmacol. 63, 159–162 (2007).

  7. 7.

    , , , & EDMUS, a European database for multiple sclerosis. J. Neurol. Neurosurg. Psychiatry 55, 671–676 (1992).

  8. 8.

    et al. MSBase: an international, online registry and platform for collaborative outcomes research in multiple sclerosis. Mult. Scler. 12, 769–774 (2006).

  9. 9.

    et al. Italian Multiple Sclerosis Database Network. Neurol. Sci. 27 (Suppl. 5), S358–S361 (2006).

  10. 10.

    , & The Norwegian Multiple Sclerosis Registry and Biobank. Acta Neurol. Scand. Suppl. 195, 20–23 (2012).

  11. 11.

    et al. EUReMS Consortium. Multiple sclerosis registries in Europe — results of a systematic survey. Mult. Scler. 20, 1523–1532 (2014).

  12. 12.

    & The Swedish MS registry — clinical support tool and scientific resource. Acta Neurol. Scand. 132, 11–19 (2015).

  13. 13.

    , & Registers of multiple sclerosis in Denmark. Acta Neurol. Scand. 132, 4–10 (2015).

  14. 14.

    & The rise of big clinical databases. Br. J. Surg. 102, e93–e101 (2015).

  15. 15.

    & A comparison of observational studies and randomized, controlled trials. N. Engl. J. Med. 342, 1878–1886 (2000).

  16. 16.

    , & Randomized, controlled trials, observational studies, and the hierarchy of research designs. N. Engl. J. Med. 342, 1887–1892 (2000).

  17. 17.

    , & Standardizing patient outcomes measurement. N. Engl. J. Med. 374, 504–506 (2016).

  18. 18.

    Propensity score studies are unlikely to underestimate treatment effects in critical care medicine: a critical reanalysis. J. Clin. Epidemiol. 68, 467–469 (2015).

  19. 19.

    & The central role of the propensity score in observational studies for causal effects. Biometrika 70, 41–55 (1983).

  20. 20.

    The relative ability of different propensity score methods to balance measured covariates between treated and untreated subjects in observational studies. Med. Decis. Making 29, 661–677 (2009).

  21. 21.

    et al. Observational studies: propensity score analysis of non-randomized data. Int. MS J. 16, 90–97 (2009).

  22. 22.

    in Multiple Sclerosis Therapeutics 4th edn Ch. 21 (eds Cohen, J. A. & Rudick, R.) 244–252 (Cambridge Univ. Press, 2011).

  23. 23.

    , & Indications for propensity scores and review of their use in pharmacoepidemiology. Basic Clin. Pharmacol. Toxicol. 98, 253–259 (2006).

  24. 24.

    Propensity-score matching in the cardiovascular surgery literature from 2004 to 2006: a systematic review and suggestions for improvement. J. Thorac. Cardiovasc. Surg. 134, 1128–1135 (2007).

  25. 25.

    , & Swedish Register of Cardiac Intensive Care (RIKS-HIA). Early statin treatment following acute myocardial infarction and 1-year survival. JAMA 285, 430–436 (2001).

  26. 26.

    , , , & Aspirin use and all-cause mortality among patients being evaluated for known or suspected coronary artery disease: a propensity analysis. JAMA 286, 1187–1194 (2001).

  27. 27.

    et al. Association between screening for osteoporosis and the incidence of hip fracture. Ann. Intern. Med. 142, 173–181 (2005).

  28. 28.

    et al. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiology. 20, 512–522 (2009).

  29. 29.

    Discussing hidden bias in observational studies. Ann. Intern. Med. 115, 901–905 (1991).

  30. 30.

    , & Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics 54, 948–963 (1998).

  31. 31.

    et al. Marginal structural Cox models for estimating the association between β-interferon exposure and disease progression in a multiple sclerosis cohort. Am. J. Epidemiol. 180, 160–171 (2014).

  32. 32.

    , , & A simulation study of finite-sample properties of marginal structural Cox proportional hazards models. Stat. Med. 31, 2098–2109 (2012).

  33. 33.

    & Simulating from marginal structural models with time-dependent confounding. Stat. Med. 31, 4190–4206 (2012).

  34. 34.

    , , , & A framework for developing, implementing, and evaluating clinical prediction models in an individual participant data meta-analysis. Stat. Med. 32, 3158–3180 (2013).

  35. 35.

    , , & Developing and validating risk prediction models in an individual participant data meta-analysis. BMC Med. Res. Methodol. 14, 3 (2014).

  36. 36.

    , , , & Effectively selecting a target population for a future comparative study. J. Am. Stat. Assoc. 108, 527–539 (2013).

  37. 37.

    , , & Bayesian evidence synthesis for exploring generalizability of treatment effects: a case study of combining randomized and non-randomized results in diabetes. Stat. Med. 35, 1654–1675 (2016).

  38. 38.

    & Cautionary tales in the interpretation of observational studies of effects of clinical interventions. Intern. Med. J. (2016). This article proposes criteria for identifying high quality observational studies.

  39. 39.

    , & Defining and improving data quality in medical registries: a literature review, case study, and generic framework. J. Am. Med. Inform. Assoc. 9, 600–611 (2002).

  40. 40.

    , , & Computer-assisted data collection in multicenter epidemiologic research. The Atherosclerosis Risk Communities Study. Control. Clin. Trials 11, 101–115 (1990).

  41. 41.

    & Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J. Am. Med. Inform. Assoc. 20, 144–151 (2013).

  42. 42.

    et al. Data quality monitoring and performance metrics of a prospective, population-based observational study of maternal and newborn health in low resource settings. Reprod. Health 12 (Suppl. 2), S2 (2015).

  43. 43.

    et al. Data quality evaluation for observational multiple sclerosis registries. Mult. Scler. (2016).

  44. 44.

    et al. Defining high, medium and low impact prognostic factors for developing multiple sclerosis. Brain 138, 1863–1874 (2015). The largest single-centre prospective study evaluating prognostic factors in patients with clinically isolated syndrome suggestive of MS.

  45. 45.

    et al. Predictors of disability worsening in clinically isolated syndrome. Ann. Clin. Transl Neurol. 2, 479–491 (2015).

  46. 46.

    et al. Recommendations for observational studies of comorbidity in multiple sclerosis. Neurology 86, 1446–1453 (2016).

  47. 47.

    et al. Real-life impact of early interferonβ therapy in relapsing multiple sclerosis. Ann. Neurol. 66, 513–520 (2009).

  48. 48.

    et al. Defining the response to interferon-beta in relapsing-remitting multiple sclerosis patients. Ann. Neurol. 59, 344–352 (2006).

  49. 49.

    et al. Measures in the first year of therapy predict the response to interferon beta in MS. Mult. Scler. 15, 848–853 (2009).

  50. 50.

    et al. Predictors of long-term outcome in multiple sclerosis patients treated with interferon β. Ann. Neurol. 73, 95–103 (2013).

  51. 51.

    et al. Predictors of long-term disability accrual in relapse-onset multiple sclerosis. Ann. Neurol. 80, 89–100 (2016).

  52. 52.

    et al. Early predictors of non-response to interferon in multiple sclerosis. Acta Neurol. Scand. 126, 390–397 (2012).

  53. 53.

    et al. Combining clinical and MRI predictors enhances prediction of 12-year disability in multiple sclerosis. Mult. Scler. (2016).

  54. 54.

    et al. Early magnetic resonance imaging predictors of clinical progression after 48 months in clinically isolated syndrome patients treated with intramuscular interferon β-1a. Eur. J. Neurol. 22, 1113–1123 (2015).

  55. 55.

    et al. Evaluating the response to glatiramer acetate in relapsing-remitting multiple sclerosis (RRMS) patients. Mult. Scler. 20, 1602–1608 (2014).

  56. 56.

    et al. Scoring treatment response in patients with relapsing multiple sclerosis. Mult. Scler. 19, 605–612 (2013).

  57. 57.

    et al. Interferon beta failure predicted by EMA criteria or isolated MRI activity in multiple sclerosis. Mult. Scler. 20, 566–576 (2014).

  58. 58.

    , , , & Assessing treatment response to interferon-β: is there a role for MRI? Neurology 82, 248–254 (2014).

  59. 59.

    et al. Assessing response to interferon-β in a multicenter dataset of patients with MS. Neurology 87, 134–140 (2016). The largest study assessing MRI criteria for predicting IFN-β treatment non-response in real-world studies.

  60. 60.

    et al. Clinical markers of long-term disability in RRMS patients treated with interferon beta [poster]. Mult. Scler. 20 (Suppl. 1), P285 (2014).

  61. 61.

    et al. Relationship between MRI lesion activity and response to IFN-beta in relapsing-remitting multiple sclerosis patients. Mult. Scler. 14, 479–484 (2008).

  62. 62.

    et al. Reliability of classifying multiple sclerosis disease activity using magnetic resonance imaging in a multiple sclerosis clinic. JAMA Neurol. 70, 338–344 (2013).

  63. 63.

    et al. Evidence-based guidelines: MAGNIMS consensus guidelines on the use of MRI in multiple sclerosis — establishing disease prognosis and monitoring patients. Nat. Rev. Neurol. 11, 597–606 (2015).

  64. 64.

    , , , & Defining interferon beta response status in multiple sclerosis patients. Ann. Neurol. 56, 548–555 (2004).

  65. 65.

    & Disease activity free status: a new end point for a new era in multiple sclerosis clinical research? JAMA Neurol. 71, 269–270 (2014).

  66. 66.

    et al. Inclusion of brain volume loss in a revised measure of 'no evidence of disease activity' (NEDA-4) in relapsing–remitting multiple sclerosis. Mult. Scler. 22, 1297–1305 (2016).

  67. 67.

    , & Treatment effect on brain atrophy correlates with treatment effect on disability in multiple sclerosis. Ann. Neurol. 75, 43–49 (2014).

  68. 68.

    , , , & Towards the implementation of 'no evidence of disease activity' in multiple sclerosis treatment: the multiple sclerosis decision model. Ther. Adv. Neurol. Disord. 8, 3–13 (2015).

  69. 69.

    et al. Fingolimod and CSF neurofilament light chain levels in relapsing–remitting multiple sclerosis. Neurology 84, 1639–1643 (2015).

  70. 70.

    , , , & Evaluation of no evidence of disease activity in a 7-year longitudinal multiple sclerosis cohort. JAMA Neurol. 72, 152–158 (2015).

  71. 71.

    et al. Intramuscular interferon beta-1a therapy initiated during a first demyelinating event in multiple sclerosis. CHAMPS Study Group. N. Engl. J. Med. 343, 898–904 (2000).

  72. 72.

    et al. Effect of early interferon treatment on conversion to definite multiple sclerosis: a randomised study. Lancet 357, 1576–1582 (2001).

  73. 73.

    et al. Long-term subcutaneous interferon beta-1a therapy in patients with relapsing-remitting MS. Neurology 67, 944–953 (2006).

  74. 74.

    et al. Subgroups of the BENEFIT study: risk of developing MS and treatment effect of interferon beta-1b. J. Neurol. 255, 480–487 (2008).

  75. 75.

    et al. Effect of glatiramer acetate on conversion to clinically definite multiple sclerosis in patients with clinically isolated syndrome (PreCISe study): a randomised, double-blind, placebo-controlled trial. Lancet 374, 1503–1511 (2009).

  76. 76.

    et al. Long-term effect of early treatment with interferon beta-1b after a first clinical event suggestive of multiple sclerosis: 5-year active treatment extension of the phase 3 BENEFIT trial. Lancet Neurol. 8, 987–997 (2009).

  77. 77.

    et al. Association between immediate initiation of intramuscular interferon beta-1a at the time of a clinically isolated syndrome and long-term outcomes: a 10-year follow-up of the Controlled High-Risk Avonex Multiple Sclerosis Prevention Study in Neurological Surveillance. Arch. Neurol. 69, 183–190 (2012).

  78. 78.

    et al. Oral teriflunomide for patients with a first clinical episode suggestive of multiple sclerosis (TOPIC): a randomised, double-blind, placebo-controlled, phase 3 trial. Lancet Neurol. 13, 977–986 (2014).

  79. 79.

    et al. Disease activity in the first year predicts longer-term clinical outcomes in the pooled population of the phase III FREEDOMS and FREEDOMS II studies [poster]. Neurology 84 (14 Suppl.), P7.239 (2015).

  80. 80.

    et al. Is it time to target no evident disease activity (NEDA) in multiple sclerosis? Mult. Scler. Relat. Disord. 4, 329–333 (2015).

  81. 81.

    et al. Immunomodulators and immunosuppressants for multiple sclerosis: a network meta-analysis. Cochrane Database Syst. Rev. 6, CD008933 (2013).

  82. 82.

    et al. Comparative effectiveness of glatiramer acetate and interferon beta formulations in relapsing-remitting multiple sclerosis. Mult. Scler. 21, 1159–1171 (2015).

  83. 83.

    Multivariate and propensity score matching software with automated balance optimization: the matching package for R. J. Stat. Software 42, 7 (2011).

  84. 84.

    et al. Comparative efficacy of first-line natalizumab versus IFNβ or glatiramer acetate in relapsing-remitting MS. Neurol. Clin. Pract. 6, 102–115 (2016).

  85. 85.

    et al. One-to-many propensity score matching in cohort studies. Pharmacoepidemiol. Drug Saf. 21 (Suppl. 2), 69–80 (2012).

  86. 86.

    , , , & Disease-modifying therapies and infectious risks in multiple sclerosis. Nat. Rev. Neurol. 12, 217–233 (2016).

  87. 87.

    , & Effect of relapses on development of residual deficit in multiple sclerosis. Neurology 61, 1528–1532 (2003).

  88. 88.

    et al. Contribution of relapses to disability in multiple sclerosis. J. Neurol. 255, 280–287 (2008).

  89. 89.

    et al. Contribution of different relapse phenotypes to disability in multiple sclerosis. Mult. Scler. (2016).

  90. 90.

    et al. Comparison of fingolimod with interferon beta-1a in relapsing-remitting multiple sclerosis: a randomised extension of the TRANSFORMS study. Lancet Neurol. 10, 520–529 (2011).

  91. 91.

    et al. Alemtuzumab for patients with relapsing multiple sclerosis after disease-modifying therapy: a randomised controlled phase 3 trial. Lancet 380, 1829–1839 (2012).

  92. 92.

    et al. Comparative efficacy of switching to natalizumab in active multiple sclerosis. Ann. Clin. Transl Neurol. 2, 373–387 (2015).

  93. 93.

    et al. Comparison of switch to fingolimod or interferon beta/glatiramer acetate in active multiple sclerosis. JAMA Neurol. 72, 405–413 (2015).

  94. 94.

    Chi-squared goodness-of-fit tests for the proportional hazards regression model. Biometrika 67, 145–153 (1980).

  95. 95.

    et al. Switch to natalizumab versus fingolimod in active relapsing-remitting multiple sclerosis. Ann. Neurol. 77, 425–435 (2015). The first comparative study evaluating the effectiveness of natalizumab and fingolimod after first-line treatment failure.

  96. 96.

    et al. Natalizumab versus fingolimod in patients with relapsing-remitting multiple sclerosis non-responding to first-line injectable therapies. Mult. Scler. 22, 1315–1326 (2016).

  97. 97.

    et al. Comparative efficacy of fingolimod versus natalizumab: a French multicenter observational study. Neurology 86, 771–778 (2016).

  98. 98.

    , , & A comparison of multiple sclerosis clinical disease activity between patients treated with natalizumab and fingolimod. Mult. Scler. (2016).

  99. 99.

    et al. Risk of early relapse following the switch from injectables to oral agents for multiple sclerosis. Eur. J. Neurol. 23, 729–736 (2016).

  100. 100.

    et al. Risk of natalizumab-associated progressive multifocal leukoencephalopathy. N. Engl. J. Med. 366, 1870–1880 (2012).

  101. 101.

    et al. Disease activity return during natalizumab treatment interruption in patients with multiple sclerosis. Neurology 76, 1858–1865 (2011).

  102. 102.

    et al. Switching from natalizumab to fingolimod in multiple sclerosis: a French prospective study. JAMA Neurol. 71, 436–441 (2014).

  103. 103.

    et al. Fingolimod versus interferon beta/glatiramer acetate after natalizumab suspension in multiple sclerosis. Brain 138, 3275–3286 (2015). The first comparative study demonstrating the superiority of fingolimod versus BRACE therapy in controlling diseases reactivation after natalizumab suspension in a real-world context.

  104. 104.

    et al. Recurrence or rebound of clinical relapses after discontinuation of natalizumab therapy in highly active MS patients. J. Neurol. 261, 1170–1177 (2014).

  105. 105.

    et al. Treatment of relapsing–remitting multiple sclerosis after 24 doses of natalizumab: evidence from an Italian spontaneous, prospective, and observational study (the TY-STOP Study). JAMA Neurol. 71, 954–960 (2014).

  106. 106.

    et al. Fingolimod after natalizumab and the risk of short-term relapse. Neurology 82, 1204–1211 (2014).

  107. 107.

    et al. Rituximab versus fingolimod after natalizumab in multiple sclerosis patients. Ann. Neurol. 79, 950–958 (2016).

  108. 108.

    Reducing bias in a propensity score matched pair sample using greedy matching techniques. SAS (2001).

  109. 109.

    et al. New natural history of interferon-beta-treated relapsing multiple sclerosis. Ann. Neurol. 61, 300–306 (2007). The first study addressing the issue of long-term effectiveness of IFN-β treatment in MS by using propensity score technique.

  110. 110.

    & Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat. Med. 23, 2937–2960 (2004).

  111. 111.

    et al. Immunomodulatory therapies delay disease progression in multiple sclerosis. Mult. Scler. 22, 1732–1740 (2016).

  112. 112.

    & Following a moving target — Monte Carlo inference for dynamic Bayesian models. J. R. Stat. Soc. B 63, 127–146 (2001).

  113. 113.

    et al. Time to secondary progression in patients with multiple sclerosis treated with first generation immunomodulating drugs. Mult. Scler. 19, 765–774 (2013).

  114. 114.

    et al. Association between use of interferon beta and progression of disability in patients with relapsing–remitting multiple sclerosis. JAMA 308, 247–256 (2012).

  115. 115.

    et al. Effectiveness and cost-effectiveness of interferon beta and glatiramer acetate in the UK Multiple Sclerosis Risk Sharing Scheme at 6 years: a clinical cohort study with natural history comparator. Lancet Neurol. 14, 497–505 (2015). The first study assessing cost–utility ratios and cost-effectiveness in patients with MS treated with BRACE therapies over a 6-year period.

  116. 116.

    & Estimation of the transition matrix of a discrete-time Markov chain. Health Econ. 11, 33–42 (2002).

  117. 117.

    , , & Multistate Markov models for disease progression with classification error. J. R. Stat. Soc. D 52, 193–209 (2003).

  118. 118.

    et al. Defining reliable disability outcomes in multiple sclerosis. Brain 138, 3287–3298 (2015).

  119. 119.

    et al. Defining secondary progressive multiple sclerosis. Brain 139, 2395–2405 (2016).

  120. 120.

    , & The PANGAEA study design — a prospective, multicenter, non-interventional, long-term study on fingolimod for the treatment of multiple sclerosis in daily practice. BMC Neurol. 15, 93 (2015).

  121. 121.

    & Cardiac safety profile of first dose of fingolimod for relapsing–remitting multiple sclerosis in real-world settings: data from a German prospective multi-center observational study. Neurol. Ther. (2016).

  122. 122.

    et al. Safety and efficacy of dimethyl fumarate in multiple sclerosis: a multi-center observational study. J. Neurol. 263, 1626–1632 (2016).

  123. 123.

    et al. Comparative analysis of first-year fingolimod and natalizumab drug discontinuation among Swedish patients with multiple sclerosis. Mult. Scler. 22, 85–93 (2016).

  124. 124.

    et al. Efficacy and safety of natalizumab in multiple sclerosis: interim observational programme results. J. Neurol. Neurosurg. Psychiatry 85, 1190–1197 (2014).

  125. 125.

    et al. Examining the effects of comorbidities on disease-modifying therapy use in multiple sclerosis. Neurology 86, 1287–1295 (2016).

  126. 126.

    , & Big data: the next frontier for innovation in therapeutics and healthcare. Expert Rev. Clin. Pharmacol. 7, 293–298 (2014).

  127. 127.

    et al. A pragmatic-explanatory continuum indicator summary (PRECIS): a tool to help trial designers. J. Clin. Epidemiol. 62, 464–475 (2009).

  128. 128.

    et al. The opportunities and challenges of pragmatic point-of-care randomised trials using routinely collected electronic records: evaluations of two exemplar trials. Health Technol. Assess. 18, 1–146 (2014).

  129. 129.

    & Integrating randomized comparative effectiveness research with patient care. N. Engl. J. Med. 374, 2152–2158 (2016).

  130. 130.

    A pragmatic view on pragmatic trials. Dialogues Clin. Neurosci. 13, 217–224 (2011).

  131. 131.

    et al. Improving the efficiency and effectiveness of pragmatic clinical trials in older adults in the United States. Contemp. Clin. Trials 33, 1211–1216 (2012).

Download references

Author information


  1. Department of Basic Medical Sciences, Neurosciences and Sense Organs, University of Bari “Aldo Moro”, Piazza G. Cesare 11, 70124, Bari, Italy.

    • Maria Trojano
    •  & Pietro Iaffaldano
  2. Department of Neurology-Neuroimmunology and Multiple Sclerosis Centre of Catalonia (Cemcat), Hospital Universitari Vall d'Hebron, Universitat Autònoma de Barcelona, Passeig Vall d'Hebron 119–129, 08035, Barcelona, Spain.

    • Mar Tintore
    •  & Xavier Montalban
  3. Department of Clinical Neuroscience, Karolinska Institutet, Tomtebodavägen 18A, S-17177, Solna, Stockholm, Sweden.

    • Jan Hillert
  4. Department of Medicine, University of Melbourne, and Department of Neurology, Royal Melbourne Hospital, Grattan Street, Parkville, VIC 3050, Australia.

    • Tomas Kalincik
    • , Tim Spelman
    •  & Helmut Butzkueven
  5. Department of Health Sciences (DISSAL), University of Genoa, Via Pastore 1, 16132, Genoa, Italy.

    • Maria Pia Sormani


  1. Search for Maria Trojano in:

  2. Search for Mar Tintore in:

  3. Search for Xavier Montalban in:

  4. Search for Jan Hillert in:

  5. Search for Tomas Kalincik in:

  6. Search for Pietro Iaffaldano in:

  7. Search for Tim Spelman in:

  8. Search for Maria Pia Sormani in:

  9. Search for Helmut Butzkueven in:


Maria Trojano coordinated the article and edited the manuscript before submission. All authors made substantial contributions to writing the article and discussion of the content, and reviewed the manuscript before submission.

Competing interests

Maria Trojano has served on scientific Advisory Boards for Almirall, Biogen, Genzyme, Novartis and Roche; has received speaker honoraria from Almirall, Bayer, Biogen, Genzyme, Merck Serono, Novartis, Sanofi and Teva Pharmaceuticals; and has received research grants for her Institution from Biogen, Merck Serono and Novartis. Mar Tintore has received compensation for consulting services and speaking from Bayer, Biogen, Merck Serono, Novartis, Sanofi and Teva Pharmaceuticals. Xavier Montalban has received speaking honoraria and travel expenses for scientific meetings, has been a steering committee member of clinical trials or participated in advisory boards of clinical trials with Almirall, Bayer, Biogen, Genentech, Genzyme, Merck Serono, Novartis, Sanofi and Teva Pharmaceuticals. Jan Hillert has received honoraria for serving on advisory boards for Biogen, Genzyme and Novartis, and has received speaker's fees from Bayer, Biogen, Genzyme, Merck Serono, Novartis, and Teva Pharmaceuticals. He has served as principal investigator for projects sponsored by, or received unrestricted research support from Bayer, Biogen, Merck Serono, Novartis and Teva Pharmaceuticals. Tomas Kalincik has served on scientific advisory boards for Biogen, Genzyme, Merck, Novartis and Roche; has received conference travel support and/or speaker honoraria from BioCSL, Biogen, Genzyme, Merck, Novartis, Sanofi, Teva Pharmaceuticals and WebMD Global; and has received research support from Biogen. Pietro Iaffaldano has served on scientific advisory boards for Bayer and Biogen, and has received funding for travel and/or speaker honoraria from Biogen, Novartis, Sanofi and Teva Pharmaceuticals. Tim Spelman has received travel support, speaker honoraria and compensation for serving on advisory boards from Biogen and Novartis. Maria Pia Sormani received consulting fees from Biogen, GeNeuro, Genzyme, Merck Serono, Novartis, Roche, Teva Pharmaceuticals and Vertex. Helmut Butzkueven has served on scientific advisory boards for Biogen, Novartis and Sanofi and has received conference travel support from Biogen, Novartis and Sanofi. He serves on steering committees for trials conducted by Biogen and Novartis and has received research support from Biogen, Merck Serono and Novartis.

Corresponding author

Correspondence to Maria Trojano.


Prognostic nomograms

Graphical prediction tools designed to assess the risk of future event based on specific patient and disease characteristics.

Least absolute shrinkage and selection operator (LASSO) procedure

A regression analysis method that enhances the prediction accuracy and interpretability of the statistical model.

Bayesian hierarchical metaregression model

A metaregression is a meta-analysis designed to assess factors associated with the size of the treatment effect; Bayesian hierarchical modelling allows estimation of the parameters of the metaregression.

Inverse probability of treatment weighting

A weighting method that uses propensity scores to derive a synthetic sample within which the distribution of baseline prognostic confounding variables is independent of the treatment assignment; the weight given to a patient is the inverse of the probability that he or she would receive the treatment that he or she actually did receive.

Progressive multifocal leukoencephalopathy

A viral encephalitis caused by JC virus, predominantly involving white matter and reported in patients being treated with certain immunosuppressive and immunomodulatory therapies.

Bayesian approach

A method of statistical inference that allows prior information about a population parameter to be combined with evidence from a sample to guide the statistical inference process.

Continuous Markov model

A model used in economics that is based on a stochastic process with the Markov property, which defines serial dependence between adjacent periods only; the model can be used to describe systems in which the next event depends only on the current state of the system.

Multilevel model

A statistical model in which parameters vary at more than one level. Observational studies in which many observations are made per subject include two levels of variability: the variability between subjects and the variability within each subject over time.

Cost–utility ratios

The outcomes of cost–utility analysis, a form of financial analysis used to guide decisions. The cost–utility ratio estimates the ratio between the cost of a health-related intervention and the benefit it produces in terms of the number of years lived in full health by the beneficiaries.

About this article

Publication history



Further reading