MAGNIMS consensus recommendations on the use of brain and spinal cord atrophy measures in clinical practice

Early evaluation of treatment response and prediction of disease evolution are key issues in the management of people with multiple sclerosis (MS). In the past 20 years, MRI has become the most useful paraclinical tool in both situations and is used clinically to assess the inflammatory component of the disease, particularly the presence and evolution of focal lesions — the pathological hallmark of MS. However, diffuse neurodegenerative processes that are at least partly independent of inflammatory mechanisms can develop early in people with MS and are closely related to disability. The effects of these neurodegenerative processes at a macroscopic level can be quantified by estimation of brain and spinal cord atrophy with MRI. MRI measurements of atrophy in MS have also been proposed as a complementary approach to lesion assessment to facilitate the prediction of clinical outcomes and to assess treatment responses. In this Consensus statement, the Magnetic Resonance Imaging in MS (MAGNIMS) study group critically review the application of brain and spinal cord atrophy in clinical practice in the management of MS, considering the role of atrophy measures in prognosis and treatment monitoring and the barriers to clinical use of these measures. On the basis of this review, the group makes consensus statements and recommendations for future research.

The inflammatory component of multiple sclerosis (MS) pathology can be focal or diffuse and is associated with neurodegenerative processes that ultimately lead to irreversible tissue damage and neuronal loss 1 . Neuro degeneration was originally thought to be a late stage phenomenon with limited clinical relevance, but it is now recognized as being associated with acute inflam mation from the early stages of MS and as the main driver of irreversible disability [2][3][4][5] . In parallel with improvements in our understanding of the mechanisms of neurodegen eration, advances in imaging techniques have enabled in vivo assessment of brain and spinal cord area and vol umes using MRI. Although brain and spinal cord volume loss observed with MRI cannot be equated with atrophy 6 , because the latter implies pathologically proven and irre versible tissue loss, changes in these MRI measures are associated with atrophy 7 and the level of disability in MS 8,9 .
MRI based quantification of inflammatory activity in MS -on the basis of lesion counts and lesion volumes -is established as the main efficacy outcome in phase II clinical trials 10 . Currently, brain and spinal cord volume measures have no role in the MS diagnostic criteria 11,12 or disease course classification 13 , but a body of evidence that these measures are valuable for early evaluation of treatment responses and prediction of disease evolution has been steadily growing alongside improvements in methodology that could facilitate widespread implemen tation of these measures in clinical practice 14,15 . A key difficulty arises in this implementation because transla tion of group based results into actionable, patient level information must be made with extreme caution.
In this Consensus Statement, we, on behalf of the Magnetic Resonance in Imaging in MS (MAGNIMS) study group, provide specific recommendations for the MAGNIMS consensus recommendations on the use of brain and spinal cord atrophy measures in clinical practice implementation of brain and spinal cord atrophy meas ures in the clinical management of patients with MS and on the directions of future research to improve our knowledge in this field. The recommendations are based on a critical review of the literature and the personal experience of MAGNIMS study group members. We dis cuss the difficulties of translating group based data into clinical application and highlight where particular cau tion is appropriate. We first discuss the role of atrophy measures on prognosis, then treatment monitoring and, finally, the barriers to implementation in clinical prac tice. Each of these three sections comprises a review of the available evidence and a set of consensus guidelines.

Methods
A multicentre international panel on the implementa tion of brain and spinal cord atrophy measures in clinical practice convened in Barcelona, Spain, under the aus pices of MAGNIMS, an independent European network of clinical research groups with a common interest in the study of MS with MRI. The panel was made up of experts in the diagnosis and management of MS, includ ing neuroradiologists, neurologists, physicists, imaging methodologists and statisticians, who were selected by the workshop organizers (with approval from all mem bers of the Steering Committee) on the basis of their personal expertise, from MAGNIMS centres from seven different countries. The purpose of this face toface meeting was to review and discuss all published data on brain and spinal cord atrophy in MS and to consider whether the previously published recommendations 16,17 on its use for diagnosis, prognosis and monitoring of patients with MS needed to be revised and updated in view of technical advances and numerous clinical studies of atrophy in MS. The panel agreed that updated recom mendations were necessary. After this meeting, the panel members formulated specific recommendations in relation to the implementation of brain and spinal cord atrophy measures in clinical practice.
The authors of the Consensus statement are mem bers of the MAGNIMS Study Group. The network is independent of any other organization and, at the time of the workshop mentioned above, was run by a Steering Committee whose members were À. Rovira to discussions on specific topics, which were assigned to individuals according to each member's area of expertise. The initial draft was then circulated among all authors (who were all presenters and/or discussants at the meet ing). Modifications were made iteratively until consensus was reached on all recommendations; all panel members agreed on the full contents of the final recommendations.
Defining and predicting MS severity Evidence review Global brain volume measures to define and predict MS severity. The initial studies to investigate clinical correlates of brain atrophy in MS focused on patients with well established disease and severe clinical mani festations, particularly in the cognitive sphere [18][19][20] , but later studies included disability, as measured with the Expanded Disability Status Scale 8 . Evidence from these studies made it clear that neurodegenerative proces ses occur in the earliest phases of MS 21 , even before the disease becomes symptomatic 22 .
Yearly global brain volume loss in healthy ageing individuals ranges from -0.05% at 20-30 years of age to -0.3% at 60-70 years of age 23 . A change of -0.4% per year has been proposed as the cut off for pathological brain atrophy in MS 24 (Fig. 1), although care must be taken before applying this threshold as a marker of ther apeutic efficacy owing to the phenomenon of pseudo atrophy (see Brain volume as an outcome measure in randomized clinical trials) 25,26 . Multiple studies have shown that short term changes (over as little as 1 year) in brain volume are predictive of clinical status (diagnosis of MS or disability status) at various follow up times in clinically isolated syndromes 27,28 , relapsing-remitting MS (RRMS) 29 and primary progressive MS [30][31][32] , either in isolation or together with lesion related parameters 33,34 .
The findings above are group based results, and translation of these findings to the individual level is not Fig. 1 | Lesion load and brain atrophy in relapsing-remitting multiple sclerosis. a | Transverse T2-weighted fluid attenuation inversion recovery images from a patient with highly active relapsing-remitting multiple sclerosis (MS) who started a disease-modifying therapy at baseline. The T2 lesion load (T2LL) is stable during the first 3 years of treatment while the patient remained clinically stable (no relapses and no disability worsening), but markedly increases at the fourth year after treatment discontinuation associated with clinical activity (the rebound effect). b | Contrast-enhanced T1-weighted images from the same patient showing the change in brain parenchymal fraction (BPF) over time. The decrease in global brain volume in the first 3 years is mild (annualized percentage of brain volume change (aPBVC) −0.089%), but the volume loss at the fourth year is severe (aPBVC −3.8%), matching the change in T2LL and clinical evolution. The severe loss observed in year 4 is well beyond the −0.4% suggested as a pathological cut-off for brain volume loss in MS 24 . c | Graphical representation of the changes in BPF over time, emphasizing the dramatic loss of volume in year 4.
straightforward. In a study published in 2017, Sormani et al. 35 made the first attempt to define individual cut off values for brain volume changes according to patients' baseline characteristics. Pooled baseline data from the placebo arms of two large international clinical trials that involved a total of 2,342 patients with RRMS showed that expected normalized brain volumes can be calcu lated from demographic (age and sex), clinical (Expan ded Disability Status Scale score and disease duration) and neuroradiological (T2weighted lesion volume) parameters for individuals. Deviation of the true brain volume from this expected value enabled classification of individuals with MS as having low, medium or high brain volume. Patients with low brain volume had a 2.4fold higher risk of disability progression over the next 2 years than patients with high brain volume.

Spinal cord atrophy measures to define and predict MS severity.
Early, seminal studies of cervical cord atrophy in MS already suggested that cervical spinal cord area is an important marker of disability status in MS 9 . Further studies demonstrated that spinal cord area and volume are affected differently in different MS subtypes, with the most profound atrophy in cross sectional studies being seen in patients with progressive MS [36][37][38][39] . Since 2015, an association between reduced cervical cord area and increased disability and motor dysfunction, indepen dent of brain atrophy, has been confirmed [40][41][42][43] . An asso ciation between cord atrophy and reduced peripapillary retinal nerve fibre layer thickness has been identified, indicating that cervical cord atrophy reflects, at least in part, global pathological processes and not only specific damage of long tracts 41 . Most studies of spinal cord area have focused on global cervical cord area measurements, but some work has highlighted that damage in particular locations in the spinal cord, such as cervical grey mat ter 44 , the thoracolumbar segment 45 and the posterior and lateral cord segments 46 , are also relevant to disability.
Longitudinal studies indicate that atrophy rates in the spinal cord are higher than those in the brain and higher volume 16 | march 2020 | 173 NaTure revIewS | NeuroLogy in progressive MS than in established RRMS 47,48 . Higher rates of cervical cord area loss have been associated with disability progression, independent of other clinical and MRI parameters 30,47 including spinal cord lesions 49 . However, as for brain atrophy, use of such group level evidence to inform clinical decisions at the individual level is not easy. Results that can be used at the individ ual level are slowly emerging; for example, Tsagkas et al. 43 have shown that a 1% increase in the annual rate of spinal cord atrophy increases the risk of disability pro gression by 28%, reinforcing the notion that spinal cord atrophy is a reliable and independent tool for monitoring disease progression.
Regional and tissue-specific brain volumetry measures to define and predict MS severity. Early cross sectional studies of brain white matter and grey matter changes in patients with MS indicated that both white mat ter and grey matter loss occurred early in the disease course, regardless of disease phenotype [50][51][52][53] . Evidence also indicates that grey matter damage can occur before white matter atrophy and can occur independently of white matter lesions [54][55][56] . Results of further longitudinal studies have identified larger decreases in grey matter volumes than in white matter volumes [57][58][59] and that grey matter damage is more relevant than white matter injury to clinical outcomes, both concurrent and forthcom ing 56,[60][61][62] . Two studies -one in which cortical thickness was estimated 63 and one meta analysis of voxel based morphometry studies 64 -have revealed statistically significant associations between disability end points and grey matter atrophy 65 , which occurs bilaterally, pre dominantly in the cingulate, pre central and/or post central gyri and the thalami and basal ganglia. Despite these results, global brain volume changes seem to be more strongly associated with clinical outcomes than are regional changes. This observation is unexpected because grey matter loss is thought to underlie disability accumulation. Associations between grey matter volume change and disability accumulation might be masked by the effects of high variability of regional segmenta tions, which makes clinical application of these regional measures inadvisable at present 62,66 .
Statements and recommendations 1. We recommend measurement of global brain volume to better gauge global disease burden in patients with MS because brain volume loss is associated with and predicts disability in all clinical MS phenotypes, including the earliest stages of the condition. 2. We recommend measurement of cervical cord area loss because this measure is associated with and pre dicts disability in all clinical MS phenotypes, including the earliest stages of the condition. 3. Grey matter volume changes in the brain are more pronounced and clinically relevant than white matter volume changes, even in the earliest stages of MS, but their exact relevance in clinical practice is unclear. We recommend further research to clarify this relevance. 4. Some cerebral grey matter regions (including the thalami, basal ganglia and specific cortical areas) are affected particularly strongly by atrophy in MS, but whether the pathological involvement of these areas is relevant in clinical practice remains unclear. We recommend further research to determine the clinical relevance of atrophy in these regions.

Monitoring therapeutic effect Evidence review
Brain volume as an outcome measure in randomized clinical trials. Many trials of disease modifying thera pies for MS have included brain atrophy as an outcome measure (Table 1). Most early studies of interferonβ (IFNβ) and glatiramer acetate did not include preplanned brain volume measures as secondary MRI outcomes. Those that did include a sound comparison of brain volume changes between intervention arms or between intervention and placebo arms produced mixed results 67 .
The only study of IFNβ that provided evidence for a positive effect of treatment of brain atrophy was the ETOMS trial 68 . In this study, accrual of atrophy was reduced by 30% in patients with clinically isolated syn dromes who received low dose subcutaneous IFNβ1a compared with patients who received placebo 68 . In several trials -particularly the trial of intramuscular IFNβ1a in RRMS 69,70 -negative results were at least partly attributed to a pseudoatrophy effect, caused by brain volume loss linked to the presumed treatment associated resolution of inflammatory activity and oedema. In the RRMS intramuscular IFNβ1a trial, significant differences that favoured treatment with IFNβ1a were only observed in the second year 69,70 . A post hoc analysis of grey matter and white matter atrophy during the 2 years of the trial confirmed this find ing and indicated that pseudoatrophy of white matter contributed most to the observed effect 71 . The same effect has been described in observational studies of patients taking natalizumab 72 or IFNβ 73 , although more research is needed to confirm these findings. Results with glatiramer acetate were also mixed, though some nonprimary analyses have suggested a positive effect of the treatment in patients who received glatir amer acetate from the beginning of the trial when compared with those who received the treatment later 74 . Trials of IFNβ and of glatiramer acetate in progres sive MS have been negative 75 or have also suggested a pseudoatrophy effect 76 .
Trials of natalizumab provided a clear demonstration of pseudoatrophy. In the AFFIRM trial 77 , brain volume decreases among patients who received natalizumab were larger in the first year than among patients who received placebo, but the observation was reversed in the second year. Subsequent clinical trials of newer drugs (including fingolimod, dimethyl fumarate, ter iflunomide, ocrelizumab and alemtuzumab) have all incorporated brain volume measures as secondary or tertiary outcomes, and results have been positive over all 78 , although studies are not readily comparable. Of note, in studies of powerful anti inflammatory drugs against active comparators, the trial drugs have been superior at decreasing accrual of atrophy [79][80][81] , indicating that the pseudoatrophy effect can be overcome by the beneficial effects of anti inflammatory drugs on neuro degeneration in MS. Strategies to minimize the effect of pseudoatrophy on clinical measures include, but are not restricted to, obtaining baseline measurements once the anti inflammatory effect is well established (for example, re baseline with MRI at 6 or 12 months after treatment initiation) 82,83 .
Further support for the clinical relevance of brain vol ume outcomes in trials of treatment for RRMS comes from a meta analysis that included >13,500 patients from 13 different clinical trials 84 . The conclusion of the analy sis was that the effect of a given therapy on changes in brain volume over 2 years is associated with the effect of the drug on disability outcomes and that this association is, at least in part, independent of its antiinflammatory effect on active MRI lesions 84 . This close association between brain atrophy and disability outcomes in clinical trials has driven the adoption of brain volume change as a primary outcome in phase II trials in cohorts of patients with progressive MS 85,86 .

Spinal cord atrophy as an outcome in randomized clinical trials.
Despite the relevance of spinal cord atro phy to longterm disability, this measure has scarcely been used as an outcome in clinical trials 87 ; when it has been used, the results have been negative. For example, spinal cord atrophy was an outcome measure in an investigator initiated study of lamotrigine for neuroprotection in secondary progressive MS, but no differences were seen between the treatment and placebo arms 88 . Spinal cord atrophy measures have been used in several other studies in progressive MS 89 but the atrophy and clinical results  76,90 or were not published with the rest of the trial 91 .

Brain volume and spinal cord atrophy to monitor clinical treatment response.
The relevance of brain volume measures to the evolution of disability in MS clinical trials is beyond any doubt 84 . The evidence from trials is complemented by that from studies of individual level data from clinical trials 92,93 and from observational studies of real world cohorts 25,94 , which confirm a close association between brain volume changes with therapy and concurrent 95 or subsequent 96 disability progression. These studies also indicate that the association between brain volume loss and dis ability progression is independent of clinical and MRI inflammatory markers.
Most models for the prediction of disability progres sion have included brain volume change combined with either the appearance of new T2 lesions or the presence of clinical relapses 25,[92][93][94] . Brain volume changes have also been proposed as an addition to the 'no evidence of disease activity' 97,98 outcome measure so as to enable assessment of neurodegenerative processes as well as inflammatory processes, with the aim of achieving full remission that includes an absence of disease specific neurodegeneration; the proposed cut off for this mea sure is -0.4% change in volume per year 24 . In a poten tially more realistic 'minimal evidence of disease activity' approach 99 , a less stringent cut off has been suggested that would allow for pseudoatrophy driven brain vol ume loss 25 . However, all these data need confirmation, and different cut offs might be needed for different calculation methods and for different drugs or groups of drugs according to different temporal patterns of brain volume effects of each drug 6,78 .

Statements and recommendations
1. We recommend the use of whole brain atrophy over a minimum period of 12 months as a secondary end point in clinical trials in MS and even as a pri mary outcome measure in trials in the progressive forms of MS to show the effects of the drug on the neurodegenerative component of the disease. 2. Ongoing and forthcoming trials are expected to include grey matter volume loss as an outcome mea sure, as atrophy in the grey matter compartment is more substantial and more clinically relevant than atrophy in the white matter and is likely to be affected less by pseudoatrophy; however, data on pseudoatro phy remain discordant and we recommend further research to clarify the contribution of grey matter atrophy. 3. Pseudoatrophy effects mostly occur within the first 6-12 months from treatment initiation with any anti inflammatory therapy, so we recommend re baseline MRI at 6-12 months after initiation of any therapy to mitigate the impact of pseudoatrophy on outcome measures. 4. Associations between treatment effects on brain volume and disability have been demonstrated in clinical trials and indicated by evidence at the indi vidual level, but we recommend further research to confirm these associations before brain volume can be considered for use as a treatment monitoring tool. 5. Use of spinal cord atrophy as a treatment monitoring tool in clinical trials and in clinical practice has been scarce, but the rate of spinal cord atrophy is faster than that of brain atrophy and methodological advances could improve reproducibility and reliability, so we recommend further research to establish the role of spinal cord atrophy for treatment monitoring.

Barriers to clinical implementation Evidence review: technical barriers
Several technical aspects of image acquisition and quan tification can affect the measurement of brain and spinal cord volumes and thereby affect the accuracy of estimated values. These technical barriers are discussed below.
Acquisition protocols. The choice of the acquisition parameters (usually repetition time, echo time, inversion time or flip angle) is usually based on the image con trast, as assessed visually by an expert neuroradiologist. Changes in scan parameters, which tend to happen in a clinical environment, affect quantification and hamper reliable cross sectional and longitudinal comparisons. Image contrast also depends greatly on the age of the population that undergoes MRI. The Alzheimer's Disease Neuroimaging Initiative 100 has made a large effort to homogenize acquisition protocols across vendors.

Gradient distortion.
By design, the gradients applied to the magnetic field in MRI are generally not uni form, which affects the geometry of the image. Small displacements of the patient's head in the z axis have a notable effect on the estimated brain volume change 101 .
Positioning of the patient identically across scanning sessions can minimize this effect, but this is time consuming and difficult; a better solution is to apply approaches developed by MRI scanner manufacturers for 3D correction for the gradient nonlinearity effect 102 .
Intrascanner variability. Any MRI derived measure is inherently variable, even when technical and physio logical conditions are controlled [103][104][105][106][107][108] . Global estimates, such as that of the whole brain volume, are the least variable (<1%) 106 , whereas measures of smaller struc tures, such as the amygdala, are much more variable (~5%) 104,105 . Such variability must be taken into account because changes that are smaller than the estimated variability cannot be reliably detected. This limitation is highly relevant to small grey matter structures and when follow up periods are short because the expected change is small 23 .

Movement.
Movement of the patient during image acquisition generates characteristic artefacts that affect image quality; as a result, estimated volumes are substan tially decreased 109 . Visual verification of image quality is important because the problem is resolved when the only images included in an analysis are those that an expert considers artefact free 109 . Various approaches have been developed to correct for movement, but an accurate method is still not available 110 .

Scanner system upgrades and interscanner variability.
Scanner upgrades are unavoidable, particularly during the course of longitudinal studies, and can affect the image contrast even if the same acquisition parameters are used. Previous studies have shown that the system upgrade should be included as a variable in the statis tical analysis 103,111,112 . Quantification methods based on the subtraction of images, rather than on differences in brain parenchymal fraction between two time points, seem to be more sensitive to system upgrades 113 , although no studies have been performed to confirm this observation. Reliable quantification of longitudinal changes in MS requires scans to be acquired with the same magnet and exactly the same sequence protocol. Variability between different scanners is higher than all the factors above together 108 . If data acquired in different scanners need to be merged, a variable that accounts for the scanner should be taken into consideration.

Evidence review: confounding factors
Numerous factors can have confounding effects on the quantification of brain volume (and its changes) and thereby cause overestimation or underestimation 114 . These factors are discussed below.
Age, sex and brain size. Several physiological factors influence brain volume estimations in healthy indi viduals. Studies of healthy elderly individuals have demonstrated ongoing brain volume loss, which tends to accelerate with age 115 . This age related effect is par ticularly pronounced for specific CNS structures, such as the hippocampus 116 . Sex is another key factor in brain volume changes. Sex differences in global brain size in humans are well established; on average, the total volume of men's brains is ~10% larger than that of women's brains 117 . Differential patterns of age related brain volume loss 118 and sex specific differences in brain morphology have also been demonstrated 119,120 . Global and regional volumetric stud ies have suggested that hormonal status can contribute to these sex related differences 121 .

Diurnal fluctuations and hydration state.
Studies of healthy individuals have shown that estimations of brain volume fluctuate with the time of scanning and the hydra tion state of the individuals. Analysis of MRI data from patients with MS (n = 755, 3,269 scans) and from participants in the Alzheimer's Disease Neuroimaging Initiative (n = 834, 6,114 scans) revealed that time of day had a notable effect on estimates of the brain parenchy mal fraction in both groups. Brain volumes were sub stantially larger in the morning 122 , and the effect size was comparable to the yearly rate of brain atrophy in MS and in healthy elderly people 122 . Similarly, in studies in which hydration status was manipulated by overnight thirsting and subsequent drinking of water, hydration related changes in brain volume were as large as -0.55% on dehydration and +0.72% on rehydration 123 .

Lifestyle and risk factors.
Many lifestyle factors, includ ing physical activity 124 , influence estimates of brain vol ume. A higher level of alcohol intake has been associated with a higher rate of brain atrophy over a 6year period 115 and with a specific pattern of regional involvement of the white matter and grey matter 125 . A similar effect has been described for cigarette smoking and substance abuse (for example, marijuana use) 115,126 . Many systemic conditions, such as diabetes, chronic kidney disease, hypertension, obesity and vascular conditions can also accelerate brain atrophy 115,127,128 .
The MS brain. All confounding factors previously discussed can interact with features of MS and affect estimates of brain atrophy in patients with the disease; these interactions can also affect comparisons between groups. For instance, more severe brain atrophy has been observed in patients with MS who have one or more cardio vascular risk factors 129 , although their impact on longitudinal assessments might be limited, as vascu lar risk factors were not associated with greater brain volume loss during 3.5 years of follow up in the same study 129 . In addition, white matter lesions in MS influ ence the accuracy of most available software for estima tion of atrophy because they alter the image intensity histogram and influence the detection of intensity bor ders between grey matter, white matter and cerebrospi nal fluid (CSF). This effect can be minimized by use of lesion filling techniques 130,131 , which enable replacement of lesions in the image with voxels that have intensities that closely resemble normal appearing white matter.
Pseudoatrophy. As discussed above, studies of the cor relation between inflammatory disease activity (new T2 and/or gadolinium enhancing lesions) and brain volume have shown that inflammation can cause a transient increase in brain volume. This increase can dramat ically resolve following treatment with steroids 132 or other disease modifying drugs, and the resultant reduc tion in brain volume can be erroneously interpreted as atrophy 133 .

Evidence review: volumetry tools
Several free touse online libraries of software for neuro imaging analyses include fully automated pipelines for quantification of brain volume (Table 2). On the basis of the current literature that relates to this software, these software tools can be classified into two broad catego ries. The first are 'segmentation based' tools, which use a priori localization related and intensity information to classify the brain voxels of each MRI without using information from brain MRI images taken at different time points. These tools do not enable direct evaluation of volumetric changes over time. This type of software is mostly used in cross sectional analyses. The second are the 'registration based' tools, which enable comparison of brain MRI images from the same individual acquired over time and are based on an initial registration step; this type of software is used in longitudinal analyses 134 .
Most segmentation based software packages provide measures of total brain volume, grey matter volume and white matter volume based on the partial volume estimation (PVE) of each tissue in each voxel. The initial step is assignment of the PVE to a given brain voxel on the basis of its intensity and the intensities of the surrounding

Brain parenchymal fraction
The percentage of the intracranial volume that is occupied by brain parenchyma, calculated as the total intracranial volume minus the volume of cerebrospinal fluid divided by the total intracranial volume. volume 16 | march 2020 | 177 NaTure revIewS | NeuroLogy voxels 113 . To improve the segmentation, the a priori spa tial information for each voxel can be included, thereby increasing the probability that each voxel belongs to spe cific tissue type on the basis of its location 135 , although the accuracy of this step strongly depends on the ana tomical similarity between the MRI image and the a priori tissue maps used. To avoid problems due to an anatomical mismatch with the atlas, only MRI images with high anatomical similarity should be used to pro vide the voxel location information 136 . Use of different anatomical maps, such as probability maps of tissues or structure labelling maps, can also offer improvements 137 . Other approaches that do not depend on the PVE can provide a measure of cortical thickness by calculating the distances between pairs of voxels at the grey matterwhite matter and grey matter-CSF interfaces perpendic ular to the grey matter-white matter surface interface.

Partial volume estimation
These methods tend to be more susceptible than some of the previously mentioned methods to the low intensity contrast between tissues because they heavily rely on the gradient intensities between tissue interfaces 138,139 .
Registration based software packages provide mea sures of total brain volume, grey matter volume and white matter volume changes by comparison of serially acquired MRI images from the same individual. A common prelim inary step in most of these procedures is registration of all MRI images from the same subject on the same virtual space. The first such software packages that were used in longitudinal analyses 113,140 involved registration of two MRI images of the same individual and measurement of whole brain volume change by analysing the shift of the parenchyma-CSF border over time. Newer approaches apply different methods to enable assessment of grey matter and white matter volume changes. In one, for each voxel, the intensity information from neighbouring voxels at each time point is used 141 . In another, a new intensity harmonization scheme is applied to all MRI images from one individual, with the aim of assigning similar inten sity to voxels with similar content of PVE 142 . Another approach, known as the Jacobian integration method 143,144 , is based on local assessment of relative volumetric differ ences between two MRI images of the same individual, one of which is usually the baseline image; the net sum of all local volumetric changes provides an estimate of total volume changes over time. Finally, cortical thickness changes can be detected by the use of a withinsubject tem plate (an MRI image created by merging all MRI images from one individual) to improve cortical thickness esti mation at each time point, or by fitting a subjectspecific cortical deformable model at each time point 145,146 . Assessment of spinal cord atrophy is more difficult than brain segmentation owing to particular anatomical (higher mobility and smaller dimensions than the brain) and imaging (lower tissue contrast) features of the spinal cord. Semiautomated (Cordial) 147 and automated (Spinal Cord Toolbox) 148 tools have now been developed, based on deformable models. These promising new software tools still need to be extensively validated on independent datasets before they can even be considered for use in clinical practice.
Academic software packages have important advan tages over commercial software packages, such as the fact that they have been validated in many studies under a plethora of different MRI conditions over the past dec ade. However, they have the severe limitation of being highly technically demanding and their use is therefore limited to centres that are specialized in MRI processing. In addition, clinical application of software to support diagnosis or care is only permitted with products that have received the "Conformité Européenne" (CE) mark in Europe or FDA clearance in the USA. For this reason, translation of imaging analysis software tools to clini cal practice is challenging and almost unfeasible for academic neuroimaging laboratories.
In the past 10 years, several companies have proposed centralized MRI reading services, often using their in house software for quantification of atrophy (Table 2). Four software packages have been approved for use in Europe and three of these have also received FDA clear ance in the USA. The IcoBrain MS (Icometrix, previ ously MSmetrix) 149 quantifies cross sectional volumes with software based on Nifty Seg and quantifies longitu dinal changes in grey matter and white matter with soft ware that implements Jacobian integration. NeuroQuant (CorTechs Labs) 150 provides both cross sectional and longitudinal quantification of atrophy 151 , building on approaches already developed by previous methods 138 . Biometrica MS (Jung Diagnostics) builds on develop ments of Statistical Parametric Mapping, a software library for neuroimaging analysis, for atrophy measure ment and of Lesion Segmentation Tool software for auto matic lesion segmentation 152,153 . Quantib Brain (Quantib) is a platform that is integrated into the General Electric MRI scanner and can assess cross sectional brain vol umes and longitudinal changes in volume. IcoBrain MS and Biometrica MS are offered as remote analysis services, Quantib Brain can be run locally or on a vendor console (General Electric), and NeuroQuant can be a remote analysis service or local installation. All packages have the CE mark and, with the exception of Biometrica, FDA clearance. These certifications guarantee standard ization of procedures and results, meaning the software can be used as medical devices.
Importantly, the companies must provide the magni tude of the error in their results, and health care profes sionals should use this information to validate or discard findings of analyses. All four commercial software pack ages have been evaluated scientifically to some extent but not completely. To our knowledge, only MSmetrix has been validated by an independent group in the context of MS 154 . Furthermore, the real world clinical value of these software packages has not yet been assessed, and the pro cedures are not widely reimbursed (with a few exceptions, such as in the USA). Although promising, these analyt ical approaches should therefore be more extensively validated by expert groups in the field of MRI preprocess ing, especially in the context of MS 134 , before they can be considered for use in the routine clinical setting.
Statements and recommendations 1. We recommend appropriate management of several scanner related factors (including, but not limited to, variation in acquisition protocols, different scanner systems and upgrades, movement artefacts and gra dient distortions) to ensure reliability of brain volume estimates, particularly at an individual patient level. 2. We recommend appropriate management of physio logical and MS related factors (including, but not limited to, age, sex, hydration status, time of day, steroid use and MS related parenchymal alterations). 3. Brain volume measures are software dependent so the use of software that has been approved as a med ical device and independently evaluated in MS is a prerequisite; we recommend further research to vali date existing software tools in MS and assess their clinical value.

Conclusions and future directions
Based on the evidence reviewed, the idea that brain vol ume changes and, to a lesser extent, spinal cord atrophy are helpful predictors of the evolution of MS before ini tiation of therapy is undisputed, so these measures could be valid treatment decision tools. The evidence reviewed also supports the idea that brain volume measures have value in monitoring the effects of MS drugs as part of the no evidence of disease activity outcome measure or minimal evidence of disease activity outcome measure. However, several potential sources of substantial error remain, including, but not limited to, differential effects of drugs on brain volume measures, confounding physio logical and technical factors and the performance and value of volumetric tools. To make implementation of volume measurements in clinical practice feasible, these potential sources of error need to be accounted for and appropriately managed, and further research is needed to ensure the accuracy and reliability of the measurements.
Published online 24 February 2020 Deformable model a mathematical construct capable of representing a broad range of shapes that can be used in Mri image registration or segmentation.

Nifty Seg
an open-source image analysis tool developed at University College london to perform segmentations of Mri images.