Multiparametric magnetic resonance imaging for quantitation of liver disease: a two-centre cross-sectional observational study

LiverMultiScan is an emerging diagnostic tool using multiparametric MRI to quantify liver disease. In a two-centre prospective validation study, 161 consecutive adult patients who had clinically-indicated liver biopsies underwent contemporaneous non-contrast multiparametric MRI at 3.0 tesla (proton density fat fraction (PDFF), T1 and T2* mapping), transient elastography (TE) and Enhanced Liver Fibrosis (ELF) test. Non-invasive liver tests were correlated with gold standard histothological measures. Reproducibility of LiverMultiScan was investigated in 22 healthy volunteers. Iron-corrected T1 (cT1), TE, and ELF demonstrated a positive correlation with hepatic collagen proportionate area (all p < 0·001). TE was superior to ELF and cT1 for predicting fibrosis stage. cT1 maintained good predictive accuracy for diagnosing significant fibrosis in cases with indeterminate ELF, but not for cases with indeterminate TE values. PDFF had high predictive accuracy for individual steatosis grades, with AUROCs ranging from 0.90–0.94. T2* mapping diagnosed iron accumulation with AUROC of 0.79 (95% CI: 0.67–0.92) and negative predictive value of 96%. LiverMultiScan showed excellent test/re-test reliability (coefficients of variation ranging from 1.4% to 2.8% for cT1). Overall failure rates for LiverMultiScan, ELF and TE were 4.3%, 1.9% and 15%, respectively. LiverMultiScan is an emerging point-of-care diagnostic tool that is comparable with the established non-invasive tests for assessment of liver fibrosis, whilst at the same time offering a superior technical success rate and contemporaneous measurement of liver steatosis and iron accumulation.

reduce the economic burden of liver disease and save lives 5 . Despite this, existing diagnostic pathways for detection and onward referral of suspected CLD in primary care are based on traditional liver enzyme tests, which lack accuracy and contribute to late diagnosis, whilst staging of liver disease in secondary care and evaluation of drug efficacy in clinical trials remains anchored to the liver biopsy. Liver biopsy is commonly used to assess liver disease but has drawbacks, including sampling variability (especially in liver conditions where histological changes are patchy), interobserver disagreement and a potential risk of complications, including patient discomfort and anxiety 6 . A lack of robust, validated non-invasive tests has hindered drug development efforts for CLD; suitable alternative methods to biopsy could enable trial enrichment and/or read out as an early efficacy signal or surrogate endpoint.
Transient elastography (TE) (FibroScan; Echosens, Paris, France) is an established non-invasive test with good diagnostic accuracy for ruling out advanced fibrosis or cirrhosis (≥Metavir stage F3) and emerging prognostic capability 7 , but there is an average failure rate of 18·4% 8 and cut-off values of liver stiffness for the different stages of liver fibrosis are not well established 9 . Moreover, both operator-related and patient-related factors produce significant variations in liver stiffness measurements, limiting its potential use for monitoring disease progression 10 .
A range of serological tests are available for the assessment of liver fibrosis. These include simple marker panels based on routine blood tests with or without clinical parameters (e.g. AST to platelet ratio index (APRI) and Fibrosis-4 (FIB-4) test) and more complex fibrosis-orientated marker panels (e.g. Enhanced Liver Fibrosis (ELF; iQur Ltd., London, UK) test and FibroTest (BioPredictive, France) which are analysed using patented algorithms. Although these serological tests have been validated in many chronic liver diseases and may predict clinical outcomes 11,12 , they fail to classify a significant proportion of patients who fall into the 'grey zone' of indeterminate values.
Multiparametric MRI techniques have shown promise for the quantitative assessment of many chronic conditions in the heart, breast, prostate and musculoskeletal system, obviating the need for invasive tissue characterisation in many patients 13 . In the liver, MRI proton density fat fraction (PDFF) has been shown to be highly accurate and reproducible for the detection and quantification of hepatic steatosis, independent of field strength, and can detect changes in hepatic fat as small as 1% 14,15 . Magnetic resonance elastography (MRE) is a phase-contrast MRI technique that measures liver stiffness as a surrogate of fibrosis. MRE has high accuracy for the diagnosis of significant liver fibrosis and cirrhosis, but it is not yet known whether it is sufficiently sensitive or dynamic for the longitudinal monitoring of fibrosis progression/regression 16 . LiverMultiScan (LMS) (Perspectum Diagnostics Ltd., Oxford, UK) is a proprietary, CE-marked, Food and Drug Administration (FDA) 510(k) cleared multiparametric MRI methodology encompassing measurement of hepatic fibro-inflammatory injury, fat and iron 17 . A derived Liver Inflammation and Fibrosis (LIF) score has been shown in a single-centre CLD cohort to correlate with histological fibrosis staging, and a LIF score <2 had a NPV of 100% for a clinical outcome over a median follow up of 27 months 18 . We conducted a larger, independent, two-centre prospective validation study in an unselected secondary care population, with the primary objective of evaluating the ability of LMS to accurately measure hepatic inflammation and fibrosis, fat and iron compared to liver biopsy as the reference standard. The secondary objectives were to assess the performance of LMS in detecting clinically significant liver disease, to compare its diagnostic performance to TE and ELF, and to determine the reproducibility and repeatability of the technique.

Methods
Study design and population. This was a prospective two-centre validation study that represented level 1b evidence for diagnostic test assessment 19 and reported to the Standards for Reporting Diagnostic Accuracy 20 . 161 unselected consecutive adult patients booked for a standard-of-care liver biopsy to investigate known or suspected liver disease, including patients post liver transplantation, were included. Data collection took place at two large tertiary UK liver centres (Queen Elizabeth Hospital Birmingham and Royal Infirmary of Edinburgh) between February 2014 and September 2015. Patient exclusion criteria were inability or unwillingness to give fully informed consent, any contraindication to MRI, and liver biopsy targeted at a focal liver lesion. Participants underwent contemporaneous multiparametric MRI (LMS), TE and analysis of blood biomarkers including the ELF test, followed by liver biopsy performed within 2 weeks of non-invasive assessments. Reference MRI data were also collected from 22 male and female adult healthy volunteers with no known liver disease and body mass index (BMI) <25 kg/m 2 . All study investigations were performed in a fasted state (minimum of four hours).
The study was conducted in accordance with the ethical principles of the Declaration of Helsinki 2013 and Good Clinical Practice guidelines. It was approved by the institutional research departments and the National Research Ethics Service (14/WM/0010). The study was registered with the ISRCTN registry (ISCRTN39463479) and the National Institute for Health Research portfolio (15912). All patients and volunteers gave written informed consent.
Histological analysis of liver biopsy samples. All biopsies were reported by four independent expert liver histopathologists, and adequacy assessed using the definition of the Royal College of Pathologists 21 . All biopsies were staged for fibrosis using modified Ishak score (MIS) (scale 0-6; Supplementary Table 1) and collagen proportionate area (CPA) was calculated by morphometry after Picrosirius red staining, as previously described 22 . Liver inflammation was graded independently (none/minimal, mild, and moderate/severe) from histopathology reports by two assessors (NM, PJE) blinded to patient characteristics and non-invasive assessment data. Discordance was adjudicated by a third blinded observer (TJK). Liver fat was graded 0-3 based on the percentage of hepatocytes in the biopsy containing a fat globule: 0 (<5%); 1 (5-32%), 2 (33-66%), and 3 (>66%) 23 . Liver iron was detected using Perls' stain and semi-quantified using a five-tier grading system (0: no iron deposition to 4: severe iron deposition) 24 . As there is considerable interobserver variation in liver biopsy reporting, 45 Scientific REPORTS | (2018) 8:9189 | DOI:10.1038/s41598-018-27560-5 randomly-selected biopsies were independently re-scored by two liver histopathologists (SH, TJK) blinded to the clinical data and previous pathology reports, to generate five observer pairs. Magnetic resonance imaging and image analysis. Non-contrast MR acquisitions were performed with the patient in the supine position using a 3.0 tesla Siemens Verio MRI scanner system (Siemens Healthcare GMBH, Erlangen, Germany). MRI operators and image data assessors were blinded to the indication for liver biopsy and to the patients' clinical details. MRI acquisition protocols (transverse abdominal T1 and T2* maps, proton density fat fraction (PDFF, %)) and image analysis were performed as previously described 17 , but with a shorter acquisition time of approximately 10 minutes. Great care was taken to optimise the acquisition protocol to minimise the impact of breathing motion and other artefacts. This included extensive training of the MR technicians on healthy volunteers and strict quality control throughout the process. Full details of the MR protocol and reproducibility and repeatability studies are provided in Supplementary Methods. Blood analysis. All patients had full blood count, coagulation profile, serum biochemistry and ELF test measured prior to liver biopsy. Transient elastography. One-dimensional TE was performed using FibroScan by fully trained and certified operators, using either an M or XL probe for obese subjects to obtain ten valid readings, with a success rate of at least 60% and IQR <30% of the median result. XL probe was used in 81 patients (50% of study cohort). Controlled attenuation parameter (CAP) estimation was unavailable at the Royal Infirmary of Edinburgh.
Sample size calculation. Based on data from a previous study, the distribution of patients across the 4 groups (Ishak 0, 1-2, 3-4 and 5-6) was 9%: 52%: 17%: 22%. The pooled value for the difference in cT1 between sequential groups in this data were found to be approximately 90 ms. Due to the large differences in the standard deviations across the groups, the study was powered on the "worst case" pairwise comparison, based on the combination of the observed standard deviation and the proportional sample size. This was between Ishak 3,4 (SD = 57, 17% of patients) and Ishak 5,6 (SD = 90, 22% of patients).
For a comparison between these groups using an alpha level of 0.8% (i.e. 5% after adjustment for 6 comparisons), sample sizes of 12 and 22 for Ishak stages 3,4 and 5,6 respectively would be sufficient to detect a difference of 90 ms in cT1 at 80% power. Assuming that the distribution of cases was similar to the previous study, this meant that a sample size of 100 patients (9, 52, 17 and 22 in the four groups) would be sufficient to detect a difference between groups of 90 ms at 80% power and with 5% alpha. We targeted a total recruitment number of 150 to account for the ~10% of biopsies that yield inadequate samples for analysis and participant non-attendance.
Statistical analysis. Repeatability of the MRI data was assessed using Bland-Altman (B-A) plots, 95% limits of agreement, Pearson correlation coefficients, and paired t-tests, to assess the level of bias. The mean coefficient of variation (CoV) was also calculated, as the average of the CoV for each patient. Interobserver agreement for biopsies was assessed using quadratic weighted Kappa 25 . Associations between continuous and ordinal variables were assessed using the Jonckheere-Terpstra test of trend, followed by pairwise post-hoc tests, where significant. For comparisons between two groups, Mann-Whitney tests were used, whilst associations between two continuous variables were assessed using Spearman's correlation coefficient (r s ). Multivariable analysis was performed to assess the relationship between fibrosis and the diagnostic tests, after accounting for the effect of inflammation, using two-way ANOVA with inflammation, fibrosis and an interaction term as factors. TE values were log-transformed, prior to analysis, to improve model fit.
Diagnostic performance of tests was assessed using receiver operating characteristic (ROC) curve analyses. Where applicable, the areas under the ROC curves (AUROCs) were compared using the "roccomp" command in Stata 14 (StataCorp LP., College Station, TX), with significance tests followed by Bonferroni-adjusted post-hoc pairwise comparisons. Net reclassification index (NRI) was calculated as a percentage difference between cases correctly classified by the reference and alternative tests. Statistical analysis was performed using IBM SPSS Statistics for Windows version 22 (IBM Corp, Armonk, NY) and GraphPad Prism version 6.0 (GraphPad Software, USA). Variables were summarised as means (±standard deviation (SD)) if normally distributed and as medians with interquartile range (IQR) if not. A p-value of less than 0·05 was considered statistically significant. Data availability. The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Results
A total of 161 patients consented to participate and 156 were biopsied (Fig. 1). Seven biopsies (4·5%) were considered inadequate for interpretation and excluded from further analysis. Median biopsy length after processing was 25 mm (IQR 22-28). Interobserver agreement for histological assessment of liver fat, fibrosis and iron for the 45 samples that were reassessed was 71% (kappa = 0·81, almost perfect), 38% (kappa = 0·66, substantial), and 82% (kappa = 0·24, fair), respectively (Supplementary Table 2). The kappa statistic for iron was low, relative to the percentage agreement, due to the fact that most patients were in the 'no iron deposition' group, resulting in low inter-case variability 26 .
Baseline demographic and clinical characteristics of study participants with adequate biopsies are summarised in Table 1 affected 2·5% (n = 4) of patients, the remaining 96% (n = 154) all had valid cT1 and T2* data for assessment of fibro-inflammatory injury and liver iron, regardless of BMI. PDFF data were not acquired in the first 48 patients (31%). After updating the protocol, the subsequent 106 (69%) patients all had valid liver fat measurements. The MRI repeatability and reproducibility data are summarised in Table 2, showing significant inter-scan and inter-subject agreement and no evidence of a post-prandial effect. Subsequent to this study, we have performed an independent clinical evaluation of the test-retest performance of LMS using both healthy controls and patient volunteers (n = 46) (unpublished data). Consistent with the results reported in this manuscript, the 95% limits of agreement for the mixed healthy control/patient population was −60.5 ms to 49.5 ms with a bias (±SD) of −5.5 ms (±28.1 ms) and a mean CoV of 2.0%. The B-A plot showed no evidence that the test-retest performance is different in patients vs. controls.
The TE failure rate was 15% (n = 24) overall, 7·7% for patients with BMI < 30 and 25% for patients with BMI ≥ 30. One patient was unsuitable for TE due to the presence of ascites detected by MRI. Three patients (1·9%) had unavailable ELF data.
On multivariable analysis, after accounting for the effect of inflammation, ELF (p = 0·011), cT1 (p = 0·002) and TE (p < 0·001) remained significantly associated with fibrosis stage (Fig. 2). In addition, ELF was found to increase with the level of inflammation (p < 0.001), with values being 0·6 (SE = 0·3, p = 0·027) higher in patients with mild inflammation, and 1·2 (SE = 0·3, p < 0·001) higher in patients with moderate/severe inflammation,   Table 3. Diagnostic accuracy of multiparametric MRI, ELF and TE in detecting any, moderate-to-severe and severe liver fibrosis. Se, sensitivity; Sp, specificity; NPV, negative predictive value; PPV, positive predictive value. All AUROCs were significant at p < 0·001. *p-values comparing the AUROCs of multiparametric MRI vs. ELF vs. TE for the n = 117 patients with data available for each measure. Significant comparisons (bold) were followed by post-hoc pairwise comparisons, the p-values of which were Bonferroni corrected to account for multiple comparisons. These comparisons found TE to be significantly superior to both multiparametric MRI and ELF test in the detection of MIS ≥ 3 (p = 0.022, 0.005) and MIS ≥ 5 (p = 0.003, < 0.001). **TE cut off levels not included as they are specific to aetiology of liver disease. compared to those with no/minimal inflammation. No significant interaction effect between inflammation and fibrosis was detected for ELF (p = 0·641). For TE, a significant interaction effect between inflammation and fibrosis was detected (p = 0·050). As shown in Fig. 2, the increase in TE with fibrosis was similar for patients in the no/minimal and mild inflammation groups. However, for patients with moderate/severe inflammation, the increase in TE by MIS was not observed, with a geometric mean TE of 13·3 (95% CI: 5·4-9·2) for MIS 1-2 and 17·5 (95% CI: 13·3-23·1) in MIS 5-6. A similar interaction effect was also observed in the analysis of cT1 (p = 0·050), with the magnitude of the increase with MIS becoming smaller with increasing levels of inflammation.

Comparative performance of non-invasive tests to detect clinically significant liver disease.
Median (IQR) cT1 and LIF score for the 22 healthy volunteers was 761 (741-811) ms and 0.76 (0.60-1.15) respectively, which was similar to the median cT1 and LIF score in patients with no evidence of liver disease on liver biopsy (cT1 787 (757-866) ms and LIF 0.91 (0.71-1.88), both p = 0.276). All non-invasive measurements were significantly different between patients with normal liver histology or simple steatosis and those with fibrosis and/ or inflammation on biopsy (p < 0·0001 for all tests) ( Table 4). To compare diagnostic accuracy between tests, a complete case analysis was performed on the subgroup of patients for whom adequate liver biopsy, cT1, TE and ELF measurements were all available. No significant difference was detected between the predictive accuracy of the three tests for the differentiation between normal biopsies/steatosis and the presence of any degree of inflammation and/or fibrosis (p = 0.500, n = 117, Supplementary Fig. 4 and Table 4). Following exclusion of post-liver   transplant patients, cT1 showed superior predictive accuracy differentiating between these groups, although statistical significance was not achieved (p = 0.063, Table 4).
Comparative performance of non-invasive tests to detect significant liver fibrosis. Diagnosis of significant (moderate/severe) liver fibrosis identifies patients requiring close clinical follow up (including variceal and hepatocellular carcinoma surveillance in cirrhotics) and those most in need of therapeutic interventions to prevent progression of liver disease and/or decompensation, including participation in clinical trials. In order to compare the predictive accuracy of the tests in the context of clinical use, a set of net reclassification index (NRI) analyses were performed (Table 5). These compared the prognostic ability, with regards to significant liver fibrosis, between cT1, TE, ELF and the combination of TE and FIB-4, which was chosen as a 'conventional diagnostic test' that incorporates a non-patented serum score and TE. Based on the cut offs used, we found no significant difference between cT1 and either TE or ELF. The combination of TE/FIB-4 had poor predictive accuracy, with TE alone performing significantly better (p = 0.007).
We performed a subgroup analysis that excluded post liver transplant patients (Supplementary Table 5), which found no significant difference between cT1 and either TE or ELF. As per the analysis of the whole cohort, predictive accuracy of the TE/FIB-4 combination was poor, and inferior to TE alone.
Subgroup analyses were also performed to assess the predictive accuracy of cT1 in those cases with indeterminate ELF and TE values. For those with indeterminate ELF (>7.7 and ≤9.8, n = 76), cT1 maintained good predictive accuracy for diagnosing significant fibrosis, as in the whole cohort (Table 6). However, cT1 was not found to be predictive of significant fibrosis in the n = 36 patients with borderline TE (>7 and ≤13).

Discussion
Validated non-invasive tools for the diagnosis, stratification and monitoring of liver disease are an urgent requirement to streamline clinical care pathways and facilitate drug development. One of the emerging imaging-based technologies is LMS, a rapid non-contrast multiparametric MRI scan quantifying hepatic fibro-inflammatory injury, fat and iron. Here we describe the first independent validation study of LMS where, critically, its performance was also evaluated alongside other commonly used non-invasive biomarkers.
MRI scanners are available in most UK hospitals and, in contrast to MRE, LMS can be implemented on any modern clinical 1.5 or 3.0 tesla MR scanner with no additional hardware requirements. MRI with LMS is not contraindicated in patients with metal implants, as any modern (within the last 20 years) implants are almost 100% non-ferrous.
Crucially for an emerging diagnostic test, we showed that MRI with LMS had excellent repeatability and reproducibility (immediate test-retest and repeated over 10 weeks). Additionally, unlike other non-invasive imaging tests, quantification of fibro-inflammatory injury was unaffected by post-prandial state, potentially increasing ease of use and patient acceptability by avoiding pre-scan fasting. Immediate test-retest repeatability of LMS has also been confirmed in a mixed healthy volunteer/patient cohort (unpublished data).
We have validated, in a diverse unselected secondary care population, that LMS has good diagnostic accuracy for detecting fibro-inflammatory injury, fat and iron in the liver across a range of disease severity and aetiology when compared to the current imperfect 'gold standard' of liver biopsy. We acknowledge the unavoidable patient selection bias introduced by the requirement for liver biopsy, as well as the need for further studies to define applicability of LMS in other patient groups.  Based on the ROC analysis (Table 3), TE was superior to both multiparametric MRI and ELF test in the detection of moderate and severe fibrosis, whilst all three non-invasive tests were comparable for the detection of any fibrosis. Both for TE and LMS, a significant interaction effect between inflammation and fibrosis was detected. As T1 relaxation time is influenced by the presence of inflammation 27 and liver fat 28 , a more nuanced understanding of the individual and combined effects of fibrosis, inflammation, fat and iron on T1 relaxation time in different aetiologies of liver disease is required to refine the interpretation of LMS. However, the multiparametric MRI sequence is ideally suited to correct for these distinct histological variables.
LMS, ELF and TE had comparable predictive accuracy for the differentiation between normal biopsies/steatosis and the presence of any degree of inflammation and/or fibrosis. On a post hoc analysis, we have identified that liver transplantation may influence the diagnostic performance of cT1 in detecting significant liver disease, with a substantial but not significant improvement in diagnostic accuracy observed (AUROC 0.89 vs. 0.76) when post-transplant patients were excluded. To investigate this further, a prospective multicentre trial comparing the accuracy of LMS against a liver biopsy in the assessment of liver transplant recipients is ongoing (NCT03165201).
Following net reclassification analysis, cT1 was comparable to TE and ELF in diagnosing significant liver fibrosis, when using the proposed cut off of 875 ms. Moreover, our analysis suggested that cT1 could be used to increase the diagnostic yield in indeterminate ELF cases, although this was not true for indeterminate TE cases. These findings have particular relevance in NAFLD patients, in whom National Institute for Health and Care Excellence (NICE) guidance recommends the ELF test for the identification of liver fibrosis 29 , and where TE failure rates are highest due to obesity, currently necessitating further evaluation via alternative imaging modalities such as acoustic radiation force impulse (ARFI) imaging, shear wave elastography or by liver biopsy. A recent decision analytic model for NHS patients with suspected NAFLD suggested that inclusion of LMS either as an adjunct to or replacement for TE in clinical care pathways may lead to cost savings by reducing the number of liver biopsies 30 .
LMS demonstrated 98.1% technical success rate even in obese subjects, which compares favourably with the average TE success rate of 85%, even with the XL probe. The high technical success rate of LMS has been corroborated in 5000 subjects included in the UK Biobank cohort (96.4% for PDFF estimation) 31 .
In contrast to the previous study by Banerjee et al. 17 , we used the well-established modified Dixon method for PDFF to quantify liver fat, a technique that is widely applicable across different MRI platforms and is highly correlated with MR spectroscopy and also hepatic triglyceride levels 15 . Additionally, MRI-PDFF is more reliable than Fibroscan Controlled Attenuation Parameter in obese individuals 32 . We have recently reported a subgroup analysis of multiparametric MRI in NAFLD patients from our study cohort 33 .
LMS also incorporates gold standard MRI assessment of liver iron 34,35 performed with a high degree of accuracy (sensitivity and NPV of 83% and 96% respectively), with potential application in the diagnosis and monitoring of hereditary and acquired conditions leading to iron overload.
The optimal LMS cut-off points for stratification of different CLD stages will require further investigation in different research settings and may be disease-specific. However, based on the application of LMS MRI in the UK Biobank cohort, a LIF score of ≤0.82 (95% CI 0.72-0.95) has been validated as a normal value for the UK population. Thus, in a general population setting, a LIF score of ≤1 with its high sensitivity (86%) would strongly indicate the absence of significant fibro-inflammatory liver disease, allowing identification of low-risk patients who could be spared further investigations and unnecessary follow-up. In contrast, a LIF cut-off >2 identified CLD patients with an increased risk of clinical outcomes 18 and could therefore potentially be used to enrich trial populations prior to enrolment in interventional studies, or as a clinically meaningful surrogate endpoint. Given that emerging treatments (and potential combination regimens) for CLD may modulate fat and fibro-inflammatory components of liver injury to a variable extent, quantitative multiparametric MRI assessments may be well suited for drug development, especially in NASH.
In this first "in-the-field" study, LMS demonstrated good diagnostic accuracy for detecting fibro-inflammatory injury, fat and iron in the liver across a range of disease severity and aetiology and with excellent repeatability and reproducibility. Its diagnostic performance was comparable to existing well-validated non-invasive biomarkers. Overall, our findings support the further development and refinement of multiparametric MRI technology for assessment of liver disease.