A composite biomarker using multiparametric magnetic resonance imaging and blood analytes accurately identifies patients with non-alcoholic steatohepatitis and significant fibrosis

Non-alcoholic steatohepatitis (NASH) is major health burden lacking effective pharmacological therapies. Clinical trials enrol patients with histologically-defined NAFLD (non-alcoholic fatty liver disease) activity score (NAS) ≥ 4 and Kleiner-Brunt fibrosis stage (F) ≥ 2; however, screen failure rates are often high following biopsy. This study evaluated a non-invasive MRI biomarker, iron-corrected T1 mapping (cT1), as a diagnostic pre-screening biomarker for NASH. In a retrospective analysis of 86 biopsy confirmed NAFLD patients we explored the potential of blood and imaging biomarkers, both in isolation and in combination, to discriminate those who have NAS ≥ 4 and F ≥ 2 from those without. Stepwise logistic regression was performed to select the optimal combination of biomarkers, diagnostic accuracy was determined using area under the receiver operator curve and model validated confirmed with and fivefold cross-validation. Results showed that levels of cT1, AST, GGT and fasting glucose were all good predictors of NAS ≥ 4 and F ≥ 2, and the model identified the combination of cT1-AST-fasting glucose (cTAG) as far superior to any individual biomarker (AUC 0.90 [0.84–0.97]). This highlights the potential utility of the composite cTAG score for screening patients prior to biopsy to identify those suitable for NASH clinical trial enrolment.


Methods
Study design, setting. This was a retrospective analysis of data combined from two prospective, cross-sectional studies into the utility of MRI methods to evaluate liver disease. The CALM study 33 invited adult patients scheduled for a standard-of-care liver biopsy to investigate known or suspected liver disease from two large tertiary UK liver centres (Queen Elizabeth Hospital Birmingham and Royal Infirmary of Edinburgh) between February 2014 and September 2015. The RIAL/NICOLA study 34 invited all patients referred for liver biopsy at two UK study centres (Oxford and Reading) between March 2011 and May 2015 to take part. Patient exclusion criteria were inability or unwillingness to give fully informed consent, any contraindication to MRI, and liver biopsy targeted at a focal liver lesion. For the purpose of this analysis only those patients with a primary diagnosis of either NAFLD or NASH who had not undergone liver transplantation were included. Patients underwent standard of care liver function blood tests and liver biopsy, and also underwent LiverMultiScan to measure cT1 and PDFF. We refer to the combination of RIAL/NICOLA and CALM data sets as the 'original dataset' .
Both the studies were conducted in accordance with the ethical principles of the Declaration of Helsinki 2013 and Good Clinical Practice Guidelines. The RIAL study was approved by the institutional review departments at the University of Oxford and by the National Review Ethics Service (South Central; Ref: 11/H0504/2). The CALM study was approved by the institutional review departments at the University of Birmingham and by the National Review Ethics Service (West Midlands-The Black Country; Ref: 14/WM/0010). All participants gave written informed consent. The RIAL study was registered with clinicaltrials.gov (NCT01543646) and was sponsored by the University of Oxford. The CALM study was registered with the International Standard Randomised Controlled Trial Number registry (ISRCTN39463479) and the National Institute of Health Research portfolio (15,912). The study sponsor was the University of Birmingham.
Histological analysis of liver biopsy samples. All biopsies were reported by at least 2 liver histopathologists for both studies 34,35 , and adequacy assessed using the definition of the Royal College of Pathologists 36 . Histology was graded according to the NASH-CRN for Kleiner-Brunt Fibrosis; hepatocellular ballooning; lobular inflammation; steatosis and the composite NAS. All pathologists were blinded to patient characteristics and non-invasive assessment data. Biopsy scores used for the analysis were those collected as part of the three independent studies and were not re-read centrally.
Magnetic resonance imaging protocol. The LiverMultiScan MRI scanning protocol was installed, calibrated and phantom tested on all the MR systems in these trials in a standard way 37  Statistical analysis. Statistical analysis was performed using R software version 3.5.3 38 and a p-value less than 0.05 was considered statistically significant. Case-wise deletion was employed to include only complete cases for NAS, Kleiner-Brunt Fibrosis score and cT1 and PDFF, ALT, AST, albumin, bilirubin, GGT, and fasting blood glucose scores. Descriptive statistics were used to summarise baseline participant characteristics. Mean and standard deviation (SD) were used to describe normally distributed continuous variables, median with interquartile range for non-normally distributed, and frequency and percentage for categorical variables. Mean difference in biomarker values between those with NAS ≥ 4 and F ≥ 2 on biopsy versus those without, were compared by Student t-test with common variance and Fisher's exact test, respectively. To discriminate patients with progressive NASH, defined as having a NAS ≥ 4 (in the presence of ballooning ≥ 1 and lobular inflammation ≥ 1), with Kleiner-Brunt Fibrosis ≥ 2, from NAFLD patients not meeting these criteria, univariable logistic regression analysis was performed for all the potential predictors which included cT1, PDFF; and the serum measures acquired as part of standard clinical care: fasting glucose, AST, GGT, ALT, albumin and bilirubin. Following this, stepwise logistic regression analysis was performed using Akaike information criterion 39 to select the optimal combination of MRI and blood serum derived predictors (model 1). All potential biomarkers were normalised using the z-score (linearly transformed data values having a mean of zero and standard deviation of 1), to allow for a meaningful interpretation because of the differences in the range and magnitude of the different units for all biomarkers. Risk scores were extracted from the odds ratio estimates of having NAS ≥ 4 and F ≥ 2 as calculated in the logistic regression model. Overall diagnostic accuracy produced by individual metrics and model 1 was estimated as the area under the receiver operator curve (AUC) with 95% CI.
Model 1 was validated using fivefold cross-validation by randomly splitting into 5 equal subsamples; 4 subsamples were used to train the model and the one left to test the model. This process was repeated 5 times, so every subsample was used once as a test dataset. The AUC was extracted as the mean across the 5 estimates from the fivefold cross-validation method. To investigate the potential effect of age and gender on discriminating patients with NAS ≥ 4 and F ≥ 2 from those without, model 1 was further adjusted (model 2) as a sensitivity analysis. The Wald test was used 40 to investigate if significant improvement was added to the fit of model 1, when age and gender were included; and DeLong's non-parametric test 41 was used to compare the overall diagnostic performance between nested models 1 and 2.

Results
362 biopsied patients were initially included in the dataset. After applying the exclusion criteria, 86 patients were included in the analysis (Fig. 2). Mean interval between biopsy and MRI was 66 days (SD: 86 days, range 0-311). 39.5% of patients were classified in the NAS ≥ 4 and F ≥ 2 group (Table 1). The blood serum metrics of bilirubin, albumin, GGT, and ALT had similar distributions in both groups. AST and fasting glucose were significantly Figure 1. Example T2*, uncorrected T1, corrected T1 (cT1) and PDFF maps (from left to right) acquired using the LiverMultiScan protocol and generated using LiverMultiScan Version 3.1 software (Perspectum, Oxford, https ://persp ectum .com/produ cts/liver multi scan).
The individual biomarkers cT1, AST, and fasting glucose yielded overall diagnostic performance of 0.73 (95% CI 0.62-0.84), 0.71 (95% CI 0.6-0.82) and 0.78 (95% CI 0.68-0.88), respectively (Fig. 4), in discriminating those with NAS ≥ 4 and F ≥ 2. The composite of the three biomarkers, abbreviated as the "cTAG" risk score (as extracted by model 1), yielded the highest diagnostic performance of 0.90 (95% CI 0.84-0.97) (Fig. 4). The diagnostic accuracy and test performance characteristics (sensitivity and specificity) for all potential cTAG cut-offs are displayed in Fig. 5. Selecting a cut-off from the model to achieve at least 90% sensitivity, yielded 34%; this gave a sensitivity of 92% and a specificity of 79% (Table 2). Cut-offs for the individual biomarkers in cTAG, that represent 90% sensitivity and 90% specificity are available in Tables S1 and S2 in the supplementary material respectively.
Model 1 resulted in the following equation for the cTAG risk score of having NAS ≥ 4 and F ≥ 2. cT1, AST and fasting glucose (Gluc) are on the normalised scale.
cTAG for trial enrichment. The potential impact of screen failure rate, defined as the proportion of individuals who pass through screening but do not meet the histological criteria of NAS ≥ 4 and F ≥ 2 after biopsy, was explored for all potential cut-offs (Fig. 5). In a worked example, a cTAG risk score of 34% resulted in a screen failure rate of 13%, compared to 61% without. Using the same cTAG cut-off, 26% of patients that received a positive result failed to meet the histological criteria (false discovery rate, is defined as 100%-PPV). The selection of the optimal cut-off for trial enrichment will ultimately depend on the required balance between screen fail and missed cases rate. Table 1. Descriptive statistics of demographic, serum and MRI metrics, described as mean and standard deviation for continuous variables or numbers and percentages for ordinal. Significant differences between those with NAS≥4 and F≥2 and those those with NAS<4 or F<2 are considered when p-value < .05, and are highlighted in bold.

Discussion
In this retrospective cohort study, we highlight the utility of a novel composite score, cTAG, that combines cT1, a non-invasive MRI-derived biomarkers of liver disease with standard serum biomarkers, to identify patients with progressive NASH. In line with previously-used definitions of this high-risk NASH group 42 , and common enrolment criteria for NASH clinical trials (e.g. Regenerate NCT02548351; Resolve-IT NCT02704403; Maestro-Nash NCT03900429) we have targeted the identification of patients with histologically-determined NAS ≥ 4 and F ≥ 2.
In this study, fasting glucose, AST, GGT and the MRI biomarker cT1 each had good diagnostic performance to discriminate those with progressive NASH. Somewhat surprisingly given its common use as a screening tool and endpoint in NASH trials, PDFF demonstrated poor performance in discriminating these patient groups. This may be explained by the observed reduction in PDFF with advanced fibrosis which drops significantly after F3 thus diverging from the positive relationship 27,29 . Of the imaging and serum markers evaluated, the optimal combination: cT1, AST and fasting glucose (cTAG), demonstrated excellent performance in identifying patients with NAS ≥ 4 and F ≥ 2 (AUROC = 0.90) in this dataset, which was corroborated with cross-validation (AUROC = 0.84). Modelling the impact of using the cTAG score to enrich the population selected for liver biopsy revealed a reduction in screen failure rate from 61 to 13%, corresponding to an 87% higher chance of a selected patient meeting the histological criteria of NAS ≥ 4 and F ≥ 2. This highlights a potential role for this composite biomarker in both NASH clinical trials to reduce the number of avoidable invasive liver biopsies, and in secondary clinical care to evaluate NASH status. Health economic modelling has demonstrated LiverMultiScan to be cost effective for the detection of patients with NASH 35 , a value that is likely to even greater if the full cost of liver biopsy, accounting for complications was considered. Further research into the cost implications of non-invasive biomarkers for NASH in a variety of healthcare settings is warranted.
The utility of cT1 in distinguishing between these groups derives from the significantly higher cT1 in the progressive NASH group. To put the 79.8 ms difference in context, the reported standard deviation for cT1 reproducibility (same patient scanned across different MRI scanners) is 41.4 ms 37 and 31.9 ms for a longitudinal test-retest study over 16 weeks 43 . Although the data reported in this study is not longitudinal, the magnitude of the difference between the two risk groups, relative to the reported repeatability and reproducibility, supports the utility of cT1 as a sensitive biomarker for monitoring changes in disease state 44 . Regarding the observed results for the blood-based biomarkers, the usual pattern of abnormal liver enzymes due to NAFLD is increased transaminases, with alanine aminotransferase (ALT) levels exceeding those of aspartate aminotransferase (AST). As NAFLD progresses to NASH and fibrosis the AST may increase and lead to a resultant rise in the ratio of AST/ALT 45,46 ; the γ glutamyl-transferase (GGT) level can also increase. Although ALT and AST are useful tests, they are not reliable in predicting NAFLD. It has been found that up to 50% of NAFLD patients can have normal levels of AST and ALT 47,48 . Similarly, insulin resistance is a factor associated with NASH 49 , HOMA-IA for example has been identified as an independent predictor of advanced fibrosis in patients with NAFLD 50 and the prevalence of NAFLD is estimated to be 60% in patients with type 2 diabetes 1,2 . The liver is also the main location of glucose production during fasting conditions thus fasting glucose a good marker of insulin resistance. Patients with hepatic steatosis may have increased fasting glucose 51 but not always. These fluctuating patterns of the biomarkers in NASH highlight the added potential of augmenting the information available by combining www.nature.com/scientificreports/ imaging and circulating biomarkers to build a more accurate profile of underlying liver disease, an area that is rapidly evolving as evidenced by the wealth of emerging research into combined biomarkers (e.g. FAST 42 , ADAPT 52 ). The promise of such approaches may not only match the prognostic performance of liver biopsy but likely one day surpass it by forming a basis for precision medicine.
Whilst the results of this analysis are very promising, the research is not without its limitations. The role of a single fasting glucose measurement to act as a predictor of NAS ≥ 4 and F ≥ 2 can be an inferior predictor compared to some other indicator of glycaemic control such as HbA1c, HOMA-IA, previous diagnosis of diabetes or current usage of antidiabetic medications, but none of these were consistently collected in the analysed dataset. Fasting glucose varies after meals, exercising or antidiabetic medications and this whilst it is common for glucose measurements to be assumed fasting, they may in fact reflect recent food intake, which can introduce bias. However, other  www.nature.com/scientificreports/ studies suggested that fasting glucose is higher in patients with NASH, highlighting the potential use as a predictor of coexisting diabetes and NASH 51,53 . Another consideration that should be noted in studies of this nature is the potential for discordance with biopsy results as a result of a large time interval between measurements. The mean time interval for this study was 66 days, in which one would not expect any major change in the health of the liver in the absence of pharmaceutical intervention; however, in some cases the interval was up to 10 months. Whilst fibrosis has been reported to take up to 7 years to progress 54 , these intervals may influence the interpretability of the data in this cohort. Further validation studies should aim to minimise this interval where possible. It should also be noted that the sample size analysed in this study was relatively small, and future studies with larger datasets are required to independently validate the diagnostic performance of the model.

conclusion
While the individual biomarkers of cT1, fasting glucose and AST yielded good discriminatory performance in identifying progressive NASH, a composite of all three, the cTAG score, greatly improved the performance. The non-invasive, objective and organ-specific nature of cT1 compliments these routinely measured blood analytes. Together, this highlights the potential utility of cTAG in identifying patients at increased risk of disease progression who would be suitable for pharmacological therapy, either as part of a clinical trial or in routine clinical practice as treatments become available. Further research, possible exploring more stable and accurate markers of insulin resistance, are warranted in order to validate the model in independent cohorts with larger sample sizes and varying disease prevalence.  Table 2. (A) Sensitivity, specificity, positive predictive value, and negative predictive value for all possible cTAG score values (B) Screen failure rate, missed case rate, and proportion of patients identified for all possible cTAG score values. Table 2. Confusion matrix to discriminate NAS ≥ 4 and F ≥ 2 patients, using a cTAG risk score of 34% as a cutoff. The negative predictive value (NPV) and positive predictive values (PPV) are illustrated too.