Comparison of conventional sonographic signs and magnetic resonance imaging proton density fat fraction for assessment of hepatic steatosis

This study correlated conventional ultrasonography (US) signs with the magnetic resonance imaging (MRI) proton density fat fraction (PDFF) to evaluate the diagnostic performance of US signs (alone or combined) to predict presence and degree of hepatic steatosis (HS). Overall, 182 subjects met the study inclusion criteria between February 2014 and October 2016. Four US signs were evaluated independently by two radiologists. MRI PDFF was defined as the average of 24 non-overlapping regions of interest (ROIs) within eight liver segments obtained by drawing three ROIs within each segment. The latter acted as the reference standard to evaluate diagnostic accuracy of the US signs and their combinations. Diagnostic performance of US for HS was assessed using receiver operating characteristic (ROC) curve analyses. There was a strongly positive correlation between some combinations of US signs and PDFF (σ = 0.780, p < 0.001). The sensitivity, specificity, PPV, and NPV were 96.6%, 74.8%, 64.8%, and 97.9%, respectively, determined using abnormal hepatorenal echoes to detect grade 1 or higher HS (area under the ROC curve = 0.875). The sensitivity and NPV for detecting HS with US were good and US may be considered a suitable screening tool for exclusion of HS.

Therefore, the purpose of our study was to correlate conventional US signs to MRI PDFF and to evaluate the diagnostic accuracy of each US sign, and their combinations to predict the presence and degree of HS.

Study oversight.
We performed a retrospective cohort study at a single academic tertiary hospital. This study was approved by the institutional review board of Hanyang University Hospital and was registered in a clinical trials database (the Korea Clinical Research Information Service, https://cris.nih.go.kr/cris/index.jsp). All experiments were performed in accordance with relevant guidelines and regulations. Informed written consent was obtained from all subjects. MRI PDFF was used as the reference standard of diagnosis and grading of HS.
Subjects. We included subjects who underwent MRI PDFF between February 2014 and October 2016. First, we enrolled 126 adult subjects with NAFLD who had participated in two previous clinical trials (clinical trial numbers KCT 0001480 and KCT 0001588, https://cris.nih.go.kr/cris/index.jsp) that compared changes of MRI PDFF before and after the use of lactobacillus. For complete details of the inclusion and exclusion criteria of the two parent studies, see Appendix 1. We used baseline MRI PDFF values from the two clinical trials. Next, we included 283 consecutive subjects who underwent magnetic resonance cholangiopancreatography who were hospitalized for suspicious viral hepatic disorders and biliary disease. Of the total 409 subjects enrolled, 227 were excluded for the following reasons: (a) >7 day interval between MRI PDFF and abdominal US (n = 111), (b) inappropriate US (n = 21), (c) age <19 years (n = 4), (d) primary or secondary hepatic malignancy (n = 43), and (e) acute infectious disease (n = 48). Thus, a total of 182 subjects were included in our study cohort for this study (107 men [mean age, 49.2 years; age range, 20-80 years] and 75 women [mean age, 58.1 years; age range, 20-91 years]). A flow diagram of the patient selection process is shown in Fig. 1. Liver US examination. All US examinations were performed using a standardized scanning protocol on a Philips iU22 (Philips Healthcare, Andover, MA, USA) or Aixplorer (SuperSonic Imagine, Aix-en-Provence, France) scanner with a low frequency convex transducer. Two radiologists (B.K.K and M.K, who had 9 and 6 years of experience in abdominal imaging, respectively) independently assessed the US imaging using a picture archiving and communication system (PiView, Infinitt Co., Seoul, Korea). The four US signs (abnormal hepatorenal echoes, loss of echogenicity of the portal vein, poor diaphragm visualization, and posterior beam attenuation) were evaluated to diagnose and determine the severity of fatty liver [19][20][21] . These observers were blinded to the clinical and histopathological data prior to analysis. On US, an abnormal hepatorenal echo was identified when the liver had higher echogenicity than the right renal cortex, loss of echogenicity of the portal vein was identified when the echogenic wall of the main portal vein was not visible in the right intercostal view, posterior beam attenuation was defined as impaired visualization of more than one-third of the hepatic parenchyma, and poor diaphragm visualization was defined as impaired visualization of more than half of the diaphragm in the right intercostal view (Fig. 2). After each radiologist had analyzed the images, a consensus was reached via discussion if they had differing opinions. Any discrepancy between the two observers regarding the four US signs was used for interobserver agreement analysis.

MRI examination.
A 3T MRI scanner (Ingenia; Philips Healthcare, Best, The Netherlands) with a torso coil was used in all MRI examinations. A three-plane localization imaging gradient echo (GRE) sequence was obtained first, followed by a 3D multiple echo GRE sequence based on the mDIXON technique (mDixon-Quant, Philips Medical systems, Best, The Netherlands) performed in a single breath-hold. The mDIXON-Quant sequence used the following parameters: six TEs (first TE 0.98 msec, delta TE 0.8 msec) and TR 6.3 msec, flip angle 3°, parallel imaging SENSE factor 2, number of signal average 1, matrix size 300 × 300, field-of-view 350 × 350 mm, number of slices 60, and slice thickness 3 mm (50% interpolation). We used a very low spin flip angle to avoid T1 saturation and 6 echo acquisition and 7 peak fat modeling for overcome complexity of fat and T2* bias. This sequence automatically generated water, fat, fat fraction, R2* and T2* maps. Reference standard of fatty liver. Fatty liver was graded according to the following criteria 7,22,23 : Grade 0: PDFF less than 6.4%, Grade 1: PDFF equal to or greater than 6.4% and less than 16.3%, Grade 2: PDFF equal to or greater than 16.3% and less than 21.7%, and Grade 3: PDFF equal to or greater than 21.7%. Statistical analyses. Baseline characteristics were calculated using descriptive statistics. Normality of the distribution was evaluated using a Shapiro-Wilk test. If the data was normally distributed, it was described as mean and standard deviation (S.D.). Mean MRI PDFF, of each US sign, and their combinations were calculated. To correlate the combinations of US signs with the mean MRI PDFF, the Spearman's correlation coefficient was calculated with box-and-whisker plots and simple correlation analysis. The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and areas under the receiver operating characteristic (ROC) curve (AUROC) with 95% confidence intervals were calculated using ROC curve analysis to evaluate the ability of US signs (alone and in combination) to predict the presence or degree of HS. In addition, we evaluated whether the diagnostic performance in detection of HS was different according to different previously published MRI PDFF cutoff values. Interobserver agreement for each US sign was analyzed using Cohen's kappa (κ) statistics as follows: < 0.20, poor; 0.21-0.40, fair; 0.41-0.60, moderate; 0.61-0.80, good; and 0.81-1.00 was considered excellent reliability. p < 0.05 were considered statistically significant. We used SPSS Version 21.0 (SPSS Chicago, IL, USA) and Medcalc for Windows (Version 14.12.0; MedCalc software, Mariakerke, Belgium) for statistical analyses.
Data availability. The datasets generated during and/or analyzed during the current study are not publicly available because of personal information protection but are available from the corresponding author on reasonable request. Technical appendix, statistical code, and dataset are available from the corresponding author at HYPERLINK "mailto:noshin@hanyang.ac.kr" noshin@hanyang.ac.kr. All relevant data are within this paper and its Supporting Information files.
In subjects without any US signs, the PDFF was 2.6% (S.D. ± 1.22). When only abnormal hepatorenal echoes were found, the PDFF was 5.4% (S.D. ± 3.35); when abnormal hepatorenal echoes and loss of echogenicity of the portal vein were present but poor diaphragm visualization and posterior beam attenuation were absent; the PDFF was 13.8% (S.D. ± 9.67). When abnormal hepatorenal echoes, loss of echogenicity of the portal vein, and poor posterior beam attenuation were visible, yet the diaphragm could not be visualized, the PDFF was 22.2% (S.D. ± 7.29). When all four US signs were seen simultaneously, the PDFF was 26.2% (S.D. ± 7.75). These combinations were named from 1 to 5 in order; there was a strong positive correlation between these signs and the MRI PDFF (σ = 0.780, p < 0.001) (Fig. 3).
In the combinations of US signs, abnormal hepatorenal echoes with loss of echogenicity of the portal vein showed the highest sensitivity of 72.9% (43/59) and NPV 88.2% (119/135) for the presence of fatty liver without much difference of specificity and PPV compared with other combinations of US signs (Table 3). However, the sensitivity and NPV were not higher than for the abnormal hepatorenal echo alone.

Discussion
Conventional US signs of HS have been well described in the literature 19-21 but they have not been compared to MRI PDFF. We found positive correlations between some combinations of US signs, and MRI PDFF values. In addition, the ability of 'abnormal hepatorenal echoes' to predict the presence of fatty liver (≥6.5% MRI PDFF) was high with a sensitivity of 96.6% and NPV of 97.9%, respectively. In a meta-analysis that compared the diagnostic performance of US for fatty liver, the diagnostic performance of US to detect a histologic steatosis grade 2 or higher was excellent, with a sensitivity of 85.7% and a specificity of 85.2% 17 . However, the diagnostic performance of US for histologic grade 1 steatosis was relatively low at 12.0-49.8% 15,16 . Previous studies have found that the sensitivity and specificity for detecting less than histologic 10% fatty liver were 73.3% and 84.4%, respectively 17,18 . Another previous study reported that the sensitivity was merely 12% in subjects with a histologic hepatic fat content of 5-10% 16 . Even though it is difficult to make direct comparisons of the diagnostic performance because of the difference reference standards used, this study demonstrated that the sensitivity and NPV of 'abnormal hepatorenal echoes' in detecting steatosis grade 1 or higher was good with a sensitivity of 80.2-96.6% and a NPV of 80.9-97.9% according to the threshold used in this study as well as with previously published thresholds.   Prior reports of US diagnoses of fatty liver have been largely subjective 26,27 . In our study, we attempted to improve the objectivity of US assessment of HS. Abnormal hepatorenal echoes were observed in 25.2% of subjects (31/123) without fatty liver on MRI PDFF. This suggests that the NPV to diagnose fatty liver using abnormal hepatorenal echoes was high, while the PPV was low. High sensitivity and NPV are important for screening given their utility when ruling out fatty liver disease. NAFLD is most common cause of unexplained liver enzyme elevation 28 . Ruling out fatty liver disease using US is very helpful in routine clinical practice for a physician examining patients with elevated liver enzymes. Moreover, it is very important to define healthy controls. NAFLD is common, but often missed in volunteers for clinical trials, despite its potential effect on subject safety and validity of results 29 . Increasing awareness of NAFLD prevalence and ruling out NAFLD using US may ameliorate this problem.
A possible reason for the increased sensitivity of US in detecting grade 1 or higher steatosis compared to previous studies may be related to the severity and distribution of fatty liver in the patient cohort. In a previous study with a large number of liver transplantation donors, the sensitivity of US was low in subjects with fatty liver less than or equal to 30% by histology 15,16 , and other subsequent studies reported that sensitivity was also low in those with fatty liver less than or equal to 20% and 12.5% by histology, respectively 19,30 . Considering that mild fatty liver is a broad spectrum that consists of 5-33% macrosteatosis, different results may be obtained based on the distribution of fat fraction of the subjects. In fact, the high sensitivity of our study might likely be attributable to the higher threshold of fatty liver detected by MRI PDFF (6.5%) than in previous studies 24,25 . However, the threshold used in our study was reconfirmed as having moderate to high sensitivity and high specificity in a recent study carried out by the same researchers who compared liver biopsy and MRI PDFF in an independent cohort 22 .
The severity of fatty liver has been evaluated qualitatively as normal, mild, moderate, and severe using US signs of echogenicity of the portal vein, poor diaphragm visualization, and posterior beam attenuation 20,31,32 . We found that when only abnormal hepatorenal echoes and loss of echogenicity of the portal vein were observed with no diaphragm visualization or posterior beam attenuation, the MRI PDFF was 13.8% (S.D. ± 9.67), which was close to the value of grade 2 steatosis determined by MRI PDFF 6,7,22,23 . When an abnormal hepatorenal echoes and loss of echogenicity of the portal vein were visible, the sensitivity and specificity in detecting grade 2 or higher steatosis were 100% and 85.9%, respectively, with an AUROC of 0.930. These results of our study were comparable to   those of a previous prospective study 19 . Furthermore, the diagnostic sensitivity and specificity for grade 3 steatosis was good when considering posterior beam attenuation.
To overcome the limitation of conventional US with its qualitative nature, quantitative ultrasound (QUS) was developed to characterize tissue microstructure objectively. QUS is a technique that can assess HS by measuring fundamental liver tissue parameters, including attenuation coefficient and backscatter coefficient (BSC) 33,34 . Although QUS is useful for grading HS, it also has limited availability in most smaller clinical trials because it requires post-processing software and cannot be evaluated in real time. In our study, we attempted to compare MRI PDFF with conventional US signs; thus, no comparison with QUS was performed. Future prospective studies are needed to compare combinations of conventional US signs with QUS, MRI PDFF, and liver biopsy in a large cohort of patients for grading of HS.
There were several limitations to our study. First, conventional US signs may be operator-dependent and subjective, although the interobserver agreement in our study was good to excellent (κ value: 0.759-0.858) for evaluation of US signs measured independently by two radiologists. This may be because this study did not qualitatively evaluate the severity of fatty liver, as in previous reports 31 , but evaluated the agreement of each US sign independently. Our results may be promising, but must be validated prospectively in multi-center, community-based clinical trials before MRI PDFF can be adopted in a routine clinical setting. Second, the the reference standard for evaluating HS was MRI PDFF. The invasive gold standard of diagnosis of fatty liver is biopsy, but it is an invasive approach and there are concerns about its accuracy because of sampling bias and poor interobserver correlations 4,35 . Recent studies have reported that MRI PDFF can measure hepatic fat content more objectively and accurately than biopsy 25,36 . However, the cutoff value of fatty liver by MRI PDFF differs depending on the researcher. Therefore, this study was performed using the most recent and most verified results. Third, the percentage of subjects with moderate or severe fatty liver included in this study was relatively low, at 15.4%. Finally, US signs for longitudinal follow up of HS were not evaluated in this study, additional larger studies are needed to define these US signs in a longitudinal setting.
In conclusion, the sensitivity and NPV for the determination of HS by US using MRI PDFF as a reference standard were good at 96.6% and 97.7%, respectively, and US may be considered a suitable screening tool for the exclusion of fatty liver.  Table 5. Diagnostic performance of combinations of ultrasonography signs for predicting degree of fatty liver. Note -FL, fatty liver; SN, sensitivity; SP, specificity; PPV, positive predictive value; NPV, negative predictive value; AUC, area under the curve; CI, confidence interval; Liver echo, abnormal hepatorenal echo only; +Loss of portal vein, abnormal hepatorenal echo and loss of echogenicity of the portal vein; +Posterior attenuation, abnormal hepatorenal echo and loss of echogenicity of the portal vein and poor posterior beam attenuation; +Poor diaphragm, abnormal hepatorenal echo and loss of echogenicity of the portal vein and poor posterior beam attenuation and poor diaphragm visualization.  Table 6. Interobserver agreement of ultrasonography signs. Note -CI, confidence interval.