Multi-omic profiles of human non-alcoholic fatty liver disease tissue highlight heterogenic phenotypes

Non-alcoholic fatty liver disease (NAFLD) is a consequence of sedentary life style and high fat diets with an estimated prevalence of about 30% in western countries. It is associated with insulin resistance, obesity, glucose intolerance and drug toxicity. Additionally, polymorphisms within, e.g., APOC3, PNPLA3, NCAN, TM6SF2 and PPP1R3B, correlate with NAFLD. Several studies have already investigated later stages of the disease. This study explores the early steatosis stage of NAFLD with the aim of identifying molecular mechanisms underlying the etiology of NAFLD. We analyzed liver biopsies and serum samples from patients with high- and low-grade steatosis (also pre-disease states) employing transcriptomics, ELISA-based serum protein analyses and metabolomics. Here, we provide a detailed description of the various related datasets produced in the course of this study. These datasets may help other researchers find new clues for the etiology of NAFLD and the mechanisms underlying its progression to more severe disease states.


Background & Summary
With an estimated prevalence of about 30% in western countries, NAFLD is a major public health issue 1 . Sedentary life-style and excessive food consumption correlate with rate at which NAFLD cases appear. Epidemiologic studies showing a prevalence of the disease that differs between countries as well as between groups in the same country, appear to reflect an interplay of environmental and genetic factors in its etiology 1 . Additionally, polymorphisms in, e.g., APOC3, PNPLA3, NCAN, TM6SF2 and PPP1R3B, correlate with NAFLD 2,3 . Over-feeding directly induces insulin resistance 4 . Causality between steatosis and the metabolic syndrome of insulin resistance, obesity, and glucose intolerance, is still unresolved 5 . While the correlation between steatosis and insulin resistance is established there is debate about the relationship between steatosis and hepatic insulin resistance 6 . Samuel et al. showed that activated PKC-ϵ and JNK can induce insulin resistance via impaired IRS1 and IRS2 tyrosine phosphorylation in rats fed with high fat diet 7 . An investigation on the insulin-like growth factor (IGF) axis in the Nurses' Health Study 8 and another population study of 3863 people 9 addressed connections between the IGF axis, insulin resistance, diabetes risk and NAFLD. IGFBP3 is associated with various cancers and up-regulation of IGF1 receptor (IGF1R) is considered an early event in hepatocarcinogenesis 10 . Thus, the IGF axis might play an important role in a direct development of carcinoma from steatosis without the formerly assumed intermediary phase of cirrhosis 11 .
The progression of NAFLD from mild steatosis up to severe steatohepatitis and even liver cirrhosis and hepatocellular carcinoma, varies widely between individual patients. Insulin resistance, dysregulation of cytokines as a basis for inflammation, and oxidative stress appear to foster progression to steatohepatitis 12 . A two-step progression from simple steatosis to steatohepatitis and fibrosis has been proposed 13 , and suggests that after fat accumulation in the liver due to insulin resistance, lipids are peroxidized with cytokines and Fas ligand induced by excessive ROS. However, this two-step progression has been questioned 5 . We found that in fibroblasts derived from steatosis patients AKT/mTOR signaling was reduced and that the insulin-resistant phenotype is exhibited not only by insulin-metabolizing central organs, e.g., the liver, but also by skin fibroblasts 14 . Transcriptome data identified a regulatory network orchestrated by the transcription factor SREBF1 and linked to a metabolic network of glycerolipid and fatty acid biosynthesis. The downstream transcriptional targets of SREBF1 which include the phosphatidic acid phosphatase LPIN1 and LDLR, were also involved. Moreover, there is the possible involvement of ROS in disease progression. Houstis et al. 15 demonstrated that oxidative stress can induce insulin resistance and that anti-oxidants may ameliorate insulin resistance. Depletion of glutathione can improve insulin sensitivity in mice 16 . Glutathione is known as the body's master antioxidant, protecting cells against damage caused by numerous reactive intermediates 17 . Detoxification of these reactive metabolites results in the consumption of glutathione either via oxidation or conjugation. Maintenance of the intracellular glutathione level is thereby a critical liver function, which could be impaired following insult/injury or in steatosis and steatohepatitis.
Several other studies exploring various aspects of NAFLD have been published. A recent publication by Moylan et al. showed that it is possible to discriminate mild versus severe fibrosis stages of NAFLD patients via their gene expression profiles 18 . Another study from Speliotes et al. investigated NAFLD via a genome-wide association study (GWAS) approach 3 . Besides the most prominent association of PNPLA3 this study reported several other associations including one at locus 19p13.11 which is in strong linkage disequilibrium with a recently found steatosis-linked polymorphism in TM6SF2, transmembrane6 superfamily member 2 (refs 19,20) . A knockdown of TM6SF2 in human hepatoma cell lines and in mice led to an increase in lipid droplet area while overexpression led to a decrease 19 . Interestingly, the above mentioned genes associated with NAFLD in GWAS were not detected in a large-scale GWAS about obesity and insulin biology although the metabolic syndrome connects NAFLD and obesity 21  Although it is evident that a complex interplay of genetic and environmental factors contribute to the development of steatosis, to date there has not been a systemic study of the disease employing a multi-omic approach-transcriptome, ELISA-based proteome and metabolome. Therefore, the intention of this study is to provide a more comprehensive view of steatosis based on transcriptomic, metabolomic and protein biomarker profiles. Additionally, this should lay down the foundation for follow-up systems biology-based studies.
In the current study we analyzed patient liver biopsies and associated serum samples, from patients with the insulin resistance phenotype confirmed by the HOMA-IR model 24 . Here, we describe these valuable data sets deposited in public repositories, which might support other researchers in identifying new clues for the etiology of NAFLD and the mechanisms underlying its progression to more severe disease states.

Patient recruitment, sample collection and clinical measurements
All patients participating in this study were recruited in the Multidisciplinary Obesity Research (MORE) project at the Medical University of Graz, Austria or at the Interdisciplinary Adipositas Center at the Kantonsspital St Gallen, Switzerland. Patients with morbid obesity who admitted into hospital for treatment by bariatric surgery (gastric banding, gastric bypass, sleeve gastrectomy) were invited to participate in the study and to sign the informed consent. The study was approved by the institutional review board of the Medical University of Graz (reg. IRB00002556 at the Office for Human Research Protections of the US Departments of Health and Human Services) under license 20-143 ex 08/09. All experiments were performed in accordance with approved guidelines. Written informed consent was obtained from all participants. In the course of the bariatric surgery, samples of blood, skin and a liver biopsy were taken. Out of 18 patients (Table 1), 9 liver biopsies were of high quality enabling their use in the transcriptome analyses. Serum plasma was available from all the patients. The overall experimental design of this study is illustrated in Fig. 1. A pathological diagnosis of the liver phenotype, including liver steatosis grading based on H&E morphology, was performed by an experienced, board certified pathologist (CL). We simplified Kleiner's scoring scheme by condensing Steatosis grades 2 (34-66%) and 3 (> 66%) to our 'high-grade' while adopting grades 0 ('none') and 1 ('low') 25 . This simplification was made because the inter-patient-variability in this complex heterogeneous disease did not allow a more detailed grading on the omics levels. Two examples of liver biopsies are shown in Fig. 2a.

Illumina bead chip hybridization and data analysis
Microarray experiments were carried out on the Illumina BeadStation 500 platform (Illumina, San Diego, CA, USA). Briefly, 500 ng DNase-treated total RNA were used as input for amplification and biotin labeling reactions (Illumina TotalPrep RNA Amplification Kit, Ambion) prior to hybridization of the resulting cRNAs onto Illumina HumanHT-12_v4_BeadChips, washing, Cy3-streptavidin staining and scanning according to the manufacturer's instructions.

Transciptomics data analysis
Illumina data was processed via R/Bioconductor 26 and packages lumi 27 , limma 28 and biomaRt. Background-corrected log2-transformed data was normalized via quantile normalization from the lumi package.  Figure 1. Scheme of experiments for multi-omics comparison of steatosis grades. The scheme shows how the distinct severities of non-alcoholic fatty liver disease (NAFLD) are compared in terms of transcriptomics, metabolomics and potentially relevant parts of the proteome. Liver biopsies were taken from NAFLD patients and classified by pathologists as low-grade (5-33% steatosis area) and high-grade (>33% steatosis area). The transcriptome of liver biopsies were assessed on Illumina HumanHT-12 v4 BeadChips and on RT-PCR. Serum samples of these NAFLD patients and from healthy persons were taken and investigated at the protein level employing ELISA assays and at the metabolome level via Nuclear Magnetic Resonance (NMR). NMR spectra acquisition and processing NMR spectra from 18 plasma samples from morbidly obese patients that underwent different type of bariatric surgery and additionally have developed steatosis were collected (Table 1). 1H-NMR spectra were acquired using a Bruker spectrometer (Bruker Biospin). Unsupervised and supervised methods were used in order to identify a disease-related metabolomic profile that might contain a signature of steatosis.

Data record 1
The microarray experiments discussed in this publication were carried out on the Illumina BeadStation 500 platform (Illumina, San Diego, CA, USA). The data have been deposited in NCBI's GEO and are accessible through GEO Series accession number GSE46300 (Data Citation 1).

Data record 2
Metabolomic raw data from Nuclear magnetic resonance (NMR) measurements have been deposited at the MetaboLights database (http://www.ebi.ac.uk/metabolights) of the European Bioinformatics Institute (EBI) under MTBLS174 (Data Citation 2).

Technical Validation Transcriptomic data
Microarray data passed the proprietary Illumina quality controls. All samples were investigated in duplicates. Fig. 2b shows that-as would be expected-the duplicates cluster together demonstrating the validity of experiments in terms of whole-genome gene expression. The Pearson correlation coefficients of all samples versus each other were calculated with the intention to detect outliers. However, all correlation coefficients were greater than 0.98 and all correlation coefficients of duplicates even greater than 0.99 so that all samples passed this quality check (Table 3). Genes with significant differential gene expression were selected for validation via RT-PCR experiments (Fig. 2c). Genes were termed differentially expressed if the multiple-testing-corrected limma 28 P-value was less than 0.05, the ratio was less than 0.75 or greater than 1.33 and the gene was expressed (detection P-value less than 0.05) in at least one of both cases. Furthermore, we analysed clusters of genes differentially expressed in high versus low steatotic livers together with genes found in literature 19,23 and genome-wide association studies 3 (Fig. 2d). This analysis confirms high similarity between duplicates and clustering-to some extent but not fully-according to steatosis grade. Fig. 3a shows a plot of the first two components of the Principal Component Analysis (PCA) of the microarray data.  Table 2. Primer sequences for QRT-PCR validation of genes differentially expressed between high-grade and low-grade steatosis.   ELISA-based assay for biomarkers Samples below and above quantification limit as well as samples with coefficient of variation (cv) greater than 20% were marked in the measurements table (Data Citation 3). Independent validation of the ELISA-based measurements was checked by visual inspection of plots comparing disease states.

Metabolomics
Assignment of all metabolites were done manually, signals were assigned on template one-dimensional NMR profiles by using matching routines of AMIX 7.3.2 (Bruker BioSpin) in combination with the BBIOREFCODE Version 2-0-0 reference database and published literature when available. Additional confirmation was done using data provided in our lab -database containing spectra of standard pure compounds. To assess which metabolites (i.e., NMR peaks) were significantly different between different sets a univariate paired Wilcoxon test was used. A P-value ≤ 0.05 was considered statistically significant (P-value not corrected for multiple testing). Robust validation of statistical analysis results was done using a cross-validation technique. The accuracy of the classification was assessed by means of a single cross-validation scheme. The original data set was split into a training set (80% of the samples) and a test set (20% of the samples) prior to any step of statistical analysis. The number of PLS components was chosen on the basis of a 5-fold cross validation performed on the training set only, and the best model was used to predict the samples in the test set. The whole procedure was repeated 200 times with a Monte Carlo cross validation scheme, and the results averaged. Figure 3b shows a plot of unsupervised discrimination analysis and Fig. 3c shows separation of steatosis grades in a plot of supervised discrimination analysis (pls/ca: partial least squares/canonical analysis) of metabolites in patient plasma samples. The clustering of Fig. 3c results from a supervised PLS/CA based only on the metabolomic NMR profiles. The algorithm takes into account the supervised information relative to the 3 steatosis groups. Figure 4a shows the distribution of percentage parenchymal involvement in steatotic patients derived from Table 1. The percentage is converted to a scale from zero to one and plotted with the kernel density function from the R statistical package. Fig. 4b-d display distributions separated into groups of age above/below median (median = 45), body mass index (BMI) above/below median (median = 43) and gender. Fig. 4d would suggest a slight tendency for more severe steatosis in males. A similar trend has   been reported in a NAFLD study on Australian adolescents where 3.1% of males and only 2.2% of females had moderate to severe steatosis while 7.0% of males and 14.1% of females had mild steatosis 29 .

Usage Notes
All patients in this study underwent bariatric surgery. This should be taken into account when generalizing results although these are typical cases of morbid obesity which is connected to the metabolic syndrome including NAFLD. The sample size of these datasets-in particular the transcriptomics dataset-poses certain limits onto its usage. Due to its small size it will not enable rigorous analysis of gender effects. Therefore it would likely need to be combined with other data sources, such as data from Moylan et al. 18 and Du Plessis et al. 23 .