Integrative analysis of blood and gut microbiota data suggests a non-alcoholic fatty liver disease (NAFLD)-related disorder in French SLAdd minipigs

Minipigs are a group of small-sized swine lines, which show a broad range of phenotype variation and which often tend to be obese. The SLAdd (DD) minipig line was created by the NIH and selected as homozygous at the SLA locus. It was brought to France more than 30 years ago and maintained inbred ever since. In this report, we characterized the physiological status of a herd of French DD pigs by measuring intermediate phenotypes from blood and faeces and by using Large White (LW) pigs as controls. Three datasets were produced, i.e. complete blood counts (CBCs), microarray-based blood transcriptome, and faecal microbiota obtained by 16S rRNA sequencing. CBCs and expression profiles suggested a non-alcoholic fatty liver disease (NAFLD)-related pathology associated to comorbid cardiac diseases. The characterization of 16S sequencing data was less straightforward, suggesting only a potential weak link to obesity. The integration of the datasets identified several fine-scale associations between CBCs, gene expression, and faecal microbiota composition. NAFLD is a common cause of chronic liver disease in Western countries and is linked to obesity, type 2 diabetes mellitus and cardiac pathologies. Here we show that the French DD herd is potentially affected by this syndrome.

The LW and DD pigs differ by CBC parameters. CBCs were measured from blood sampled at 8, 20, 40, 60 and 100 days of age. Because some data were missing, the datasets comprised 23, 16, 24, 20 and 17 samples, respectively (Supplementary Table 1). Eighteen parameters were measured at each time-point (Supplementary Table 2, Supplementary Fig. 1).
Twelve parameters were differentially abundant between DD and LW pigs at least at one time-point. After quality control, five parameters were discarded as non-reliable. The other seven were split in two groups displaying opposite patterns (Supplementary Table 3A). The metrics of the first group showed higher values in DD pigs (Supplementary Table 3A). They were related to viscosity and platelet activation, and included hematocrit (hct), mean corpuscular volume (mcv), mean corpuscular hemoglobin (mch), hemoglobin (hgb), and platelet distribution width (pwd).
The hct and the derived metric mcv were significantly higher in DD pigs over the whole time-course of the experiment. In the case of mch, hgb and pwd, the minipigs also presented higher values during all the period, but some time-points lacked statistical support (Supplementary Table 3A).
The second group of differentially abundant CBC parameters was linked to inflammation and included metrics related to white blood cells. These parameters tended to be lower in DD pigs (Supplementary Table 3A). Monocyte absolute number (monum) and monocyte percentage (mopro) were decreased in DD pigs, but with some inconsistencies and low statistical support.
A principal component analysis (PCA) separated the two breeds, with the first component accounting for 47.9% of the total variability (Fig. 1A). Hierarchical clustering, instead, did not found a clear split (Supplementary Fig. 2A).
Blood transcriptomes of DD and LW pigs revealed a potential nAfLD syndrome in DD pigs. Blood transcriptome profiles were obtained from 60-day-old animals. After quality assessment, 21 samples were retained (Supplementary Table 1). A total number of 3,046 differentially expressed (DE) genes were identified (Supplementary Table 4), of which 1,405 were upregulated and 1,641 downregulated in DD pigs. The values of log 2 fold change (FC) ranged from 4.1 to −10.7, but were generally low. In fact, only 38 genes presented a log 2 FC higher than 1.5 or lower than −1.5. The 20 most upregulated and downregulated genes in DD pigs compared to LW pigs are listed in Table 1.
Several exploratory approaches were used to gather information about the structure of the dataset. A PCA was carried out using the expression values of all the genes. The first component accounted for 30.2% of the total variability and separated the two breeds (Fig. 1B). A clustering analysis confirmed this split, but three samples were misplaced ( Supplementary Fig. 2B).
As a third step, a Gene Set Enrichment Analysis (GSEA) 29 was carried out. Eleven MSigDB gene sets 30 were enriched at a FDR q-value < 0.05, 4 of which (or 36.4%) were related to NAFLD (Supplementary Table 7). The results were globally consistent with those of IPA and ClueGO, as in the case of the signatures "insulin resistance (pancreas β cells)" and "mTOR signalling".
The pathways related to NAFLD were evenly distributed across all the steps of the two-hit model and were often confirmed by two or more methods of analysis.
Because these three steps of analysis suggested the potential presence of an NAFLD-related pathology, we decided to explore the genes directly related to this disorder in more detail. For this, a literature-based meta-analytic approach was chosen. First, a non-redundant consensus list of 2,551 NAFLD-related genes was created (see methods). Among them, 1,679 were represented on the microarray and 386 were differentially expressed (Supplementary Table 4). The enrichment within the DE genes list was assessed using a Fisher's exact test and a hypergeometric test, both of which were significant (respectively 0.0195 and 0.0104).
DD and LW pigs differ by their faecal microbiota. The V3-V4 16S rRNA region was sequenced on 24 60-day-old animals. After quality checks, one low quality sample with less than 300 reads was discarded. Thus, 23 samples were retained for the analysis (Supplementary Table 1  The values of α-diversity (OTU richness and Shannon diversity) did not differ between the two breeds, while β-diversity analyses suggested that taxonomic abundance profiles were significantly different between the groups, with p-values of 0.001 for both the PERMANOVA and the ANOSIM analyses ( Supplementary Fig. 3). In agreement with the literature 22,31 , more than 95% of the sequences in both breeds were represented by the phyla Firmicutes and Bacteroidetes (Fig. 3A), followed by Proteobacteria, Actinobacteria and Spirochaetes. Firmicutes were significantly more abundant in DD minipigs, while Bacteroidetes and Proteobacteria were significantly less abundant (Supplementary Table 8). The two most abundant genera in both breeds were Prevotella and Lactobacillus (Fig. 3B). Peptococcus, Lactobacillus and Ruminococcus were significantly more abundant in DD pigs, while Anaerovibrio, Prevotella, Succinivibrio, Mitsuokella and Roseburia were significantly less abundant (Supplementary Table 9).
To further characterize the composition of DD pigs microbiota, we compared our results to three available datasets, the first one obtained on NASH-affected Ossabaw minipigs 32 and the other two obtained on obese Göttingen and obese Ossabaw minipigs, respectively 22 .
In the first comparison (i.e. the contrast between DD and LW pigs versus the contrast between NASH-affected Ossabaw and lean Ossabaw pigs), 14 taxa were common to the two studies, seven of which were significantly differentially abundant in both cases, and three of which showed the same direction of variation ( Supplementary  Information 2). This was the case with Firmicutes, which were more abundant in both the contrasts, and with Bacteroidetes and Roseburia, which were less abundant. In the second comparison (i.e. the contrast between DD and LW pigs versus the contrast between obese Göttingen and lean Göttingen pigs), seven taxa were shared by both studies. Only Firmicutes and Bacteroidetes showed statistically different abundances in the two studies, but with opposite directions of variation. The other five taxa were not significantly different in our study, and only three of them showed the same direction of variation. In the third comparison (i.e. the contrast between DD and LW pigs versus the contrast between obese Ossabaw and lean Ossabaw pigs) we found eight taxa common to both studies, six of which with the same direction of variation, i.e. Actinobacteria, Prevotellaceae, Clostridium, Streptococcus, Bacteroides and Prevotella. Of note, none of these contrasts was however statistically supported (Supplementary Information 2).

Figure 2.
Representation of the functionally connected networks obtained using ClueGO 2.3.4 28 . Each node corresponds to an enriched GO term, and each colour corresponds to a GO group. The same term can be included in several groups, and its size reflects its statistical significance (refer to Supplementary Table 6). The 13 nodes at the bottom left of the picture are not connected to the network. Line width is proportional to k-score values.
integration of the different layers of phenotype information. First, a multiple factor analysis (MFA) was performed using CBCs, blood transcriptome, and faecal microbiota data. The individual factor map (IFM) revealed a split between the breeds, with the first component accounting for 20.3% of the total variability (Fig. 1D). The variable group plot ( Supplementary Fig. 4) showed that CBCs gave the highest contribution to the first dimension of variability and the lowest contribution to the second dimension, while transcriptome data showed the opposite pattern. Microbiota data presented intermediate values.
Subsequently, the links between the different data layers were studied in a pairwise fashion with sPLS 33 and using the data obtained at 60 days. In the case of CBCs and transcriptome, 17 individuals were used. The results www.nature.com/scientificreports www.nature.com/scientificreports/ were visualized producing a clustered image map (CIM) 34 , showing that CBCs were split in two main groups, corresponding to the metrics which had already been found more abundant and less abundant in DD pigs (Fig. 4A).
The most relevant associations were represented by 20 genes and 12 CBCs. Some CBC metrics, which were not differentially abundant between DD and LW pigs, did show association with DE genes (Supplementary Table 3B). Indeed, eonum, eopro, grnum and wbc grouped with the parameters showing lower abundance in minipigs. The metric eopro was discarded as not reliable (See methods).
In the case of transcriptome and faecal microbiota, sPLS was performed on 20 individuals. A CIM (Fig. 4B) revealed that the most important associations concerned 20 genes and 18 taxa. The genes were split into three groups. The first included four genes with strong positive association with Firmicutes such as Peptococcus and Ruminococcus and strong negative association with Prevotella and Roseburia, while the second included six genes showing the opposite pattern. The third group showed weaker association with the taxa.
For CBCs and faecal microbiota, the sPLS was based on 20 individuals, but no significant association between the variables was found.

Discussion
The DD minipig herd maintained inbred in France does not seem especially prone to obesity, but to date there have been no attempts to characterize it in terms of metabolic diseases. Usually, these disorders are studied by feeding the animals with high-fat and high-sucrose diets. However, by a general point of view, minipigs can develop obesity even when they receive ad libitum standard chow 22 . This has been observed, for instance, in the DD minipig herd maintained in the United Kingdom (Mick Bailey, personal communication). In our study, the animals were fed ad libitum with classical pig diet and their physiological status was studied at 60 days by comparison to LW pigs. This latter commercial breed was chosen because of its lean meat percentage 20,35 . We combined the results from three data layers, i.e. CBCs, transcriptome profiles, and 16S rDNA sequencing, and our findings suggested that DD pigs were potentially affected by a disease belonging to the NAFLD spectrum.
First, we studied the overall data structure. The MFA showed a clear split between DD and LW pigs, which was globally confirmed by the PCA analyses individually performed on each dataset. In the case of hierarchical clustering, the split between the breeds was clearly supported by the transcriptome data alone. The transcriptome was also the most informative dataset in terms of contribution to the total variation. The microbiota contributed to a similar, but slightly lower extent, while the role of CBCs was more limited. These findings were in agreement with those obtained by Mach and co-workers 31 and was possibly related to the number of variables of each dataset, which was highest in the case of transcriptome and lowest in the case of CBCs.
Then, we studied each single dataset independently to understand the molecular mechanisms underlying our biological systems, and we used pairwise sPLS analyses to better characterize their interactions. Two important hematorheological parameters, i.e. hematocrit and hemoglobin, were significantly higher in DD pigs over almost the whole time-course, making the finding more robust. High levels of hematocrit and hemoglobin are associated with obesity and metabolic syndrome 36,37 , respectively, and both metrics are related to insulin resistance, NAFLD and NASH in humans [38][39][40] . Mean corpuscular volume was also higher in DD pigs; however, even though this parameter is increased in some liver pathologies, it does not seem to be linked to NAFLD 41 . The values of platelet distribution width and mean platelet volume were also consistently higher in DD pigs, indicating platelet activation. This process is linked to obesity and NAFLD in both pigs and humans 23,[42][43][44] . Platelets participate in the process of liver inflammation 45 and their activation is specifically predictive of cardiovascular diseases 46 , which are strongly associated with NAFLD 5,10 . Because these processes are deeply intertwined, it is difficult to state whether augmented values of pwd and mpv point to a NAFLD-related disorder, a cardiovascular disease, or a combination of both. However, in a multisystemic perspective, they might suggest the presence of cardiac diseases associated to NAFLD as comorbidities. In contrast, monocyte absolute number and monocyte percentage showed a trend towards decreased values in minipigs. Low levels of monocytes have been correlated to septic processes in patients affected by chronic liver diseases, possibly representing a response to an extensive and acute inflammatory process 47 . In any case, the low levels of statistical support did not allow us to draw strong conclusions in this regard.
Transcriptome profiling also suggested a disorder belonging to the spectrum of NAFLD in DD pigs. This result was especially interesting because the enriched processes were found across each step of the two-hit model 12,13 and because they were often confirmed by more than one analysis method. Furthermore, the pathways identified as enriched showed high levels of agreement with the literature available for humans 13,14,48 and for Bama pigs 15 . These findings indicated a global shift in gene expression and suggested an impact of the disease on the physiological state of the minipigs. The pathways detected using traditional functional analysis corresponded to different molecular processes involved in NAFLD, but almost none of them represented an NAFLD-centered gene set. Indeed, a specific enrichment was obtained only with ClueGO, because both the IPA and the GSEA databases lacked appropriate ontologies. Since we wanted to study the potential presence of NAFLD in a more targeted way, we followed a meta-analytic approach based on the literature and then we obtained a significant enrichment of our DE genes list. This supported our characterization of the transcriptome in terms of NAFLD, with particular relevance because it was based on experimental expression studies (See methods). Moreover, looking at the 20 highest and the 20 lowest expressed genes, we found a strong overrepresentation of NAFLD-related genes, which added up to 25% (i.e. 5 out of 20 genes). Indeed, the same value calculated on the whole list was just 12.7% (i.e. 386 out of 3,042 genes). The genes belonging to the pathways used for the other methods of enrichment analysis, instead, accounted for only 5% of the total. This indicated that the deepest changes in the transcriptome affected genes that were directly involved in NAFLD. It was the case, for instance, with IFITM1 (log 2 FC = 4.  www.nature.com/scientificreports www.nature.com/scientificreports/ In agreement with CBC data, a specific enrichment of pathways related to cardiac disorders was also found, like in the cases of "heme metabolism" (GSEA), "platelet activation" and "cardiomyopathies" (ClueGO). This finding is meaningful in a multisystemic perspective, which stresses the link between NAFLD and cardiovascular diseases 5 .
The sPLS provided further hints about the connections between the first two data layers (i.e. CBCs and blood transcriptome). On the one hand, it confirmed the importance of the differentially abundant CBCs in the definition of the NAFLD phenotype, showing strong association with some DE genes. Interestingly, the large majority of genes (18 out of 20) was not known to be related to NAFLD and could therefore represent new candidates. This was the case, for instance, with CFAP126 and HIST2HBE, which showed strong positive association with hematorheological parameters and strong negative association with monocytes, and which grouped with the well characterized SLC11A1 49 and TBXAS 6 genes. On the other hand, the sPLS also indicated that some non-differentially abundant CBC parameters were strongly associated with DE genes, even though their biological meaning was not easy to interpret. This happened, for instance, with the metric eonum. A loss of eosinophils was reported in the adipose tissue of patients affected by NASH 52 , but it is not straightforward to compare these results to ours. Also, NAFLD is usually associated with high wbc 53 and mpv 44 values, which contrasts with our findings.
The characterization of microbiota was less clear-cut. On the one hand, the comparison of our data to those obtained in NASH-affected Ossabaw pigs 32 and in obese Göttingen pigs 22 highlighted only limited similarities. On the other hand, the microbiota of our DD pigs shared some characteristics with that of the Ossabaw breed which, according to the authors, displayed the features associated with obesity 22 . However, the lack of statistical support did not allow us to draw robust conclusions on this point. The Firmicutes to Bacteroidetes ratio was higher in DD (2.7) than in Large White pigs (1.6), which was also consistent with the pattern observed in obese Ossabaw minipigs 22 and could provide some support to the presence of obesity 32,54 . However, these pieces of evidence were not sufficient to make robust inferences about microbiota data.
The sPLS analysis provided more details about the relationship existing between microbiota and DE genes. The genus Prevotella, for instance, was negatively associated with obesity in Ossabaw minipigs 22 and was similarly underrepresented in DD pigs. According to our results, the genus Prevotella exhibited negative association with the genes FUT1, LTB4R2, STX12 and DIRAS2, which could be therefore related to this pathology. Except for LTB4R2, these genes have never been associated with obesity or NAFLD and could be therefore new candidates for these disorders. The genera Roseburia and Anaerovibrio grouped with Prevotella, pointing to a potential link to the NAFLD spectrum; however, no evidence for the involvement of these bacteria in obesity was found in Ossabaw pigs 22 .
Taken together, our results suggest that the DD minipigs could potentially present symptoms of a pathology belonging to the NAFLD spectrum. It is important to refer to NAFLD spectrum, and not to NAFLD stricto sensu, because of the inherent complexity of this disorder and because we could not produce additional physiological data to allow a specific diagnosis. In any case, NAFLD is increasingly being regarded as a multisystem disease 5 , which implies that it its constitutively linked to a large array of comorbidities.
Even if all the data layers supported our hypothesis, each one highlighted a slightly different combination of pathologies. In the case of CBCs, the results pointed to the presence of cardiac diseases and NAFLD, with a stronger representation of the former. In the case of transcriptome, NAFLD seemed to be the primary disorder, while cardiac diseases appeared as comorbidities. Microbiota data, instead, suggested the potential presence of obesity, but this evidence was not strong.
Transcriptome was the most informative data layer, and its integration with CBCs and microbiota produced consistent results. Therefore, we can speculate that our tentative characterization of the datasets in terms of NAFLD could potentially be the most accurate.
It must be considered that some limitations affect our results. Our data were obtained on young individuals, while NAFLD-related disorders are typically studied on 4-to 30-month-old animals. Therefore, many symptoms were likely not completely apparent. Moreover, transcriptome and microbiota data were only obtained at 60 days, while more time-points could provide a better characterization. Nevertheless, our work adds useful information to characterize NAFLD-related disorders in pigs, providing a robust workflow for the analysis of intermediate phenotypes.

Materials and Methods
Animals and diet. Twelve LW (i.e. 6 pairs of siblings) and twelve DD pigs (i.e. 6 pairs of siblings) of the same age were used (Supplementary Table 1). The suckling piglets had access to a pre-starter feed from day 3 (18% crude protein DM, 7.5% crude fat DM, 1.55% lysine DM) and were weaned at day 28. They were fed a starter diet from day 29 to day 40 (19% crude protein DM, 7% crude fat DM, 1.45% lysine DM) and were fed a grower diet ad libitum (16.5% crude protein DM, 2.4% crude fat DM, 1.17% lysine DM) from day 41 to day 150.
A linear-mixed model was fitted to each parameter using the lme4 R package 55 . The breed was set as a fixed effect and the sow as a random effect. Data of each day were separately analysed.
The results obtained on 60-day-old individuals were used for the subsequent analyses, while the values obtained at the other time-points (i.e. 8, 20, 40 and 100 days) were used only as a means of validation. A metric was retained only if the direction of the change in the differential abundance at 60 days (i.e. positive or negative) was consistent with at least two out of five points.
The metrics retained were rescaled using the 'rescale' function of the scales R package (https://CRAN.R-project. org/package = scales) and analysed through a PCA and a hierarchical clustering using the FactoMineR R package and selecting the Pearson's correlation coefficient and the Ward method 56 .
RNA extraction and microarray processing. Total RNA from blood of 60-day-old individuals was isolated using the PAXgene Blood RNA Kit (Qiagen). RNA quality and integrity were determined using a NanoDrop 1000 (Thermo Scientific, Waltham, MA, USA) and a Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, USA).
Microarray data analysis. The annotation of the pig array was updated using the Sscrofa 11.1 version of the pig genome (see Supplementary Information 1).
Probe intensities were background corrected using the "normexp" method, log 2 scaled and quantile normalized using the Limma R package 58 . The probes of the lowest quartile were filtered out, controls were discarded and the probes corresponding to genes were summarized.
The obtained expression matrix ("E1") was processed with the arrayQualityMetrics R package 59 for quality assessment. Three outliers and a technical bias concerning the third chip were detected. After outlier removal, the raw data were pre-processed again to obtain the "E2" expression matrix.
The differential analysis was performed using the Limma R package. A linear model was fitted for each gene, setting the breed and the chip as fixed effects, and comparing DD to LW pigs. The sow was included as a random effect using the "duplicateCorrelation" function. The p-values were Benjamini-Hochsberg corrected setting a threshold of 0.05.
An alternative linear model was tested, including also monocyte count as a fixed effect, but the small sample size (i.e. 17 individuals) determined a sharp decrease in the statistical power of detection.
The E2 expression matrix was further modified. In fact, this matrix was not corrected for the chip-related bias, which was treated as an effect by the Limma linear model. To perform other downstream analyses, this bias was corrected using the Limma "removeBatchEffect" function. The "E3" corrected matrix was used to perform a PCA and a clustering with FactoMineR 56 . functional analysis of blood transcriptome. Transcriptome functional analyses were performed using four approaches. First, the DE genes were analysed with Ingenuity Pathways Analysis 01.08 (September 2017 Release, IPA, www.ingenuity.com) to identify canonical pathways. Only those with a -log(p-value) > 1.75 and including at least 10 genes were considered.
The second analysis was realized using ClueGO 2.3.4 28 . A two-sided test was used to highlight enriched GO terms. Significance was set at an adjusted p-value of 0.05, the "GO fusion" option was used, the k-score was fixed at 0.2 and the global/local parameter at the sixth level. Three ontologies were selected: "Sus scrofa ontologies KEGG_06.07.2015", "GO_ImmuneSystemProcess_03.07.2015_09h08" and "GO_BiologicalProcess_03.07.2015_09h08".
In all these three cases, the percentage of pathways putatively related to the NAFLD spectrum was estimated by performing a comparison with a pathway consensus list obtained using 19 papers describing transcriptional profiling 6,8,[14][15][16]49,50,[60][61][62][63][64][65][66][67][68][69] or data meta-analysis 7,51 . A further review paper was also added 27 . More details are found in Supplementary Information 2. The fourth approach involved a meta-analytical screening of the literature relative to NAFLD and other related diseases. The same already listed 19 papers were used, and the available DE gene lists were retrieved and merged to obtain a non-redundant set, which was intersected with our DE gene list. The enrichment was assessed using a Fisher's exact test and a hypergeometric test.
Production and analysis of 16S data. DNA was extracted from frozen faecal samples as described by 70 .
Subsequently, the V3-V4 region of the 16S rRNA gene was sequenced using the 454 GS FLX Titanium platform (Roche, Pleasanton, CA, USA) as described by 71 . FastQC was used to perform quality checks on the raw data and the sequences were analysed with QIIME 1.9.1 72 as described previously 71 .

Scientific RepoRtS |
(2020) 10:234 | https://doi.org/10.1038/s41598-019-57127-x www.nature.com/scientificreports www.nature.com/scientificreports/ Species richness estimates (Chao1, observed OTUs, and abundance-based coverage estimators or ACE) and evenness indexes (Shannon and Simpson), were calculated using the phyloseq R package after the reads were rarefied 73 . For β-diversity, weighted UniFrac distances were first calculated on the OTUs table. Then, nonmetric multidimensional scaling (nMDS) and distances to centroid were calculated using the "betadisper" function of the vegan R package. Breed comparisons were assessed with PERMANOVA using the "adonis" function in the vegan R package 74 , and p-values were obtained using Monte Carlo random draws. Also, an ANOSIM was performed using the vegan R package to test the significance of the difference between the breeds.
A linear-mixed model was fitted on the OTUs table using the lme4 R package and setting the breed as a fixed effect and the sow as a random effect. Relative abundance (%) differences at phylum and lower taxonomic levels between DD and LW pigs were assessed using MetagenomeSeq 1.20.1 75 . A Kruskal-Wallis test was performed, and q-values were Benjamini-Hochberg adjusted. The differentially abundant taxa were compared to the results published in two recent papers 22,32 by following the procedure described in Supplementary Information 3. A PCA and a clustering were also performed. 16S data were submitted to SRA (accession number SRP144235).
Data integration. An MFA 76 was used to integrate CBCs, blood transcriptome, and 16S data using the FactoMineR R package 56 . The results were visualized using an individual factor map (IFM) and a variable groups plot.
The datasets were then studied in a pairwise fashion with sPLS 33 using the mixOmics R package (https:// cran.r-project.org/web/packages/mixOmics/) 77 . For each dataset pair, the analysis was performed using only the 60-day-old individuals for which the data belonging to both datasets were available. For CBCs and transcriptome and for transcriptome and 16S, the statistical models were optimized using the "perf " function and setting the "ncomp" parameter to two, and a CIM was used to visualize the data. No valid model was obtained for CBCs and 16S data.