Genome-wide association studies for 30 haematological and blood clinical-biochemical traits in Large White pigs reveal genomic regions affecting intermediate phenotypes

Haematological and clinical-biochemical parameters are considered indicators of the physiological/health status of animals and might serve as intermediate phenotypes to link physiological aspects to production and disease resistance traits. The dissection of the genetic variability affecting these phenotypes might be useful to describe the resilience of the animals and to support the usefulness of the pig as animal model. Here, we analysed 15 haematological and 15 clinical-biochemical traits in 843 Italian Large White pigs, via three genome-wide association scan approaches (single-trait, multi-trait and Bayesian). We identified 52 quantitative trait loci (QTLs) associated with 29 out of 30 analysed blood parameters, with the most significant QTL identified on porcine chromosome 14 for basophil count. Some QTL regions harbour genes that may be the obvious candidates: QTLs for cholesterol parameters identified genes (ADCY8, APOB, ATG5, CDKAL1, PCSK5, PRL and SOX6) that are directly involved in cholesterol metabolism; other QTLs highlighted genes encoding the enzymes being measured [ALT (known also as GPT) and AST (known also as GOT)]. Moreover, the multivariate approach strengthened the association results for several candidate genes. The obtained results can contribute to define new measurable phenotypes that could be applied in breeding programs as proxies for more complex traits.

cholesterol and other metabolite levels) or enzyme functions and activities (e.g. hepatic and muscle enzyme stress indicators). Therefore, the identification of DNA markers associated with these traits in animals might provide information to indirectly overcome the limited genetic progress that traditional livestock selection programs have on disease resistance, robustness and resilience 11 . In addition, variability in the basal levels of these parameters has been also associated with different predisposition to a wide range of human diseases, including cardiovascular, cancer, metabolic, infection and immune disorders (e.g. [12][13][14][15][16]. Haematological traits can be classified according to the blood cells from which they derive: erythrocyte-related traits, leukocyte-related traits and platelet-related traits. Clinical-biochemical traits include the amount of biochemical and mineral components of the blood and the activities of several enzymes. Most of blood parameters can be regarded as the result of simpler biological processes than classical performance and production traits and could be considered as intermediate (or internal) phenotypes 8 .
Several QTL studies investigating haematological and immune capacity traits have been reported in pigs. Most of these studies used F2 and backcross reference families having, as parental animals, pigs of different breeds and lines, including wild boars, commercial European breeds (Duroc, Landrace, Large White, Pietrain, Yorkshire) and Asian breeds or lines (Erhulian, Korean native pig, Meishan, Minzhu, Songliao Black Pig) [17][18][19][20][21][22][23][24][25][26] . A few other authors have applied genome-wide association studies in Chinese Sutai pigs and in the German Landrace population 27,28 . Results of these works reported few overlapping QTL regions, probably derived by the genetic heterogeneity of the investigated pig populations, differences among the analysed traits or parameters and the variety of the experimental designs.
This study reports the results obtained combining different genome-wide association strategies, i.e. single-marker (single-trait and multivariate associations) and Bayesian (windows-based single-trait) association studies, for a total of 30 haematological and blood clinical-biochemical parameters in Italian Large White heavy pigs. Based on different statistics, these methods present different pros and cons, and their combined use could overcome drawbacks derived by the population structure or by the experimental design. The single-maker approach is one of the most adopted genome scan methods. In its simplest version, this approach is implemented by means of a liner model relating, via an additive model, a phenotype y (measured in n individuals) to genotypes x (aa/aA/AA usually coded as 2/1/0, according to the number of copies of the minor allele a) 29 . Implemented as a linear mixed model (LMM), the regression model is augmented by (i) a random genomic effect g that account for sample relatedness (population stratification) via a relatedness matrix K [either a pedigree-based kinship matrix (A) or a genome-based matrix (G) also known as Genomic Relationship Matrix (GRM)] and (ii) additional fixed effects accounting for other confounding factors (e.g. sex, body weight) 29 . However, this approach fits one SNP at a time, and multiple testing correction of results is an issue that limits the power of this approach 30 . Moreover, due to imperfect linkage disequilibrium (LD), the effect of a putative gene/QTL can be only in part captured by a single SNP. The Bayesian approach, here implemented as multi-marker windows-based approach, can overcome these limits by fitting multiple markers simultaneously. Over the years several models have been proposed (e.g. BayesA, BayesB, BayesC, BayesD, BayesR) differing mainly in the distributional assumption of the SNP effect. In Bayesian analyses inference of marker association is based on the posterior distribution of markers effect, estimated using Markov Chain Monte Carlo (MCMC) methods and considering SNPs effects within a defined genome window 29 . Lastly, by analysing more phenotypes simultaneously, the multivariate method can increase the statistical power and identify pleiotropic loci 31 . In doing so, the multivariate linear mixed model (mvLMM) adds the cross-traits covariance as extra information 31 . Single-marker and Bayesian approaches have been already efficiently used together to detect QTLs affecting haematological and blood clinical-biochemical parameters in pigs (e.g. 28,[32][33][34].
Results obtained in this study were then compared to what was reported by previous works carried out in different pig populations using the same or similar blood derived phenotypes. Identified QTLs showed a low chromosome region overlap with those mapped in other studies. Some of them highlighted genomic regions harbouring genes that, for their functions, may be the obvious candidates explaining the detected genetic variability.

Methods
Animals, blood collection and analyses. All animals used in this study were kept according to the Italian and European legislations for pig production. All procedures described are in compliance with Italian and European Union regulations for animal care and slaughter. Performance testing of the animals was carried out under the national selection program of heavy pigs. Pigs were not raised or treated in any way for the purpose of this study. Animals were slaughtered in an approved commercial abattoir following the regular procedures for commercial pig slaughtering at the end of their production cycle. Pigs were sampled after slaughtering. For all these reasons no other ethical statements are needed.
Genome-wide association studies were conducted on a total of 843 performance tested Italian Large White pigs (278 castrated males and 565 gilts, obtained from 86 boars and 377 litters), slaughtered over 25 different days. These animals were part of the sib-testing program of the Italian Large White pig population that is based on triplets of pigs of the same litter (two females and one castrated male) that are individually performance tested at the Central Station of the National Pig Breeder Association (ANAS) for the genetic evaluation of a boar from the same litter (sib-testing). Pigs started their performance evaluation at 30-45 days of age until they reached 155 ± 5 kg live weight 35 . All animals were fed with the same standard commercial feed for fattening pigs under the production rules of the Parma and San Daniele dry-cured ham consortia. At the end of the test, animals were transported to the same commercial abattoir where they were slaughtered with standard procedures in the morning (07.00-08.00 a.m.; after overnight fasting of about 12 h) using electrical stunning. Blood was collected just after jugulation and exsanguination, into an EDTA containing tube and a serum separator tube with gel separator and clot activator (Vacutest Kima s.r.l.).

Data analyses. Data transformations.
The Box-Cox transformation method 38 was used to normalize each blood parameter as follows: The optimal value for λ was selected by maximum log likelihood, considering a uniform grid of 3,001 values of λ in an interval between −3 and +3. The effects of sex, weight and slaughtering date have been considered during the normalization process. Data normalization was done in R v. 3.0.2 39 by using the "MASS" and "CAR" packages.
Confounders removal and correlation network. Environmental and technical factors can affect the levels of blood parameters, so it is important to remove these confounding effects. Here, residuals derived from a linear regression model were considered. The basic model was: where y i is the level of the blood parameter for the i th animal [after the Box-Cox transformation; see Eq. (1)], β 0 is the intercept term, w i indicates the weight of the i th animal, s i is a dummy variable representing the sex of the i th animal, d i1 , …, d i(J−1) is a set of J = 25 dummy variables coding the blood collection date for the i th animal, while β w , β s and β Cj are the corresponding regression coefficients and ε i is the error term. Confounding effects are removed by computing the residuals: are the least squares estimates of model parameters. Dependences among measured blood parameters were investigated with a correlation network. We obtained a Pearson's correlation matrix R, whose generic entry r (i.e. the Pearson's correlation coefficient between the h th and the k th residual blood parameter level) is: where the test statistic t is given by: with n equal to the sample size (n = 843). Bonferroni correction was applied considering a nominal level of α = 0.05 and a total of ( ) m 2 correlation coefficients. This resulted in a threshold value of p-value = 1.15 × 10 −4 corresponding to a minimum |r| = 0.133. However, the network resulted highly connected, so we retained only correlation coefficients with |r| > 0.4 (medium correlation). Correlation coefficients and significances were computed in R with the function cor.test. The network was visualized and annotated with Cytoscape 3.0 40 .
Genome-wide association analyses. Three methods (described in detail below) were used for genome-wide association analyses. Before fitting the genome-wide association models, the effect of slaughtering date on each normalized blood parameter was removed by obtaining residuals from a linear regression model.
(1) Single-marker single-trait genome-wide association analysis Genome-wide association studies were performed examining each trait-SNP pair, hereafter denoted as (j, i), where j = 1, …, q (q = 30) and i = 1,…p (p = 45,536). Additive genetic models assuming a trend per copy of the minor allele were used to specify the dependency of each blood parameter on genotype categories. The following linear mixed effect model was specified: where y (n × 1) is a vector containing blood parameter (residuals of the normalized blood parameter) for the n th animal, W (n × k) is a covariate matrix with k = 3 (a column of 1 s, sex, and weight) and α is the k-dimensional vector of covariates effects, x (n × 1) is the vector containing genotypes for the i th SNP (coded as 0, 1, 2, according to the number of copies of the minor allele), β is the additive fixed effect of the i th SNP on blood parameter, g~N(0,σ 2 g K) is a multivariate Gaussian polygenic effect, with covariance matrix proportional to the relatedness matrix K (n × n) and e~N(0,σ 2 e I) is a multivariate Gaussian vector of uncorrelated residuals. The assessment of the association between each SNP and blood parameter was obtained by testing the null hypothesis H 0 :β = 0. Significance was tested by using the Wald test. All the models were fitted with GEMMA 41 after computing the relatedness matrix K as a centred genomic matrix (this matrix provides a good control for population structure 41 ). To account for multiple comparisons, we opted for the Bonferroni correction, which considered a total 45,536 SNPs, 30 phenotypes and a value of α = 0.05. We estimated a threshold of − = = . × 8 . However, the Bonferroni correction assumes independence among the performed tests, so that it is inherently conservative when applied to correlated phenotypes and genetic data that exhibits high linkage disequilibrium 42,43 . Therefore, to take in consideration also moderate associations, and balance the risk of Type I and Type II errors, in our analyses we considered a less conservative significance threshold of p-value = 5.0 × 10 −05 , as widely adopted in genome-wide association studies in farm animals (i.e 2,37,[44][45][46][47] ). Based on this threshold, for a QTL region related to the analysed trait, the SNP with the lowest p-value was considered as a "tag" SNP. Given the presence of multiple peaks on the same chromosome, tag SNPs for the trait "basophil count" (BASO) were identified by using Haploview 48 . The proportion of variance in phenotype explained by a given SNP (PVE) was computed as described in 49 . Briefly, PVE was estimated as follows: where: βˆ, βŝe( ) and MAF are the effect size estimate, the standard error of the effect size estimate and the minor allele frequency of a given SNP, respectively. N represents the sample size. GEMMA was used also to retrieve, for each trait, the chip heritability (or SNP heritability; h SNP 2 ) estimated by the whole set of available genotypes. QQplots and Manhattan plots were generated in R by using the "qqman" package.
(2) Multivariate genome-wide association analysis Multivariate genome-wide association (single-marker multi-traits) studies were performed with GEMMA by fitting the following multivariate linear mixed model: where y (n × d) is a matrix containing d blood parameters (residuals of the normalized parameters) for the n animals, W (n × k) is a covariate matrix with k = 3 (a column of 1 s, sex, and weight) and A (k × d) is the matrix of the corresponding coefficients including the intercept, x (n × 1) is the vector containing genotypes for the i th SNP (coded as 0, 1, 2, according to the number of copies of the minor allele), β is the www.nature.com/scientificreports www.nature.com/scientificreports/ additive fixed effect of the i th SNP for the d phenotypes, G (d × n) is a matrix of random effects, E (d × n) is matrix of residual errors, K (n × n) is the relatedness matrix, I (n × n) is the identity matrix, V g (d × d) is the symmetric matrix of genetic variance component, V e (d × d) is a symmetric matrix of environmental variance component and MN d×n (0, V 1 , V 2 ) denotes the d × n matrix normal distribution with mean 0, row covariance matrix V 1 (d × d), and column covariance matrix V 2 (n × n). Association between each SNP and blood parameters was obtained by testing the null hypothesis H 0 :β = 0. Wald test was used to test association significance. As relatedness matrix K we computed with GEMMA a centred genomic matrix. Significant threshold, QQ plots and Manhattans plots were defined or produced as reported for the single-marker single-trait analyses.
(3) Bayesian genome-wide association analysis (windows-based scan) As a first step for this approach, missing genotypes were imputed with Beagle v.4.1 50 . Bayesian analyses were performed with GenSel v.4 51 , using the method Bayes-C. This method uses all SNPs simultaneously and assumes a common variance for all SNPs. The model was as follows: where y (n × 1) is a vector containing blood parameter (residuals of the normalized parameter) for the n th animal, W (n × k) is a covariate matrix with k = 4 (sex, weight and two columns for relatedness), b is the vector of fixed effects, J is the number of SNPs, x j is the vector containing genotypes for the SNP j (coded as 0, 1, 2, according to the number of copies of the minor allele), β j is the random substitution effect for the j th SNP, which conditional on σ 2 β was assumed normally distributed N (0, σ 2 β ) when δ j = 1 but β j = 0 when δ j = 0, with δ j being a random 0/1 variable indicating the absence (with probability π) or presence (with probability 1 − π) of SNP j in the model, and e is the vector of the random residual effects assumed normally distributed N (0, σ 2 e ). Relatedness was accounted by considering the first two eigenvectors of the centred genomic relatedness matrix K, as computed in R by using the function "eigen". The Markov Chain Monte Carlo (MCMC) method, as implemented in GenSel, was used to obtain the posterior distributions of SNPs effects. This comprised a burn-in period of 1,000 iterations from which results were discarded, followed by 51,000 iterations from which results were accumulated to obtain the posterior mean effect of each SNP 28,47 . In the Bayesian variable selection, multiple-regression models with π = 0.995, about 200-250 SNPs were fitted simultaneously in each MCMC iteration. The cumulative effect of markers within 1 Mb non-overlapping genome windows was computed in GenSel. The window effect was expressed as the percentage of total genetic variance contributed by each window 28 . A total of 2,332 genomic windows were retrieved. The expected percentage of genetic variance explained by each genomic window (%Var) was equal to 0.043% (100/2,332). The genomic windows explaining more than the 0.043% of genetic variance were considered non-random associations. However, a more stringent and reliable threshold, ranging from 0.2 to 1.00 (which means from ~5X to ~25X the expected variance), is generally applied to consider as QTLs the identified regions, with a threshold value of %Var = 0.5 commonly adopted 28,47,[51][52][53][54][55] . Here, a medium moderate threshold, equal to 0.7 (~17X the expected variance), was used: (i) to declare the presence of QTLs and (ii) to confirm those identified via the single-marker single-trait approach. Manhattan plots representing the proportion of genetic variance of consecutive non-overlapping genomic windows were obtained with the "qqman" R package.

Functional annotation of QTL regions and comparative analysis with previous studies.
Different QTL regions for a specific parameter were defined considering significant SNPs that mapped at least one Mbp apart from another significant SNP. Functional analysis was carried out by (i) retrieving all annotated genes in Sscrofa11.1 from a region spanning ±1 Mbp the significant SNPs that identified the borders of that QTL and by (ii) evaluating the relevance of the genes in affecting the considered parameter through the scientific literature, selected combining as keywords the name of the gene under analysis (or synonyms) and the parameter itself (including alias and related traits). In addition, traits (or groups of similar traits defined by the correlation network analysis; see above) for which at least five QTLs were identified, gene enrichment analysis was carried out with NET-GE (http://net-ge.biocomp.unibo.it/enrich) 56 . NET-GE took as input all genes in the QTL regions or only the closest gene to the most significant SNP or only preselected genes based on their functions (as defined above), including at least one gene per QTL region. Over-representation analysis was carried out considering statistically enriched terms with a p-value < 0.05, Benjamini-Hochberg corrected (False Discovery Rate, FDR). Analyses run over the Gene Ontology (http://geneontology.org/), KEGG (http://www.kegg.jp/) and Reactome (https://reactome.org/) databases.
Comparative QTL mapping analysis across studies was obtained using the pigQTL database (https://www. animalgenome.org/cgi-bin/QTLdb/SS/index) 57 with direct evaluation on the related published literature in pigs.

Results
Blood parameters and networks. A total of 843 Italian Large White pigs were analysed for 30 blood-related parameters: 15 haematological traits and 15 clinical-biochemical parameters. Descriptive statistics for all these traits are reported in Supplementary Table S1.
A correlation network based on Pearson's correlation coefficients was used to study the dependence among these parameters. The modelling resulted in a network of 30 nodes (of which 10 were singletons) and 27 edges (|r| > 0.4; Fig. 1). This network was characterized by modules describing more complex traits, such as erythropoiesis or leukopoiesis. Three clusters emerged: (i) a large module evidencing two sub-clusters, one Genome-wide analyses. The proportion of phenotypic variance (PVE%) explained by SNPs (single-marker single-trait analysis) ranged from 3.8 for BASO, to 9.0 for LDL-Chol (Supplementary Table S2 Table S1). The average h SNP 2 Standard Error (SE) associated to heritability estimates was equal to 0.060 and ranged from 0.028 to 0.071 (Supplementary Table S1). Genome-wide association analyses were carried out using three approaches (single-marker single-trait analysis, multivariate analysis and Bayesian analysis). The single-marker single-trait approach could account for the population structure of the pigs of the sib-testing program 29,30 . Bayesian analysis was applied to overcome the multiple testing correction to declare significant results in an experimental design in which the power was limited by the number of animals for which blood traits could be measured 30 . Lastly, because of the similarity of several blood phenotypes, the multivariate approach was carried out to boost the power of genome scans by detecting pleiotropic loci 31 .
In the single-marker single-trait analysis, a total of 51 unique tag SNPs was associated to all the 15 haematological traits and to 12 out of 15 clinical-biochemical traits investigated in this work. Table 1 includes only the top/tag markers associated with the investigated traits. Supplementary Table S2 reports all the identified SNPs (no. = 190) associated to these blood-related phenotypes.
The Bayesian analysis (windows-based single-trait) reported a total of 22 windows with an explained additive genetic variance >0.7% each, for twelve different traits ( Table 2). Fifteen of these 22 windows (~70%) overlapped significant SNPs reported in the single-marker analysis.
The single-marker multi-trait analysis reported a total of 13 tag SNPs associated to five out of six groups of haematological traits (Table 3; no significant associations were obtained for the [NEUTRO-WBC] cluster). Supplementary Table S3 reports all the identified SNPs (n = 49) associated to these clusters of blood parameters. Comparing each set of blood parameters against the related single-trait analyses, five SNPs were in QTL regions not highlighted by the other two single-trait methods (i.e. single-marker and Bayesian approaches).
By combining results from the single-trait and the multivariate genome-wide association scans, a total of 52 unique QTL regions were identified (Supplementary Table S4). These regions were distributed on all autosomes  www.nature.com/scientificreports www.nature.com/scientificreports/ except on porcine chromosome (SSC) 11. Four QTL regions were reported only by multi-trait groups, 41 were detected only by single-traits and the remaining seven were detected by both multi-trait and single-trait analyses. A total of 23 QTL regions were for haematological traits (for all 15 parameters), 28 were for clinical-biochemical traits (for all 14 parameters; ALP levels were not statistically associated to any genomic marker) and one (on SSC3, position 12-18 Mbp) was shared between haematological and clinical-biochemical parameters (LDL-Chol, T-Chol and EOSINO).
Information on QTLs identified by other studies on the same traits matching the same chromosome regions reported in this work is included in Tables 1-3. Only six out of 52 QTL regions have been identified by other studies for the same or similar traits (Tables 1 and 2).
Manhattan plots constructed over-imposing genome-wide association results for all considered haematological parameters and for all clinical-biochemical traits, carried out using single-marker single-traits linear models, multivariate scan and Bayesian (windows-based single-trait) analyses, are reported in Fig. 2. Manhattan plots produced separately for each trait are reported in Supplementary Fig. S1 (single-marker single-trait), Supplementary  Fig. S2 (multivariate approach) and Supplementary Fig. S3 (Bayesian analysis). Q-Q plots are reported in Supplementary Fig. S4 and genomic inflation factors (λ) are included in Supplementary Table S5. Erythrocyte traits. A total of 21 SNPs/windows were identified for the seven investigated erythrocyte-related traits (in single-trait or multivariate analyses), with some SNPs/windows shared among the 30 analysed phenotypes: (i) two for RBC; (ii) three for HGB, one detected with the single-marker approach (single-trait and multi-trait) and two identified with Bayesian analysis); (iii) two for HCT; (iv) two for MCV, one of which (on SSC8) confirmed by the Bayesian method; (v) two for MCH, one of which (on SSC14) confirmed by all the three approaches; (vi) two for MCHC; (vii) three for RDW; (viii) three for the multi-trait group [MCH-MCHC-MCV-RDW], one of which not detected with the other single-trait methods and (ix) three for the multi-trait set [RBC-HGB-HCT], all of them detected also with the single-marker single-trait analysis. Combining all these QTLs, a total of 13 QTL regions (affecting one or more traits) were identified (Tables 1-3; Supplementary Table S4; Fig. 2).
Three traits (RBC, MCV and MCH), that quantify red blood cells (RBC) or define ratios including the number of red blood cells at the numerator or denominator of the formulas (see Supplementary Table S1), identified significant markers in the same SSC5 region (mapping at position ~15 Mbp). This region was also identified by the multivariate analysis of the two erythrocyte-related clusters, i.e. [RBC-HGB-HCT] and [MCH-MCHC-MCV-RDW], confirming the presence of a QTL region for erythrocyte traits. A few genes highly expressed in bone marrow (ADCY6, TUBA1B, TUBA1C and KMT2D), that could be considered candidates for red blood cell related parameters, are annotated in this chromosome region. Porcine chromosome 5 had another QTL for MCHC at position ~86 Mbp, close to the TMPO gene highly expressed in bone marrow.
Bayesian approach identified a QTL for HGB in a window on SSC7 (at position ~50 Mbp) that contains several annotated genes. One of which, EFL1, is ubiquitously expressed in bone marrow. Another gene is ALPK3, harbouring variants that have been associated in humans to the mean corpuscular haemoglobin concentration (MCHC) 58     www.nature.com/scientificreports www.nature.com/scientificreports/ also a QTLs for RBC count on SSC7 (position 27-28 Mbp). This genomic region contains GCLC, a gene coding for the catalytic subunit of the glutamate-cysteine ligase (GCL). Mutations in the GCLC gene, causing GCL deficiency, have been linked to human haemolytic anaemia 59 .
A QTL for MCV was identified on SSC8 by the single-marker single-trait and Bayesian methods, centred on marker DRGA0008367 (position ~17.4 Mbp). An annotated gene included in the detected Bayesian window is PPARGC1A, which encodes for a regulator involved in fibre muscle type formation, blood pressure and cellular cholesterol homeostasis and fat deposition.
The RDW QTL identified on SSC12 (position ~40 Mbp) is marked by a SNP positioned within the UNC45B gene, whose known functions and restricted heart expression might not be related to RDW. A QTL on SSC13 for HGB positioned at ~177 Mbp (close to ROBO2 gene) was also identified with the multi-trait [RBC-HGB-HCT] group.  www.nature.com/scientificreports www.nature.com/scientificreports/ Two QTLs were located on SSC14. One at position ~2.9 Mbp was only identified using the multi-trait analysis with the cluster [MCH-MCHC-MCV-RDW]. On the same chromosome, two related traits (MCH and MCHC) showed significant markers at position ~6.5 Mbp, close to BIN3, a gene reported to be highly expressed in bone marrow. Moreover, this genomic region contains DOK2, a gene which was suggested to regulate the differentiation of primitive erythrocytes in zebrafish embryos 60 , and XPO7, which was associated to the levels of MCV, MCH and RBC in humans 61,62 . This region was also confirmed by the [RBC-HGB-HCT] multi-trait set.
Two QTLs for RDW were mapped on SSC16 (one at position ~33.2 Mbp and one at position ~67.4 Mbp) with the single-marker single-trait analysis. Two QTLs were also reported for HCT on SSC7 and SSC18.
Leukocyte traits. Ten QTLs were identified for the six leukocyte traits using the single-marker single-trait-approach, one of which (basophil count, on SSC14) was also identified with the Bayesian method (Tables 1 and 2).
Only one QTL was identified for each lymphocyte count (LYMPHO; on SSC2), neutrophil count (NEUTRO, on SSC4), white blood cell count (WBC; on SSC6), basophil count (BASO; on SSC14) and monocyte count (MONO; on SSC15). Several closest annotated genes have functions that might be indirectly linked to the reported QTLs. For example, the SNP that marked the NEUTRO QTL on SSC4 is within the RB1CC1 gene, that is involved in autophagy biopathways. The basophil QTL region identified the largest number of significant markers (n. 101 of which 21 have been considered as tag SNPs), spanning a region of SSC14 from position ~66.4 to ~73.4 Mbp. In this region, the Manhattan plot (Fig. 2) evidences two close peaks potentially underlying two QTLs for this trait (but not completely separated to formally consider them two distinct QTL regions). This large region also includes the highly significant tag SNP (ALGA0079529, p-value = 6.53 × 10 −09 ) that is within the STOX1 gene. STOX1 encodes for a DNA binding protein involved in preeclampsia. Other genes whose function might better support their candidacy in affecting basophil number are embedded in this large SSC14 QTL region: CCNC that is involved in the regulation of human hematopoietic stem/progenitor cell quiescence 63 ; EGR2 that controls adaptive immune responses by temporally uncoupling expansion from T cell differentiation 64 ; LIN28B that encodes for pluripotency factor implicated in driving a fetal hematopoietic program 65 ; JMJD1C which encodes for a hematopoietic transcription factor and that human gene variants have been associated with WBC 66 .
Eosinophil count (EOSINO) showed five QTLs on four different chromosomes (two on SSC3, two on SSC7 and one on SSC10). Eosinophils have a key role in the allergic inflammatory response. Therefore, it is interesting to note that the first significant region for EOSINO on SSC3 is identified by a SNP that is within the RBFOX1 gene, whose variability in humans has been associated with food allergy 67 .
Platelet traits. Two platelet traits have been investigated in this work showing a total of three QTLs, two for PLT (on SSC1, also detected with the Bayesian approach, and on SSC4) and one for MPV (on SSC12). The identified regions contain several genes whose known functions can be indirectly considered to have a putative role in platelet activity, level or development. For example, the PLT significant marker on SSC4 is within the ZBTB7B gene that encodes for a key regulator of lineage commitment of immature T-cell precursors, involved in several other functions.    www.nature.com/scientificreports www.nature.com/scientificreports/ in the human APOB gene cause familial hypercholesterolemia that is characterized by pathogenic elevated LDL-cholesterol levels and atherosclerosis (e.g. 68 ).
Significant SNPs identifying other LDL-Chol and T-Chol QTLs were very close to or within genes that, based on their functions, might be directly involved in affecting cholesterol related traits. For example, ALGA0004272 (SSC1 at position ~72.3 Mbp), that identified a QTL for LDL-Chol, is close to the ATG5 gene (position 72,345,004-72,520,448) which is one of the genes required for formation of the autophagic isolation membrane and engulfment. Macrophage specific Atg5-deficient mice were demonstrated to have decreased cholesterol efflux and for this reason the encoded protein is considered a key element in affecting cholesterol level in blood [69][70][71] . Another QTL for LDL-Chol on SSC1 identified by SNP ALGA0008284 (position ~228.9 Mbp) indicates the PCSK5 gene as the most plausible candidate (position 228,854,608-229,308,906; including the mentioned marker). Genetic variations in the human PCSK5 were shown to modulate high-density lipoprotein cholesterol levels with impact on other cholesterol fractions 71 .
The ADCY8 gene, that is involved in a metabolic pathway associated with high-density cholesterol in human cohorts 72 , was tagged by ALGA0022970 (positioned within the porcine ADCY8 on SSC4). This marker was associated with T-Chol (single-marker single trait) and HDL-Chol (windows-based) in our study. For the same trait, we identified a QTL on SSC7, tagged by a SNP (MARC0003814; position ~17.2 Mbp) that is close to two candidate genes: CDKAL1 (position 15,910,494-16,674,989) and PRL (position: 17,449,463,970). Variants in the human CDKAL1 gene have been associated with cholesterol efflux capacity 73 . The PRL gene, coding for the hormone prolactin, is well known to regulate cholesterol stores in male and female gonads and plasma total cholesterol concentration (e.g. 74 ).
The multi-trait analysis of the three cholesterol-related traits, i.e. T-Chol, LDL-Chol and HDL-Chol, strengthened the association of the marker DIAS0000055 in the APOB gene (p-value = 8.55 × 10 −12 ) and confirmed the QTLs on SSC1 (first region) and SSC4 and thus the potential involvement of the ATG5 and ADCY8 genes in affecting blood cholesterol level. Moreover, the multi-trait analysis for the same cholesterol traits highlighted a new QTL on the SSC2 (ALGA0013564, position ~43.6 Mbp) near the SOX6 gene (position 42,452,778-43,068,323). An in vivo mouse study demonstrated that Sox6 is involved in the regulation of serum and liver triglyceride as well as serum cholesterol levels 75 .
Three NEFA QTLs were reported in three different porcine chromosomes (SSC2, SSC9 and SSC14). The SSC2 marker associated with NEFA level is close to the SLC25A46 gene (position 115,529,496-115,555,452) which encodes for a member of the mitochondrial solute carrier family involved in several metabolic pathways, including fatty acid oxidation 76 . The QTL on SSC9, identified with a marker at position ~5.7 Mbp, is close to the TRIM21 gene (position 5,783,322-5,789,812 bp) which regulates the acetylated form of the fatty acid synthase (FASN), a key enzyme in the fatty acid pathway 77 . The significant marker on SSC14 is close to a few genes (CDK1, PSMA4 and RHOBTB1) whose role might indirectly affect blood content of NEFA.
Genes annotated in the SSC16 region in which a QTL for blood triglyceride (TG) levels was assigned did not have any obvious or known functions that could be related to this biochemical trait.

Metabolism and protein related traits.
A QTL for glucose content was identified with a marker on SSC7 (ALGA0110857) close NRXN3 gene (position 101,132,780-102,779,809) which has been associated with human obesity and energy balance 78 .
Several genes were within (i) the two QTL regions (on SSC5 and SSC9) identified for urea blood content, (ii) the QTLs identified on SSC14 and SSC17 for total bilirubin content, (iii) the SSC6 QTL for total blood protein and (vi) the SSC7 QTL for the Albumin-Albumin-Globulin-ratio cluster. However, their known functions could not be attributed to a direct role on the observed effects.

Enzyme traits.
No markers were associated with alkaline phosphatase (EC 3.1.3.1, ALP), even if some SNPs on SSC2 were just below the significant threshold.
Alanine aminotransferase [EC 2.6.1.2, ALT or ALT1; also known as alanine transaminase (AAT1) or glutamate-pyruvate transaminase (GPT)] activity showed three QTLs, one on SSC4, one on SSC18 and another one on an unassigned scaffold (NW_018084979.1; Table 1). The QTL on SSC4 was marked by a SNP (ALGA0029783, position ~1,16 Mbp) close to the porcine GPT gene (position 297,747-302,498 bp), supporting a direct role of the gene encoding the analysed enzyme in the identified QTL related to its function.
The activity of aspartate aminotransferase [EC 2.6.1.1, AST; also known as aspartate transaminase or serum glutamic oxaloacetic transaminase (sGOT)] was significantly associated with a marker on SSC14 (INRA0046629 at position ~110.4 Mbp) GOT1 gene (position 110,608,422-110,635,901). GOT1 encodes for the cytoplasmic form of the enzyme whose activity was also measured, supporting again a direct involvement of this gene in the identified QTL related to its function. AST showed a high degree of correlation (r = 0.844) with creatine kinase (CK), so these two parameters were jointly analysed in the multivariate GWA scan using the [AST-CK] cluster. Creatine kinase and AST are two enzymes mostly present in muscle cells whose increased level in serum is an indicator of muscle stress or damage, whereas a high AST level is considered to be derived from liver damage 81,82  www.nature.com/scientificreports www.nature.com/scientificreports/ human CPN1 gene have been associated with plasma levels of both ALT and AST enzymes 83 . Moreover, variations in the human CPN1 gene have been also associated with blood CK levels 12 .
Another [AST-CK] QTL was located on SSC2 (position ~8. 3-8.4 Mbp) but in this region no annotated genes might be directly involved in functions producing altered blood AST and/or CK levels.
Gene enrichment. Gene enrichment analysis was performed with NET-GE, which performs both a standard and a network-based analysis (the latter taking advantage of protein-protein interaction networks to better define biological functions) 56,84 . When applied for traits for which at least five QTL regions were identified in the genome association analyses, NET-GE highlighted statistically significant terms when genes were preselected based on their functions for cholesterol related traits. A gene list identified as mentioned above (including ADCY8, APOB, ATG5, CASTOR2, MYO18B, NPAP1, PCSK5, STAB2, SOX4 and SOX6) showed significant terms for the Gene Ontology Molecular Function (GO:MF) and the Reactome databases. The GO:MF highlighted four terms involving two genes of the input set. Two terms were leaves of the GO:MF hierarchy: low-density lipoprotein particle binding (GO:0030228, p-value = 0.012) and lipoprotein particle receptor activity (GO:0030169, p-value = 0.026). Over the Reactome database, two terms (involving four genes of the input set) were retrieved: (i) scavenging by Class H Receptors (R-HSA-3000497, p-value = 3.61 × 10 −04 ), related to the vesicle-mediated transport, and (ii) deactivation of the beta-catenin transactivating complex (R-HSA-3769402, p-value = 0.025), related to a transduction pathway. Overall, these terms confirm the biological mechanisms underlying the identified QTLs related to blood cholesterol levels. For the other traits for which five or more QTLs were identified (i.e. eosinophil count and the multi-trait group [MCH-MCHC-MCV-RDW]) no significantly enriched terms were shown.

Discussion
Sub-optimal farming practices and environmental stressors might reduce animal response to adverse conditions and increase disease susceptibility. To link physiological aspects of the animals to their potential production performances in many different conditions, intermediate (internal) phenotypes should be measured and used to describe the underlying fine biological mechanisms related to these aspects. In this context, blood parameters provide biomarkers with several applications in animal sciences: (i) to detect and monitor the pathophysiological status of the animals (diagnostic and monitoring biomarkers) and, in normal or challenging conditions, (ii) to indicate the potential to develop a disease or a sensitivity to an exposure (susceptibility/risk biomarkers), (iii) to identify animals that might experience a favourable or unfavourable effect from a particular condition or exposure (predictive biomarkers), and (iv) to identify likelihood of an adverse condition or disease progression 85 . Hematopoietic cells are responsible for a range of different functions including oxygen and dioxide transport (erythrocytes), immunosurveillance (leukocytes), clotting/homeostasis maintenance (platelets) and vary substantially among healthy animals. Lipids, energy related metabolites, proteins and enzyme activities in the serum are indicators of metabolic disorders, cardiovascular disease risks, oxidative stress, and many different pathological conditions (e.g. 86 ).
Genetic variability in these blood parameters can be important to describe the potentials of the animals to cope with infective agents and stressing conditions. This information could contribute to design new strategies to overcome the limited effectiveness of the traditional selection programs to improve disease resistance, tolerance and resilience 87 . In addition, considering the importance of blood related traits in human medicine, results obtained in pigs might further strengthen the usefulness of this animal model in translational-biomedical applications for related aspects.
Despite the relevance of blood measures in animals, just few studies (compared to what has been reported for carcass and meat production and performance traits) have investigated at the genetic level these parameters in pigs. In this study, three different approaches for association analyses -single-marker single-trait, single-marker multi-trait (multivariate analysis) and multi-marker single-trait (Bayesian method) analyses -were adopted to dissect the genetic variability of 15 haematological traits (erythrocyte-, leukocytes-and platelet-related parameters) and 15 clinical-biochemical traits (lipids, metabolism and protein, and enzyme traits) in an Italian Large White heavy pig population. The use of different approaches was able to improve the output of this genome scan study, overcoming, at least in part, the limited power of the experimental design due to the small investigated population (blood traits were measured for a total of 843 animals). The multiple testing correction in the single-marker single-trait analysis should balance the risk of Type I and Type II errors and for this reason we applied a significance threshold that was already used in several other works (e.g. 2,37,[44][45][46][47] ) that needed to deal with this question that is quite common in livestock. To overcame in part this problem, we also applied another genome wide association method that used a windows-based approach (i.e. Bayesian approach). The use of genome windows could also counteract the problem of imperfect linkage disequilibrium established between markers nearby a gene affecting the QTL 30 . In addition to these two methods, a multivariate approach was added to the study with the objective to increase the statistical power and identify pleiotropic loci 31 , considering that several blood traits are correlated to each other (|r| > 0.4).
Similarly to other genome-wide association studies in livestock species (e.g. 27,28,32,47 ), the combination of results derived by several genome-wide association approaches was able to refine and confirm QTL regions, as also observed in our study which highlighted genomic regions harbouring almost obvious candidate genes.
To our knowledge, this study in Italian Large White pigs reported results for the largest number of blood traits in any single genome-wide studies carried out in pigs so far. Most of the previous genome-wide studies investigated only cellular traits [18][19][20][21][22][23][24][25][26][27][28]88,89 or clinical and biochemical parameters 6,7,33,34,[90][91][92] . Results obtained by all these works showed few shared QTLs for the same traits (e.g. 33 ). This is also what we obtained comparing the QTL maps for the same or similar traits already reported by others to the results we obtained in Italian Large White pigs. From a total of 52 QTLs that we identified in this heavy pig breed, only 10% were located in chromosome regions that have been already shown to harbour QTLs for the same or similar blood parameters included in www.nature.com/scientificreports www.nature.com/scientificreports/ the current work. This can be explained by the heterogeneity of the studies carried out so far (including animals analysed at different ages) which might reflect, in turn, the heterogeneity of the results: most previous works have studied QTLs in Asian breeds or crosses between Asian and European breeds and few works have been carried out on pure European breeds only. Large White pigs have been previously investigated in just one genome-wide association study as pure breed 89 and in a F2 based study as parental animals 24 .
A few QTL regions showed pleiotropic effects, in particular for haematological traits. This could be expected since the different blood parameters (cell count, volumetric measurements or haemoglobin levels) can serve as intermediate descriptors of erythropoiesis. MCHC, MCV, RBC shared the QTL on the SSC5, MCH and MCHC shared the QTLs on the SSC5 and SSC14 while HCT and HGB shared the QTL on the SSC18. Genes highly expressed in bone marrow are annotated in most of these QTL regions, suggesting that several genetic factors affecting haematopoiesis could determine the observed effects on erythrocyte traits, as also reported in humans and rodents (e.g. 92 ).
Relationships among erythrocyte traits were also evidenced from the analysis of the correlation network which identified two phenotypic modules: [RBC-HGB-HCT] and [MCH-MCHC-MCV-RDW]. Results based on these multi-trait groups confirmed, in most cases, single-trait analyses, also strengthening the observed associations. In a few cases, they highlighted additional QTL regions that were just below the significance threshold in the single-marker analyses. This is also obtained for the other multi-trait modules.
Leukocytes are fundamental players of the primary defence mechanisms against pathogen agents and their counts (in the different components) are used as clinical marker of inflammation status. In humans, high WBC count has been associated with cancer mortality and all-cause mortality and several common diseases (e.g. 93,94 ). Eosinophil count and LDL-Cholesterol levels showed the largest number of QTLs for a single haematological and clinical-biochemical trait (i.e. five) in the investigated Italian Large White pig population. However, the most significant QTL was reported for basophil count on SSC14. It is worth to mention that this region might actually harbour two QTLs as two close peaks are evident from the Manhattan plot of the single-marker single-trait analysis (even if we could not formally separate them). The same region was reported to harbour a QTL for lymphocyte count in German Landrace pigs 28 . Genetic heterogeneity in this chromosome regions could act at the haematopoiesis level affecting stem cells which might lead to the development of these two classes of leukocytes in the two pig breeds: basophil in Italian Large White and lymphocytes in the German Landrace breed.
For some lipid and enzyme measures, mapped QTLs were able to directly pinpoint candidate genes based on their functions related to the expected effect on the analysed phenotype or that could contribute to explain the biological mechanisms underlying their genetic variability. For example, QTLs reported for cholesterol traits (with estimated SNP heritability of about 0.35-0.38) highlighted several genes involved in the metabolism, homeostasis, transport and regulation of its forms (i.e. ADCY8, APOB, ATG5, CDKAL1, PCSK5, PRL and SOX6) as deduced from literature information already available in humans and mice. In particular, variability in the porcine APOB gene has been already associated with the level of blood cholesterol in four months-old pigs 95,96 . These studies are considered among the first examples that demonstrated the usefulness of the pig as animal model for cardiovascular diseases, as variability in this gene is associated with atherosclerosis risk. In pigs, other works already identified QTL for Total-cholesterol, LDL-cholesterol, LDL/HDL-cholesterol ratio and TC in the APOB region 33,96 . Age-specific associations were reported in Duroc pigs, which showed significant results only in fattening animals (190 days) and not in the post-weaning phase (45 days 32 ). The study in Italian Large White pigs was based on just one time point (i.e. slaughtering age of the animals at 270 days), that might confirm an effect on adult pigs of this region for cholesterol measures. However, no association with TC was reported on this SSC3 region in the Italian Large White population. Another interesting aspect that might have age-related implications, is that in the adult Italian Large White pigs, no QTLs for HDL-cholesterol were observed. This is in contrast to what was reported in adult Duroc pigs which showed several QTLs for this cholesterol fraction 32,90 .
Most of the genes located in QTL regions for cholesterol traits in the Italian Large White breed (all except APOB) were not identified in corresponding genome wide association studies for the same traits in human cohorts. However, they were associated to fat deposition/obesity traits, cardiovascular disorders and other diseases in several human studies 97 , (see https://www.ebi.ac.uk/gwas/home). The genome-wide association study that was carried out in this heavy pig population captured the effect of genetic variability on cholesterol parameters that could be explained by a detailed analysis of the literature. Thus, across species inference might be important to better inform the functional relevance of genes involved in several biological mechanisms 98 . Gene enrichment analysis confirmed their role in biological processes and molecular mechanisms involving cholesterol.
QTLs for several other traits analysed in this study identified candidate genes that might be directly involved in explaining their genetic variability. For example, a QTL for alanine aminotransferase activity (the enzyme also known as glutamate-pyruvate transaminase or GPT) identified a SSC4 region in which the gene encoding for this enzyme is annotated. Thus far, only another study analysed this serum parameter in a genome scan, which was based on an F2 intercross between Landrace and Korean pigs, showing QTLs in different chromosomes (i.e. SSC1, SSC5 and SSC7 6 ). A QTL for the activity of another enzyme [aspartate aminotransferase (AST), also known as aspartate transaminase or serum glutamic oxaloacetic transaminase (sGOT or GOT)] was on a SSC14 region in which the GOT1 gene (that encodes for the measured enzyme) in annotated. Reiner et al. 99 . reported a suggestive AST QTL on SSC14 in a Pietrain/Meishan F 2 family infected by the common protozoan parasite Sarcocystis miescheriana. This QTL became significant after the acute burden of the parasite infection 100 . Subsequent sequence characterization of the GOT1 gene reported a SNP in the 5′ flanking region that resulted associated to AST even in non-challenged and healthy pigs, further supporting the effect of this region in affecting the activity of this enzyme 96 . Association analysis with the [AST-CK] trait group in the Italian Large White pigs increased largely the significant results observed for the single-trait analysis (based on AST). This result confirmed the presence of genetic variability in this SSC14 region that may play an important role on response to stressing or tissue damaging conditions, according to the diagnostic and predictive potential of these two blood biomarkers 81

Conclusions
This study provided new insights into the genetic factors affecting haematological and clinical-biochemical traits in pigs. These traits can be considered intermediate or internal phenotypes. They are simpler than production or external traits and could be useful to dissect the complex genetic architecture of disease resistance and resilience of the animals to stressing and adverse environmental conditions. Combining different approaches (singlemarker with single-trait and multi-trait analyses and Bayesian multi-marker method), this study identified QTL regions for 29 out of 30 analysed blood biomarkers which highlighted promising candidate genes, some of which encode for the analysed enzymes or are directly involved in the biological mechanisms that may explain the variability of the measured parameters. The obtained results can contribute to explore new avenues to overcome the limited genetic progress for disease resistance and related traits that selection programs are currently experiencing due to the difficulties in defining measurable phenotypes for these traits linked to genetic variability in commercial pig populations.

Data Availability
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.