Genome-wide association study and genomic heritabilities for blood protein levels in Lori-Bakhtiari sheep

Serum protein levels are related to physiological and pathological status of animals and could be affected by both genetic and environmental factors. This study aimed to evaluate genetic variation of serum protein profile in sheep. Blood samples were randomly collected from 96 Lori-Bakhtiari ewes, a heavy meat-type breed. Total protein, albumin, globulin, α1, α2, β and γ globulins and IgG levels were measured in blood serum. The samples were genotyped using the Illumina OvineSNP50 BeadChip. The studied traits adjusted for age, birth type, birth season and estimate of breeding value for body weight were considered as pseudo-phenotypes in genome-wide association analysis. In the GWAS model, the first five principal components were fitted as covariates to correct the biases due to possible population stratification. The Plink, R and GCTA software were used for genome-wide association analysis, construction of Q-Q and Manhattan plots and estimation of genetic variances, respectively. Noticeable genomic heritabilities ± SE were estimated for total and γ globulins (0.868 ± 0.262 and 0.831 ± 0.364, respectively), but other protein fractions had zero or close to zero estimates. Based on the Bonferroni adjusted p values, four QTLs located on 181.7 Mbp of OAR3, 107.7 Mbp of OAR4, 86.3 Mbp of OAR7 and 83.0 Mbp of OAR8 were significantly associated with α1, β, β and γ globulins, respectively. The results showed that the PKP2, IGF2R, SLC22A1 and SLC22A2 genes could be considered as candidate genes for blood serum proteins. The present study showed significant genetic variations of some blood protein fractions.

Genotyping and quality control. Genomic DNA was extracted from the EDTA-containing blood samples, using DNP TM Kit (CinnaGen Inc, Iran). The samples were sent to Illumina laboratory and genotyped using the Ovine SNP50 BeadChip (Illumina Inc., CA, USA), which detected a total of 48,054 SNPs in the genome. Quality control process was performed using R 28 and Plink 1.90 beta 29 software, whereby samples with a GenCall (GC) score < 0.6 and a call rate < 0.99 and variants with minor allele frequencies (MAF) < 0.05, genotype call rates < 0.95 and significant deviation from Hardy-Weinberg equilibrium ( p < 10 −6 ) were removed from the analysis.

Statistical analyses.
In the first analysis, the phenotypic records of different serum protein fractions were subjected to a general linear model 4 as follows: In this model, y ijkl is an observation, μ, A i , B j and S k are overall mean and effects of age (2-3, 4, 5, 6 and + 7 years), birth type (1 or 2) and birth season (winter or spring), respectively, β is regression coefficient of the observed parameter on estimate of breeding value for body weight (EBV), as an indicator of genetic potential of body weight and e ijkl is residual effects.
EBVs for body weight were obtained using 15859 test-day body weight records of 4402 individuals, collected during 29 years in the studied population. The EBVs were estimated based on Average-Information algorithm of restricted maximum likelihood (AI-REML), using an animal mixed model fitting animal birth year, birth month, birth type, sex and quadratic regression coefficient of body weight on age, as fixed effects and direct additive genetic and permanent environmental and maternal additive genetic effects as random effects.
The general linear and animal mixed models were analyzed using Proc GLM of SAS 30 and Wombat software 31 , respectively.
Residuals of the general linear model (1), as adjusted records, were considered as pseudo-phenotypes in genome-wide association analysis. The model used for GWAS, fitted random SNP effects and the first five principal components (PCs) as covariates to account the biases due to possible population stratification. The GWAS was carried out as a single-SNP regression and the SNPs were fitted separately. The genome-wide association p-values were adjusted by Bonferroni adjustment method. The Plink 1.90 beta software 29 was used for genomewide association analysis. The qqman package of R 32 was used to create quantile-quantile (Q-Q) and Manhattan plots. Genomic heritabilities and contributions of the significant SNPs in genetic variation of the studied traits were estimated, based on AI-REML algorithm, using GCTA software 33 . Gene annotation. Possible candidate genes, located within 50 kbp distances from the detected significant SNPs, based on Bonferonni adjusted p-values, were identified based on SNPchiMp V.3 ovine SNPs genome map 34 , using BioMart tool of Ensembl database (www. ensem bl. org). The published QTLs around the significant SNPs were also searched using Animal QTL database (www. anima lgeno me. org/ QTLdb/ sheep).

Approval for animal experiments. The experimental protocols were approved by the Biomedical Ethics
Committee of Bu-Ali Sina University. All methods were carried out in accordance with relevant guidelines and regulations. The authors also complied with the ARRIVE (Animal Research: Reporting of In Vivo Experiments) guidelines.

Genome-wide association analysis.
Multi-dimensional scaling (MDS) plots of the genotyping data, based on the first three PCs did not show any obvious classification of the sampled animals. However, a slight stratification was observed based on the first two PCs. Whereby a few animals, as the first cluster (black dots) had a slight distance from the others (Fig. 1). This slight stratification is probably due to interfamily differences and import of rams from other populations. Estimates of genomic inflation factor (λ) in the association analysis for total protein, albumin, globulins, albumin/globulins ratio, α1 globulin, α2 globulin, β globulin, γ globulin and IgG were 1.01, 1.00, 1.02, 1.00, 1.00, 1.00, 1.03, 1.07 and 1.00, respectively. Q-Q plots of GWAS −log10 (p values) for the studied traits are presented in the Fig. 2.
In genome-wide association analysis, a total of six SNPs had genome-wide p values < 10 −5 . However, based on Bonferroni adjusted p values, four SNPs, including one SNP on chromosome 3 (rs411530530), two SNPs on chromosomes 7 (rs429230884) and 4 (rs401001039), and one SNP on chromosome 8 (rs427910139) had significant associations (p < 0.05) with α1, β and γ globulins, respectively (Table 1). Other SNPs did not show any significant association with the studied traits. Manhattan plots of GWAS − log10 (p values) for the studied traits are presented in the Fig. 3.

Genes and QTLs annotation.
Based on BioMart tool of Ensembl database (www. ensem bl. org), a total of five genes were found within 50 kbp distances from the significant SNPs. The genes found were plakophilin 2 (PKP2) gene on chromosome 3, ENSOARG00000017510 on chromosome 7, and three genes, including insulin like growth factor 2 receptor (IGF2R), solute carrier family 22 members 1 and 2 (SLC22A1 and SLC22A2, respectively) on chromosome 8. No gene was found within 50 kbp intervals from the significant SNP on chromosome 4 ( Table 2). The genes surrounding the significant SNPs on chromosomes 3, 4, 7 and 8 are illustrated in Fig. 4. Based on the Animal QTL database (Animal QTLdb), no QTL associated with the studied blood serum proteins was found around the significant SNPs. However, some QTLs associated with immunoglobins A (IgA), E (IgE) and G (IgG) were found in the Animal QTL database.

Discussion
Total protein and globulins, albumin, albumin to globulins ratio and γ globulin averages were in the range, reported for different populations, such as Merino sheep 13,35 , Karakul and Tzurcana ewes 10,13 , Balami ewes 14 , Lori-Bakhtiari and Mehraban sheep 36 and Santa Inês ewes 37 . However, averages of α1, α2 and β globulins levels were to some extent lower than those reported in literature 9,13,36 . The observed differences are probably due to different environmental, physiological, health, age and genetic conditions of the studied populations. For example, it has been found that sick animals may have higher levels of α and β globulins 9 .    In the present study, moderate to high genomic heritabilities were estimated for total globulins (0.868), albumin/globulin ratio (0.227), α1 globulin (0.264) and γ globulin (0.831), which indicates considerable genetic effects on these protein fractions and probably their potential use as biomarkers for genetic selection. Other fractions, with negligible heritabilities, including total protein, albumin, α2 globulin, β globulin and IgG are likely proper guides to animal management 38 . The present study is probably the first published attempt to estimate genomic heritabilities of the blood protein levels. However, based on high standard errors of the heritability estimates, which was due to limited number of the sampled animals, more studies are still needed to clarify exact genetic bases of blood serum protein variations.
A few genome-wide association studies have been conducted on blood serum proteins. A GWAS on Korean population revealed six loci, including TNFRSF13B, FADS1, GALNT2, IRF4, HLA-DBP1 and SLC31A1, associated with albumin/globulin ratio 20 . In a GWAS on Japanese human population, based on two significant SNPs for total protein and one significant SNP for albumin, five genes, including TNFRSF13B, RPL13A, RPS11, FCGRT and RCN3 and one gene, GCKR were suggested as candidate genes for total protein and albumin levels, respectively 22 . Associations of the TNFRSF13B and GCKR genes with total protein and albumin levels were also confirmed in another GWAS on Japanese people 24 . In a GWAS meta-analysis on east-Asian human population, one SNP, located within the RRPS11 gene, was significantly associated with blood plasma albumin level 23 . Almost www.nature.com/scientificreports/ different candidate genes for blood protein fractions were found in the present and previous GWASs, which is probably due to population-specific associations 22 and different genetic architectures of the studied populations and individuals 43 . The plakophilin 2 gene (PKP2), located on OAR3, encodes the plakophilin 2 protein, which is mainly found in myocardium cells the heart wall. This protein is a found in desmosome structures, a component of intercellular adhesive junction 44,45 . The compromised junctional integrity probably contributes to disease pathophysiology 46 . Although PKP2 serves as a critical scaffold for protein kinase Cα (PKCα). Thus, more global functions in cellular homeostasis are expected for PKP2. It has been found that PKP2 knockdown would result in increased PKC substrate phosphorylation, and this association is probably the reason for pathogenesis of congenital defects due to PKP2 deficiency 46 . In several studies, the PKP2 gene was associated with canine atopic dermatitis, a chronic inflammatory skin disease 47,48 . It has been found that the PKP2 impacts the β-catenin activity, a main participant in canonical Wnt signaling and associates with SOX2 and SOX9 expressions, as Wnt target genes, which suggests a signaling role of plakophilin 2 by regulation of Wnt signaling pathway 49 . On the other hand, it is demonstrated that the Wnt signaling is essential in pathogenesis of some diseases 50 . Thus, the PKP2 gene could be considered as a candidate gene for blood serum proteins.
The ENSOARG00000017510 gene, is located on 86.3 Mbp of OAR7, in a 47.2 kbp distance from the SNP rs429230884 (Table 2). Based on the Ensembl database (www. ensem bl. org), the ENSOARG00000017510 is a protein-coding gene, which encodes an integral component of membrane. No evidence for phenotypic association of this gene with diseases or other traits was found in literature. It seems that more studies are needed to understand the molecular, cellular and biological functions of this gene.
The insulin like growth factor 2 receptor (IGF2R) is locates on OAR8 and encodes a highly conserved transmembrane glycoprotein receptor which regulates the insulin-like growth factor 2 (IGF2) level and this function is necessary for embryonic development in mammals 51 . However, there are some evidences for IGF2R functions in immunity such as regulation of HIV infection and chemokine expression 52 , overexpression of IGF2R in osteosarcoma cells 53 and increase of regulatory T cell functions in reducing of other effector T cells activities and suppression of food allergic effects on intestinal inflammation 54 . Moreover, it has been found that some viral infections are associated with IGF system 52 . Therefore, the IGF2R is probably a candidate gene for blood serum proteins and immune system activity.
The significant SNP on OAR8 (rs427910139) is located within and in 41 kbp distance from the solute carrier family 22 members 1 (SLC22A1) and 2 (SLC22A2), respectively (Fig. 4, part D). The SLC22A1 and SLC22A2 encode organic cation transporters, with crucial roles in elimination of endogenous organic cations, drugs and toxins 55 . Associations of SLC22 members 1-3 with drug disposition, response and generally pharmacodynamics are well known 56,57 . There are several evidences for associations of SLC22A1 and SLC22A2 with diseases. For example, the SLC22A2 is associated with hypertension 58 and SLC22A1 and SLC22A2, both contribute in disposition pathways for fluoroquinolone antimicrobials 59 . In a GWAS on Korean human population, another member of the solute carrier families (SLC31A1) was significantly associated with blood serum albumin/globulin ratio 20 . Therefore, both SLC22A1 and SLC22A2 could be considered as candidate genes for blood serum protein profile and probably resistance to diseases.

Conclusion
The QTLs on 181.7 Mbp of OAR3, 107.7 Mbp of OAR4, 86.3 Mbp of OAR7 and 83.0 Mbp of OAR8 in the present study are probably the first QTLs reported for α1, β and γ globulins. Moreover, the PKP2, IGF2R, SLC22A1 and SLC22A2 genes could be considered as candidate genes for blood serum proteins. Moderate to high genomic heritabilities were estimated for total globulins (0.868), albumin/globulin ratio (0.227), α1 globulin (0.264) and γ globulin (0.831). This study showed considerable genetic variation in blood serum protein profile, especially total and gamma globulins. This study is probably the first GWAS on blood serum protein profile in animals. However, more studies with larger sample sizes and use of high-density SNP chips would probably result in detection of more genomic regions associated with blood serum protein profile in sheep.