Genome wide association study identifies novel candidate genes for growth and body conformation traits in goats

Pakistan is third largest country in term of goat population with distinct characteristics of breeds and estimated population of 78.2 million. Punjab province has 37% of country’s total population with seven important documented goat breeds namely Beetal, Daira Din Pannah, Nachi, Barbari, Teddi, Pahari and Pothwari. There is paucity of literature on GWAS for economically important traits i.e., body weight and morphometric measurements. Therefore, we performed GWAS using 50 K SNP Chip for growth in term of age adjusted body weight and morphometric measurements in order to identify genomic regions influencing these traits among Punjab goat breeds. Blood samples were collected from 879 unrelated animals of seven goat breeds along with data for body weight and morphometric measurements including body length, body height, pubic bone length, heart girth and chest length. Genomic DNA was extracted and genotyped using 50 K SNP bead chip. Association of genotypic data with the phenotypic data was performed using Plink 1.9 software. Linear mixed model was used for the association study. Genes were annotated from Capra hircus genome using assembly ARS1. We have identified a number of highly significant SNPs and respective candidate genes associated with growth and body conformation traits. The functional aspects of these candidate genes suggested their potential role in body growth. Moreover, pleiotropic effects were observed for some SNPs for body weight and conformation traits. The results of current study contributed to a better understanding of genes influencing growth and body conformation traits in goat.

www.nature.com/scientificreports/ traits and environmental effects in different production systems and geographical regions. Consequently the selection process led to goat breeds specialized for milk, meat, fiber or dual purposes 12 . Body weight is also an important trait for successful farming of small ruminants because of its economic value. Therefore, identification of genomic regions for understanding variations in body weight and other morphological traits is quite valuable for selection purpose 13 . Pakistan has 78.2 million goats making it third largest goat producing country in the world. Goats play a vital role in uplifting the economy of poor farmers in Pakistan. Milk and meat production obtained from goats in Pakistan is 965 and 748 thousand tonnes respectively 14 . Punjab is the largest province of Pakistan with variety of goat breeds. According to the Livestock Census; Punjab has the highest goat population which is 37% followed by Sindh (23%), Balochistan (22%) and KPK (18%) respectively 15 . Beetal, Daira Din Pannah, Nachi (Bikaneri), Barbari, Teddi, Pahari (Kajli) and Pothwari are the documented goat breeds of the province 16 . Goat rearing is commonly meant for meat production as primary objective while milk is also consumed domestically and hair of goats are used for making rugs by poor families. In Pakistan, only fresh meat is usually sold. Hundred thousands of goats are sacrificed on Eid-ul-Azha where Beetal and Kamori breeds are preferred due to their larger size while Teddi which is one of the smallest goat breed is also common because of its highly prolific nature. Moreover, goat rearing is also encouraged by religious customs in certain areas of the country i.e. in Kafiristan goat sacrifice is a custom to celebrate deaths, therefore large flocks of goat are common in these areas 17 .
Despite of having rich diversity of goat breeds in the country; no GWAS has been carried out in Pakistani goats regarding growth and conformation traits. However, Punjab goat breeds have been recently documented using 50 K SNP chip 18 . Current study was aimed to fulfill the gap by finding significant genomic regions or SNPs and relevant genes influencing body weight and morphometric measurements among goat breeds of Punjab province.

Results
Descriptive statistics and quality control. Genotyping of DNA samples was performed using 50 K SNP Bead Chip which coverd 53,347 SNPs distributed across the whole Caprine genome. After quality control filters; 36,861 SNPs were left while 11,181 SNPs were removed due to Hardy Weinberg Equilibrium, 3963 removed due to minor allele threshold and 1342 removed due to missing genotypes. Numbers of SNPs present before and after quality controls with average distance in kb (1 kb = 1000 base pairs) at each chromosome are given in Table 1. A total of 52 individuals were removed after quality control measures thus, 827 individuals were used for final GWAS analysis.
Significantly associated SNPs and identified genes. Manhattan plot for body weight revealed that there were two highly significant SNPs for body weight as indicated in Table 2 and Fig. 1. The validity of P value was determined using quantile-quantile (QQ) plots (Fig. 2). There were two significant SNPs viz. snp24590-scaffold25-1223464 (Chromosome 8) and snp45231-scaffold617-879437 (Chromosome 16) significant deviated from the rest of the SNPs. Moreover, after False Discovery Rate with Benjamin-Hochberg procedure (FDR-BH) correction application, these SNPs remained significant for body weight as indicated in Table 2. Raw P values and FDR-BH corrected Stage 1 and Stage 2 P values of these SNPs associated with body weight and respective genes are given in Table 2. There were 914 and 798 genes in the respective regions for snp24590-scaffold25-1223464 and snp45231-scaffold617-879437 respectively (Table 2) while the closest genes in physical location were Translation initiation factor IF-2-like (IF2) and Plasma membrane calcium-transporting ATPase 4-like respectively. Manhattan and QQ plotting for body length, chest length, heart girth and pubic bone length shared these two SNPs with body weight trait ( GWAS analysis for body height trait revealed a third significantly affecting SNP other than two aforementioned SNPs viz. snp12189-scaffold1454-532653 located on chromosome 8 mentioned in Table 2 with raw, Stage 1 and Stage 2 FDR-BH P values. There were 914 genes in the region while the closest gene was Transducin-like [transducin-like] enhancer protein 4 (TLE4) that was found + 24,297 bp away from the SNP.
Likewise, GWAS analysis for body height trait generated two additional significantly affecting SNPs other than two aforementioned SNPs on chromosome 8 and 16 viz. snp28833-scaffold310-5443065 located on chromosome 5 and snp2733-scaffold1079-1194419 located on chromosome 7 given in Table 2 with raw, Stage 1 and Stage 2 FDR-BH P values. There were 1498 and 1557 genes in the respective regions of these SNPs. The snp28833-scaf-fold310-5443065 was within WASH Complex Subunit 4 (WASHC4) gene while snp2733-scaffold1079-1194419 was located within Rho-related BTB domain-containing protein 3 (RHOBTB3) gene.
Finally, there were five SNPs identified as result of GWAS analysis for pubic bone out of which two on chromosome 8 and 16 as detailed above and mentioned in Table 2. Moreover, among three remaining SNPs one was located on chromosome 10 while two were located on chromosome 6. These SNPs were snp31031-scaffold343-1182947, snp27986-scaffold30-2051903 and snp58134-scaffold94-7205823 respectively where closest genes located were 34,050 bp away within gene and −54,447 bp away respectively. Respective genes for SNPs located on chromosome 6 were EPH Receptor A5 (EPHA5) and Spermatid protein-like (THEGL) while Unknown ncRNA for chromosome 10 SNP as detailed in Table 2 along with raw, Stage 1 and Stage 2 FDR-BH P values.
Major chromosomal regions, prominent genes and polygenic risk score (PRS). Both chromosome 6 and 8 are harboring two SNPs in the same region with 786 and 914 genes in a window of 100 MB respectively given in Table 2

Discussion
Demand of goat meat is increasing due to ever increasing human population with improved living and national standards 16,19 . Body weight and morphometric measurements are economically important traits for genetic improvement of meat type goats 20 . Keeping in view, the rapidly increasing demand of goat meat and importance of growth trait current study was performed using Illumina 50 K SNP Bead Chip to identify SNPs and genetic variations influencing growth in term of age adjusted body weights and body conformation traits. www.nature.com/scientificreports/ GWAS is widely applicable in identification of casual genes using single nucleotide polymorphism. GWAS is an ideal technique for discovery of major genes especially for complex traits and a novel way to study the genetic mechanism of these traits. Moreover, identification of genomic regions and their genes regarding economically important traits in livestock paved the way for selection of genetic markers and candidate genes which influence the trait of interest. Marker assisted selection is faster, more reliable and economical as compared to traditional selection as it enables selection of younger animals with desirable traits on the basis of highly significant genomic regions and their candidate genes. Such an approach may accelerate the breeding progress and enhance the economic prospects for selection of best breeding animals 21 .
It is quite evident from the previous studies that body weight appears to be the function of linear body measurements 22 . We identified two highly significant SNPs influencing body weight. Moreover, 2, 3, 4, 4 and 5 highly significant associations were observed for heart girth, height, body length, chest length and pubic bone length respectively. Almost more than half of these SNPs were found in the coding regions of the genes. Generally different SNPs affect different traits but this study explored two of the nine SNPs affecting more than one trait as previously stated byWu et al. 23 in GWAS of body conformation traits in Chinese Holstein cattle population. We observed that there were two SNPs (45231-scaffold617-879437 on chromosome 16 and snp24590-scaf-fold25-1223464 on chromosome 8) which were significantly associated with all the six traits under investigation in the current study. Moreover, candidate genes identified for growth were also influenced body conformation traits and vice versa which might be due to genetic correlation among studied traits and also because of a pleiotropic effect of these SNPs and their respective genes as previously suggested in GWAS of body conformation traits in the Chinese Holstein cattle population 23 .
In the current study, it was observed that more than half of highly significant SNPs were located in the coding regions of the genes influencing growth in term of age adjusted body weight and morphometric measurements. Moreover, current study identified a number of the most promising candidate genes which might play a critical role in growth and body conformation traits on the basis of highly significant SNPs. The genes identified in the current study may broadly be categorized as those involved in 1. cell growth and development; 2. cellular transportation, transcription and translation and 3. fertility.
The candidate genes involved in cell growth and development mechanism are EPH Receptor A5 (EPHA5), Rho-related BTB domain-containing protein 3 (RhoBTB3), Spermatid protein-like (THEGL) and Transducinlike enhancer protein 4 (TLE4). EPH Receptor A5 (EPHA5) was a candidate gene for body length in the present study. EPHA5 along with EFNA5 mediates communication between pancreatic islet cells to regulate glucosestimulated insulin secretion. Insulin plays a key role in utilizing sugar in the body which is needed for proper growth, metabolism and tissue repair in the body secretions [24][25][26] . RhoBTB3 is proposed to have a role in the tissue development during embryonic life 27 while THEGL is involved in all stages of male gonads development such as sperm maturation, its fertilization and subsequent embryonic development. Moreover it has also been reported to have a high level of expression from adolescence to adulthood suggesting key role in the organs development and onset of puberty 28 . TLE4 supresses Pax7-mediated Myf5 transcriptional activation through inducing Myf5 enhancer to continue latency. Loss of TLE4 function results into Myf5 upregulation thus, showing its role in growth through transcription suppression 29 . TLE4 is also a critical regulator in haematopoiesis 30 and bone development 31 .
The candidate genes involved in cellular transportation, transcription and translation mechanism were RhoBTB3, WASH Complex Subunit 4(WASHC4), Plasma membrane calcium-transporting ATPase 4-like (ATP2B4) and Translation initiation factor IF-2-like (IF2) as appeared in current GWAS analysis.
RhoBTB3 is involved in cell cycle thus has a role in mitosis and eventually also in development as mentioned in the last paragraph 32 . WASHC4 is a gene encoding a component of WASH complex that is responsible for  www.nature.com/scientificreports/ transport of endosomes within the cell 33 ; moreover, a mutation of this gene resulted in remarkable developmental disorders and skeletal muscles dysmorphism 34 . ATP2B4 is an enzyme that is calcium/calmodulin-regulated and magnesium-dependent. It catalyzes the hydrolysis of ATP along with the calcium transportation out of cell as well as has a role in sperm motility 35 . Translation initiation factor 2 (IF2) is required for GTP/GDP-binding protein whose principal role is to interact with initiator fMet-tRNA and to position it correctly in the ribosomal P site, thus enhancing the rate and fidelity of translation initiation 36 .
The candidate genes involved in fertility mechanism are Spermatid protein-like (THEGL), Plasma membrane calcium-transporting ATPase 4-like whose role is discussed previously.
The functional aspect of aforementioned genes have been predominantly studied in human beings. However, it is suggested that these genes or their variants most probably have had more or less similar functions in other mammals i.e., goats. Moreover, in the current GWAS analysis number of highly significant SNPs were identified representing eight the most promising candidate genes which might play a key role in the growth in term of age adjusted body weight and morphometric measurements in goat. Moreover, these candidate genes seem to play a key role in growth, metabolism, cellular transportation, and fertility. This study also suggested pleiotropic effects for body weight and conformation traits of couple of SNPs and their candidate genes.

Methods
Ethics declarations. This study involved a questionnaire-based survey of farmers as well as blood sampling and phenotypic data recording from their animals. Participants provided their verbal informed consent for animal blood sampling as well as for the related survey questions. Collection of blood samples and all methods were performed in accordance with the relevant guidelines and regulations for animals by veterinarians adhering to the regulations and guidelines on animal husbandry and welfare as per international norms as per approved experimental protocols by the institutional Ethics Committee of PMAS-Arid Agriculture University, Rawalpindi, Pakistan. Moreover questioning of farmers were performed in accordance with the relevant guidelines and regulations for human subjects as per approved experimental protocols by institutional Ethics Committee of PMAS-Arid Agriculture University, Rawalpindi, Pakistan.  Table 3. Each goat was assigned a particular ID for blood collection, body weight and linear body measurements. Sampling was done by ensuring that the sampled individuals were unrelated and there was effective representation of each breed 37 . The pictures of seven documented goat breeds of Punjab and five strains of Beetal are given in Figs. 3 and 4. Finally study was carried out in compliance with the ARRIVE guidelines.

Sampling.
Linear body measurements and body weight. Linear body measurements from all the breeds were taken which include body length, body height, pubic bone length, heart girth and chest length in centimeter (cm) as described previously 16,38 . Body weight was taken in Kilogram using weighing scale from all animals sampled. DNA extraction. DNA was extracted using organic method of DNA extraction as previously reported 39 .
Briefly, 1 mL of blood was taken and 2 mL of lysis buffer was added. It was then vortexed and mixed well and samples were centrifuged for 5 min at 1500 g (2690 RPM). Previous step was repeated with 3 mL of Lysis buffer. Supernatant was discarded and 300 uL of DNA extraction buffer was added followed by addition of 5 µL of 10 mg/mL Proteinase K and 20 uL of 10% SDS. Samples were vortexed gently and kept at 65 ℃ in the incubator overnight. After that, samples were placed in − 20 ℃ freezer until completely frozen. Samples were thawed after freezing and 120 uL of 5 M NaCl was added and mixed. Samples were centrifuged for 15 min at the rate of 6000 RPM and 1 mL of 100% Ethanol was added in new conical tube while clear solution was added to get DNA in 50 uL of DNase/RNase free water. Biological samples of DNA were kept at 4 ℃ to resuspend overnight and then frozen at − 80 ℃. Concentration and optical density of DNA samples were measured using Nanodrop (Quawell Q 5000). Two stage association for body weight and morphometric measurements and polygenic risk score (PRS) estimation. After pruning and quality control measures two stage SNPs association analysis was carried out as methodology adapted from Shi et al. 41 . In Stage 1, association study was performed for age adjusted body weights in Punjab goat breeds. Same procedure was used for identifying SNPs and respective genes or genomic regions regarding morphometric measurements. Linear model was used for finding significant SNPs influencing body weight. Based on False Discovery Rate of Benjamini and Hochberg test (FDR-BH); a significance level was defined as 10e−8 for the traits under study. The identified SNPs were further carried to Stage 2 of analysis on all animals. Identified SNPs were used to calculate polygenic risk score (PRS) using Genome-wide Complex Trait Analysis (GCTA) 42 and Plink 1.9 43 . Manhattan and quantile quantile plots. Manhattan and Quantile Quantile (QQ) plots were made in R program using CMplot package (https:// github. com/ YinLi Lin/ CMplot). For SNPs with P values less than 10e−8, the Q-Q plot represented highly significant deviations from the distribution under the null hypothesis indicating a strong association of these SNPs and respective regions with body weight and morphometric measurements.
Genes annotation. Genes annotation was performed with latest goat genome assembly Capra hircus ARS1.
We identified genes listed on www. genec ards. org that were functionally responsible for the phenotypic effects.

Statistical analysis.
Morphometric measurements data i.e. body weight, body length, body height, pubic bone length, heart girth and chest length were analyzed using Proc mixed of SAS University Edition. Any wrong and biologicaly impossible information was not included in the final analysis. Raw data mean, standard deviation, minimum and maximum values for the studied traits are given in Table 4. The mixed linear model was used to describe the data as previously reported 44 and model equation narrated as under.
where y i is the phenotypic value of the ith individual, age is the effect of age in days on body weight while γ is regression coefficient, α is the fix effect of breed including Beetal, Teddi, Daira Din Panah, Pahari, Pothwari and Barbari and sex of each individual including male and female. Z indicates matrix of SNP effect, β is SNP effect vector, W is indicator of polygenic residual, µ is the polygenic residual effect vector and ε i is the residual with variance σ 2 ε .