Genomic prediction for growth using a low-density SNP panel in dromedary camels

For thousands of years, camels have produced meat, milk, and fiber in harsh desert conditions. For a sustainable development to provide protein resources from desert areas, it is necessary to pay attention to genetic improvement in camel breeding. By using genotyping-by-sequencing (GBS) method we produced over 14,500 genome wide markers to conduct a genome- wide association study (GWAS) for investigating the birth weight, daily gain, and body weight of 96 dromedaries in the Iranian central desert. A total of 99 SNPs were associated with birth weight, daily gain, and body weight (p-value < 0.002). Genomic breeding values (GEBVs) were estimated with the BGLR package using (i) all 14,522 SNPs and (ii) the 99 SNPs by GWAS. Twenty-eight SNPs were associated with birth weight, daily gain, and body weight (p-value < 0.001). Annotation of the genomic region (s) within ± 100 kb of the associated SNPs facilitated prediction of 36 candidate genes. The accuracy of GEBVs was more than 0.65 based on all 14,522 SNPs, but the regression coefficients for birth weight, daily gain, and body weight were 0.39, 0.20, and 0.23, respectively. Because of low sample size, the GEBVs were predicted using the associated SNPs from GWAS. The accuracy of GEBVs based on the 99 associated SNPs was 0.62, 0.82, and 0.57 for birth weight, daily gain, and body weight. This report is the first GWAS using GBS on dromedary camels and identifies markers associated with growth traits that could help to plan breeding program to genetic improvement. Further researches using larger sample size and collaboration of the camel farmers and more profound understanding will permit verification of the associated SNPs identified in this project. The preliminary results of study show that genomic selection could be the appropriate way to genetic improvement of body weight in dromedary camels, which is challenging due to a long generation interval, seasonal reproduction, and lack of records and pedigrees.

For thousands of years' camels have produced meat, milk, and fiber in harsh desert conditions. There are 35 million camels globally (FAO, 2019), 95% of which are dromedaries 1 . The innate characteristics of adaptability and sustainability of the productions can be in antagonism as already proven for example in the bovine species 2 . Thus, the development of a modern genetic improvement program for the productivity of the dromedary should be accompanied by a profound understanding of its genome and the mechanisms of inheritance of the characters of interest 3 . The ability to blend together the adaptability to hot climates and its innate rusticity with an efficient production capacity would make this animal an excellent alternative in marginal environments 3 . In local population, the lack of phenotypic records and pedigrees, small herd size and missing connectedness, and genetic evaluations are the main limitations 4 . Genomic approaches can be beneficial to reduce the impact of these problematic 5 .
Next-generation sequencing platforms have prepared suitable approaches for genome wide association studies and genomic selection at the whole-genome level 6 . By using genotyping-by-sequencing (GBS) method can be produced many genome wide markers, which is a that supports GWAS 7,8 . GBS has been widely used in plant and animal breeding for genome-wide association analysis, genomic diversity studies, and genomic selection 9 . The availability of reference genome assemblies coupled with GBS enables us to explore in greater detail of dromedary populations and to identify genetic associations with different phenotypic traits 10 . Genome-wide association studies (GWAS) are used to screen the whole genome for target genes that correlate with phenotypic traits, using SNPs and an important method for identifying candidate genes for important economic traits in livestock 11 . GWAS have a greater capability than QTL mapping to detect causal SNPs in a smaller genetic range 12 . In recent years, many genes and molecular markers, regulate important traits, were identified using GWAS in livestock animals like pigs, cattle, sheep, and chickens 13,14 . Despite its unique potential and increased contribution to food security, comparatively less attention has been paid to camels compared to other livestock species 3 .
Body growth is an economical important trait in dromedaries. The birth and weaning weight, gain per day, and body weights at different ages are used to reflect the growth and development. Growth in weight is a heritable trait and an important index of selection 17 . Although meat production and its functionality are strategic topics in camel breeding, which is reflected by the increasing interest of stakeholders and consumers in camel products, few studies have been produced in this field.
In Iran, there are about 140,000 dromedary camels (FAO, 2019) that are divided into four basic types: meat type, milk type, dual purpose and riding camels 18 . Camels in the central desert of Iran usually belong to the meat type with a large and heavy muscular head, short neck, large hump, wide posterior parts and firm body 19 . The camels are kept by the villagers located around the desert. The camels are gathered usually in spring. While the young camels are sold the remaining herds are returned to the desert. During the summer, the camels need more water, so they return to the villages every day. The size of herds may vary from 4-5 up to 100-150 animals.
The present study has been designed to gain first knowledge on body weight traits at different age in the Iranian dromedary population, and to understand their underlying genomic characteristics. We collected a large number of phenotypic measurements in 96 dromedaries in the Central Desert of Iran as well as genotypes by GBS approach. We applied GWAS to identify candidate genes for growth related traits and estimated GEBVs for selected SNPs. This study is a first step towards systematic genomic breeding and selection in dromedaries for the benefit of local communities depending on camels as resource for income and food.

Materials and methods
Ethics statement. All of the animal procedures were performed in strict accordance with the guidelines and regulations proposed by the Animal Science Research Institute of Iran. All the animal experiments were approved by ethics committee of the Animal Science Research Institute of Iran under the number ASRI-34-64-1357-005-970,180. Blood samples were collected during qualified veterinary treatment within the framework of governmental programs aimed at the animal identification, monitoring of health, and parentage confirmation of the dromedary populations included in our study. No other kind of tissue was used in this study.
Animals and sample collection. Yazd province with the area of 129,285 km 2 (49,917 sq mi) is situated at an oasis where the Dasht-e Kavir desert and the Dasht-e Lut desert meet and located in 31° 53′ 41.28″ N, 54° 21′ 25.2″ E (31.8948, 54.357). Data on 51 herds of dromedaries were collected in 2018. The herd size mean was 89.90, range from 11 to 400 heads among 4279 camels, 16% were younger than one year, and named Hashi, while 12% were older than ten years. The proportion of females (Arvaneh), males (Lok), male calves (Hashi), and female calves (Hashi) were 76%, 9%, 6%, and 9%, respectively. Among pregnant camels, 49% were older than ten years. The ratio of pregnant camels to all female camels was 46%. A total of 964 calving was registered between January to May 2018, distributed over 22% in January, 28% in February, 27% in March, 15% in April, and 8% in May. Generation intervals in females and males were estimated at 7.84 and 5.91 years, respectively. Among 256 male calves, we recorded 96 samples from 5 regions including: Bafgh (n = 41) Bahabad (n = 8), Khatam (n = 17), Mehriz (n = 8), and Ardakan (n = 22). Characteristics of the sampled herds and the rangeland plants are presented in Table 1.
Phenotypic measurements. Data were recorded at the morning before grazing on the pasture. Camels were kept in closed area at night, which is called Garch. The animal identification was inferred via three-digit ear tags. Due to large distance in remote regions and transport difficulties, we constructed a portable weighting scale, consisting of 13 pieces of iron, a digital scale for 2000 kg, and one chain crane (Fig. 1). The meta data collected for any calf included: ID number, characteristics of owner, geographical region, recording date, birth date, parental names, and body weight. We recording intervals were approximately three months, with the first record starting in the calving season, the second during the summer, and the third at weaning season at the beginning of autumn. We collected 252 body weight records from 96 calves in different times during 2018. The 18 calves belong to National Research and Development Station on Dromedary Camel (Bafgh), measured in 8 -times, the others were recorded 2 or 3-times including: Bafgh (n = 164), Bahabad (n = 8), Khatam (n = 26), Mehriz (n = 9), Ardakan (45).
For adjusting the body weight trait, the growth trend was estimated using linear regression model (Eq. 1). Analysis of covariance (ANCOVA) of body weight was did among five sampling regions using SPSS v.22 software 20 . The mean of differences was compared using LSD test. Daily weight gain was calculated by two or three-times records. DNA extraction, SNP discovery and genotyping. Blood samples were collected from the jugular vein in EDTA tubes during routine veterinary treatment on the pasture. The genomic DNA was extracted using the modified salting-out method 21 and, after elution, was quantified using spectrophotometry and checked for quality on a 1% agarose gel. Finally, DNA samples were adjusted to a concentration of 50 ng/µl for subsequent steps. The samples were genotyped-by-sequencing using two restriction enzymes combination, EcoR1 and HinF1 (New England Biolabs, Ipswich, MA, USA), and paired-end (150 bp) sequencing (10 X) on the Illumina HiSeq 2000 platform by Persian Bayangene Research and Training Center (Shiraz, Iran).
The sequence reads were mapped to the dromedary reference genome assembly on chromosome level (GCA_000803125.3 [1];) by using the BWA-MEM algorithm of Burrows-Wheeler Aligner (BWA) 22 ;. PCR duplicates were detected by utilizing Picard tools and disregarded in downstream analyses both by GATK 23 and SAMtools 24 . SNPs were called across the GBS data using GATK.
(1)  25 in R for data manipulation and quality control as for producing input file objects for the other analysis, after that using ape and poppr package 26 we carried out K-means clustering and discriminant analysis of principal components (DAPC), while all graphics were produced by means of RColorBrewer 27 .
Linkage disequilibrium analysis (LD) and SNP-based haplotype blocks estimating. TASSEL 5.0 28 was used to calculate the linkage disequilibrium (LD) (r 2 ) for all pairwise loci. Haplotype blocks (HAP) were constructed in Haploview 29 .
Genome-wide association studies and candidate genes prediction. The association between the SNPs and the traits were tested using mixed linear models with PCA and kinship matrix in TASSEL software 28 .
The regions of this study and age at weighting date were used as a fixed and covariate effect, respectively. The statistical analysis model, the MLM-PCA + K analysis, was expressed as: where y was phenotype value; α was the vector of SNP effects; β was vector of population structure effects based on PCA; u was vector of kinship background effects; e was vector of residual effects; X, P, Z were incidence matrix relating the individuals to fixed marker effects α, fixed principal component (PC) effects β, random group effects u, respectively. The suggestive significant Bonferroni P-value thresholds were set (−log p value > 3.9) using the GEC software tool 30 . The associated SNPs (−Log p value > 3) was traced in NCBI and the candidate genes were detected by blasting to the dromedary camel's genome (GCA_000803125.3). We considered genes associated with the respective SNPs, if they were located either within the exon/ intron of a gene or within a flanking region of 100 kb up-and downstream.
Bayesian genomic prediction. The estimation of Genomic Breeding values (GEBVs) was performed with the BGLR package including BRR, Bayes A, B, and C approach (nIter = 100 k, Burn In = 5 k) 31 . two sets of SNPs were used to predict GEBV: (1) all 14,522 SNPs and (2) the 99 associated SNPs (-Log p value ≥ 2.5 from GWAS). The prediction accuracy was estimated using the average Pearson's correlation (r) and regression (b) coefficient between the GEBVs and observed values [32][33][34] . The replicated training-testing approach (10 replications) was used for evaluation of the models. We also used 3:1 size ratio of training set and validation set randomly selected from the 96 camels, which is a three-folds cross-validation, and repeated 100 times for evaluation of models by the 99 associated SNPs 35,36 .

Result
Phenotypic statistics of body weight traits. The distribution of 252 body weight records is visualized in Fig. 2. The growth trend of data suggested linear relationship between age and body weight (R adj 2 = 0.63). Analysis of covariance for body weight records showed significant (p < 0.05) differences among camels from five sampled regions ( Table 2). The body weight of camels in Ardakan was higher than the others (except Mehriz, because camels in addition to grazing on the pasture were fed by hand. The camels of Bafgh and Bahabad didn't have significant differences. It is necessary to adjust the sampling region effect in GWAS and genomic selection. The adjusted birth weight, daily weight gains and body weight of the 96 camels from five regions of the Iranian central desert are shown in Table 2. The descriptive statistics including the mean, standard error (SE), coefficient of variation (CV) are presented in Table 3. The Pearson correction between daily gain with body weight (r = 0.63) was more than birth weight (r = 0.21). Also, the birth and body weight were correlated (r = 0.36) (Fig. 3).

Summary of genotyping data
A total of 14,522 SNPs resulted after filtering and were used for final analysis. The largest number of SNPs was identified on chromosome 9 (n = 1829) followed by chromosome 19 (n = 1655), and the smallest number of SNPs was found on chromosomes 22 (n = 20) and chromosomes 23 (n = 16) (Fig. 4). The average MAF of all SNPs was 0.19, after QC (MAF > 0.05) (Fig. 5). Average observed heterozygosity was 0.25 ± 0.03.
Genome wide association study. The MLM-PCA + K statistical model considering the covariates composed of population structure and kinship matrix was used for GWAS to prevent false positivity. In GWAS, the p value should be less than Bonferroni correction by using α/Me, where α = 0.05 and Me = effective number of SNPs. After applying a Bonferroni correction (1.2 × 10 -4 ), no SNPs correlated with the growth traits (Fig. 6). However, this was expected given the limited number of samples used in our study (n = 96). A number of 28 SNPs were found to be associated with birth weight, daily gain, and body weight of dromedaries (p value < 0.001) ( Table 4). For birth weight, 12 correlated SNPs (p value < 0.001) were detected on the chromosomes 7, 8,9,11,19, and 34 were annotated to 9 genes (Table 4). For daily gain, 7 correlated SNPs (p value < 0.001) on the chromosomes 10, 16, 12, 19, and 14, were annotated to 11 genes (Table 4). For body weight, 9 correlated SNPs (p value < 0.001) on the chromosomes 11, 8, 19, X, 14, and 18 were annotated to 16 genes ( Table 4). The most significant associated SNP with birth weight, daily gain, body weight, was located on chromosome 8, 10, and 11, respectively (-log p value = 3.81, 3.41, and 3.76, respectively). Out of the 36 genes potentially associated with www.nature.com/scientificreports/ peak SNPs [the genes listed in Table 4], two genes harboured the SNPs in their exon/ intron regions. Another 11 genes were detected in flanking regions of less than 30 kb up-and downstream of the respective SNP. Four genes were identified in 30-50 kb regions and 12 genes lay 50-100 kb up-and downstream of the potentially associated SNP. A total of 99 SNPs were associated with the three traits (birth weight, daily gain, and body weight) at p value < 0.002 (Table 5). Twelve haplotype blocks and 80 tag SNPs were predicted among the 99 associated SNPs with LD (D́ > 0.8) (Fig. 7). Majority haplotype blocks contained two SNPs and only two blocks contain 4, and 6 markers. The haplotype-traits association analysis showed the haplotype 2, 3, 4, 5, 9, and 12 associated with birth weight (Table 6), while four haplotypes of them associated with daily weight gain. The haplotype 8 containing six SNPs didn't associate with body weight, while The haplotype 4 was the most important haplotype associated with body weight (Table 6).

Genomic Selection Based on BGLR.
Based on BRR model in BGLR, the GEBVs of each trait (birth weight, daily gain, and body weight) was estimated using all 14,522 SNPs. The GEBVs also were predicted using the 99 associated SNPs based on BRR, Bayes A, B, and C models. The averaged correlation (r) and regression (b) coefficient between the observed and the GEBVs predicted from two SNPs sets: (1) all 14,522 SNPs and (2) the    (Table 7). the GEBVs was less biased based on the 99 associated SNPs. The accuracy of using the 99 associated SNPs also evaluated by cross-validation (3 folds and 100 replications) ( Table 8). The accuracy of the BRR model was more than Bayes A, B, and C (r > 0.65) based on the 99 associated SNPs. the accuracy of GEBVs of body weight was less than birth weight and daily gain based on the 99 associated SNPs (Fig. 8).

Discussion
In this project, we performed a GWAS for growth traits in dromedary camels using the genotypes of 96 calves of the central desert of Iran. this is the first GWAS for birth weight, daily gain, and body weight of camels using GBS. The final goal of genetic mapping is to identify associated markers with phenotype 37,38 . A number of 14,522 SNPs, generated using GBS, prepared the GWAS in dromedary camels. We identified 36 candidate genes associated    42 . Performance of genomic selection was determined by the prediction accuracy [43][44][45] . Until now, Genomic prediction has been conducted in many animal species. The accuracy of GEBVs for economic traits in beef cattle was predicted range 0.38 to 0.40 46,47 . Also, ranged from 0.18 to 0.33 for growth traits in New Zealand sheep breeds 48 It was reported range from 0.40 to 0.50 for important traits in pig 49 .
The statistical model, marker density, LD, and sample size influenced on selection accuracy 43 . The accuracy of GEBVs for growth traits was reported 0.391 (GBLUP) and 0.379 (Bayes Lasso) in Yak 50 . Because of low samples size in this research, it was suggested to predict unbiased GEBVs using the associated SNPs from GWAS. The accuracy of GEBVs of birth weight, daily gain, and body weight based on the 99 associated SNPs was 0.62, 0.82, and 0.57, respectively. Using the most significant associated SNPs, the reduced SNP panels were developed for many traits 51 . It was resulted by the Bayes models, that some fraction of the SNPs has zero effect on the trait 51 . The beef industry has been focused on collections of informative SNPs for subsets of traits that have the most economic effect and market opportunity 51 . The 600 SNPs (20 markers / chromosome) in Bovine relatively had the same predictive ability rather than 50 K SNP 52 , and 90% of 50 k SNPs had zero effect 52 . Garrick et al. 2011 reported that using only 384 SNPs by low costing can be achieved predictive ability for interest traits, so that the accuracies were range from 0.30 to 0.60 for growth, meat quality, and carcass weight 53 . The accuracies were predicted 0.28, 0.29, 0.39, and 0.43 using 50, 100, 150 or 200 SNPs for marbling in Angus 52 . the haplotype-traits association analysis may also provide additional power 54 .
Phenotyping especially pedigree data is now the principal limitation in camel breeding. Genomic predictions do not reliant on pedigree data 55 , therefore it can be suggested in camel breeding because of changing into intensive farming. Establishing of training populations across countries provides an opportunity to increase training data size and genomic data in dromedaries. www.nature.com/scientificreports/