Introduction

For thousands of years' camels have produced meat, milk, and fiber in harsh desert conditions. There are 35 million camels globally (FAO, 2019), 95% of which are dromedaries1. The innate characteristics of adaptability and sustainability of the productions can be in antagonism as already proven for example in the bovine species2. Thus, the development of a modern genetic improvement program for the productivity of the dromedary should be accompanied by a profound understanding of its genome and the mechanisms of inheritance of the characters of interest3. The ability to blend together the adaptability to hot climates and its innate rusticity with an efficient production capacity would make this animal an excellent alternative in marginal environments3. In local population, the lack of phenotypic records and pedigrees, small herd size and missing connectedness, and genetic evaluations are the main limitations4. Genomic approaches can be beneficial to reduce the impact of these problematic5.

Next-generation sequencing platforms have prepared suitable approaches for genome wide association studies and genomic selection at the whole-genome level6. By using genotyping-by-sequencing (GBS) method can be produced many genome wide markers, which is a that supports GWAS7,8. GBS has been widely used in plant and animal breeding for genome-wide association analysis, genomic diversity studies, and genomic selection9. The availability of reference genome assemblies coupled with GBS enables us to explore in greater detail of dromedary populations and to identify genetic associations with different phenotypic traits10. Genome-wide association studies (GWAS) are used to screen the whole genome for target genes that correlate with phenotypic traits, using SNPs and an important method for identifying candidate genes for important economic traits in livestock11. GWAS have a greater capability than QTL mapping to detect causal SNPs in a smaller genetic range12. In recent years, many genes and molecular markers, regulate important traits, were identified using GWAS in livestock animals like pigs, cattle, sheep, and chickens13,14. Despite its unique potential and increased contribution to food security, comparatively less attention has been paid to camels compared to other livestock species3.

Body growth is an economical important trait in dromedaries. The birth and weaning weight, gain per day, and body weights at different ages are used to reflect the growth and development. Growth in weight is a heritable trait and an important index of selection17. Although meat production and its functionality are strategic topics in camel breeding, which is reflected by the increasing interest of stakeholders and consumers in camel products, few studies have been produced in this field.

In Iran, there are about 140,000 dromedary camels (FAO, 2019) that are divided into four basic types: meat type, milk type, dual purpose and riding camels18. Camels in the central desert of Iran usually belong to the meat type with a large and heavy muscular head, short neck, large hump, wide posterior parts and firm body19. The camels are kept by the villagers located around the desert. The camels are gathered usually in spring. While the young camels are sold the remaining herds are returned to the desert. During the summer, the camels need more water, so they return to the villages every day. The size of herds may vary from 4–5 up to 100–150 animals.

The present study has been designed to gain first knowledge on body weight traits at different age in the Iranian dromedary population, and to understand their underlying genomic characteristics. We collected a large number of phenotypic measurements in 96 dromedaries in the Central Desert of Iran as well as genotypes by GBS approach. We applied GWAS to identify candidate genes for growth related traits and estimated GEBVs for selected SNPs. This study is a first step towards systematic genomic breeding and selection in dromedaries for the benefit of local communities depending on camels as resource for income and food.

Materials and methods

Ethics statement

All of the animal procedures were performed in strict accordance with the guidelines and regulations proposed by the Animal Science Research Institute of Iran. All the animal experiments were approved by ethics committee of the Animal Science Research Institute of Iran under the number ASRI-34–64-1357–005-970,180. Blood samples were collected during qualified veterinary treatment within the framework of governmental programs aimed at the animal identification, monitoring of health, and parentage confirmation of the dromedary populations included in our study. No other kind of tissue was used in this study.

Animals and sample collection

Yazd province with the area of 129,285 km2 (49,917 sq mi) is situated at an oasis where the Dasht-e Kavir desert and the Dasht-e Lut desert meet and located in 31° 53′ 41.28″ N, 54° 21′ 25.2″ E (31.8948, 54.357). Data on 51 herds of dromedaries were collected in 2018. The herd size mean was 89.90, range from 11 to 400 heads among 4279 camels, 16% were younger than one year, and named Hashi, while 12% were older than ten years. The proportion of females (Arvaneh), males (Lok), male calves (Hashi), and female calves (Hashi) were 76%, 9%, 6%, and 9%, respectively. Among pregnant camels, 49% were older than ten years. The ratio of pregnant camels to all female camels was 46%. A total of 964 calving was registered between January to May 2018, distributed over 22% in January, 28% in February, 27% in March, 15% in April, and 8% in May. Generation intervals in females and males were estimated at 7.84 and 5.91 years, respectively. Among 256 male calves, we recorded 96 samples from 5 regions including: Bafgh (n = 41) Bahabad (n = 8), Khatam (n = 17), Mehriz (n = 8), and Ardakan (n = 22). Characteristics of the sampled herds and the rangeland plants are presented in Table 1.

Table 1 Characteristics of sampling herds and the rangeland plants.

Phenotypic measurements

Data were recorded at the morning before grazing on the pasture. Camels were kept in closed area at night, which is called Garch. The animal identification was inferred via three-digit ear tags. Due to large distance in remote regions and transport difficulties, we constructed a portable weighting scale, consisting of 13 pieces of iron, a digital scale for 2000 kg, and one chain crane (Fig. 1). The meta data collected for any calf included: ID number, characteristics of owner, geographical region, recording date, birth date, parental names, and body weight. We recording intervals were approximately three months, with the first record starting in the calving season, the second during the summer, and the third at weaning season at the beginning of autumn. We collected 252 body weight records from 96 calves in different times during 2018. The 18 calves belong to National Research and Development Station on Dromedary Camel (Bafgh), measured in 8 -times, the others were recorded 2 or 3-times including: Bafgh (n = 164), Bahabad (n = 8), Khatam (n = 26), Mehriz (n = 9), Ardakan (45).

Figure 1
figure 1

The portable weighting scale (Company: Fuzhou Kejie Intelligent Technology Co.,Ltd. model: OCS-XZ-2, 2000 Weigh Capacity (Kg), Accuracy Class III, + /- 1 division (least count).This were made from 13 pieces of iron, hanging digital scale 2000 kg, and one chain crane.

For adjusting the body weight trait, the growth trend was estimated using linear regression model (Eq. 1). Analysis of covariance (ANCOVA) of body weight was did among five sampling regions using SPSS v.22 software20. The mean of differences was compared using LSD test. Daily weight gain was calculated by two or three-times records.

$${y}_{ij}=\mu +{\tau }_{i}+\beta \left({x}_{ij}-\stackrel{-}{x} \right)+{\epsilon }_{ij}$$
(1)

\({y}_{ij}\): body weight; \(\mu :\) Effect of mean; \({\tau }_{i}\): sampling regions includes: (Bafgh, Bahabad, Khatam, Ardakan, and Mehriz); \(\beta :\) The regression coefficient; \({x}_{ij}\): Age of recording; \(\stackrel{-}{x}\): average of recording Ages.

DNA extraction, SNP discovery and genotyping

Blood samples were collected from the jugular vein in EDTA tubes during routine veterinary treatment on the pasture. The genomic DNA was extracted using the modified salting-out method21 and, after elution, was quantified using spectrophotometry and checked for quality on a 1% agarose gel. Finally, DNA samples were adjusted to a concentration of 50 ng/µl for subsequent steps.

The samples were genotyped-by-sequencing using two restriction enzymes combination, EcoR1 and HinF1 (New England Biolabs, Ipswich, MA, USA), and paired-end (150 bp) sequencing (10 X) on the Illumina HiSeq 2000 platform by Persian Bayangene Research and Training Center (Shiraz, Iran).

The sequence reads were mapped to the dromedary reference genome assembly on chromosome level (GCA_000803125.3[1];) by using the BWA-MEM algorithm of Burrows–Wheeler Aligner (BWA)22;. PCR duplicates were detected by utilizing Picard tools and disregarded in downstream analyses both by GATK23 and SAMtools24. SNPs were called across the GBS data using GATK.

Population structure andd genetic diversity

A quality control (QC) steps, and genome-wide diversity (observed and expected heterozygosity), as well as admixture analyses were performed using TASSEL V5.023. Variants with a minor allele frequency (MAF) below 0.05 and call rate below 0.95 were removed. Of the 41,897 SNPs, 256 markers were monomorphic, and 27,375 markers were deleted because of MAF < 0.05. The final data set consisted of 14,522 SNPs and 96 individuals. To investigate population structure, we used vcfR package25 in R for data manipulation and quality control as for producing input file objects for the other analysis, after that using ape and poppr package26 we carried out K-means clustering and discriminant analysis of principal components (DAPC), while all graphics were produced by means of RColorBrewer27.

Linkage disequilibrium analysis (LD) and SNP-based haplotype blocks estimating

TASSEL 5.028 was used to calculate the linkage disequilibrium (LD) (r2) for all pairwise loci. Haplotype blocks (HAP) were constructed in Haploview29.

Genome-wide association studies and candidate genes prediction

The association between the SNPs and the traits were tested using mixed linear models with PCA and kinship matrix in TASSEL software28. The regions of this study and age at weighting date were used as a fixed and covariate effect, respectively.

The statistical analysis model, the MLM-PCA + K analysis, was expressed as:

$$y=\alpha X+\beta P+UZ+e$$

where y was phenotype value; α was the vector of SNP effects; β was vector of population structure effects based on PCA; u was vector of kinship background effects; e was vector of residual effects; X, P, Z were incidence matrix relating the individuals to fixed marker effects α, fixed principal component (PC) effects β, random group effects u, respectively. The suggestive significant Bonferroni P-value thresholds were set (−log p value > 3.9) using the GEC software tool30. The associated SNPs (−Log p value > 3) was traced in NCBI and the candidate genes were detected by blasting to the dromedary camel’s genome (GCA_000803125.3). We considered genes associated with the respective SNPs, if they were located either within the exon/ intron of a gene or within a flanking region of 100 kb up- and downstream.

Bayesian genomic prediction

The estimation of Genomic Breeding values (GEBVs) was performed with the BGLR package including BRR, Bayes A, B, and C approach (nIter = 100 k, Burn In = 5 k)31. two sets of SNPs were used to predict GEBV: (1) all 14,522 SNPs and (2) the 99 associated SNPs (-Log p value ≥ 2.5 from GWAS). The prediction accuracy was estimated using the average Pearson’s correlation (r) and regression (b) coefficient between the GEBVs and observed values32,33,34. The replicated training—testing approach (10 replications) was used for evaluation of the models. We also used 3:1 size ratio of training set and validation set randomly selected from the 96 camels, which is a three-folds cross-validation, and repeated 100 times for evaluation of models by the 99 associated SNPs35,36.

Result

Phenotypic statistics of body weight traits

The distribution of 252 body weight records is visualized in Fig. 2. The growth trend of data suggested linear relationship between age and body weight (Radj2 = 0.63). Analysis of covariance for body weight records showed significant (p < 0.05) differences among camels from five sampled regions (Table 2). The body weight of camels in Ardakan was higher than the others (except Mehriz, because camels in addition to grazing on the pasture were fed by hand. The camels of Bafgh and Bahabad didn’t have significant differences. It is necessary to adjust the sampling region effect in GWAS and genomic selection.

Figure 2
figure 2

Box plot of 255 body weight records (left plot) and growth trend (right plot) of 96 dromedary camels in five regions of the central desert of Iran.

Table 2 ANCOVA of body weight (kg) among five regions of the central desert of Iran.

The adjusted birth weight, daily weight gains and body weight of the 96 camels from five regions of the Iranian central desert are shown in Table 2. The descriptive statistics including the mean, standard error (SE), coefficient of variation (CV) are presented in Table 3. The Pearson correction between daily gain with body weight (r = 0.63) was more than birth weight (r = 0.21). Also, the birth and body weight were correlated (r = 0.36) (Fig. 3).

Table 3 The descriptive statistics of body weight traits of dromedary camels.
Figure 3
figure 3

Correlation coefficients among birth weight, daily gain, and body weight in dromedary camels.

Summary of genotyping data

A total of 14,522 SNPs resulted after filtering and were used for final analysis. The largest number of SNPs was identified on chromosome 9 (n = 1829) followed by chromosome 19 (n = 1655), and the smallest number of SNPs was found on chromosomes 22 (n = 20) and chromosomes 23 (n = 16) (Fig. 4). The average MAF of all SNPs was 0.19, after QC (MAF > 0.05) (Fig. 5). Average observed heterozygosity was 0.25 ± 0.03.

Figure 4
figure 4

Number of SNPs/Chromosome.

Figure 5
figure 5

The Minor Allele Frequency distribution of 14,522 SNPs.

Genome wide association study

The MLM-PCA + K statistical model considering the covariates composed of population structure and kinship matrix was used for GWAS to prevent false positivity. In GWAS, the p value should be less than Bonferroni correction by using α/Me, where α = 0.05 and Me = effective number of SNPs. After applying a Bonferroni correction (1.2 × 10–4), no SNPs correlated with the growth traits (Fig. 6). However, this was expected given the limited number of samples used in our study (n = 96). A number of 28 SNPs were found to be associated with birth weight, daily gain, and body weight of dromedaries (p value < 0.001) (Table 4). For birth weight, 12 correlated SNPs (p value < 0.001) were detected on the chromosomes 7, 8, 9, 11, 19, and 34 were annotated to 9 genes (Table 4). For daily gain, 7 correlated SNPs (p value < 0.001) on the chromosomes 10, 16, 12, 19, and 14, were annotated to 11 genes (Table 4). For body weight, 9 correlated SNPs (p value < 0.001) on the chromosomes 11, 8, 19, X, 14, and 18 were annotated to 16 genes (Table 4). The most significant associated SNP with birth weight, daily gain, body weight, was located on chromosome 8, 10, and 11, respectively (-log p value = 3.81, 3.41, and 3.76, respectively). Out of the 36 genes potentially associated with peak SNPs [the genes listed in Table 4], two genes harboured the SNPs in their exon/ intron regions. Another 11 genes were detected in flanking regions of less than 30 kb up- and downstream of the respective SNP. Four genes were identified in 30–50 kb regions and 12 genes lay 50–100 kb up- and downstream of the potentially associated SNP.

Figure 6
figure 6

Manhattan plots and q-q plots of birth weight, daily gain, and body weight traits for Dromedary Camels. (A) birth weight; (B) daily gain; (C) body weight. The dotted horizontal line represents the set significant threshold (–log10 p value = 3). Red dots in the q–q plots represent the -log p-value of the entire study and the grey line represents the expected values under the null hypothesis of no association.

Table 4 Genome-wide association studies (GWAS)-identified significant single-nucleotide polymorphisms (SNPs) (− log p value > 3), associated traits, and candidate genes.

A total of 99 SNPs were associated with the three traits (birth weight, daily gain, and body weight) at p value < 0.002 (Table 5). Twelve haplotype blocks and 80 tag SNPs were predicted among the 99 associated SNPs with LD (D́ > 0.8) (Fig. 7). Majority haplotype blocks contained two SNPs and only two blocks contain 4, and 6 markers. The haplotype-traits association analysis showed the haplotype 2, 3, 4, 5, 9, and 12 associated with birth weight (Table 6), while four haplotypes of them associated with daily weight gain. The haplotype 8 containing six SNPs didn’t associate with body weight, while The haplotype 4 was the most important haplotype associated with body weight (Table 6).

Table 5 List of SNP markers associated with camel growth at −Log p value ≥  2.5.
Figure 7
figure 7

the Haplotype Blocks of the 99 associated SNPs with birth weight, daily gain, and body weight.

Table 6 the haplotype-traits association analysis.

Genomic Selection Based on BGLR

Based on BRR model in BGLR, the GEBVs of each trait (birth weight, daily gain, and body weight) was estimated using all 14,522 SNPs. The GEBVs also were predicted using the 99 associated SNPs based on BRR, Bayes A, B, and C models. The averaged correlation (r) and regression (b) coefficient between the observed and the GEBVs predicted from two SNPs sets: (1) all 14,522 SNPs and (2) the 99 associated SNPs using BGLR package (BRR model). The accuracy of GEBVs were more than 0.65 base on all 14,522 SNPs, but the regression coefficients for birth weight, daily gain, and body weight were 0.39, 0.20, and 0.23, respectively (Table 7). the GEBVs was less biased based on the 99 associated SNPs. The accuracy of using the 99 associated SNPs also evaluated by cross- validation (3 folds and 100 replications) (Table 8). The accuracy of the BRR model was more than Bayes A, B, and C (r > 0.65) based on the 99 associated SNPs. the accuracy of GEBVs of body weight was less than birth weight and daily gain based on the 99 associated SNPs (Fig. 8).

Table 7 The averaged correlation (r) and regression coefficient (b) between the observed values and the GEBVs predicted from two SNPs sets: (1) all 14,522 SNPs and (2) the 99 associated SNPs using BGLR package (BRR model).
Table 8 The GEBVs accuracy predicted from the 99 SNPs using BGLR package.
Figure 8
figure 8

The accuracy of predicted GEBVs from the 99 SNPs using BGLR package.

Discussion

In this project, we performed a GWAS for growth traits in dromedary camels using the genotypes of 96 calves of the central desert of Iran. this is the first GWAS for birth weight, daily gain, and body weight of camels using GBS. The final goal of genetic mapping is to identify associated markers with phenotype37,38. A number of 14,522 SNPs, generated using GBS, prepared the GWAS in dromedary camels. We identified 36 candidate genes associated with camel’s growth using Annotation of the genomic region (s) within ± 100 kb of the associated 28 SNPs. The candidate genes for birth weight of camels are ACTR3B, RPL32, XRCC2, SERAC1, TBX15, RNF114, IFNL1, SPATA2, SNAI1. For daily gain trait, EFCAB5, NSRP1, Slc6a4, ITGA7, OR6C2, RIPOR3, PTPN1, PARD6B, BCAS4, MOCS3, TRAPPC9 as candidate genes were identified. EMX2, FAM184A, MCM9, ASF1, CSE1L, ARFGEF2, TSR2, WNK3, LPAR4, RTL3, PEG10, PBDC1, TRAPPC13, UNKNOWN, DEXI, TSPYL4, CIITA were for camel body weight. The QTL regions and the genes: ADAMTSL3, CAPN2, CAPN2, FABP6, ZEB2 were detected using GWAS in Colombian Brahman cattle that influencing growth and weight traits39. Lu et al.11 (2020) reported SLCO2A1, LY6K, RALYL, AADACL3, C17H4orf45, BICC1, SHROOM2 as candidate genes for birth weight in Chinese Fine-Wool Sheep11. The candidate genes for Dry Matter Intake, Average Daily Gain, and Metabolic Body Weight based on the imputed 7.8 M WGS in chattels were reported: SNORA70, B3GALT1, DDR2, GPR37, SYT1, LYZL1, RGS2, F13A1, SNORA31, LCORL, DPH6, PARD3, MOS, CRB1, CUL1, CCND2, ARRDC3, PLAG1, STC2, CARD11, TMEM72, SCGB1A1, ERICH6, ARRDC3, GALNT14, PLAG1, ERGIC1, AP3S2, A1CF40. An, B. et al. (2020)41 identified the candidate genes for growth traits in Simmental beef cattle, including SOX2, SNRPD1, RASGEF1B, EFNA5, PTBP1, SNX9, SV2C, PKDCC, SYNDIG1, AKR1E2, PRIM2, SLC37A1, LAP3, PCDH7, MANEA, LHCGR, P2RY1, MPZL1, LINGO2, CMIP, and WSCD1. LOC101903200, PARP4, GPA33, NADK, PREX2, FRMD4B were identified as candidate gene for carcass weight in commercial Hanwoo cattle42.

Performance of genomic selection was determined by the prediction accuracy43,44,45. Until now, Genomic prediction has been conducted in many animal species. The accuracy of GEBVs for economic traits in beef cattle was predicted range 0.38 to 0.4046,47. Also, ranged from 0.18 to 0.33 for growth traits in New Zealand sheep breeds48 It was reported range from 0.40 to 0.50 for important traits in pig49.

The statistical model, marker density, LD, and sample size influenced on selection accuracy 43. The accuracy of GEBVs for growth traits was reported 0.391 (GBLUP) and 0.379 (Bayes Lasso) in Yak50. Because of low samples size in this research, it was suggested to predict unbiased GEBVs using the associated SNPs from GWAS. The accuracy of GEBVs of birth weight, daily gain, and body weight based on the 99 associated SNPs was 0.62, 0.82, and 0.57, respectively. Using the most significant associated SNPs, the reduced SNP panels were developed for many traits51. It was resulted by the Bayes models, that some fraction of the SNPs has zero effect on the trait51. The beef industry has been focused on collections of informative SNPs for subsets of traits that have the most economic effect and market opportunity51. The 600 SNPs (20 markers / chromosome) in Bovine relatively had the same predictive ability rather than 50 K SNP52, and 90% of 50 k SNPs had zero effect52. Garrick et al. 2011 reported that using only 384 SNPs by low costing can be achieved predictive ability for interest traits, so that the accuracies were range from 0.30 to 0.60 for growth, meat quality, and carcass weight53. The accuracies were predicted 0.28, 0.29, 0.39, and 0.43 using 50, 100, 150 or 200 SNPs for marbling in Angus52. the haplotype-traits association analysis may also provide additional power54.

Phenotyping especially pedigree data is now the principal limitation in camel breeding. Genomic predictions do not reliant on pedigree data55, therefore it can be suggested in camel breeding because of changing into intensive farming. Establishing of training populations across countries provides an opportunity to increase training data size and genomic data in dromedaries.

Conclusion

Body weight is a critical economic trait for camels and it is necessary to plan the breeding program. Detection of the genomic regions associated with growth is important for MAS (marker assisted selection) or GS (genomic selection). This is the first genome-wide association study using GBS on dromedary camels, and identifies markers associated with growth traits. This could help to plan breeding programs for genetic improvement in dromedary camels. Further studies using a larger sample size and collaboration of stakeholders will allow confirmation of the associated SNPs and candidate genes identified in this project. Because of long generation interval, no artificial insemination, and seasonal reproduction, genomic selection based on the 99 SNPs associated with growth trait could be a first step into the direction of genetic improvement for body weight in Iranian dromedaries.