Genome-wide detection of CNVs in Chinese indigenous sheep with different types of tails using ovine high-density 600K SNP arrays

Chinese indigenous sheep can be classified into three types based on tail morphology: fat-tailed, fat-rumped, and thin-tailed sheep, of which the typical breeds are large-tailed Han sheep, Altay sheep, and Tibetan sheep, respectively. To unravel the genetic mechanisms underlying the phenotypic differences among Chinese indigenous sheep with tails of three different types, we used ovine high-density 600K SNP arrays to detect genome-wide copy number variation (CNV). In large-tailed Han sheep, Altay sheep, and Tibetan sheep, 371, 301, and 66 CNV regions (CNVRs) with lengths of 71.35 Mb, 51.65 Mb, and 10.56 Mb, respectively, were identified on autosomal chromosomes. Ten CNVRs were randomly chosen for confirmation, of which eight were successfully validated. The detected CNVRs harboured 3130 genes, including genes associated with fat deposition, such as PPARA, RXRA, KLF11, ADD1, FASN, PPP1CA, PDGFA, and PEX6. Moreover, multilevel bioinformatics analyses of the detected candidate genes were significantly enriched for involvement in fat deposition, GTPase regulator, and peptide receptor activities. This is the first high-resolution sheep CNV map for Chinese indigenous sheep breeds with three types of tails. Our results provide valuable information that will support investigations of genomic structural variation underlying traits of interest in sheep.

have fat deposits in the buttocks that look like a tail; thus, Altay sheep are known as fat-rumped sheep, while Tibetan sheep have a thin tail the shape of a flattened cone 13 . To understand the genetic basis of these three types of tails, we aimed to identify the potential contribution of CNV to phenotypic variation. Therefore, the primary aim of this study is to detect CNV in Chinese indigenous sheep breeds with different types of tails. Genome-wide CNV detection was carried out by high-density 600K SNP genotype arrays. The PennCNV program was used to analyse the data and validate the CNVRs by qPCR. To the best of our knowledge, this is the first sheep CNVR map constructed based on high-density 600K SNP genotype data. We also compared our results with those from previous studies of sheep CNV. This study provides a comprehensive map of CNV in the ovine genome that will guide studies aimed at elucidating the causes of complex traits.

SNP genotyping.
A total of 120 sheep were genotyped using the Illumina Ovine SNP 600K BeadChip.
Samples with low-quality signal intensity were excluded based on the CNV quality control filter criteria. A total of 110 individuals from three sheep breeds (36 large-tailed Han sheep, 37 Altay sheep, and 37 Tibetan sheep) were analysed. Table 1 Table S4). The ratio of CNVRs on 26 pairs of autosomal chromosomes ranged from 1.16% to 82.50%. Chromosome 3 had the most CNVRs (42), as shown in Table 2. No CNVRs were identified on chromosome 25 or 26 in Tibetan sheep (Supplementary File 2: Fig. S1).  To provide insight into the functions of the genes in the identified CNVRs, GO and KEGG pathway analyses were performed using the DAVID tool. After the conversion of Ovis aries Ensembl gene IDs to human orthologue Ensembl gene IDs using BioMart, 3016 corresponding human gene IDs remained in DAVID. 'Human' was selected as the background for subsequent analysis. After Bonferroni correction, 50 GO terms were identified as significantly enriched (Supplementary File 2: Table S5). The significantly enriched GO terms were involved in thyroid hormone receptor activator activity (GO: 0010861) and coactivator activity associated with energy expenditure (GO: 0030375) 14 . In addition, some significantly enriched GO terms were involved in GTPase regulator activity (GO: 0030695), GTPase activator activity (GO: 00005096), regulation of GTPase activity (GO: 0043087), regulation of Ras protein signal transduction (GO: 0046578), skeletal system development (GO: 0001501), embryonic morphogenesis (GO: 0048706), embryonic limb morphogenesis (GO: 0048598), transcription regulator activity (GO: 0030528), and other biological processes. KEGG pathway analysis revealed that the genes in the identified CNVRs were enriched in 2 significant pathways: ribosome and notch signalling (Supplementary File 2: Table S6).

Difference between fat-tailed and thin-tailed sheep breeds.
Among the breeds that were selected for this study, large-tailed Han sheep and Altay sheep are fat-tailed and fat-rumped sheep, respectively, while Tibetan sheep are thin-tailed sheep. There were many differences in the CNV of fat-tailed and thin-tailed sheep. Fat-tailed sheep harboured 211 CNVRs covering 56 Mb of the sheep genome sequence and 123 genes. After GO enrichment analysis, some genes associated with fat were identified (Table 3). Seven genes associated with fat were identified in the CNVRs of large-tailed Han sheep, including peroxisome proliferator-activated receptor-α (PPARA), retinoic X receptor A (RXRA), Kruppel-like factor 11 (KLF11), adipocyte determination and differentiation factor 1 (ADD1), fatty acid synthase (FASN), phosphoprotein phosphatase 1 catalytic subunit A (PPP1CA), and platelet-derived growth factor alpha (PDGFA). In Altay sheep, 5 genes in CNVRs were associated with fat deposition, including peroxin 6 (PEX6), RXRA, FASN, PPP1CA, and PDGFA. Tibetan sheep are thin-tailed sheep native to a high-altitude plateau. Sixty-six CNVRs were identified in Tibetan sheep; however, no CNVRs were detected on chromosome 25 or 26. In Tibetan sheep, only 1 gene related to fat deposition was found within the identified CNVRs: RXRA. Two genes related to adaptation to high altitude were found within the identified CNVRs in Tibetan sheep: α-ketoglutarate-dependent dioxygenase alkB homologue 5 (ALKBH5) and nuclear prelamin A recognition factor-like (NARFL).

Comparison with other studies on CNV in sheep.
Our results were compared to those of previous reports on sheep genomic CNV. The first study on sheep genome CNV was conducted by Fontanesi et al. 15 using a tiling oligonucleotide array with approximately 385,000 probes that was used to detect 135 CNVRs from 11 ewes belonging to 6 different Italian dairy or dual-purpose sheep breeds. Eleven of the 135 CNVRs identified were also identified in our results (Supplementary File 1: Table S7, Table 4). Liu 16 identified 238 CNVRs by assessing 329 individuals from 3 breeds using the Ovine SNP 50 BeadChip, but only 12 of these CNVRs on 10 chromosomes were also identified in our results (Supplementary File 1: Table S8). The most recent study on CNV in sheep was carried out by Ma 17 , who identified 111 CNVRs by subjecting 160 individuals from 8 breeds to analysis using the Ovine SNP 50 BeadChip; only 8 of these CNVRs on 4 chromosomes were also identified in our results (Supplementary File 1: Table S9).

Discussion
In recent years, with the development of high-throughput genotyping technology, CNV detection using SNP chip data has been conducted in humans and animals [18][19][20] . Several algorithms for inferring CNV based on SNP chip data have been developed, including PennCNV, cnvPartition, and QuantiSNP. No single approach can capture all CNV; thus, one may be complementary to another. However, most studies on CNV using SNP arrays in animals and humans have used only PennCNV software [21][22][23] , especially for high-density SNP data [22][23][24][25][26][27][28] . PennCNV detects CNV more reliably than do some other algorithms 29 because PennCNV can incorporate multiple sources of information, including total signal intensity and allelic frequency SNPs 30,31 . When multiple algorithms are used to detect CNV, determining the appropriate number of CNVs may be difficult. If we accept only CNVs that are commonly detected by all algorithms, many CNVs are missed, but many false-positive CNVs are identified if we accept all CNVs detected by different algorithms. Finally, high-density SNP chips and stricter filtering criteria could ensure a high positive validation rate. Although we used only PennCNV to detect CNVs, two strict criteria were adopted to reduce the risk of false-positive results: SD of LRR < 0.30 and BAF = 0.01. These strict criteria and high-density SNP array data likely resulted in the high qPCR confirmation percentage of 80% in the present study. In our study, most of the predicted positive samples identified by the PennCNV program agreed well with the qPCR experiments and showed a high level of copy number concordance between them. The discrepancy between PennCNV and qPCR findings may indicate uncertain boundaries of CNV based on the SNP array, as well as the potential impacts of SNPs and small indels.
It is notable that few of the CNVRs that were identified in this study were identified in other sheep CNV studies, similar to the results of other CNV studies in humans and mammals 26,36 . The inconsistency between results from different studies might be due to the following reasons: (1) different genetic backgrounds and sample sizes; (2) different chip platforms (Illumina SNP chips or CGH arrays); and (3) SNP chips with different densities and CNV-calling algorithms.
In this study, we selected three sheep breeds distributed in different parts of China with obviously different types of tails: Altay sheep, large-tailed Han sheep and Tibetan sheep. Altay sheep live in the Gobi Desert, where the annual average temperature and extreme minimum temperature are 4.0 °C and −42.7 °C, respectively, and the ground is snow-covered for 200-250 days of the year. In such an environment, fat tends to be deposited on the rump of the animal, forming a substantial buttock that is an adaptation to the harsh environment of the Gobi Desert. Our results indicate that the process of adaptation to extreme climates in Altay sheep is principally

Table 4. CNVRs in common with previous studies in sheep.
mediated through complex, integrated metabolic responses, as has been confirmed in rodents 33 . Climate has a significant impact on animal fitness and physiology, especially in ruminants 34 . Climate factors, including solar radiation, temperature, UV radiation, humidity, and precipitation, have direct impacts on sheep; some of these climate factors, such as temperature and precipitation, also directly affect the quality, quantity, and digestibility of food, thus impacting sheep indirectly 35 . Therefore, long-term thermal stress can lead to energy metabolic adaptation, as well as cold and heat tolerance, in particular breeds 36 . At the same time, sheep with different types of tails (fat-tailed and thin-tailed) also follow basic thermoregulatory principles to conserve or dissipate body energy during different seasons. The ability to deposit fat in the tail is beneficial in extremely harsh environments and seasons. Large-tailed Han sheep are primarily produced in the hinterland of the northern Chinese plain, which has clear seasonal variation, with cold, dry winters and hot, rainy summers. We found some candidate genes involved in endocrine regulation, consistent with the impact of day length and seasonal timing on sheep physiology and evolutionary fitness. The tails of large-tailed Han sheep show hypertrophy and have a fan-like shape, sagging below the hock and dragging on the ground in some individuals. During periods of famine, the tails of fat-tailed sheep can be used as food.
Thin-tailed Tibetan sheep live in the mountainous region of the Tibetan plateau at an altitude of 3000-5000 metres. Tibetan sheep are relatively strong and tall, with a thin tail in the shape of a flat cone. We found that fat-tailed and fat-rumped sheep had more CNVRs (371 and 301, respectively) than did thin-tailed sheep (66). In the CNVRs of fat-tailed sheep, we detected some genes associated with fat. In the CNVRs of thin-tailed sheep, we found two genes (ALKBH5 and NARFL) associated with altitude adaptation. It is generally believed that the first wild sheep were thin-tailed sheep, from which fat-tailed sheep were produced as the result of selection by humans and by nature 37 . These results indicate that natural and artificial selection for fat deposition in sheep tails may lead to CNV alterations.
In this study, we identified genes in CNVRs associated with fat synthesis and metabolism in fat-tailed (7 genes), fat-rumped (5 genes), and thin-tailed (1 gene) sheep. RXRA was identified in CNVRs in all three types of sheep, indicating that it may play an important role in the maintenance of basic fat metabolism. PPARA, RXRA, KLF11, ADD1, FASN, PPP1CA, PDGFA, and PEX6 were only identified in fat-tailed and fat-rumped sheep, indicating that these genes may cause tail fat deposition. PPARA is a member of the PPAR family. Peroxisomes play an important role in fatty acid metabolism. Unsaturated fatty acids bind to PPARA, which is highly expressed in the liver, skeletal muscle, heart, and kidney 38 , where it activates genes that are involved in fatty acid metabolism. RXRA is a member of the nuclear hormone superfamily and forms heterodimers with a number of nuclear receptors, including the thyroid hormone receptor, vitamin D receptor, GR, and PPAR 39 . RXRA can mediate heterodimerization with retinoic acid receptors (RARs) and is essential for the functional activation of RARs by their ligands 40 . RXRA plays a role in lipid homeostasis 41,42 and foetal development 39 . KLF11 (Kruppel-like factor 11) is a member of the Kruppel-like factor family and is expressed in many human tissues 43 . Loft 44 reported that KLF11 is a browning transcription factor in human adipocytes. PPAR directly induces KLF11 and, together with KLF11, activates and maintains the brite-selective gene programme. ADD1 (adipocyte determination-and differentiation-dependent factor 1), a member of the basic helix-loop-helix (bHLH) family of transcription factors, is associated with adipocyte differentiation and cholesterol homeostasis 45 . ADD1 plays an important role in the process of adipocyte differentiation 46 , as well as in the activation of PPAR and C/EBPs. FASN (fatty acid synthase) plays a key role in the de novo synthesis of fatty acids in mammals. FASN catalyses reaction steps in the conversion of acetyl-CoA, malonyl-CoA, and NADPH into saturated fatty acids. FASN is expressed in the mammalian liver and adipose tissue, where it regulates body fat deposition and fatty acid anabolism 47 . PPP1CA (phosphoprotein phosphatase 1 catalytic subunit alpha isozyme) is an isoform of PPP1C 48 , which was first identified as the enzyme that converts phosphorylase A into phosphorylase B 49 . A number of roles have been attributed to PPP1C in the regulation of important cellular events. PDGFA (platelet-derived growth factor A) is a PDGF subunit 50 . PDGF has been directly implicated in developmental and physiological processes 51 . PEX6 (peroxin 6) is a member of the PEX family. Peroxidase is involved in fatty acid metabolism, glycometabolism, toxic material degradation, and oxygen concentration regulation. The biosynthesis and proliferation of the peroxisome are regulated by PPARs and PEX. Because super-long-chain fatty acids cannot enter mitochondria 52 , they must be oxidized into short fatty acids by peroxidase. Some of the differences may be due to selection because of different environments. We also identified Rab family protein TBC1D13, a GTPase activator that plays an important role in increasing GTPase activity 53 . GTP hydrolysis is catalysed by GTPases, after which GDP serves as an energy source in metabolic reactions within the cell.
To better understand the molecular function of these candidate genes, we examined their GO classification. Many of the genes that were detected in our study were consistent with prior expectations because they are involved in adipose tissue energy balance, skeletal system development, and metabolic activities. Two intriguing candidate GO terms were thyroid hormone receptor activator activity (GO: 0010861) and coactivator activity (GO: 0030375) related to energy expenditure. These observations should be explored and verified in future studies.

Conclusions
We performed genome-wide CNV detection based on ovine high-density genotyping data from 120 sheep with three types of tails and provide the highest resolution CNV map of the sheep genome produced to date. Among the 490 CNVRs, 93 were gains, 390 were losses, and 7 were losses and gains within the same region. The length of the CNVRs ranged from 100.11 to 804. 18  KLF11, ADD1, FASN, PPP1CA, PDGFA, and PEX6. Moreover, multilevel bioinformatics analyses of the detected candidate genes showed that they were likely involved in fat deposition, GTPase regulation, and peptide receptor activities. This study is an important step towards the generation of a CNV map of the ovine genome and provides an important resource for studies of ovine genomic variation. In addition, our findings provide valuable information that will facilitate studies investigating genomic structural variation underlying traits of interest in sheep, including types of sheep tails, fat deposition in sheep tails, and environmental adaptability. Genomic DNA samples were extracted from blood using the TIANamp Blood DNA Kit (Tiangen Biotech Co. Ltd., Beijing, China). The purity and concentration of genomic DNA were measured using a NanoVue Spectrophotometer.

Materials
Genotyping and quality control. The genomic DNA of each specimen was genotyped with the Illumina Ovine SNP 600 BeadChip (Illumina Inc., CA, USA), which contains 604,715 SNPs spanning the ovine genome with an average distance of 4.28 Kb.
To increase the accuracy of CNV inference, stringent quality control criteria were applied: (1)  Detection of CNV. Similar to previous studies 27 , chromosomes X and Y were excluded from the analysis.
PennCNV software was only used to detect CNV on autosomes and was downloaded from http://penncnv.openbioinformatics.org/en/latest/user-guide/download/. The PennCNV algorithm incorporates multiple information sources, including the total signal intensity of the log R ratio (LRR), B allele frequency (BAF), and population frequency of the B allele (PFB) 54 . The LRR and BAF of each SNP for every sample were exported from Illumina GenomeStudio software. The PFB was generated based on the BAF of each SNP marker. Genomic waves were adjusted with the sheep GC model file, which was generated by calculating the GC content of 1-Mb genomic regions surrounding each marker (500 Kb on each side), after the program argument 'gcmodel' was used to adjust the results. According to previous research using high-density SNP chips to detect CNV in humans 55,56 , the CNV filter was based on three criteria: (1) the CNV must contain ten or more consecutive SNPs; (2) the length of the CNV must be at least 100 K; and (3) the CNV must be present in at least one animal.
Finally, following the method reported by Redon et al., CNVRs were obtained by merging CNVs with overlapping regions that were identified across all samples 7 . Experimental validation of CNVRs by quantitative PCR. Quantitative real-time PCR (qPCR) was used to validate the CNVRs that were detected in this study. First, qPCR was performed to validate 10 randomly selected CNVRs that were identified by PennCNV using genomic DNA of the same sheep as analysed by the Illumina chip. The GAPDH gene was chosen as a reference gene for all of the qPCR experiments 15 . The primers were designed using the Primer 3 webtool (http://frodo.wi.mit.edu/primer3/). All of the PCR primers were designed based on NCBI reference sequences (Supplementary File 1: Table S10). PCR reactions were prepared using the Power SYBR Green PCR reagent kit (Takara Bio). PCR was carried out in a total volume of 20 μL containing 1 μL of DNA (approximately 60 ng), forward primer and reverse primers (2 μL total), 10 μL of Master Mix (2×), and 15 μL of water. PCR was performed with the following cycling conditions: 5 min at 95 °C, followed by 40 cycles at 95 °C for 10 sec and 60 °C for 10 sec. All of the reactions were run in triplicate and included blank controls. The 2 −ΔΔCt method was used to calculate the copy number [57][58][59] , where Ct is the threshold cycle. The copy number of the target gene in the test sample was calculated as 2 × 2 −ΔΔCt , where ΔΔCt was defined as in a previous report 60 . The corresponding equation was where C t is the threshold cycle, sample A is the tested individual, and sample B is the control individual.
Gene detection and functional analysis. Genes harboured in the inferred CNVRs were obtained from the Ensembl Genes 64 Database using BioMart software based on the Ovis aries (Oar_v3.1) gene sequence assembly. The Database for Annotation, Visualization and Integrated Discovery (DAVID) (http://david.abcc.ncifcrf. gov/) was used to perform the gene ontology (GO) 61 enrichment analysis and Kyoto Encyclopedia of Gene and Genome (KEGG) 62 pathway analysis. To better understand the functions of the genes within the detected CNVRs, the Ovis aries Ensembl gene IDs were converted into human orthologue Ensembl gene IDs using BioMart because the annotation of the sheep genome is limited.