Convergent genomic signatures of domestication in sheep and goats

The evolutionary basis of domestication has been a longstanding question and its genetic architecture is becoming more tractable as more domestic species become genome-enabled. Before becoming established worldwide, sheep and goats were domesticated in the fertile crescent 10,500 years before present (YBP) where their wild relatives remain. Here we sequence the genomes of wild Asiatic mouflon and Bezoar ibex in the sheep and goat domestication center and compare their genomes with that of domestics from local, traditional, and improved breeds. Among the genomic regions carrying selective sweeps differentiating domestic breeds from wild populations, which are associated among others to genes involved in nervous system, immunity and productivity traits, 20 are common to Capra and Ovis. The patterns of selection vary between species, suggesting that while common targets of selection related to domestication and improvement exist, different solutions have arisen to achieve similar phenotypic end-points within these closely related livestock species.

MSMC does not have sufficient resolution for very recent time points preventing us from saying much about the demographic history of these species over the last 1,000 years. For the earliest period (i.e., before 300 kya), the ancestral Ne of Capra appears larger than that of Ovis, but current methods prevent us from seeing further back in time and better characterise that period. We must also caveat the fact that the mutation rate used is not goat/sheep specic, what might aect the interpretation of the timing (see Methods).

Genetic diversity within groups
Among Ovis groups, the IROO group displayed 24.9 million polymorphic SNPs, while the IROA, MOOA and wpOA groups had respectively 20.7, 21.4 and 21 million polymorphic SNPs (Supplementary Table 1). Although the IROO group presented the lowest sample size with 13 individuals, it also showed the highest level of mean nucleotide diversity (π=2.68x10 -coefficient in the IROO group (F=0.08) was significantly lower (p-value<10 - 4 Table 1). Nucleotide diversity was higher for IRCH (π=1.65x10 -3 ) than for IRCA (π=1.53x10 -3 ) wpCH (π=1.54x10 - For the following analyses of genetic structure and admixture we removed the SNPs showing high LD (see Methods), keeping 6,155,224 and 6,511,536 SNPs for Ovis and Capra, respectively. Using sNMF 2 we estimated the number of genetic clusters in the data and the admixture between them. The best partition of samples into clusters for each genus was estimated with the cross-entropy criterion resulting in K=2 for Ovis and K=3 for Capra.
Nevertheless, we further investigated the effect of higher values of K to assess how the partition of individuals changed for different levels of the genetic structure. For both Ovis and Capra, the clustering analysis first separated the wild animals from the domestics. For K=2, the domestic and the wild Ovis animals belonged clearly to two different clusters. For K=3, the domestics were split in different clusters according to their geographic origins, with the wpOA individuals being assigned to the cluster representing either IROA or MOOA or being admixed between both. For K=4 and K=6 two inbred and one other IROO individuals were clustered apart, while at K=5 a new cluster appeared with various levels of admixture in the wpOA group ( Supplementary Fig. 7a). For Capra, the clustering for K=2 showed one cluster representing the domestic animals, while the second cluster regrouped 8 IRCA animals and the remaining 12 animals were admixed. At K=3, the European goat breeds belonged to a distinct cluster, and at K=4 the Iranian and the Moroccan goats were clustered separately.
Then, for K=5 and K=6 the admixed IRCA individuals were largely assigned to two new clusters of respectively 7 and 3 individuals ( Supplementary Fig. 7b). Notably, when increasing the number of clusters we found no admixture between wild and domestic animals for both Ovis and Capra.

Tests for admixture
As the genetic clustering clearly separated wild and domestic individuals, we rooted the TreeMix tree by the split between both lineages. In sheep, most of the wpOA individuals were more closely-related to MOOA than IROA individuals, that clearly clustered separately ( Supplementary Fig. 3a). In goats, the IRCH and MOCH groups were also clearly separated, and while the European breeds from wpCH were close to the MOCH group, the Australian breeds clustered with IRCH ( Supplementary Fig. 3b). Further post-processing with the dedicated TreeMix R package 3 confirmed that the topologies recovered in absence of migration edges explained nearly all differences in the allelic frequencies among groups (99.1% and 98.2 of the total variance for Ovis and Capra, respectively, Supplementary Table   3). Moreover the residual values were high only for intra-species pairs of individuals (i.e., not between wilds and domestics, see residuals heat maps in Supplementary Figure 3). We also explored the possibility of minor admixture events, adding an increasing number of migration edges, from 1 to 4 (Supplementary Table 3). All the migration edges involved individuals from the same species (no wild-domestic migration). For Ovis no sensible increment of the variance was observed. The maximum increment of the variance in Capra (>99%) was reached by adding a single migration edge between domestics, which was inferred from one Moroccan individual to a European breed. A formal test of admixture based on the f3 statistics 4 , however, showed no significant results for any combination of groups (Z-scores > 0, Supplementary Table 6), providing no evidence for any recent admixture among the wild and domestic species within each genus. It is worth noting that this finding does not reject ancient introgressions of genetic material, as domestication and intensive selective breeding may have eroded the molecular signature of admixture.

• Supplementary Note 4: Detection of selection signatures
The subsets of variants filtered on allelic frequencies (see Methods) were 22,134,330 for the 53 Ovis samples and 12,412,758 for the 58 Capra samples used in this analysis.

hapFLK results
We filtered out from hapFLK results the genomic regions for which the haplotype clustering was not congruent among the domestics. Thus, we retained a total of 8,498 SNPs for Ovis and 10,571 SNPs for Capra with q-values<10 -2 from the FDR framework applied to the whole Supplementary Data 4). In 10 cases where a gene was found in the homologous regions, the genes most likely impacted by selection were the same between Ovis and Capra (Table   1). In 4 other cases the genes were different but involved in similar phenotypic effect in livestock, while the genes from 3 regions corresponded to different classes. Three regions were intergenic in both Capra and Ovis. Four genes common to sheep and goat (HMGl-C, KITLG, MTMR7, and NBEA) show clear pleiotropic effects. HMGI-C is involved in body size in sheep 6 and was also described as being responsible of dwarf size in chicken 7 . MTMR7 is expressed in the central nervous system in human 8 and is also involved in fatty acid composition in pig 9 . KITLG is known to have an effect on coat coloration in numerous mammal species 10 , is associated to litter size in goat 11 and is also implicated in nerve cells development and mast cells development, migration and function 12 . NBEA is associated to wool crimp in sheep 13 but is also known to regulate neurotransmitter receptor trafficking to synapses in human 14 and was suspected to play a role in docility in cattle 15 .
In 21 regions among the 46 detected with hapFLK in Ovis, 148 SNPs showed combined FLK p-values<10 -4 . According to VEP, they were mostly located in intergenic regions (92 SNPs) and intronic sequences (44 SNPs), whereas 12 SNPs were upstream or downstream genes and two SNPs were exonic (one missense and one synonymous, see Supplementary Table   5). This distribution was not significantly different from that of the whole set of SNPs initially tested. In Capra, 928 SNPs located in 29 out of the 44 regions detected with hapFLK showed combined FLK p-values<10 -4 . They were found in intronic sequences (544 SNPs), in intergenic regions (296 SNPs), upstream or downstream genes (190 SNPs). Within coding sequences, only one missense variant was found. There was here a clear enrichment for variants located in non-coding regions close to genes (intronic, upstream gene and downstream gene) compared to the whole set of SNPs analysed (Chi-square test; p-value<2x10 -16 ; see Supplementary Table 5).

• Supplementary Note 5 : Comparison between selection detection methods
Following a reviewer's comment for exploring and discussing the differences between our approach and that of Naval- Sanchez et al. (2018) 16 who studied a sampling including some identical individuals, we provide the following information about the impact of the method and sampling on the detection of signatures of selection.
Our study was specifically designed to detect selected sweeps in common between sheep and goat. This stems from both the sampling (ie. working on traditionally-managed domestics with the same origin and wild populations from the cradle of domestication) and the choice of the method (ie. hapFLK with stratified FDR approach). Naval Sanchez et al. 16 used an other approach based on Fst and pi (nucleotide diversity variation) to contrast the genomes of wild mouflon to that of a worldwide panel of 43 breeds (each represented by only a few individuals). This experimental design is very different from ours and designed to detect different selective processes. The method used by Naval-Sanchez et al. for detecting selective sweeps on 20 kb windows combined the mean Fst value and the difference in nucleotide diversity between wilds and domestics -estimated by ln(pi_wilds/pi_domestics).
Combining the two variables make this method relaxed in a statistical sense. The thresholds applied for selecting the outlying windows (e.g., p-values on z-scores of 10 -2 in Naval Sanchez et al. 16 does not reflect the true level of significance due to the non-independence of the two distributions that are combined. This relationship between Fst and ln(pi-ratio) can be seen on Supplementary Figure 9 (shape of the scatter-plot). The selection of outliers with the Fst/pi-ratio approach is thus very likely to be less stringent. hapFLK, however, is based on the difference in haplotype frequencies between populations and takes into account the hierarchical structure of the sampling. Its detection power has been proved to be greater than that of the Fst approach 6 , is only slightly affected by migration and is not affected by bottlenecks. Even if Fst approaches are more commonly used for detecting selective sweeps, our study is much more suited to the hapFLK approach, even more when considering our sampling made of populations with different drift effects.
Whatever this better theoretical support for the hapFLK approach, we analysed the same dataset with the two methods (HapFLK versus Fst/pi-ratio) in order to assess how the detection method would affect the result. We also assessed the effect of the sample set on the detection of selected genomic regions. As hapFLK is designed to detect selection among hierarchically structured populations, it cannot be applied to the worldwide panel, which are too heterogeneous as they are composed of breeds with very different demographic histories admixture, bottlenecks,...), and each breed is represented by a few individuals.
Thus we applied the Fst/pi-ratio method on our sheep dataset and performed the following To conclude, the Fst/pi method is not adequate to analyse our dataset, as the result might be affected especially by the differential effects of drift in the populations. However, it confirmed most of the sweeps detected with hapFLK but appeared to be more permissive. Moreover, the sweeps that we describe are confirmed by the haplotypes shown on Supplementary For each group the sample size, the number of segregating SNPs (S), the average nucleotide diversity (π), the mean inbreeding coefficient (F) and the mean genetic load over the whole genome are indicated. The p-value of the difference between the wild group and each domestic group is given in brackets (one-sided t-test for F and load, Mann-Withney test for π). For π, the Mann-Whitney tests between all pairs of groups for each genus all showed p-values < 2.2e-16. For mean inbreeding F and mean genetic load, the p-values of the onesided t-tests on individual values between the wild group and each of the domestic group correspond to: ns: p-value>10-1 ; a: 5x10-2<p-value<10-1 ; b: 10-2<p-value<5x10-2 ; c: p-value<10-2 .
The differences between the distributions of SNPs with FLK p-values<10 -4 and all SNPs used for detecting selection signatures are tested with a Chi-square test.

Supplementary Fig. 4: Midpoint rooted Neighbor-Joining trees based on the % of iden>ty between sequences for homologous regions under selec>on in both
Ovis and Capra. The 2n haplotypes are represented as leaf/row in the panels. The color bars (center panel) relates each haplotype to its group (blue: wild, green: iranian domes>cs, red: moroccan domes>cs, orange: worldpanel). The right panel depict all SNPs in the considered haplotype (brown and blue squares represent minor alleles. Black squares represent missing data). IROO                  and not with the Fst/ln(pi-ratio) method.
Among them six were close to at least a significant 20 kb region (green stars). Some regions with low SNP densities (< 20 SNPs) were filtered out in the Fst/ln(pi-ratio) method