Uncovering Sub-Structure and Genomic Profiles in Across-Countries Subpopulations of Angus Cattle

Highlighting genomic profiles for geographically distinct subpopulations of the same breed may provide insights into adaptation mechanisms to different environments, reveal genomic regions divergently selected, and offer initial guidance to joint genomic analysis. Here, we characterized similarities and differences between the genomic patterns of Angus subpopulations, born and raised in Canada (N = 382) and Brazil (N = 566). Furthermore, we systematically scanned for selection signatures based on the detection of autozygosity islands common between the two subpopulations, and signals of divergent selection, via FST and varLD tests. The principal component analysis revealed a sub-structure with a close connection between the two subpopulations. The averages of genomic relationships, inbreeding coefficients, and linkage disequilibrium at varying genomic distances were rather similar across them, suggesting non-accentuated differences in overall genomic diversity. Autozygosity islands revealed selection signatures common to both subpopulations at chromosomes 13 (63.77–65.25 Mb) and 14 (22.81–23.57 Mb), which are notably known regions affecting growth traits. Nevertheless, further autozygosity islands along with FST and varLD tests unravel particular sites with accentuated population subdivision at BTAs 7 and 18 overlapping with known QTL and candidate genes of reproductive performance, thermoregulation, and resistance to infectious diseases. Our findings indicate overall genomic similarity between Angus subpopulations, with noticeable signals of divergent selection in genomic regions associated with the adaptation in different environments.

www.nature.com/scientificreports www.nature.com/scientificreports/ material, Brazil also performs national genetic evaluation including Angus purebred local records, which allows the identification of superior Brazilian sires. The progeny originated from locally selected animals tend to be more adaptable to tropical conditions and more robust to environmental changes 7 .
Combining international data of the same breed with the employment of proper adjustment factors has long been referred to as a promising strategy improving accuracies for cattle genetic analysis [8][9][10] , including the Angus breed 11 . Genomic evaluation also benefits from this data combining strategy as dense SNP genotypes will overcome the lack of common ancestry between subpopulations when the pedigree is not deep enough 12 . The genomic prediction relies on the linkage disequilibrium (LD) between the genotyped markers and quantitative trait loci (QTL), which may not persist between subpopulations, or may even be in reversed phase. Although it is expected that LD phases extend across large distances for subpopulations of the same breed in different countries 13 , this information is yet to be adequately assessed in Angus cattle.
The geographic division of Angus subpopulations may increase the diversity and sub-structure within the breed, albeit they originated from a unique domestication center. Since, differentiation in allelic frequencies, haplotypes diversity and LD tend to increase over time due to reduced common ancestry between subpopulations, different environmental pressures, and potential differences in selection schemes 13,14 . Furthermore, selection, either natural or human-oriented, intensifies the differentiation between subpopulations by driving particular genomic regions towards special patterns in each specific geographical location 15 .
Here, we characterize differences and similarities in the genomic profile of Angus from Canada and Brazil, by assessing the genomic level of relationship, inbreeding, linkage disequilibrium and persistence of linkage phase in both subpopulations. In addition, we scan for selection signatures that may shed light on the molecular basis of selected traits and divergent adaptation mechanisms on the two subpopulations. Although the current study has been focused on the population genomic aspects, our results are relevant to genomic predictions, especially to further collaborative analyses incorporating the studied herds.

Results
Population substructure and relationship. Here, we evaluated a sample of 948 Angus cattle (382 from Canada and 566 from Brazil) with genotypic information of 31,483 autosomal markers, spanning up to 2.5 GB of the bovine genome in an average spacing of 78.6Kb (ranging from 0.006Kb to 3,049Kb). The overall averages of observed heterozygosity (0.353 ± 0.012; 0.350 ± 0.009) and minor allele frequency (MAF: 0.268 ± 0.141; 0.263 ± 0.144) did not differ between Brazilian and Canadian subpopulations ( Table 1). The genomic relationship between samples ranged from 0.5 to 0.81 and the average estimated for pairs of individuals from different subpopulations was 0.577 ± 0.021. The average genomic relationship within the Canadian subpopulation was slightly higher than the Brazilian average ( Table 1).
The proportion of variance in genomic relationship matrix (GRM) explained by the first and second principal components was 21.72% and 6.07%, respectively (Fig. 1). In line with the genomic relationship averages, principal component analysis (PCA) revealed greater diversity within the Brazilian samples than within Canadians and depicted two closely connected clusters representing the subpopulations. Although the Brazilian subpopulation included black and red coat colored samples, a clear extra population substructure was not observed within this subpopulation due to coat color variation (Fig. 1). It is noteworthy that the substructure that allowed distinguishing the two clusters representing the Angus Brazilian and Canadian subpopulations is not comparable with the stratification level involving different breeds ( Supplementary Fig. S1). The level of differentiation between the two subpopulations, measured through the average of Wright's F ST statistics, was equal to 0.072 ± 0.021.
Averages of genomic inbreeding estimated using diagonal elements of GRM and runs of homozygosity (ROH) were similar in the two subpopulations (Table 1), being slightly higher in the Canadian. The correlation between these two metrics was equal to 0.9415, with F GRM estimates being overall higher than F ROH (Table 1, Fig. 2). Table 2 summarizes the distribution of ROH lengths by subpopulations. The mean length of ROHs and the proportion of segments longer than 8 Mb were higher in the Brazilian subpopulation. All samples presented at least one ROH segment longer than 8 Mb. In total, 59 and 63% of samples in the Canadian and Brazilian subpopulations, respectively, presented at least one ROH segment longer than 16 Mb. Linkage disequilibrium. We used phased haplotypes to estimate linkage disequilibrium (LD), through r 2 statistic. The number of SNPs, averages of SNP distances and r 2 of adjacent SNPs per chromosome are presented in Supplementary Table S1. The r 2 of all pairwise adjacent markers averaged 0.20 and 0.21 in the Brazilian and Canadian subpopulations, respectively ( Table 3). The r 2 average decreased according to the increase in distance www.nature.com/scientificreports www.nature.com/scientificreports/ classes of adjacent markers and increased according to the increase in the MAF threshold (Table 3). There was a smaller increase in r 2 by rising the MAF threshold from 0 to 1%, than rising it from 1 to 5%.    www.nature.com/scientificreports www.nature.com/scientificreports/ With all combinations of syntenic SNPs, the overall average of r² kept higher than 0.2 for markers distanced up to 50Kb in both subpopulations (Fig. 3A). At distances around 1 Mb, the markers pairwise had r² averages equal to 0.06 and 0.07 in the Brazilian and Canadian subpopulations, respectively. Although the LD averages in the Canadian subpopulation were the largest for almost all distance classes, the slope of decay were consistent between the two subpopulations. The persistence of the phase between the two subpopulations remained high at a scale of nearly 0.50 up to a distance of 5 Mb (Fig. 3B) indicating agreement between linkage phases in the two subpopulations.
Candidate regions under selection within and across subpopulation. The ROH scan revealed 3 and 2 regions with at least two consecutive SNPs with autozygosity score above the empirical threshold (99.9 percentile) in the Canadian and Brazilian subpopulations, respectively (Fig. 4, Supplementary Tables S2). The F ST and varLD tests (between-population comparative methods) identified 5 regions each, with at least two consecutive windows exceeding the empirical threshold (Fig. 4, Supplementary Tables S3). The length of these regions ranged from 160 Kb to 1910Kb and presented some overlapping between them. Table 4 lists 6 selection signatures, which represent genomic regions concordant between the 99.9 percentile of at least two of the independent tests, hence more likely to be true positive signals 16 .
Two ROH islands common to both subpopulations were detected at BTA13:63.77-65.25 and BTA14:22.81-23.57 (Fig. 4A,B), surrounding candidate genes of stature, muscle development and lipid metabolisms, such as MYH7B (myosin heavy chain 7B), GDF5 (growth differentiation factor 5), PLAG1 (PLAG1 zinc finger) and XKR4 (XK related 4) [17][18][19][20] . A further ROH island exclusively detected in the Canadian subpopulation, BTA7:37.84-38.64, overlapped with a signal of divergent selection highlighted via F ST (Table 4, Fig. 4A,C), indicating it as a region that has been targeted by selection only in the Canadian subpopulation. This genomic region encompassed several genes encoding integral components of membrane related to reproductive traits. The ARL10 (ADP-ribosylation factor-like 10) gene is suggested to play important role in bovine pre-implantation embryo development 21    www.nature.com/scientificreports www.nature.com/scientificreports/ breeds, such as the number of insemination per conception, fertility index and the interval between first and last insemination 22 .
Selection signatures revealed by the overlaps of the between-population differentiation methods (F ST, and varLD), at BTA7:21.31-21.89 and BTA18:11.75-11.94, are putatively related to adaptation into adverse conditions across the different environments (Table 4, Fig. 4C,D). The selection signature BTA7:21.31-21.89 contained several adaptive immunity-related genes, including the IRF1 (interferon regulatory factor 1) and IL4 (Interleukin-4) genes, which control the differentiation of naive T helper (Th) cells and activation of cell-mediated   www.nature.com/scientificreports www.nature.com/scientificreports/ or antibody-mediated immune responses 23,24 . In addition, the KIF3A (Kinesin family type 2 member 3 A) gene, mapped at this region, is a candidate to innate tick-resistance typical of some breeds 25 . Beyond immunity-related genes, the selection signature BTA18:11.75-11.94 comprised the COX4I1 (cytochrome c oxidase subunit 4I1) that is related to thermoregulatory efficiency. This gene represents a candidate gene to body temperature regulation, which was previously associated with cold tolerance 26 , and was up-regulated in the liver of cows experimentally submitted to cold-stress 27 .
A further selection signature identified via F ST and varLD, at BTA18:13.76-15.40, was the longest signal detected in the current study and comprised the MC1R gene, major gene to coat color determination in Angus 28,29 . Interestingly, this signal was detected even when the selection signature scan was re-performed keeping only samples of black coat color in both subpopulations (Supplementary Fig. S2). Other genes of this region that are plausible of being causing this signal include DNAJA2 (DnaJ heat shock protein family member A2) that is involved in cellular responses to heat stress 30 , and some key players of immune responses, such as CYBA (cytochrome b-245 alpha chain) and CDK10 (cyclin dependent kinase 10) [31][32][33] .
The selection signatures common between Brazilian and Canadian subpopulation (BTA13:63.83-64.87 and BTA14:24.43-26.20) comprised 41 genes, whereas the regions representing signals of divergent selection (ROH islands exclusively detected in one of the two subpopulation, F ST, and varLD signals) comprised 74 genes. Significant enrichment of particular terms or biological processes was not detected in any of these gene lists.

Discussion
Accentuated differences between the overall genomic diversity of two populations allow two assumptions, being (1) recent gene flow or adaptive genetic variation in the subpopulation of higher diversity, or (2) genetic drift, reduced effective population size or increased inbreeding in the population of lower diversity 34,35 . Although the genomic diversity patterns of the two subpopulations studied here were quite similar, the Brazilian samples showed higher heterozygosity, broader dispersion in PCA and lower averages of genomic relationship, inbreeding, and linkage disequilibrium than Canadian samples. The slightly lower diversity observed in the Canadian subpopulation is most likely due to the sample origin and its commercial purpose. The Canadian genotypes were obtained from a single experimental herd of reduced effective population size, in comparison with Brazilian samples that represented commercial herds, more opened to the eventual adoption of imported semen.
The observed heterozygosity within each Angus subpopulation was in agreement with averages varying from 0.357 to 0.399, reported in taurine breeds genotyped with high-density SNP panels 34,36 , but higher than the average of 0.28 previously reported to 42 Angus samples 37 . Presumably, this difference is due to the greater number of genotyped individuals studied here. Similar to previous results , the correlation between F GRM (with base allele frequency fixed to 0.5) and F ROH was high, with greater estimates obtained to F GRM due to the non-distinction between IBS and IBD alleles as done by F ROH 38,39 . Long ROHs (>8,000 Kb) were observed in both subpopulations, indicating relatively recent relatedness within both of them. Autozygous segments longer than 8,000 Kb and 16,000 Kb indicate nearly 6 and 3 generations, respectively, since the common ancestor that gives origin to both haplotype copies 40 .
The population structure depicted in PCA showed a great overlap between the clusters of the Brazilian and Canadian samples. One reason for this continued genomic connection resides in the recent common ancestry, also evidenced by the relative high LD phase persistence across the subpopulations. This recent common ancestry is most likely due to the importation of semen from Canada into Brazil, as well as the extensive importation of U.S. Angus sires into both countries (ASBIA 2018, http://www.asbia.org.br/wp-content/uploads/2018/10/ INDEX-ASBIA-2017_completo.pdf).
The averages of r² considering all SNP pairs spaced up to 50Kb showed agreement with estimates previously reported for taurine breeds (r² ~ 0.25) 41,42 and were higher than averages previously reported for indicine and taurine-indicine cross-bred cattle (r² ~ 0.18) 41,43 . The LD is a decisive parameter to define the required markers density for successful genomic prediction and genome-wide association studies 10,44 , being suggested that r² values of 0.2 are required to achieve high prediction accuracies (>0.8) 45 . Considering the bovine genome size of 2,875 Mb, and the averages of r² higher than 0.2 for adjacent SNPs spaced up to 100Kb in both Angus subpopulations, the minimum genotyping density required to these subpopulations would be 28,750 SNPs. The distribution of 31,483 SNPs common between two different commercial arrays maintained some gaps longer than 100Kb along chromosomes. Thus, the imputation of genotypes to the non-common SNPs would be a feasible strategy to enhance the marker density in both subpopulations, fulfilling long gaps and assuring great prediction accuracies in further studies. Furthermore, a single reference population genotyped with a higher density of markers could supply reference haplotypes to both subpopulations, due to the strong preserved phase between these two subpopulations.
The persistence of phase across populations may indicate allele-QTL preserved phases, being another very important issue to joint-genomic analysis. Porto-Neto et al. 46 reported considerable gains in genomic prediction accuracies in five cattle traits using only SNPs of consistent linkage phase between two distinct populations. The linkage phase correlation between the Brazilian and Canadian subpopulations stayed high for genomic distances extended to 5 Mb, corresponding well with estimates of approximately 0.7 reported for subpopulations of a single breed 42 . This persistence was significantly higher than estimates involving different breeds (approximately 0.45) 41 . Therefore, combining the two subpopulations studied here to perform genomic prediction would result in gains in reliability, as it would enable increased reference population without the effect of SNPs in reversed phase masking the marker-QTL association.
The suitability of ROH islands for the selection signature scanning was reinforced here, due to several overlaps with QTL of traits undergoing selection and previously reported selection signatures of beef cattle. The two ROH islands found common between the Brazilian and Canadian samples suggested common selection events on BTA13 and BTA14, at genomic regions harboring genes known to be associated with stature in mammals 19,47,48 , www.nature.com/scientificreports www.nature.com/scientificreports/ thus putatively influential on growth traits undergoing selection in both subpopulations. These two signals were previously reported as selection signatures of Angus 49 and were significantly associated with the body weight of Brahman 50 , in a study based on the reference UMD3.1. Reasonable candidate genes of BTA13 includes the GDF5 gene, which belongs to a family of bone morphogenetic proteins that stimulate bone formation and regulate growth 51 , and MHY7B that is expressed in the skeletal muscle and downregulated in double-muscled breeds 20 . The region of BTA14 comprising PLAG1 and XKR4 genes is well-known for its pleiotropic effect on economically important traits of beef cattle, such as backfat thickness, ribeye area, body and carcass weight 5254,55 , as well as the serum level of the growth-related hormones, IGF1 53 .
An impressive signal of genomic differentiation between the subpopulations was identified at BTA7, corresponding to the ROH island exclusive of the Canadian subpopulation that partially coincided with F ST results. The rs110428791 SNP, which is an intronic variant of gene UNC5A mapped at this selection signature, was previously associated with reproductive performance in taurine breeds, in a study based on the reference genome UMD3.1 22 . In addition, homologous genes of RNF44 (ring finger protein 44), and UIMC1 (ubiquitin interaction motif containing 1), have been associated with reproductive risks and duration of reproductive life in women 56,57 . These findings suggest selection pressures in Canadian subpopulation with a balance between growth and reproductive traits.
The signal of divergent selection detected at BTA7:21.31-21.89 is most likely related to differences in the ability of quicker and strong response towards pathogen invasions across the subpopulations raised in different environments. This genomic region was previously identified as a QTL associated with resistance to viral load in Holstein cattle 58 , and it comprises genes that showed differential expression between susceptible and resistant hosts to cattle tick (KIF3A) 25 , mange infested and uninfested animals (IL4, IL5 and IL13) 59 , healthy and with mastitis cows (IL4) 60 . In addition, this selection signature was associated with the adaptation of taurine breeds to harsh environments 61 and of indicine breeds to not so harsh, but still challenging environmental conditions 62 .
Two highlighted regions of BTA18 overlapped with selection signatures previously reported in Russian taurine breeds adapted to harsh weather 61 and harbored candidate genes to immune and thermoregulatory efficiency. It is noteworthy that the longest selection signature (BTA18:13.76-15.40), encompassed the major gene for coat color variation in Angus, MC1R 28 . We were able to refuse the likelihood of this signal being related to the presence of red Angus samples (recessive homozygous) in the genotyped Brazilian subpopulation. However, we didn't estimate frequencies of recessive allele among the samples of black coat color, since our genotyping information did not include the causal mutation for pigmentation in Angus, rs109688013 29 . Nevertheless, none of the two countries perform selection based on coat colors. Thus, it is not discarded the hypothesis of this signal being caused by other genes in this region, related to better performances on selected traits under challenging environmental conditions. Furthermore, this region has been indicated as QTL of resistance to infection in dairy cattle 32,63 , and revealed as a selection signature of cattle breeds with coat color phenotypes not determined by MC1R, such as Brown Swiss, Hanwoo and Nguni 33,64,65 . Further investigation adopting denser panels or even sequencing data could narrow down this region and empower the identification of the causal mutation(s) behind this signal.
Despite the potential of parallel selection in promoting differentiation of geographically isolated subpopulations of the same breed 13 , the Angus subpopulation from Canada and Brazil still present substantial genomic connection likely due to common ancestry (mainly influenced by the use of semen from common sires) and similarity between the selection schemes, primarily focused on growth-related traits in both countries 7,66 . Nevertheless, signatures of divergent selection pinpointed particular loci, related to reproductive performance, thermoregulation, and resistance to infectious diseases. Therefore, specific population subdivision might have been caused by both adaptation to distinct environments and particularities of selection criteria applied in each country, such as an additional emphasis on breeding indexes that combines fertility and growth traits in Canada 67-69 .

Methods
Genotyping information. Care and Use Committee approval was not required for the current study since it was performed with previously genotyped samples.
The genotypic data consisted of 382 and 566 Angus born and raised in Canada and Brazil, respectively. The Canadian subpopulation, previously described in Chen et al. 70  All the genotyped samples presented call-rate higher than 0.90. The SNPs from both arrays were remapped according to the new bovine assembly, ARS-UCD 1.2 71 , using publicly available coordinates provided at https:// www.animalgenome.org/repository/cattle/UMC_bovine_coordinates. SNPs presenting different positions among the different coordinate files aligned into the ARS-UCD 1.2 (N = 12) were filtered out from both arrays during the remapping process. Then, non-autosomal markers, markers presenting call-rate lower than 0.90 or showing a significant deviation from Hardy-Weinberg equilibrium (p < 10 −5 ) were removed in both subpopulations datasets. Imputation of missing genotypes and haplotype phasing was performed per subpopulation using the Fimpute v3 72 . Finally, only non-monomorphic markers, common between the two panels were kept to further analyses (n = 31,483).

Genomic relationships and inbreeding coefficients.
Relatedness between all pairs of individuals was estimated using the GRM computed according to method 1 in VanRaden 73 , with base allele frequency fixed to 0.5 39,74 . www.nature.com/scientificreports www.nature.com/scientificreports/ Genomic inbreeding coefficients based on GRM (F GRM ) were estimated by subtracting one from its diagonal elements. Runs of homozygosity based inbreeding coefficients (F ROH ) were estimated as the sum of lengths of ROHs of each individual divided by the length of the genome covered by the SNPs 75 . The ROH discovery was conducted using PLINK v1.9 76 , by sliding a 30 SNPs window in one SNP interval throughout each autosomal chromosome. One heterozygous genotype was permitted per window to account for occasional genotyping errors. The minimum length of an ROH segment was set to 1,000Kb, where a density of at least 1 SNP every 120Kb and no marker-intervals longer than 1,000Kb were required, following the recommended criteria of smaller density genotypes 77 .
SNP autozygosity scores were calculated per subpopulation and expressed as a proportion of animals in which each SNP appeared in an ROH. Genomic regions harboring at least two adjacent SNPs falling into the 99.9 th percentile were assigned as ROH islands. The positions of the first and the last SNP of these regions were assumed as the start and end point of ROH islands.
To assess the influence of ROH detection using a fixed sliding window (approach implemented in PLINK software 76 ) in our results, the ROH detection was repeated with an approach based on run detection in consecutive SNPs, implemented in the package detectRUNS 78 of R software v3.4.4 79 . This procedure revealed F ROH values highly correlated (0.98) with those estimated in PLINK 76 , and essentially the same ROH islands ( Supplementary  Fig. S3, Tables S4).
Population substructure. Principal component analysis was applied to the GRM to investigate the population sub-structure. Additionally, we performed PCA after adding publicly available genotypes of Holstein 80 and Nelore 81 to compare the stratification level between the subpopulations of a single breed with the stratification across different breeds (Supplementary Fig. S1).
Analyses of linkage disequilibrium. Phased haplotypes were used to estimate the LD extent and persistence of phase, applying publicly available scripts at https:// www.msu.edu/~steibelj/JP_files/LD_estimate.htm. A measurement of LD was estimated for all syntenic SNP pairs by the genotype squared correlation (r 2 ), as proposed by Hill and Robertson 82 . The decay of LD was then analyzed in each subpopulation using r² averages within bins according to the physical distances of SNP pairs, including 10 Kb intervals starting from 10 to 100 Kb, and the intervals 100-200 Kb, 200-500 Kb and 500-1000 Kb. To assess the effect of MAF on LD, the r 2 was calculated with SNPs of minimum MAF higher than 0.01 (N = 30,657) and 0.05 (N = 28,048) in both subpopulations.
The consistency of the linkage phase for pairs of loci across subpopulations was analyzed through genotype correlation (r ij ). We estimated r ij between all pairs of SNPs spaced for shorter intervals than 10 Mb in both subpopulations. Then, the persistence of phase was estimated within intervals of 100Kb, starting from 0.1 to 5 Mb, as Pearson's correlation of r ij for each subpopulation, as follows:R where R m,n is the correlation of phase between r ij in the subpopulation m (Brazilian) and n (Canadian); r m ( ) and r n ( ) are the averages of r ij within interval p in the subpopulation m and n, respectively; and s (m) and s (n) are the standard deviation of r ij in the subpopulation m and n, respectively 83 . Contrasts between regional patterns of LD of the two subpopulations were performed through the statistic varLD 84 , implemented in varLD software 85 . The varLD method consists of building symmetric LD matrices in each subpopulation, with r 2 of all pairs of SNPs within windows defined based on a fixed number of markers, sliding in one SNP interval. Next, eigenvalues of each symmetric matrix are computed per subpopulation, and raw varLD scores assigned as the sum of absolute differences between ranked eigenvalues of the homologous matrices. Finally, each raw varLD score is compared against the distribution with all windows values. Since one of the premises of varLD statistic is the establishment of windows encapsulating the same number of SNPs across the genome 84 , we choose to define windows of 15 SNPs (mean length of 1095.9 ± 389.9 Kb), instead of windows with fixed length size. Genomic regions with at least two consecutive windows falling in the 99.9 th percentile of varLD were considered indicative of divergent selection. Preliminary analyses carried out using a range of SNP numbers (15, 20, 25 and 30) yielded consistent results related to the majority of signals, with the shortest boundaries of the detected signals associated with the windows of 15 SNPs (Supplementary Fig. S4).
Allelic frequency differentiation. The proportion of total variance in allele frequencies due to differences between subpopulations was quantified by applying the Wright's fixation index (F ST ), estimated according to Cockerham and Weir 86 using the "fsthet" package 87 in R software, v3.4.4 79 . F ST and heterozygosity were averaged in sliding windows set equal to those of varLD procedure (15 SNPs, sliding in one SNP interval). To control false positive discoveries in F ST test the empirical thresholds were defined based on the F ST -heterozygosity outliers approach 87 . Windows were grouped into 20 bins according to heterozygosity and genomic regions were considered as putative signals of divergent selection if at least two sequential windows fell into the 99.9 th percentile scores in its heterozygosity's bin. Preliminary analyses carried out using windows defined based on physical size (1000 Kb, 750 Kb of overlapping) and SNP number (15 and 30) yielded consistent results (Supplementary Fig. S5).
Genes and QTL identification. Genomic regions identified into the top 99.9 percentile of more than one of the tests (ROH, varLD, and F ST ) were considered selection signatures, as they are more likely of being true positive signals 16,88 . Known QTL overlapping the selection signatures were assessed in the CattleQTL database -release 40 89 , with QTL positions lifted from the former reference genome UMD3.1.1 to the new reference ARS-UCD 1.2. The Ensembl cow gene -set 99 90 , which is based on the new ARS-UCD1.2 reference, was used for gene prospection.
www.nature.com/scientificreports www.nature.com/scientificreports/ Two lists of candidate genes were evaluated as over-representation of biological processes and molecular pathways, using the DAVID v.6.7 91 and PANTHER v.13.1 92 . One list included 26 genes surrounded by ROH islands common for the two subpopulations and a further list included 73 genes of genomic regions divergently selected (ROH islands exclusively detected in one of the two subpopulations, F ST and varLD signals).

Data availability
The datasets used and/or analyzed during the current study are available upon reasonable requests to the corresponding authors.