High-density Genotyping reveals Genomic Characterization, Population Structure and Genetic Diversity of Indian Mithun (Bos frontalis)

The current study aimed at genomic characterization and improved understanding of genetic diversity of two Indian mithun populations (both farm, 48 animals and field, 24 animals) using genome wide genotype data generated with Illumina BovineHD BeadChip. Eight additional populations of taurine cattle (Holstein and NDama), indicine cattle (Gir) and other evolutionarily closely related species (Bali cattle, Yak, Bison, Gaur and wild buffalo) were also included in this analysis (N = 137) for comparative purposes. Our results show that the genetic background of mithun populations was uniform with few possible signs of indicine admixture. In general, observed and expected heterozygosities were quite similar in these two populations. We also observed increased frequencies of small-sized runs of homozygosity (ROH) in the farm population compared to field mithuns. On the other hand, longer ROH were more frequent in field mithuns, which suggests recent founder effects and subsequent genetic drift due to close breeding in farmer herds. This represents the first study providing genetic evidence about the population structure and genomic diversity of Indian mithun. The information generated will be utilized for devising suitable breeding and conservation programme for mithun, an endangered bovine species in India.


Population diversity parameters -Observed H o , expected H e , F IS and ROH. Observed (H o ) and
expected (H e ) heterozygosities estimated in the Indian mithun population (farm and field population) ranged from 0.25 to 0.17, and 0.25 to 0.18, respectively ( Table 2). The inbreeding coefficient estimates (F IS ) based on the observed versus expected number of homozygous genotypes, were found to be 0.06 ± 0.02 and 0.02 ± 0.01 for the farm and field animals ( Table 2). The mean F ST estimate between farm and field mithun populations was 0.03 ± 0.01.

Runs of homozygosity.
Runs of homozygosity (ROH) in the autosomes of 48 farm mithun and 24 field mithun animals were determined using PLINK1.9 19 and consisted of 139,350 SNPs in each of the farm and field data set after quality control.
The total length of ROH per animal was averaged within population in six sized windows: 250 kb-1 Mb Summary statistics of ROH observed are outlined in Table 3 (farm) and 4 (field). Average numbers of ROH per animal for these six categories were 1793.0, 182. 8 the field population has more ROH. Proportions of very long ROH are higher in the farm populations. This suggest an older history that may be shared between the farm and field populations, a period of relative depression of the effective population size for the field population, followed by a period of relatively higher close breeding in the farm population. The largest total length of autosomal ROH observed corresponded to approximately 27% of the genome length for farm and 23% for field animals. The average number of ROH of different lengths in farm and field mithuns are ploted to show their frequency distribution (Fig. 1).
Admixture Analysis. Unsupervised clustering analysis using ADMIXTURE software program were carried out for 10 populations (total 209 samples; Table 1) including one indicine cattle breed, Gir and two taurine breeds (Holstein and NDama) to identify possible indicine and taurine introgression in the mithun population. The analysis was done using 50k markers. The admixture analysis results of farm and field mithuns do not show any population structure (details not presented). The admixture analysis including other species grouped mithuns from farm and field together, and    Phylogenetic analysis using Treemix. Treemix 20 analysis was used to study the population splits and subsequent gene flows. We constructed a phylogenetic tree of all the bovine species without adding any migrations and assuming wild buffalo (OWB) as outgroup using Treemix analysis (Fig. 6). Threepop reveals no evidence of admixture (the smallest test statistics was +2.76). We also analysed to fit 0-5 migrations in Treemix, but there was no evidence of migration (details not presented). While two mithun populations and gaur were in one single clade, the indicine breed, Gir, was in one group, two taurine breeds were in one group, Bali cattle was in a group by itself, and yak and bison formed another group. The taurine breeds Holstein and NDama animals formed a separate group. Gir, an indicine cattle breed, branched out from taurine as expected.

Discussion
India has the largest mithun population (~97.57%) in the world (0.30 million, 19 th Livestock Census, 2012). Myanmar (0.96%, approx. 3000), Bangladesh (0.32%, approx. 1000), China (0.96%, approx. 3000) and Bhutan (0.18%, approx. 570) also home small populations of mithun in Asia 2 . Information regarding genetic diversity and population structure of Indian mithun remains scanty. A conservation programme for mithun has been initiated at Khonoma and Thevopisu villages, Nagaland in their native breeding tracts in India. Genetic improvement in mithun through a systematic breeding program could not be initiated under field conditions due to the small number of animals, uncontrolled mating, and their scattered distribution across remote and inhospitable hilly terrain.
Genetic diversity of two groups of Indian mithun population (farm and field groups) were assessed by expected heterozygosity (H e ), observed heterozygosity (H o ) and estimates of inbreeding (F IS and F ST ). F ST between two populations was low (0.03) indicating a close genetic connection between them 21 . Mithun is assumed to be a close evolutionary relative of gaur 3,22,23 . Hence, the BovineHD BeadChip was used to genotype mithuns and available literature on cattle was referred for comparison in the present study. Genetic diversity in terms of average heterozygosity in both mithun populations (0.17-0.25) was found to be similar to zebu and African taurine cattle 14 , and was in the same range as for taurine cattle from West Africa (from 0.18 for African Lagune to 0.22 for Somba cattle) or East Africa (0.24 for Sheko cattle) 10 . However, SNPs that are included in the analyses are polymorphic within Mithun, but originate from bovine and are thus expected to be old SNPs (polymorphic in the common ancestor of Mithun and Bos taurus) leading to biased allele frequency spectrum.   We used F IS as a measure to study inbreeding within farm and field mithun populations. Estimated F IS values indicated low levels of inbreeding. F IS was lower in the field (0.02) than in the farm population (0.06). We see some evidence of mating between relatives in the farm population, while there is little evidence of mating among close relative in the field population. The F IS values in our study were similar to that reported 24 in mithuns from Bangladesh using the Illumina BovineSNP50 BeadChip, mean expected heterozygosity of 0.148 ± 0.14 with a heterozygote deficiency of 0.06 (F IS ).
Lower heterozygosity values H o and H e of mithun population in our study compared to taurine and indicine cattle breeds as reported by the Bovine HapMap Consortium 12,13 and Korean cattle populations 16 may also be attributed to ascertainment bias in SNP discovery of the BovineHD BeadChip. Previously low heterozygosity estimates of Tunisian cattle populations were likewise attributed to ascertainment bias in the design of the BovineSNP50 BeadChip as polymorphic sites of African origin present in the genome of Tunisian cattle were not included in the chip 14 . The Bovine HapMap Consortium reported that nucleotide diversity in (Brahman) cattle, a crossbred of indicine breeds, is more than twice that observed within Holstein and Angus breeds 25,26 as expected.
Our study revealed a higher number of short and medium ROH (250 kb-1 Mb, 1-2 Mb, 2-4 Mb, and 4-8 Mb) than longer categories (8)(9)(10)(11)(12)(13)(14)(15)(16) Mb and >16 Mb) in both the farm and field mithun population. This reflects ancestors shared between the parents long ago, can be indicative of selective sweep, ancient inbreeding or bottleneck.  The farm population showed greater ROH length and numbers compared to field population, which was as per expectation due to higher level of inbreeding and small population size parental population in the farm. Similar observations in various cattle breeds were reported as short ROH are generally due to older haplotype relatedness, while longer ROH result from more recent inbreeding [27][28][29] .
Length and frequency of ROH provide information about demographic history as well as recent inbreeding in individuals 27,30,31 . Long ROH indicates consanguinity between an individual's parents 27 . Shorter ROH indicate more distant demographic and selective events after repeated fragmentation of chromosome segments by recombination 27 . In particular, recent inbreeding resulting from the mating of closely-related ancestors leads to a high occurrence of long ROH. On the other hand, very long ROH sometimes occur in outbred populations 32 .
In human, short ROH (<1.5 mb) is due to ancient linkage disequilibrium through inheriting parental common haplotypes, whereas, bigger ROH is due the history of related parents in recent times 33 . Level of inbreeding can be estimated accurately in cattle population without knowledge of the pedigree using Illumina BovineHD SNP genotyping assay 34 . F ROH , the genomic inbreeding coefficient, was found to be highly correlated with pedigree inbreeding in a small group of populations 30,34 .
Estimates of mean F ROH for the various categories (250 kb-1Mb, 1 Mb-2 Mb, 2 Mb-4 Mb, 4 Mb-8 Mb, 8 Mb-16 Mb and >16 Mb) in the farm and field mithun population were generally low, probably indicating larger effective population sizes in the past. We observed relatively higher mean F ROH for the 250kb-1Mb category in the farm mithun (0.335). Such short ROH reflect events at least 25 generations ago. With generation interval of approximately 5-10 years in mithun, this translates to selection 100 or 200 years in the past. It seems that probably mithun population has experienced a population bottleneck in the past.
Our results fit well with other genotyping studies in various breeds of cattle using Illumina BovineHD BeadChip and validated through Illumina BovineSNP50 BeadChip, viz. average autozygosity calculated from ROH (F ROH ) with lengths above 1 Mb was in intensely selected Brown Swiss breed (0.151-0.156) 33,35 and in Holstein cattle (0.116) 29 , while lower F ROH for unselected cattle breeds Pinzgauer and Tyrolean Grey domestic cattle (0.062 and 0.066, respectively) 35 , and the lowest in unselected, preserved Polish Red breed (0.057) 29 .
We found Indian mithun studied here constitute a genetically uniform group. Influence from taurine and indicine cattle was comparatively minor. Admixture analysis detected a small proportion of admixture of Indian mithun with indicine or taurine cattle. However, based on Treemix analysis there was no evidence of direct gene flow to Indian mithun from cattle, indicating the local tribal practice of crossing mithun with cattle under the field conditions. This crossing probably was with a cattle population not closely related to the breeds represented in our study. Treemix results indicated there was considerable genetic similarity between Indian mithun and gaur, and consistently placing mithun and gaur in the same clade.

Conclusion
To our knowledge, this is first study aiming to assess the genetic structure of Indian mithun and population diversity using the BovineHD BeadChip SNP array. Our results provide information about the genomic diversity, population structure and origin of Indian mithun inhabiting North Eastern Hilly region. We found evidence of admixture of taurine and indicine in some mithuns, signifying crossing with cattle under field conditions. We also showed that the Indian mithuns are having distinct genetic characteristics and common ancestry with gaur. There was a substantial amount of inbreeding detected as ROH, which has to be considered in future sustainable breeding and conservation programs for the species. Overall introgression from other bovine species into Indian mithun was limited. Our results are consistent with the mithun being domesticated from a population related to the extant gaur. Our results do not support the hypothesis that mithun originated from crossing gaur bulls with indigenous cattle.
Our study provides a comprehensive picture of the genetic structure and population diversity of Indian mithuns and their phylogenetic relationship with other bovine species.

Methods
Study Populations and sampling. The mithuns were selected randomly from a randomly mating population maintained at the ICAR-NRC on Mithun research farm, Medziphema, Nagaland (n = 48) and a diverse field population (n = 24) from four locations of North Eastern Hill Region (Nagaland, Arunachal Pradesh, Manipur and Mizoram) falling in the native mithun breeding tracts of India. All the experiments were performed in accordance with relevant guidelines and regulations approved by the Institutional Animal Ethics Committee, ICAR-NRC on Mithun, Nagaland. BeadChip (777 k). DNA extraction. Blood samples were collected from jugular vein in a vacutainer tube (BD) containing EDTA by a qualified veterinarian and were transported to the laboratory in cool pack as soon as possible. The DNA was isolated using a standard blood DNA isolation kit (Promega #A1620) as per the manufacturer's instructions.

DNA processing and genotyping with BovineHD
Genotyping and Quality Control. Genomic DNA from each mithun was quantified to assure a concentration of at least 50 ng/µl genomic DNA required by the Illumina ® Infinium ® SNP genotyping platform. A total of 72 mithun (39 males and 33 females) were genotyped by Sandor Lifesciences Pvt. Ltd., Banjara Hills, Hyderabad, India with the Illumina ® BovineHD BeadChip assay, following the manufacturer's protocol. Genotypes were called using the validated standard cluster file provided by the manufacturer. Only the autosomal SNPs were considered in this study.
Samples and marker based quality control was performed using the Illumina's Genome Studio software (https://support.illumina.com/array/arraysoftware/genomestudio/documentation.html). Genome Studio was also used to generate PLINK1.9 data files (in.ped and.map format) 19 for further analyses. The SNPs located on sex chromosomes, not polymorphic or without known position in the cattle genome were excluded from further analysis. The Bos taurus genome assembly (UMD3.1; http://www.ensembl.org/Bos_taurus/Info/Index) was used as a reference genome due to the absence of a published mithun genome.
To identify closely related individual, a pair-wise identity-by-state (IBS) distance analysis was performed using PLINK1.9 19 . No closely related individuals were detected, based on a significance test criterion of whether two individuals belong to the same population (i.e. do not merge clusters that contain significantly different individuals). SNP markers showing deviation from Hardy-Weinberg proportions (HWP) based on parental genotype data (p < 0.001) were removed. The attributes considered for quality control finally include filtering of SNPs with call rate ≥95%, MAF ≥5% and HWP ≥0.001 using PLINK1.9 software. After this filtration 139,350 polymorphic SNPs remained for analysis.
Population genetics analysis. The parameters estimated to study the genomic diversity in mithun populations included estimates of heterozygosity, detection of runs of homozygosity (ROH) and estimates of inbreeding coefficients.
The observed (H o ) and expected (H e ) heterozygosities, as well as the estimates of inbreeding for mithun populations were estimated using PLINK1.9 software 19 . The F ST was estimated using PLINK software, which uses method introduced by Weir and Cockerham (1984) 36 .
Detection of autozygosity in genomic region. Runs of homozygosity (ROH) were detected to determine the extent of autozygosity in the mithun populations. A ROH is defined as a contiguous length of homozygous genotypes. A ROH of sufficient length indicates the two copies of the chromosome in this region are identical-by-descent (IBD) 30 . The length and frequency of ROH is useful to get insights into the history of inbreeding of an individual and the population. PLINK1.9 19 was used to identify ROH by running a sliding window that scans the genomic distribution of SNP data to identify stretches of homozygous SNPs. A minimum number of 10 consecutive homozygous SNP and zero heterozygotes were allowed in each window. A maximum gap between SNP of 1,000 kb was allowed. With high-density SNP data this approach mostly detects truly autozygous segments 37 . Moreover, this strategy is particularly suitable for livestock populations because they have much higher levels of autozygosity than model organisms making identification of longer ROH easy 35,37 .
Detected ROH were classified in windows: ROH in the range of 250 kb- In the present study, genomic autozygosity, F ROH of each individual was estimated as the sum of length of autosomal ROH divided by the total length of the autosomes covered by markers 34 .
Population structure and origin of Mithun. The main goal of the present study was genomic characterization of two mithun populations (farm and field). Another eight Illumina ® BovineHD BeadChip or BovineSNP50 BeadChip genotype data sets were collected from previously published work, available at [http://widde.toulouse. inra.fr/widde/] 38 Table 1). The wild gaur (Bos gaurus, OGR, n = 6) was included assuming contribution of gaur to mithun genomes as its presumed wild relative; while yak (Bos grunniens, OYK, n = 4), bison (Bison bison, OBB, n = 4) and Bali cattle (Bos javanicus, BLI, n = 20) were taken into account to find out possible genetic contribution towards mithun genome, if any. As it is assumed that there might be some introgression due to crossing between mithun and cattle under the field conditions, the indicine cattle, Gir (Bos indicus) genotype data was included. Taurine cattle (NDama, NDA, n = 23; Holstein, HOL, n = 20) were included to identify any taurine introgression and anoa, the wild buffalo (Bubalus depressicornis, OWB, n = 10) was included as an outgroup in the present study.
Genotype data from a total of 10 populations were included in subsequent analysis (Table 1). Mithun genotype data were merged with the other data into a single dataset by retaining markers shared between the HD and the 50k panels. The final data included total 209 samples and 38,587 SNPs. Markers and individuals were removed from the dataset using PLINK1.9 software 19 if they did not have call rate of at least 95%. No filtering was done at this step for MAF, except SNPs monomorphic across all breeds. Summary statistics such as heterozygosity and inbreeding coefficient were computed and results are presented ( Table 2). For principal component analysis, the dataset included total 144 samples keeping the maximum number of individuals set to 20 (Table 1). Population structure was inferred in two ways: principal component analyses (PCA) by smartpca from the Eigenstrat package 39 and ADMIXTURE v. 1.23 18 . PCA uses an orthogonal transformation to convert a set of correlated variables into a set of linearly uncorrelated variables called principal components and maps individuals onto these major axes of variation. In contrast, ADMIXTURE provides maximum likelihood estimates of individual ancestries.
Phylogenetic analysis was carried out using Treemix 20 and wild buffalo as an outgroup with between 0 and 5 migrations allowed. For Treemix analysis we used 21,939 SNPs polymorphic in mithun out of 38,587 SNPs used in PCA and ADMIXTURE analysis. 18 samples (3 Bali cattle, 9 mithuns from farm and 6 mithuns from field) showing admixture were removed from the Treemix analysis. Nodes robustness was estimated with 500 bootstrap replicates and plotted using the Treemix bootstrap function of BITE 40 . Admixture detection was done using the F3 statistics computed using threepop program from the Treemix software package.

Ethics approval and consent of mithun owners. Approval of Institutional Animal Ethics Committee
was obtained for this study including collection of mithun blood samples and DNA extraction from the Institute research farm. Blood samples of the mithun were collected from the farm and with the permission of the mithun owners from the field by qualified veterinarians. Data availability. Data supporting this paper were generated by ICAR-NRC on Mithun. The phenotype and genotype data are available with the Data Cell of the Institute and should be requested directly from the corresponding authors or the ICAR-NRC on Mithun.