Conservation priorities for endangered Indian tigers through a genomic lens

Tigers have lost 93% of their historical range worldwide. India plays a vital role in the conservation of tigers since nearly 60% of all wild tigers are currently found here. However, as protected areas are small (<300 km2 on average), with only a few individuals in each, many of them may not be independently viable. It is thus important to identify and conserve genetically connected populations, as well as to maintain connectivity within them. We collected samples from wild tigers (Panthera tigris tigris) across India and used genome-wide SNPs to infer genetic connectivity. We genotyped 10,184 SNPs from 38 individuals across 17 protected areas and identified three genetically distinct clusters (corresponding to northwest, southern and central India). The northwest cluster was isolated with low variation and high relatedness. The geographically large central cluster included tigers from central, northeastern and northern India, and had the highest variation. Most genetic diversity (62%) was shared among clusters, while unique variation was highest in the central cluster (8.5%) and lowest in the northwestern one (2%). We did not detect signatures of differential selection or local adaptation. We highlight that the northwest population requires conservation attention to ensure persistence of these tigers.


I) SNP calling and selecting samples for analysis
The total number of samples (tissue/ blood) obtained for the study was 54 (table S1). Depth was initially calculated in the program Stacks 1 from the log file generated at the end of the ref_map.pl script. For each sample, mean coverage depth value was extracted from this file, and has been plotted in figure S1. The number of unique loci identified per sample has been plotted in figure S2. Here, four samples (Corbett -1, Sunderbans, Melghat and NSTR) did not have any reads matching the catalog (in Stacks) and hence were not plotted. For the SNPs called in Stacks, relatively few SNPs passed the minimum depth criteria for the allowed level of missing data (depth=5, missing data allowed=30%). Therefore, we used Some of the samples discarded from our dataset were single representatives from their protected areas. We attempted to retain these samples with low data (very few reads mapping to the tiger genome), for a second datasetretaining more individuals at a few SNPs common across the samples. To do this, we identified regions present in each sample that were in common with SNPs in vcf file from the samples finally used. This vcf file was called allowing for 50% missing data to capture more SNPs. Bedtools was used to identify regions in common with the vcf file for each of these samples (Table S2). The bedfiles were then compared to look for overlap between the samples. However, with the exception of one region that was found to be common between 2 samples, there were no overlaps among the samples with low data.
Since no common regions could be identified, regions identified for each sample were combined into a single bedfile, and these regions were extracted from the larger vcf file containing all the samples. However, this led to a vcf file having over 11,000 SNPs at 20% missing data allowed per SNPs. The missing data for the samples with low data however was extremely high. Hence we were unable to incorporate these samples into the analysis.   Low data Sunderbans_1 Low data Sunderbans_2 Low data Bandhavgarh_1 Low data Chandrapur Low data Kanha_2 Low data Melghat Low data Nagpur Aligns with N cluster, even though geographically part of CI. However, protected area of origin unverifiable. Pench_4 Low data Tadoba Low data NSTR Low data BRT Low data Wayanad_1 Low data Wayanad_6 Low data

II) Estimation of summary statistics (pair-wise relatedness, inbreeding and He) in each population
Relatedness and inbreeding were estimated using the software PLINK 4. The p^ estimator from PLINK indicated that the samples included a recapture (p^ ~ 0.9). Therefore, one individual from this pair (Ranthambore_2) was removed from all analyses. Pairwise relatedness is plotted as a heat map in figure S4. As can be seen, pairs in Ranthambore have higher relatedness. Inbreeding coefficients for individuals were grouped into 5 regions and have been plotted as a boxplot in figure S5. While the inbreeding coefficient is highly variable within each group, Ranthambore (NW) shows overall high values. Expected heterozygosity for a five cluster scenario is presented in Table S6.

IV) Isolation by distance
The presence of Isolation by distance was tested using a Mantel test in Adegenet 6 . As the relationship did not appear to be linear at all distance classes, a Mantel's correlogram was performed. Figure S6: Correlation of genetic distance (DPS) with geographic distance at different size classes of geographic distance. On the X axis, size classes of geographic distance are shown; Mantel correlation is given on the Y axis.

VI) Private allele richness analysis repeated assuming five clusters
When considering five clusters, resampling could only be done up to 6 times as the sample size from NE was a limiting factor. While estimates of private allele richness may not have reached an asymptote for all populations, NW consistently has lower richness. Central India was also tested separately to test if the variation had been completely sampled ( Figure  S10).

VII) Distribution of strength of association in the network
Community detection in network analysis (NetStruct 8 ) assigns each individual to a community. However, individuals may differ in their affinities to the assigned community, reflecting similarity to other clusters. In figure 3a (i), two central Indian tigers are assigned to the NW cluster. However, it can be seen in figure S11 (i) that the strength of association of these individuals to the NW cluster is very low. Similarly, the N and CI clusters at higher thresholds have individuals which have low affinity to the assigned cluster. This likely reflects hierarchical structure, as at lower thresholds, N, NE and CI are assigned to a single C cluster. Figure S11: Strength of Association Distribution for each inferred cluster (NetStruct). Cluster numbers are shown on the X axis and association to the cluster on the Y axis. Colours have been matched to figure 3 for clarity. The genetic similarity thresholds at which community detection was performed in each instance are depicted above each panel and correspond to the networks in Figure 3a.

VIII) Genetic differentiation in peninsular India examined with the inclusion of a sample from Nagarjunsagar Srisailam Tiger Reserve (NSTR)
As discussed in the main text, our data suggests high differentiation between central and southern India. However, the only sample from NSTR, a population straddling the gap between central and southern India, could not be included in the analysis as it had very few reads remaining after filtering. From the final set of SNPs, the NSTR sample had only 88 SNPs. Admixture 9 analysis with NSTR included in the data still supports three clusters -NW, C and SI. NSTR derives nearly equal proportions of ancestry from C and SI (figure S12). However, as NSTR has a very large proportion of missing data, this is far from conclusive. We also repeated the analysis with only these 88 SNPs. In this case, K = 1 had the highest support, indicating that 88 SNPs have low power to detect structure.
Wayanad_4 Satyamangalam Figure S14: PCA of climate and vegetation data for all points. Red points represent regular points across India that fall outside protected areas. Names of tiger reserves are specified on the plot enclosed in rectangles.

X) Population subdivision re-examined after removal of related individuals
As mentioned in the main text, it is difficult to sample wild tigers for tissue and our sampling is opportunistic. This has resulted in the inclusion of some related individuals in the data (Fig. S4). It also revealed the presence of a recapture which was then discarded from the data. While removing all related individuals for all analyses would have reduced the sample size quite a bit, we re-assessed population structure using just the unrelated individuals (p^ < 0.2) -23 samples. For this reduced sample size, Admixture was unable to detect any structure. Therefore, we used FastStructure 10 for this purpose. We also analyzed the larger dataset with this algorithm to ensure that the results are comparable to that obtained from Admixture. Both the figures are shown below. For both analyses, FastStructure identified the model complexity that maximizes the marginal likelihood to be 4. For the larger dataset, K=3 was identified as the most likely value of K to describe the data, whereas 4 was the most likely value of K with FastStructure. However, the admixture proportions for each individual at K=4 were similar in both analysis. Figure S17: Genetic clusters inferred at K=4 through FastStructure. All samples from the main study are included here. Figure S18: Genetic clusters inferred at K=4 through FastStructure for only unrelated individuals (p^ < 0.2).
XII) Inference of demographic history using a site frequency spectrum-based method -DADI.
To derive the site frequency spectrum, SNP calls can directly be used. However, in the case of low coverage data such as ours, it is better to use methods that derive the site frequency spectrum (SFS) from genotype likelihoods 11 . Therefore, we first used the programs ANGSD 12 and realSFS to filter the bam files and obtain the site allele frequency likelihoods. These are used to estimate the EM optimized folded SFS. This was imported it into the dadi 13 environment. We began with single population models to test for population size change in the past. However, none of the models could fit the data well and parameter estimates did not converge and would mostly hit against the specified bounds. A possible reason could be low sample sizes of the populations in question. A simulation-based study suggests that detecting recent bottlenecks would require more than 10 individuals per population 14 . It is possible that given the very recent bottleneck experienced by tigers, it would require a larger dataset than ours to examine the demographic history of these populations. Below is the fit of the data to a 2 epoch model from our dadi runs (Fig. S19). As can be seen from the figure, the sfs shows a huge deficit of low frequency alleles. The fit of the data to other one population models was similar.