Although edge-tolerant species are known to benefit from habitat fragmentation, less is known about the population genetic impacts fragmentation may exert on edge-tolerant species. We examined the landscape genomic structure of an edge-tolerant forest-dependent bird species, the Striped Tit-Babbler Mixornis gularis, in the heavily urbanized island of Singapore to determine if two centuries of fragmentation have led to signs of isolation and loss of population-genetic diversity in different parts of the island. We obtained a high-quality complete reference genome with 78x coverage. Using almost 4000 SNPs from double-digest RAD-Sequencing across 46 individuals, we found that the population has likely experienced a recent contraction in effective population size and presently exhibits low population genetic diversity. Using empirical and simulation-based landscape genomic analyses, we also found that the subtle population genetic structure observed in the Striped Tit-Babbler population in Singapore is likely driven by isolation by distance resulting from limited dispersal. Our results demonstrate that population genetic impoverishment and subdivision can accumulate at relatively rapid rates in edge-tolerant bird species such as the Striped Tit-Babbler as a result of fragmentation, and that subtle spatial genetic structure can be detected over fine spatial and temporal scales using relatively few multilocus genomic SNPs.
Anthropogenic habitat fragmentation is a key driver of biodiversity loss worldwide1,2,3. Organisms living in fragmented landscapes experience reduced gene flow between remnant habitat patches4, which may result in reduced genetic diversity, increased likelihood of inbreeding depression, higher susceptibility to stochastic environmental change, and an elevated risk of localised extirpation4.
The time lag between physical fragmentation and the manifestation of deleterious effects and localised extirpation results in fragmented landscapes incurring an extinction debt that may only be realised after decades3,5,6,7,8. Sensitive species often go rapidly extinct after initial habitat loss, followed by the gradual decline of less sensitive species due to the effects of isolation and other ecological factors9,10,11,12. Assessing and predicting fragmentation impacts is therefore challenging as organisms respond differentially depending on their life history traits, as well as the spatial and temporal scale of fragmentation.
While edge-tolerant species are assumed to be well-adapted to habitat fragmentation – many studies show a positive correlation between the abundance of edge-tolerant species and fragmentation effects13,14,15,16 – few have explored the population genetic impacts of fragmentation on these ostensibly resilient species, especially at fine spatial and temporal scales (see Harrisson et al.17 for an exception). The time lag between fragmentation and the detection of its effects, compounded by the relative abundance of edge-tolerant species, may result in (1) allele frequency-based efforts using few marker loci failing to detect subtle signals of isolation and genetic divergence at shallow temporal and spatial scales and (2) edge-tolerant species being overlooked in long-term conservation plans.
In this study, we combine RAD-Sequencing18 with whole-genome sequencing and individual-based landscape genomic approaches19,20 to investigate the impact of fragmentation on the population genetic structure of an abundant edge-tolerant passerine, the Striped Tit-Babbler (Mixornis gularis), in Singapore. A highly social insectivore, the Striped Tit-Babbler is a widespread resident of dense scrub and secondary woodland habitats across Southeast Asia21. Its affinity for disturbed and degraded habitats means that the species is abundant across its range and is not considered a species of conservation concern22. Although little is known about the species’ breeding or dispersal ecology, it is thought that the Striped Tit-Babbler breeds cooperatively23, and like other babblers (family Timaliidae) is a weak disperser on account of its short wings and sedentary habits24, which should increase the species’ susceptibility to fragmentation. The city-state of Singapore provides an ideal landscape for investigating this owing to its history of intensive habitat fragmentation spanning approximately 200 years, with agriculture-driven fragmentation dominating for the first 150 years, followed by urbanisation-driven secondary fragmentation25. This has given rise to a heterogeneous landscape consisting primarily of a heavily streetscaped urban matrix with young and maturing secondary forest fragments interspersed throughout. As one of the few woodland-dependent songbird species to have maintained healthy population levels in Singapore (4,000 to 10,000 individuals, mean estimated population density of 0.94 individuals ha−1 forest26 (Fig. S19, Supplementary Information)) in spite of extensive habitat loss and fragmentation, the Striped Tit-Babbler has likely benefited from the forest edges and secondary forests created by early fragmentation. However, it is not known whether the Striped Tit-Babbler population in Singapore constitutes multiple isolated subpopulations or a single metapopulation.
Using thousands of genome-wide SNPs and a newly-sequenced genome, we analyzed 46 Striped Tit-Babblers sampled from forest patches across Singapore’s north-south axis (Fig. 1) to characterize the population-genetic effects of recent secondary fragmentation in a mosaic of patches of different size and age. We used approximate Bayesian computation to assess the demographic history and effective population size (Ne) of the Striped Tit-Babbler population in Singapore, and calculated individual and population-based divergence statistics to investigate the degree of genetic subdivision within the population. These results were compared against models of landscape structure to assess how landscape configuration affects genetic connectivity between forest fragments. To test the robustness of our SNP dataset and the informativeness of multilocus SNPs, we also assessed if varying the number of SNPs used would affect downstream population genetic inferences.
We ringed a total of 66 Striped Tit-Babblers across 19 unique sampling localities. Accounting for historical ringing records (unpublished data, National Parks Board of Singapore), 26 individuals were recaptured across 37 recapture events between the years 2010 and 2014. All recaptures occurred at the original locality of capture, suggesting a highly sedentary lifestyle. One recapture occurred eight years after the original ringing, suggesting that the species is relatively long-lived in the wild.
After filtering out low quality and clonally duplicated reads, a total of 86 Gb of DNA (78x coverage) was obtained for de novo assembly of the Striped Tit-Babbler genome, and assembly quality and completeness were assessed for each assembly by checking read pair coverage and supporting evidence (Table 1). Based on standard contiguity metrics, it is clear that the ALLPATHS-LG assembly outperforms the other two assemblies, with an N50 of 3 Mb and producing 9542 scaffolds, thereby producing the fewest but longest sequences (Table 1). In addition, to evaluate the assembly correctness, we used FRCurves to plot regions of suspected mis-assemblies (features) against the coverage depth (Fig. S7, Supplementary Information). In this instance, the ALLPATHS-LG assembly also presents the best performance, with the FRCurve indicating better genome coverage with fewer suspect errors introduced relative to the ABySS and SOAPdenovo assemblies. This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession QVAJ00000000. The version described in this paper is version QVAJ01000000.
We obtained 386,020,580 paired-end Illumina reads of 100 base-pair length each across 47 individuals, of which 325,488,846 reads (84.3%) were retained after quality control, filtering, and trimming. The number of retained reads per individual ranged from 4,225,764 to 14,316,443, and applying further quality control and aligning these reads to the reference genome resulted in 3,017,218 to 9,898,116 reads per individual successfully mapping to the reference genome. Assembling mapped reads into loci, calling SNPs, filtering for no missing data, and filtering for linkage disequilibrium resulted in an output SNP matrix containing 3849 loci. Bayescan did not detect any loci under selection, and while some SNP loci may nonetheless be closely linked to loci with fitness effects, we assumed that all loci were neutral absent better methods for testing this. Filtering for half-sibs resulted in 11 individuals being pruned, giving a reduced SNP matrix comprised of 35 individuals. Comparing inbreeding coefficients showed that most individuals exhibit relatively low levels of inbreeding (Table S2; Fig. S9, Supplementary Information; mean TrioML coefficient = 0.0176), although individuals from the Admiralty Park subpopulation appear to be significantly more inbred (mean TrioML coefficient = 0.101), with one individual exhibiting an extreme TrioML coefficient of 0.160 (Fig. S9; Table S2, Supplementary Information).
Population Genetic Structure and Genetic Diversity Statistics
The first three PC axes (comprising 4.00%, 3.78%, and 3.72% of the total variance, respectively) indicate that most of the Striped Tit-Babblers sampled fall into a single cluster with little to no substructure (Fig. 2). However, the PCA plot also indicates that all the individuals sampled from Admiralty Park fall out of the central cloud as outliers (Fig. 2). Based on the results of the PCA, we subsequently merged the individuals sampled from Sentosa Island with the Southern Ridges in to a single “Southern” subpopulation (Fig. 1), due to the Sentosa individuals clustering together with the individuals sampled from the Southern Ridges (Fig. 2B).
The AMOVA results show that 97.1% of the genetic variation was contained within individuals, while 2.3% of genetic variation occurred among subpopulations (FST = 0.023, p = 0.001), indicative of low levels of population genetic subdivision. Within-subpopulation inbreeding was barely statistically significant, and accounted for only 0.6% of the total genetic variation (FIS = 0.006, p-value = 0.045), suggesting relatively low levels of non-random mating within subpopulations.
Using the SNP matrix containing all 46 individuals, the overall mean nucleotide diversity (π) for polymorphic loci was 0.2311, with within-subpopulation π values ranging from 0.2093 (Admiralty) to 0.2306 (Central Catchment) (Table 2). The overall observed heterozygosity value (Hobs) was 0.2269, with within-subpopulation values ranging from 0.2096 (Admiralty) to 0.2294 (Central Catchment) (Table 2). For both fixed and polymorphic loci, πoverall was 0.0026, with within-subpopulation values ranging from 0.0023 (Admiralty) to 0.0026 (Central Catchment), while Hobs was 0.0025 with a within-subpopulation range of 0.0023 (Admiralty) to 0.0025 (Central Catchment and Southern) (Table 2). The similarity of within-subpopulation π and Hobs values across subpopulations suggests that Striped Tit-Babbler subpopulations in Singapore do not differ significantly from one another, although the Admiralty Park subpopulation appears to consistently exhibit the lowest levels of genetic diversity. We obtained similar π and Hobs values for both the kin-filtered and unfiltered analyses, suggesting that π and Hobs are both largely unaffected by filtering for relatedness (Table 2).
Interestingly, the mean Admiralty Park TrioML inbreeding coefficient (Fig. S9; Table S2, Supplementary Information) is consistent with Hobs (Table 2). Since the mean genome-wide heterozygosity of a population (H) and realised mean level of inbreeding (F) can be related in the equation H = H0 (1 − F), assuming the Central Catchment subpopulation to be H0 on account of it being the most genetically diverse subpopulation in Singapore, solving for F for both the unfiltered and kin-filtered datasets gives 0.09 and 0.10 respectively, both of which are close to the empirically derived TrioML mean inbreeding coefficient of 0.101 (Table 2; Fig. S9; Table S2, Supplementary Information).
Pairwise FST values calculated using the full (n = 46) SNP matrix ranged from 0.0124 (FST-CentralCatchment-Southern, 95% CI: 0.0105–0.0144) to 0.0606 (FST-Southern-Admiralty, 95% CI: 0.0525–0.0688), with relatively low pairwise FST values between the Central Catchment and the Southern subpopulation indicative of overall low population subdivision between the two largest forest patches along the north-south axis of Singapore (Table 3; Fig. S20, Supplementary Information). In contrast, moderately high (>0.05) pairwise FST values between Admiralty Park and all other forest patches suggest increased population subdivision between Admiralty Park and any other sampled forest patch. Filtering the SNP matrix for kin resulted in lower absolute FST values, ranging from 0.0073 (FST-CentralCatchment-Southern, 95% CI: 0.0052–0.0094) to 0.0261 (FST-Southern-Admiralty, 95% CI: 0.0169–0.0352), although relative pairwise differences between subpopulations remained largely the same (Table 3).
Recalculating population genetic statistics, pairwise population differentiation and mean inbreeding coefficients using varying numbers of randomly subsampled loci, we found that the Hobs, π, pairwise FST, and TrioML inbreeding coefficients were precise for 500 or more SNP loci, even for sample sizes as small as three individuals (Figs. S10–S13, Supplementary Information).
Analyses of Gene Flow
We identified two first-generation migrants: one individual from Sentosa island (sample K1120) within the Southern subpopulation and one from Admiralty Park (sample K1104), both of which were inferred to have originated from the Central Catchment. This pattern is indicative of recent gene flow from the Central Catchment toward peripheral subpopulations.
Demographic History of the Striped Tit-Babbler
PCA for pre-evaluation of scenarios revealed that the observed data fall within the prior space of the contraction scenario (Scenario 2; Fig. S14). Model comparison with both rejection and logistic regression revealed overwhelming evidence of recent population decline (Fig. S15, Supplementary Information). Further posterior predictive checks using default parameters revealed that the population decline scenario has no associated error, further adding confidence to our scenario choice (Table S4, Supplementary Information). Plots of prior versus posterior revealed that for the best scenario, all posterior sampling of parameters fell within the prior space (Fig. S16, Supplementary Information). Although the confidence intervals are high, median parameter estimates suggest that the present Ne is three orders of magnitude lower than the ancestral Ne (approximately 544 effective contemporary individuals compared with 107,000 effective ancestral individuals), and that the population has suffered a recent decline approximately half a century ago (Table 4).
Independently estimating Ne from the observed data using the linkage disequilibrium method, the full (n = 46) dataset results in a population-wide Ne of 101.5 (95% CIs: 100.8–102.2). Using the kin-filtered (n = 35) dataset results in a population-wide Ne of 596.1 (95% CIs: 569.4–625.4).
Landscape Genetic Analyses
Spatial autocorrelation analysis indicated that the Striped Tit-Babbler exhibits significantly positive spatial autocorrelation for the first distance class (0–1 km) and significantly negative spatial autocorrelation for the fourth, and the seventh to twelfth distance classes (Fig. 3), consistent with observational inferences about the poor dispersal ability of the Striped Tit-Babbler24. Spatial autocorrelation in the second and third distance classes did not differ significantly from zero (Fig. 3). The spatial autocorrelation analysis remained largely invariant to the exclusion of the Admiralty Park individuals (Fig. 3B).
Optimising landscape parameters showed that the optimised model with urban resistance set to 90 and managed vegetation resistance set at 50 (R2adj = 0.0343) outperforms the preliminary model, with resistance values of 60 and 40 respectively (R2adj = 0.0231). The proportion of genetic variation explained by spatial factors for the optimised resistance model (R2adj = 0.0344, p = 0.000001) was higher than the Euclidean model (R2adj = 0.0275, p = 0.000001) and preliminary resistance model (R2adj = 0.0331, p = 0.000001), indicating that the optimised landscape resistance model provides a marginally better explanation for the spatial genetic signal observed in the Striped Tit-Babblers (Table S8, Supplementary Information). While this suggests that an IBD + IBR model best explains the landscape genetic structure of Striped Tit-Babblers, the model is closely correlated with the null IBD model (R2 = 0.8009, P = 0.0001), which may inflate the likelihood of type I error.
Landscape Genetic Simulations
We discarded seven of 100 replicate MCMC landscape genetic simulations due to stochastic extinctions of the Sentosa individuals. We observed an overall increase in FST for all pairwise subpopulation comparisons over time (Fig. 4), although the Admiralty Park subpopulation exhibits a higher accumulation rate of pairwise differentiation (Fig. 4).
Testing the CDPOP simulations for Type I errors, we detected false positive signals of IBD + IBR in 73.9% of the simulated IBD-only genotypes at generation 10, as well as in 93.97% of the simulated genotypes at generation 20, 98.9% of the simulated genotypes at generation 30, 95.6% of the simulated genotypes at generation 40, and 97.8% of the simulated genotypes at generation 50.
Based on the first genome assembly of the Striped Tit-Babbler Mixornis gularis, combined with population-genomic analyses and landscape genetic simulations from 46 individuals across Singapore, we detected a pattern of reduced Ne and subtle but noticeable IBD-driven population subdivision along Singapore’s North-South axis. That these patterns were detected at fine spatial and temporal scales illustrates the utility of genome-wide multilocus SNPs in illuminating the genetic impacts of habitat fragmentation prior to the manifestation of deleterious physical effects.
Coalescent simulations strongly indicate that the Striped Tit-Babbler population has experienced a recent contraction in Ne across Singapore, with median parameter estimates suggesting a decline by three orders of magnitude approximately 46 years ago. While there is uncertainty over the precision of these estimates, the estimated time of population contraction coincides with a period of intensive urbanisation in Singapore, and the reduction in habitat extent during this period was likely a major contributor to the contraction in Ne.
The population-wide Ne estimate of ~596 derived from the kin-filtered dataset using the linkage-disequilibrium method is consistent with the DIYABC-derived contemporary population-wide median Ne estimate of 544. The results indicate that contemporary Ne, and by extension overall genetic variability, is low relative to the census population size, with a conservatively estimated Ne/N ratio of 0.15 (assuming Ne = 596, and census population size = 4,000 (Fig. S19, Supplementary Information)26. Although the Ne/N ratio is expected to be low in birds due to unequal sex ratios27, especially in cooperative breeders, the Striped Tit-Babbler population in Singapore nonetheless exhibits an Ne/N ratio 50% less than that observed in other similarly sedentary and cooperatively breeding birds such as the Splendid Fairy-wren (Malurus splendens) (Ne/N = 0.3)28 and Darwin’s Medium Ground Finch (Geospiza fortis) (Ne/N = 0.31)27,29, suggesting that the Striped Tit-Babbler population is likely more susceptible to the impacts of fragmentation than other species with similar life histories.
AMOVA and principal component analyses (Fig. 2) indicate that most of the sampled individuals fall into a single cluster with little variation between individuals, pointing to the presence of at least modest levels of recent gene flow (as inferred from the presence of first-generation migrants) and an overall shallow population genetic structure30,31. However, significant pairwise FST values between subpopulations and reduced Hobs and π values in the Admiralty Park subpopulation points toward the existence of some population genetic subdivision between the forest patches along the north-south axis of Singapore.
PCA results (Fig. 2) show that the individuals sampled from Admiralty Park emerge as outliers despite strict filtering conditions to account for low coverage loci, kinship bias, linkage disequilibrium, and loci under selection. Additionally, significantly higher overall TrioML inbreeding coefficients of the Admiralty Park samples (Fig. S9, Supplementary Information), consistent with Hobs, as well as the presence of one individual exhibiting an inbreeding coefficient one order of magnitude higher than the mean Singapore-wide inbreeding coefficient (Table S2, Supplementary Information), provide evidence of a subpopulation in relative isolation.
We consider it unlikely that the signal of population subdivision observed in Admiralty Park may be an artifact of low sampling coverage or biased by PCR clones32,33: our study employs thousands of marker loci, and datasets of this magnitude have been shown to produce accurate FST estimates with sample sizes as low as two individuals34; furthermore, we sampled approximately 4–6% of the total Admiralty Park subpopulation26 (Fig. S19, Supplementary Information), and conducted triplicate PCR reactions to minimise the biasing effect of PCR clones.
In addition, Admiralty Park individuals consistently emerge as outliers on the PCA plot even when the highly inbred individual is replaced with its less inbred kin (sample K1105; Table S2, Supplementary Information) for the same set of SNP loci (Fig. 2B), strengthening confidence in our data.
The low genetic diversity and comparatively high subdivision of Admiralty Park from the Central Catchment is surprising owing to their close proximity (minimum Euclidean distance of approximately 2.25 km) and the relatively recent isolation of Admiralty Park (approximately 20–40 years based on satellite imagery; Fig. S18, Supplementary Information). The heterogeneity of population genetic structure observed in the Striped Tit-Babbler population in Singapore indicates that spatial context plays a significant role in affecting inter-patch gene flow.
We find that the Striped Tit-Babbler population exhibits significant positive spatial autocorrelation over short distances (<1 km), even after the potentially aberrant Admiralty Park individuals are excluded (Fig. 3B). While empirical data on natal dispersal distances of tit-babblers are limited, this result is similar to that of the relatively closely related Abbott’s Wren Babbler (Turdinus abbotti)35 obtained from radio telemetry methods (200–700 m)36. Limited dispersal and highly sedentary behaviour in this species (inferred from recapture data) likely contributes to the statistically significant signal of IBD observed in the dbMEM analysis. Simulating the effects of IBD using forward-in-time landscape genetic simulations further shows that a pattern of relatively stronger population differentiation from the Central Catchment emerges in the Admiralty Park subpopulation within relatively few generations compared to the Southern subpopulation (Fig. 4), consistent with empirically observed population genetic structure (Table 3). Our landscape genetic analyses strongly suggest that IBD is a primary driver of subtle population genetic structure in the Striped Tit-Babbler in Singapore.
In Admiralty Park, limited dispersal (Fig. 3), small local Ne (estimated at 10 effectively breeding individuals based on a conservative Ne/N ratio of 0.15), and the paucity of forest patches within dispersing distance of other proximate source forests (Fig. 1) have likely resulted in the rapid accumulation of population subdivision over relatively few generations. In contrast, the weaker population genetic structure observed in the south may be attributable to the existence of “stepping stone” habitat patches (Fig. 1).
As for the IBR signal observed in the dbMEM analysis, the high proportion of IBD-only landscape genetic simulations showing false positive signals of IBD + IBR suggests that the sampling strategy adopted in this study lacks sufficient power for discriminating between IBD and IBR, although IBR may still apply in our study system. This outcome highlights the importance of rigorous sampling design in landscape genetic studies37.
Our observation that a minimum of 500 to 1,000 loci are needed for precise estimates of most population genetic statistics (Figs S10–S13, Supplementary Information) suggests that relatively few loci are needed to derive precise estimates of population genetic statistics. This is consistent with the findings of Kardos et al.38 and Nazareno et al.34, who find that using large numbers of loci does not necessarily improve the statistical power of population genetic analyses34,38.
While the Striped Tit-Babbler population in Singapore is not presently threatened with extirpation39, an examination of genome-wide SNPs reveals that the species has experienced recent decline in Ne, has relatively low population-genetic diversity, and exhibits weak population genetic structure consistent with the effects of limited dispersal, suggesting that the population is susceptible to the effects of habitat fragmentation. Detecting these population-genetic signals in an abundant and edge-tolerant species at such fine spatial and temporal scales likely reflects the initial stages of fragmentation usually detected only in more sensitive species or at later stages in the local extirpation process. Other relatively edge-tolerant and abundant forest-dwelling species, avian or otherwise, may likewise be shown to experience such fragmentation effects once genome-wide data are applied. Our results are especially significant in the context of Southeast Asian biodiversity conservation owing to the rate at which forest habitats are being degraded and fragmented across the region40.
Our study on Striped Tit-Babblers in Singapore applies ddRAD-Seq and individual-based landscape genomic techniques to elucidate spatial genetic structure in an overlooked edge-tolerant forest-dwelling avian species across heavily fragmented tropical forest patches. We demonstrate that extremely fine-scale population genetic structure on the order of 2.3% of molecular variance can be detected using thousands of genome-wide marker loci. We hope that further investigations in this vein will lead to a better understanding of the fine-scale spatial effects of fragmentation and hopefully lead to solutions to ameliorate these impacts.
We conducted mist netting at forest fragments across Singapore between May 2013 and September 2014 (Fig. 1). We collected blood samples via brachial venipuncture, subsequently stored at 4 °C. In addition, all mist-netted birds were uniquely ringed for recapture studies.
Additional DNA samples were obtained from muscle and liver tissues stored at the Lee Kong Chian Natural History Museum and from the carcass collection of the NUS Avian Evolution Laboratory. A total of six tissue samples were obtained, five from the cryogenic collection dating to October 2006 and one from the carcass collection from March 2014.
RAD-Seq Library Preparation
We extracted DNA using the Exgene Clinic SV kit (GeneAll Biotechnology) per the manufacturer’s protocol for blood and body fluid DNA extraction, with minor modifications for samples stored in 100% ethanol. For muscle and liver tissue samples, we extracted DNA as per the Animal Tissue protocol for the Exgene Clinic SV kit. Extracted DNA samples were eluted into molecular-grade water and stored at −20 °C.
We prepared double digest RAD-Seq libraries for each sample based on a modified FASSST protocol developed by Tay et al.41 and Tin et al.33, using combinatorial barcodes derived from Peterson et al.18 and the restriction enzymes EcoRI (NEB) and MspI (NEB). We conducted triplicate PCR reactions per sample to reduce the likelihood of PCR bias highlighted by Tin et al.33 and to maximise the yield of adapter-ligated fragments. We produced 47 successful double digest RAD-Seq libraries, inclusive of one replicate specimen, pooled in equimolar volumes. Pooled libraries were sequenced on one Illumina HiSeq2000 lane at BGI Shenzhen, producing 100 bp paired-end reads.
Whole Genome Sequencing and Assembly
Genomic DNA was extracted from fresh tissue from one Striped Tit-Babbler individual using the KingFisher™ Duo extraction robot (Prime Magnetic Particle Processor) and the KingFisher Cell and Tissue DNA Kit, following the manufacturer’s protocol. Preparation of libraries, sequencing and the assembly of the de novo genome were performed by Science for Life Laboratory (SciLifeLab) in Stockholm. Short-insert-sized (180 bp) and mate-pair (5 and 8 kb) DNA libraries were constructed. All libraries were sequenced on the Illumina HiSeq2500 platform with a 2 × 126 setup in RapidHighOutput mode. Paired-end sequence data from the genomic DNA libraries were quality-checked, assembled, and evaluated using the NouGAT pipeline42.
ddRAD-Seq Read Processing and Alignment
We analysed raw sequence reads with FastQC to determine average quality scores across all reads. We used the process_radtags pipeline in Stacks v1.343,44 to demultiplex sequence reads, filter out low quality reads, and trim raw sequence reads to 90 bp. Single nucleotide errors within the barcode were corrected by the software. We used Bowtie2 v2.2.545 to align the RAD reads to the Striped Tit-Babbler reference genome. Successfully aligned reads were processed with SAMtools v1.046 to filter out reads with mapping quality score <25, exclude improperly paired reads, and convert the files to BAM format.
We used the ref_map.pl pipeline in Stacks v1.3 to assemble reference-aligned reads into loci for SNP calling, ensuring that only loci with no missing data were reported in the output SNP matrix. We used default parameters in Bayescan47 to detect loci under selection, and used PLINK v1.948 to identify loci in linkage disequilibrium for subsequent filtering.
We used Coancestry49 to determine the relatedness coefficients between individuals as well as the inbreeding coefficients for all 46 individuals. We filtered out one individual from each pair with an inferred relatedness greater than half-sibs (Queller and Goodnight)50 relatedness coefficient >0.25) using GenoDive51. We used the TrioML method52 to estimate inbreeding coefficients as it performs better than other estimators, especially for populations with high inbreeding and closely related individuals52,53.
Analysis of Population Genetic Structure
To explore inter-individual differentiation, we performed a PCA with the kin-filtered SNP matrix using the dudi.pca function in the ade4 package54 in R v3.2.2 (R Core Development Team, 2015). The results of the PCA were plotted using the dudi.plot function in the R package Momocs55.
To calculate genetic diversity statistics, we grouped the 46 samples into three putative subpopulations based on the PCA results (Fig. 2), comprising one isolated forest patch (Admiralty Park: n = 3), and two large but fragmented networks of forest patches (Southern: n = 16, and Central Catchment Nature Reserve: n = 27) (Table 2). We conducted an analysis of molecular variance (AMOVA) in GenoDive, and calculated population genetic statistics for both the full (n = 46) and kin-filtered (n = 35) datasets using the populations module in Stacks. We calculated pairwise Weir and Cockerham’s FST56 using the R package diveRsity. As the removal of kin may improve the performance of FST estimations at the expense of precision57, we calculated pairwise FST values using both the full (n = 46) SNP matrix and the kin-filtered (n = 35) SNP matrix.
To test the robustness of our population genetic inferences, we used custom bash scripts to randomly subset the original SNP matrix for 2 to 3,500 loci, and recalculated population genetic statistics based on these reduced-loci SNPsets for a total of 100 independent subsamples per number of SNP loci being tested. The subsampled SNPsets were further used to calculate pairwise FST values (using the R package diveRsity) and mean TrioML values (using Coancestry via the R package related)58 for each subpopulation.
Analysis of Inter-Population Gene Flow
To detect the presence of gene flow between subpopulations, we used GENECLASS259 to identify first generation immigrants within each putative subpopulation. We used the L_home likelihood statistic since some source populations may not have been sampled.
Analysis of Historical Population Demography
To understand the historical demography of the Striped Tit-Babbler population in Singapore, we performed coalescent simulations, compared different models of historical demography, and further estimated parameters in DIYABC60. We assigned all samples to a single population and performed simulations to test three population demographic scenarios: (1) the Striped Tit-Babbler population in Singapore has maintained a uniform Ne, (2) experienced recent decline, and (3) recent expansion (Fig. S14).
We used NeEstimator v2.0161 to independently estimate the contemporary Ne of Striped Tit-Babblers in Singapore using the linkage disequilibrium method. As Ne calculations can simultaneously be biased by the presence of kin and overestimated under aggressive purging of kin57, we ran NeEstimator for both the full (n = 46) and kin-filtered (n = 35) SNP matrices without pruning for loci under linkage disequilibrium (5481 loci), to explore the range of possible Ne values for the Striped Tit-Babbler population.
Landscape Connectivity Modelling
To explore changes in forest contiguity over time, we conducted a supervised classification of LandSat 5 TM and LandSat 8 OLI/TIRS imagery (USGS), using the maximum likelihood method in ArcMap 10.0 to produce four land use maps of Singapore for the years 1989, 1997, 2005, and 2013.
We used Circuitscape v4.0.562,63 to model connectivity between extant forest patches in 2013. We defined preliminary resistance parameters of each habitat type along a scale of 1 (no resistance) to 100 (maximum resistance) (Table S6, Supplementary Information) based on the known ecology and habitat requirements of the Striped Tit-Babbler24,39,64. We ran Circuitscape v4.0.5 in pairwise mode to generate an exploratory currentmap showing the likely dispersal pathways between habitat fragments.
Landscape Genomic Analyses
We used GenAlEx65 to estimate the extent of spatial autocorrelation between the multilocus genotypes of the individuals sampled, using a distance class size of 1 km for 999 permutations and 1,000 bootstraps. In addition, we conducted distance-based Moran’s eigenvector map (dbMEM) analyses using the R package MEMGENE66, which account for potential spatial autocorrelation in the data, to determine the proportion of the spatial genetic signal explained by isolation by distance (IBD) and isolation by resistance (IBR) models.
We refined the resistance model by generating 45 alternative models such that resistanceurban >resistancemanagedvegetation, and using Circuitscape v4.0.5 and MEMGENE to select the optimal model for which the proportion of genetic variation explained by the corresponding resistance distance matrix was highest. We used the optimised resistance model to generate the final landscape connectivity map using Circuitscape v4.0.5.
Forward-in-time Landscape Genomic Simulations
To test for false positive signals of IBR and assess the validity of our population genetic analyses, we used CDPOP67 to simulate the effects of landscape structure on the population genetic structure of the Striped Tit-Babbler.
We defined relaxed life history parameters based on the known biology of the Striped Tit-Babbler and closely related species, which should result in a slower accumulation of intrapopulation differentiation, for a spatial dataset of 5761 Striped Tit-Babblers (per the species’ estimated population density, inclusive of the original 46 samples) and simulated the effects of IBD on the population for 99 randomly-generated neutral loci, for 100 generations, sampled every 10 generations, with 100 MCMC replicates per simulation.
We sampled individual genotypes at the locations of the original 46 individuals and calculated the change in pairwise FST between putative subpopulations using the R package diveRsity for all sampled generations. We also conducted dbMEM analyses using MEMGENE for generations 10 to 50, excluding MCMC replicates for which an entire subpopulation goes extinct, to compare the variance explained by IBD and IBD + IBR models and ascertain the Type I error rate.
We acknowledge the National Parks Board of Singapore for facilitating fieldwork under permit NP/RP13-019-2. All field and lab work was conducted in accordance with regulations outlined by the National University of Singapore’s Office of Safety, Health, and Environment.
The Striped Tit-Babbler genome and all short read sequences have been accessioned on both the NCBI Genomes and Short Read Archive with the following BioProject accession number: PRJNA392017. Raw Stacks outputs, CDPOP working files, and custom scripts are available from the corresponding author on request. All other data analysed during this study are included in the Supplementary information files.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We thank Dr. N. Ng, K.R. Sadanandan, G. Tang, G. Low, Dr. Y.C. Tay, R. Kolandavelu, Y.F. Chung, R. Teo, S.H. Yeo, X.Z. Li, H.Y. Teo, P. Sim, S. Yap, Dr. W.S. Symes, Dr. N. Tassie, and A. Deverakonda for field and lab support. We also thank Dr. E.L. Webb and Sakinah Y. for GIS and mapping assistance, Dr. T.L. Schmidt for assistance with CDPOP comfiguration, and Dr. A. Srivathsan for scripting support. We also thank our reviewers, including Dr. M. Kardos, for their comments and suggestions. Additionally, we thank the National Parks Board of Singapore and the Lee Kong Chian Natural History Museum for facilitating access to Striped Tit-Babbler samples. The authors would also like to acknowledge support from Science for Life Laboratory, the National Genomics Infrastructure, NGI, and Uppmax for providing assistance in massive parallel sequencing and computational infrastructure, as well as the Southeast Asian Biodiversity Genomics Center (SEABIG) and the National University of Singapore High Performance Computing Centre for additional computational support. This work was funded by National University of Singapore start-up funds from the Department of Biological Sciences (R-154-000-583-651), and from the National Parks Board, Singapore (R-154-000-A11-490), as well as from a Singapore Ministry of Education Tier I grant (R-154-000-658-112). The genome assembly work was also supported by the Swedish Research Council (grant number 621-2013-5161 to PE). DJXT and BC are funded by SEABIG (R-154-000-648-646 and R-154-000-648-733).