The evolutionary history of bears is characterized by gene flow across species

Kumar, Vikas; Lammers, Fritjof; Bidon, Tobias; Pfenninger, Markus; Kolter, Lydia; Nilsson, Maria A.; Janke, Axel

doi:10.1038/srep46487

Download PDF

Article
Open access
Published: 19 April 2017

The evolutionary history of bears is characterized by gene flow across species

Vikas Kumar^1,2,
Fritjof Lammers^1,2,
Tobias Bidon^1,2,
Markus Pfenninger^1,2,
Lydia Kolter³,
Maria A. Nilsson¹ &
…
Axel Janke^1,2

Scientific Reports volume 7, Article number: 46487 (2017) Cite this article

94k Accesses
111 Citations
186 Altmetric
Metrics details

Subjects

Abstract

Bears are iconic mammals with a complex evolutionary history. Natural bear hybrids and studies of few nuclear genes indicate that gene flow among bears may be more common than expected and not limited to polar and brown bears. Here we present a genome analysis of the bear family with representatives of all living species. Phylogenomic analyses of 869 mega base pairs divided into 18,621 genome fragments yielded a well-resolved coalescent species tree despite signals for extensive gene flow across species. However, genome analyses using different statistical methods show that gene flow is not limited to closely related species pairs. Strong ancestral gene flow between the Asiatic black bear and the ancestor to polar, brown and American black bear explains uncertainties in reconstructing the bear phylogeny. Gene flow across the bear clade may be mediated by intermediate species such as the geographically wide-spread brown bears leading to large amounts of phylogenetic conflict. Genome-scale analyses lead to a more complete understanding of complex evolutionary processes. Evidence for extensive inter-specific gene flow, found also in other animal species, necessitates shifting the attention from speciation processes achieving genome-wide reproductive isolation to the selective processes that maintain species divergence in the face of gene flow.

A polar bear paleogenome reveals extensive ancient gene flow from polar bears into brown bears

Article 16 June 2022

Ming-Shan Wang, Gemma G. R. Murray, … Beth Shapiro

Range-wide whole-genome resequencing of the brown bear reveals drivers of intraspecies divergence

Article Open access 06 February 2023

Menno J. de Jong, Aidin Niamir, … Axel Janke

Large-scale mitogenomic analysis of the phylogeography of the Late Pleistocene cave bear

Article Open access 15 August 2019

Joscha Gretzinger, Martyna Molak, … Verena J. Schuenemann

Introduction

Ursine bears are the largest living terrestrial carnivores and have evolved during the last five million years, attaining a wide geographical distribution range (Fig. 1). Bears are a prominent case where conflicting gene trees and an ambiguous fossil record¹ make the interpretation of their evolutionary history difficult². Introgressive gene flow resulting from inter-species mating is believed to be rare among mammals³. However, some 600 mammalian hybrids are known⁴ and the importance of hybridization has started to gain attention in evolutionary biology⁵. Yet, our knowledge of the extent of post speciation gene flow is limited, because few genomes of closely related species have been sequenced.

**Figure 1: Approximate geographic distribution of extant bears according to IUCN data.**

In bears, natural mating between grizzlies (brown bears Ursus arctos), and polar bears (Ursus maritimus) results in hybrid offspring, the grolars⁶. Genome scale studies in brown and polar bears find that 8.8% of individual brown bear genomes have a polar bear origin⁷. Additionally, the brown bear mitochondrial (mt) genome was captured by polar bears during ancient hybridization⁸ and polar bear alleles are distributed across brown bear populations all over the world by male-biased migration and gene flow^7,9,10.

Polar and brown bears belong to the sub-family Ursinae, which comprises six extant, morphological and ecological distinct species¹¹, but hybridization among some ursine bears is possible. A natural hybrid has been reported also between the Asiatic black bear (Ursus thibetanus) and the sun bear (Ursus malayanus)¹². In captivity more bear hybrids are known, some of them have been fertile⁴. Despite limited population sizes for most bears and apparently distinct habitats, morphology and ecology, molecular phylogenetic studies have been unable to unequivocally reconstruct the relationship among the six ursine bear species². Especially, the evolution of the American (Ursus americanus) and Asiatic black bear is difficult to resolve, despite being geographically separated (Fig. 1).

Evidence from the fossil record, morphology and mitochondrial phylogeny suggested a closer relationship between the Asiatic and the American black bears^13,14,15. In contrast, autosomal and Y-chromosomal sequences support a grouping with the American black bear being sister group to the brown/polar bear clade^2,9,16. Another conflict between mitogenomics, morphology and autosomal sequence data is the position of the morphologically distinct sloth bears (Ursus ursinus). Mitochondrial DNA (mtDNA) analyses and morphological studies placed sloth bears outside of all other ursine bears, while nuclear gene analyses favor a position close to sun bears^2,15,17. A study of nuclear introns with multiple individuals for each ursine species was unable to reconstruct a well-supported species tree and suggested that incomplete lineage sorting (ILS) and/or gene flow caused the complexities in the ursine tree². However, previous molecular studies did not have access to genome data from all bear species and were thus limited to single loci.

The genomic era allows a detailed analyses of how gene flow from hybridization affects genomes, and has revealed much more complex evolutionary histories than previously anticipated for many species, including our own^18,19,20. Multiple genomic studies on polar, brown bears and the giant panda^10,21,22,23 lead to a wealth of available genomic data in these species. We investigated all living Ursinae and Tremarctinae bear species based on six newly sequenced bear genomes and published ones. Methods specifically developed to deal with complex genome data^24,25 and gene flow^18,26 are applied to resolve and understand the processes that have shaped the evolution of bears.

Results

The sequenced individuals were morphologically typical for the respective species. Mapping Illumina reads against the polar bear genome²³ yielded an average coverage of 11X. Supplementary Tables 1 and 2 detail the sequencing and assembly data, and provide accession numbers of the included species. As a basis for subsequent analyses, non-overlapping 100 kb Genome Fragments (GFs) were extracted from polar bear scaffolds > 1 megabase (Mb). These have presumably a higher assembly quality than smaller fragments and still represent > 96% of the genome (Supplementary Fig. 1). Heterozygous sites, gaps, repetitive sequences, and transposable element sequences were removed from GF alignments (Supplementary Fig. 2). Pedigrees (Supplementary Fig. 3) and genome-wide heterozygosity plots (Supplementary Fig. 4) show that the sequenced individuals are neither hybrids nor, compared to wild specimens, severely inbred.

Network analysis depicts hidden conflict in the coalescent species tree

GFs larger than 25 kb, representing the majority of the length distribution (Supplementary Fig. 2), contain on average 104 substitutions among Asiatic bears (Supplementary Fig. 5). Phylogenetic topology testing on real and simulated sequence data shows that GFs with this information content significantly reject alternative topologies (Supplementary Figs 6 and 7). For subsequent coalescence, consensus, and network analyses, only GFs > 25 kb were used and the results are thus based on firmly supported Maximum Likelihood (ML) analyses.

A coalescent species tree utilizing 18,621 GFs > 25 kb (869,313,834 bp) resolved the relationships among bears with significant support for all branches (Fig. 2A, Supplementary Fig. 8). In the coalescent-based species tree, sun and sloth bears are sister group to the Asiatic black bear, and the American black bear groups with polar and brown bears. The spectacled bear is, consistent with previous results^2,16, placed as sister taxon to Ursinae. The well-resolved coalescent species tree appears to be without conflict from genomic data.

**Figure 2: A coalescent species tree and a split network analysis from 18,621 GF ML trees.**

However, a network analysis²⁷ gained from the same 18,621 GFs identifies conflicting phylogenetic signal (Fig. 2B). The square and cuboid-like structures indicate alternative phylogenetic signals, particularly among brown and polar bears, but also among the Asiatic bears. The brown bear from the Admiralty, Baranof, and Chichagof (ABC) islands groups in different arrangements with other brown and polar bears, consistent with gene flow between the two species^7,8,23. When the threshold level for depicting conflicting branches is reduced in the network analysis, the signal becomes increasingly complex, illustrating the conflict among 18,621 ML-trees (Supplementary Fig. 9). Still, the network analysis agrees with the species tree when the spectacled bear is the outgroup. The phylogenetic conflict can be caused by incomplete lineage sorting (ILS) or gene flow, but less likely from lack of resolution due to the strong phylogenetic signal of each GF (Supplementary Figs 6 and 7). Analyses of 8,050 protein coding sequences (10,303,323 bp) and GFs from scaffolds previously identified as X chromosomal (total 74 Mb)²², conform to the species tree and networks (Supplementary Fig. 10). Finally, the paternal side of bear evolution based on Y chromosome sequences²⁸ for available genomes is consistent with the inferred species tree (Supplementary Fig. 11).

The Bayesian mtDNA tree (Fig. 3, Supplementary Fig. 12) conforms to previous studies^2,15, making this the hitherto largest taxonomic sampling of 38 complete bear mt genomes. However, several nodes of the mtDNA tree differ notably from the coalescent species tree (Fig. 2A). In the mtDNA tree, the brown bears are paraphyletic, because the brown bear mt genome introgressed into the polar bear population⁸. The extinct cave bear (Ursus spelaeus) is the sister group to polar and brown bears. The American black bear is the sister group to the Asiatic black bear, and the sloth bear is the sister group to all ursine bears. The topological agreement of the mtDNA tree to previous studies and placement of the new individuals corroborates that the studied individuals are representative for their species.

**Figure 3: Phylogenetic relationship among the bears using mtDNA genomes.**

Finally, a consensus analysis based on GF ML-trees (Supplementary Fig. 13) produces a tree that is identical to the coalescent species tree, but highlights that numerous individual GF trees support alternative topologies (Supplementary Table 3). Inspection of the individual 18,621 GF ML topologies shows that 38.1% (7,086) support a topology where Asiatic black bear is the sister group to the American black/brown/polar bear clade. The Asiatic black bear groups in different arrangements with the two other Asiatic bears: 18.7% (3,474) of the branches support a grouping with the sun bear, and 7.5% (1,394) with the sloth bear.

Gene flow among bears is common

Seemingly conflicting phylogenetic signals in evolutionary analyses can be explained by incomplete lineage sorting (ILS) or gene flow among species. In contrast to the largely random process of ILS, gene flow produces a bias in the phylogenetic signal, because it is a directed process. The D-statistic measures the excess of shared polymorphisms of two closely related lineages with respect to a third lineage¹⁸ and can thus discriminate between gene flow and ILS. The test assumes that the ancestral population of the in-group taxa was randomly mating and recently diverged²⁹. These assumptions might be compromised in wide-spread, structured species like bears. However, speciation is rarely instantaneous, but is rather preceded by a period of population divergence. This should not compromise the test as long as there was a panmictic population ancestral to the progenitor populations of the eventual daughter species at some point in time, which is a reasonable assumption.

The D-statistics analyses find evidence of gene flow between most sister bear species (Fig. 4, Supplementary Tables 4 and 5 and Supplementary Fig. 14). Regardless if spectacled bear or giant panda is used as outgroup, the involved species and relative signal strengths of gene flow in the tested topologies remain the same (Supplementary Table 6). The D-statistics is limited to four-taxon topologies and therefore gene flow signals are difficult to interpret when they occur between distant species, as it cannot determine if it is a direct, indirect, or ancestral signal. For taking more complex gene flow patterns into account, and to determine the direction of gene flow, we applied the recently introduced D_FOIL-statistics²⁶. This method uses a symmetric five-taxon topology and has specifically been developed to detect and differentiate gene flow signal among ancestral lineages²⁶.

**Figure 4: Graphical summary of gene flow analyses using D and D_FOIL statistics on a cladogram.**

In agreement with the phylogenetic conflict and D-statistics, the D_FOIL- statistics finds gene flow between the ancestor of the American black bear/brown/polar bear clade and the Asiatic black bear (Fig. 4, Table 1). The Etruscan bear was geographically overlapping with other bear species and was, like the Asiatic black bear, widely distributed³⁰. It has been identified in fossil layers of Europe 2.5 Ma − 1.0 Ma^1,31. The wide geographical distribution would explain the nearly equally strong gene flow from Asiatic black bear into brown bear also observed in the D-statistics (Supplementary Fig. 14). Finally, there is a gene flow signal between the American and Asiatic black bears. The gene flow could have taken place either on the American or Asiatic side of the Bering Strait and is consistent with mitochondrial capture between the species² (Fig. 3). Most of the weaker gene flow signals in Fig. 4 (dashed-lines) do not necessarily reflect direct species hybridization and are possibly remnants of ancestral gene flow not detected due to allelic loss or signals of indirect gene flow by ghost lineages or intermediate species. Permutations of species for the D_FOIL analysis including other polar, sloth and brown bear individuals show that the results are taxon independent (Table 1).

Table 1 Gene flow detected by the D_FOIL analyses that is based on a five taxon analysis.

Full size table

PhyloNet³² has been developed to detect hybridization events in genomic data while accounting for ILS. We applied the ML approach implemented in PhyloNet³² to detect hybridization among bear species. Due to computational constraints we sampled 4,000 ML trees from putatively independent GFs using one individual representing per species. The ABC island brown bear was chosen as another representative for brown bears and positive control, because its population hybridized with polar bears^7,8,28. The outgroup, the spectacled bears were removed to reduce the computational complexity and, because previous analyses using D-statistics and D_FOIL did not detect gene flow between tremarctine and ursine bears. The complex phylogeny requires exceptional computational time so we analyzed only networks with up to two reticulations. The resulting PhyloNet network with the highest likelihood (Supplementary Fig. 15) shows reticulations between ABC island brown bear and polar bears, and also between the Asiatic black bear and the ancestral branch to American black, brown and polar bears. It is noteworthy, that the second reticulation has a high inheritance probability (41.8%), which agrees with the strongest gene flow signal identified by D_FOIL analyses (Fig. 4, Table 1). Due to computational limits so far only two reticulations that represent the strongest hybridization signals were identified. For three and more reticulations the network-space becomes extremely large.

Additional analysis using CoalHMM³³ supports the findings of gene flow from D-, D_FOIL, and PhyloNet analyses (Supplementary Fig. 16). It shows that a migration model fits most pair wise comparisons significantly better than ILS, and is robust under a broad range of parameters (Supplementary Figs 17 and 18). Thus, gene flow among bears throughout most of their history is the major factor for generating conflicting evolutionary signals.

Estimation of divergence times and population splits

The phylogenomic divergence time estimates (Fig. 5) are older than previous estimates based on nuclear gene data², but consistent with that from mtDNA data¹⁵ (Supplementary Table 7). The amount of heterozygous sites differs among species and individuals, and is highest in the Asiatic black bear genome and, as expected² lowest in the polar bears and spectacled bears (Supplementary Fig. 4). It is noteworthy that the average numbers of heterozygous sites differ among the two sun bears, which may reflect different population histories.

**Figure 5: Phylogenomic estimates of divergence times.**

Estimates for past changes in effective population size (N_e) using the pairwise sequentially Markovian coalescent (PSMC)³⁴ are shown in Fig. 6 (Supplementary Fig. 19). While PSMC plots from low coverage genomes may vary and not be ultimately accurate, the plots inferred for the brown, polar and American black bear are very similar to previous published on higher coverage genome (Supplementary Fig. 20)¹⁰. The demographic histories of the Asian bear individuals vary widely, but do not overlap in bootstrap analyses since 100 ka (Supplementary Fig. 21).

**Figure 6: Historical effective population sizes (N_e) using the pairwise Markovian coalescent (PSMC) analyses for the newly sequenced bear genomes.**

Discussion

Previously, nuclear gene trees and mitochondrial trees have been in conflict^14,15,16, and a forest of gene trees made it difficult to conclusively reconstruct the relationships among bears, in particular among Asiatic bears². Now, phylogenomic analyses resolve a solid coalescent species tree and provide a temporal frame of the evolutionary history of the charismatic ursine and tremarctine bears and allow a glimpse into their demographic history.

According to the PSMC analyses the Asiatic black bear maintained a stable and a relatively high long-term Ne since 500 ka (Fig. 6). This is consistent with its wide geographic distribution and its high degree of heterozygous sites in the genome². The effective population size of the Asiatic black bear declined some 20 ka, correlating with the end of the later part of the ice age. By contrast, the spectacled bear maintained a relatively low long-term effective population size, consistent with their lower population diversity^2,35. The demography of two sun bear individuals is strikingly different from each other since 100 ka. As the bootstrap replicates do not overlap, the different curves support a hypothesis of separate population dynamics (Supplementary Fig. 21). Their distinct mitochondrial lineages (Fig. 3) might indicate that the two sun bear individuals belong to the described subspecies U. m. malayanus (Sumatra and Asian mainland) and U. m. euryspilus (Borneo) respectively³⁶. The ancestor of extant sun bears might have settled in the Malay Archipelago during the marine isotope stage (MIS)⁶. In the following Eemian interglacial, Borneo got isolated, thereby giving rise to different environmental conditions and to a distinct sun bear subspecies, but without samples from multiple individuals from known locations and high coverage genomes, this remains speculative.

Multi-species-coalescent methods that are becoming increasingly important in genomic analyses³⁷ taking phylogenetic conflict into account. However, when analyzing GFs > 25 kb, phylogenetic conflict is not caused by noise, but by evolutionary signal and should not be ignored³⁸. Phylogenetic networks show that evolutionary histories of numerous GFs, i.e. various regions of their genome, are significantly different, not only because the phylogenetic signal differs drastically, but it does so with statistically significant support. This is also evident from large-scale evolutionary analysis of insertion patterns of transposable elements into the bear genomes, which yield a similarly complex history of bears³⁹. Compared to a study based on 14 loci² we were able to fully resolve the species relationship among Ursidae. In addition genome analyses shows that, the conflicting relationship shown in² are to be the result of gene flow which is not only limited to sister species. It is important to realize that bifurcating species trees, even coalescence based, can only convey a fraction of the evolutionary information contained in entire genomes and that network analyses are needed to identify underlying conflict in the data^24,38. The analyses of the ursine phylogeny suggest that gene flow and not incomplete lineage sorting are major cause for the reticulations in the evolutionary tree. These two processes can be distinguished from each other by methods and programs like D-statistics, D_FOIL and Phylo-Net^18,26,32 that are specifically developed for this task.

Some of the inferred gene flow between bear species appears weak or episodic and thus requires further corroboration by additional sampling of individuals. Population analyses show that American black bears are divided into two distinct clades that diverged long before the last glacial maximum, indicating a long and isolated evolutionary history on the North American continent⁴⁰. Thus, it is unlikely that American black bears came into contact with the Asiatic sun and sloth bears⁴⁰. Likewise, introgressive gene flow between south-east Asiatic bear species and polar bears requires an explanation, because they have been evolving in geographically and climatically distinct areas, from the time when polar bears diverged from brown bears and began parapatric speciation in the Arctic. It is therefore possible that some gene flow events occurred through an intermediate species. The brown bear has been shown to distribute polar bear alleles across its range⁷ and may therefore be a plausible vector species for genetic exchange between Asiatic bears and the polar, or American black bear. The brown bear is a likely extant candidate, because it has been and is geographically wide-spread⁴¹. Furthermore, the geographical range of brown bears overlaps with all other ursine bear species (Fig. 1), they have reportedly migrated several times across continents and islands⁴¹, and numerous brown bear hybrids with other bears in either direction are known⁴. While also the Asiatic black bear was widely distributed across Asia and had, like the brown bear¹⁰, a large effective population size (Fig. 6), a migration of the Asiatic black bear into North America has not been shown. Likewise, migration of the American black bear in the opposite direction, from the American to the Asian continent, is not evident from fossil data. The D_FOIL and PhyloNet analyses^26,32 are powerful tools to detect ancestral gene flow, such as the prominent signal between the Asiatic black bear and the ancestor to the American black, brown and polar bears (Fig. 4, Table 1). In fact, gene flow during early ursine radiation from extinct bear species, such as the Etruscan bear or the cave bear is to be expected to leave signatures in genomes of their descendants and thus causing conflict in a bifurcating model of evolution.

Speciation as a selective rather than an isolation process

There is no question that bears are morphologically, geographically and ecologically distinct and they are unequivocally accepted as species even by different species concepts⁴². Yet, our genome-wide analyses identify gene flow among most ursines, making their genome a complex mosaic of evolutionary histories. Increasing evidence for post-speciation gene flow among primates, canines, and equids^19,20 suggests that interspecific gene flow is a common biological phenomenon. The occurrences of gene flow and to a lesser extent ILS, of which a fraction in the phylogenetic signal cannot be excluded, suggest that the expectation of a fully resolved bifurcating tree for most species might be defied by the complex reality of genome evolution. Recent genome-scale analyses of basal divergences of the avian⁴³, and even metazoan⁴⁴ tree share the same difficulties to resolve certain branches as observed for mammals⁴⁵. Detecting gene flow for these deep divergences is difficult and therefore most of the reticulations and inconsistent trees have so far been attributed to ILS⁴⁶.

The recent discoveries of gene flow by introgressive hybridization in several mammalian species^19,20 and in bears over extended periods of their evolutionary history have a profound impact of our understanding of speciation. If, in fact gene flow across is frequent, and can last for several hundred-thousand years after divergence, evolutionary histories of genomes will be inherently complex and phylogenetic incongruence will depict this complexity. Therefore, speciation should not only be viewed as achieving genome-wide reproductive isolation but rather as selective processes that maintain species divergence even under gene flow⁴⁷.

Materials and Methods

Genome sequencing, mapping and creation of consensus sequences

Prior to sampling and DNA extraction and evolutionary analyses, pedigrees from zoo studbooks and appearance of the individuals confirmed that these individuals are not hybrids (Supplementary Fig. 3). DNA extraction from blood samples was done in a pre-PCR environment on different occasions to avoid confusion by standard phenol/chloroform protocols and yielded between 1 to 6 μg DNA for each of the six bear individuals (Supplementary Table 1). Paired end libraries (500 bp) were made by Beijing Genome Institute (BGI) using Illumina TrueSeq and sequencing was done on Illumina HiSeq2000 resulting in 100 bp reads. Routine diagnosis samples were taken by a veterinarian and stored for later analyses in accordance with ethical guidelines of the respective institutions (see Acknowledgements), were used opportunistically for DNA isolation in accordance to best ethical and experimental practice of the Senckenberg Natural Research Society.

Raw reads were quality-trimmed by Trimmomatic⁴⁸ with a sliding window option, minimum base quality of 20 and minimum read length of 25 bp. The assembled polar bear genome²³ was used for reference mapping using BWA version 0.7.5a⁴⁹ with the BWA-MEM algorithm on scaffolds larger than 1 Mb. Scaffolds shorter than 1 Mb in length were not involved in the mapping and analyses, due to potential assembly artefacts⁵⁰ and for reducing the computational time in downstream analyses. Duplicate Illumina reads were marked by Picard tools version 1.106 (http://picard.sourceforge.net/) and the genome coverage was estimated from Samtools version: 0.1.18⁵¹.

Freebayes version 0.9.14–17⁵² called Single Nucleotide Variants (SNVs) using the option of reporting the monomorphic sites with additional parameters as -min-mapping-quality 20, -min-alternate-count 4, -min-alternate-fraction 0.3 and -min-coverage 4 with insertion/deletion (indel) realignment. A custom Perl script created consensus sequences for each of the mapped bear individuals from the Variant Call Format (VCF) files, keeping the heterozygous sites and removing indels. In order to complete the taxon sampling of the ursine bears, reads from six previously published genomes (Supplementary Table 1) selected and on the basis of geographic distribution, availability and sequence depth and SNVs were called as described above. For the two high coverage ( > 30X) genomes, SNVs calling parameters (-min-coverage) were set as one-half of the average read depth after marking duplicates. Genome error rates⁵³ were calculated on the largest scaffold (67 Mb) for all bear genomes, confirming a high quality of the consensus sequences. (Supplementary Methods and Supplementary Fig. 22).

Data filtration, simulation of sequence length and topology testing

The next step was to create multi-species alignments for further phylogenetic analysis from all 13 bear individuals. In order to create a data set with reduced assembly and mapping artefacts, genome data was masked for TEs and simple repeats¹⁹ using the RepeatMasker⁵⁴ output file of the polar bear reference genome available from http://gigadb.org/²³. Since the polar bear reference genome RepeatMasker output file did not contain the simple repeat annotation, we repeatmasked the polar bear reference genome with the option (-int) to mask simple repeats. Next all bear genomes were masked with bedtools version 2.17.0⁵⁵ and custom Perl scripts. Non-overlapping, sliding window fragments of 100 kb were extracted using custom perl scripts together with the program splitter from the Emboss package⁵⁶ (Supplementary Fig. 1), creating a dataset of 22,269 GFs from 13 bear individuals. Heterozygous sites, and repeat elements were all marked “N” and removed using custom Perl scripts. An evaluation of the minimum sequence length of GFs needed for phylogenetic analysis was done by estimating how much sequence data is needed to reject a phylogenetic tree topology using the approximate unbiased, AU test⁵⁷. Only sufficiently long sequences can differentiate between alternative trees with statistical significance. The evaluation was done in two separate analyses: (a) with a simulated data set and (b) on a data set of 500 random GFs (Supplementary Methods).

Phylogenetic analysis using Genomic Fragment (GF), coding and mitochondrial sequences

For phylogenetic analysis, all GFs with length < 25 kb were removed from the initial 22,269 GFs resulting in a data set consisting of 18,621 GFs (mean sequence length of 46,685 bp and standard deviation of 9,490 bp). The dataset was then used to create a coalescent phylogenetic species tree. First the selected GFs were used to create individual ML-trees using RAxML version 8.2.4⁵⁸. The best fitting substitution model was selected on 10 Mb of genomic data using jModelTest 2.1.1⁵⁹ available in RAxML version 8.2.4⁵⁸ and applied to all ML analyses. From 18,621 ML trees, ASTRAL²⁵ constructed a coalescent species tree. For bootstrap support of the coalescent species tree, GF ML trees were bootstrapped 100 times, generating a total of 1,862,100 ML trees. The bootstrapped ML-trees and the coalescent species tree were used as input in ASTRAL²⁵ using default parameters to generate bootstrap support. The consense program in Phylip version 3.69⁶⁰ built from 18,621 ML-trees, a majority rule consensus tree. SplitsTree version 4⁶¹ created a consensus network from the 18,621 GF ML-trees with various threshold settings (5%, 7%, 10% and 30%), to explore the phylogenetic conflict among the bear species. Similarly phylogenetic analysis of nuclear protein coding sequences (CDS) and mitochondrial genomes were done with panda genome as outgroup (Supplementary methods).

Gene flow analysis using D-statistics and the D_FOIL-method

The program ANGSD⁶² was used for admixture analysis (D-statistics) among the ursine bears using the spectacled bear-Chappari as outgroup. The reads of the other bears were mapped to the consensus sequence of the spectacled bear as described in method section. In addition, indel realignment was done using GATK version 3.1–1⁶³. All possible four-taxon topologies of the bear species including sun bear-Anabell, brown bear-Finland, Brown bear-ABC, Polar bear-2, American black bear, Asiatic black bear, Sloth bear were involved for gene flow analysis using D-statistics. A block jackknife procedure (with 10 Mb blocks) with parameters: -minQ 30 and -minMapQ30, was used to assess the significance of the deviation from zero. We also mapped the sun bear-Anabell, the Asiatic black bear and the sloth bear against the giant panda genome (ailMel1) http://hgdownload.soe.ucsc.edu/goldenPath/ailMel1/bigZips/ and repeated the analyses described above on to investigate if the outgroup choice affected our conclusions. In addition, we analyzed the data using D_FOIL-statistics²⁶, to detect signatures of introgression. For this analysis we assumed the coalescent species tree (Fig. 2A) and selected a window size of 100 kb with–mode dfoil as suggested by the authors²⁶. Other parameters were left at default.

Hybridization inference using PhyloNet

A data set of 4,000 random (every fourth) GFs, that are putatively in linkage equilibrium, was created to calculate rooted ML trees with RAxML as described earlier. The trees were pruned to contain one individual of each ursine species plus the ABC- brown bear to reduce computational complexity of the ML analyses. Maximum likelihood networks in a coalescent framework, thus incorporating ILS and gene flow, were inferred using PhyloNet^32,64 allowing 0, 1 and 2 reticulations in 50 runs and returning the five best networks.

Estimation of heterozygosity, past effective population size and divergence times

In order to calculate the amount of heterozygous sites as well as their distribution in all the bear genomes, their genomes were fragmented into 10 Mb regions using custom Perl scripts. The number of heterozygous sites was counted using a custom Perl script and plotted as distributions using R. The pairwise sequentially Markovian coalescent (PSMC)³⁴ analysis assessed past changes in effective population size over time. We used default parameters and 100 bootstrap replicates assuming a generation time for brown and polar bears of ten years, and six years for the other bear species for the PSMC analysis. We selected a mutation rate of 1 × 10⁻⁸ changes/site/generation for all species. These parameters were used in previous brown and polar bear analyses¹⁰ and enable comparability between the studies. A generation time of six years has been shown for the American black bear⁶⁵ and was deemed realistic for the other relatively small-bodied bears. The mutation rate is close to a pedigree-based mutation rate of 1.1 × 10⁻⁸ changes/site/generation in humans⁶⁶ that is considered to be typical for mammals. We also estimated the divergence time for all the bear species (Supplementary methods).

Additional Information

Accession Codes: The raw reads of the genome sequences have been deposited in the European Nucleotide Archive under the BioProject accession code PRJEB9724.

How to cite this article: Kumar, V. et al. The evolutionary history of bears is characterized by gene flow across species. Sci. Rep. 7, 46487; doi: 10.1038/srep46487 (2017).

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

Wagner, J. Pliocene to early Middle Pleistocene ursine bears in Europe: a taxonomic overview. J. Natl. Mus. Prague Nat. Hist. Ser. 179, 197–215 (2010).
Google Scholar
Kutschera, V. E. et al. Bears in a Forest of Gene Trees: Phylogenetic Inference Is Complicated by Incomplete Lineage Sorting and Gene Flow. Mol. Biol. Evol. 31, 2004–2017 (2014).
Article CAS PubMed PubMed Central Google Scholar
Coyne, J. A. & Orr, H. A. Speciation. 37, (Sunderland, MA: Sinauer Associates, 2004).
Gray, A. Mammalian hybrids. A check-list with bibliography. (Commonwealth Agricultural Bureaux, 1972).
Mallet, J. Hybridization as an invasion of the genome. Trends Ecol. Evol. 20, 229–237 (2005).
Article PubMed Google Scholar
Smol, J. P. Climate Change: A planet in flux. Nature 483, S12–S15 (2012).
Article ADS CAS PubMed Google Scholar
Cahill, J. A. et al. Genomic evidence of geographically widespread effect of gene flow from polar bears into brown bears. Mol. Ecol. 24, 1205–1217 (2015).
Article PubMed PubMed Central Google Scholar
Hailer, F. et al. Nuclear genomic sequences reveal that polar bears are an old and distinct bear lineage. Science 336, 344–347 (2012).
Article ADS CAS PubMed Google Scholar
Bidon, T. et al. Brown and polar bear Y chromosomes reveal extensive male-biased gene flow within brother lineages. Mol. Biol. Evol. 31, 1353–1363 (2014).
Article CAS PubMed Google Scholar
Miller, W. et al. Polar and brown bear genomes reveal ancient admixture and demographic footprints of past climate change. Proc. Natl. Acad. Sci. 109, E2382–E2390 (2012).
Article CAS PubMed PubMed Central Google Scholar
Nowak, R. Walker’s Mammals of the World. (Johns Hopkins Press, 1991).
Galbreath, G. J., Hunt, M., Clements, T. & Waits, L. P. An apparent hybrid wild bear from Cambodia. Ursus 19, 85–86 (2008).
Article Google Scholar
McLellan, B. & Reiner, D. A review of bear evolution. Bears Their Biol. Manag. 85–96 (1994).
Yu, L., Li, Y. W., Ryder, O. A. & Zhang, Y. P. Analysis of complete mitochondrial genome sequences increases phylogenetic resolution of bears (Ursidae), a mammalian family that experienced rapid speciation. BMC Evol. Biol. 7, 198 (2007).
Article PubMed PubMed Central CAS Google Scholar
Krause, J. et al. Mitochondrial genomes reveal an explosive radiation of extinct and extant bears near the Miocene-Pliocene boundary. BMC Evol. Biol. 8, 220 (2008).
Article PubMed PubMed Central CAS Google Scholar
Pagès, M. et al. Combined analysis of fourteen nuclear genes refines the Ursidae phylogeny. Mol. Phylogenet. Evol. 47, 73–83 (2008).
Article PubMed CAS Google Scholar
Abella, J. et al. Kretzoiarctos gen. nov., the Oldest Member of the Giant Panda Clade. PLoS ONE 7, e48985 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Green, R. E. et al. A Draft Sequence of the Neandertal Genome. Science 328, 710–722 (2010).
Article ADS CAS PubMed PubMed Central Google Scholar
Carbone, L. et al. Gibbon genome and the fast karyotype evolution of small apes. Nature 513, 195–201 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Jónsson, H. et al. Speciation with gene flow in equids despite extensive chromosomal plasticity. Proc. Natl. Acad. Sci. USA 111, 18655–18660 (2014).
Article ADS PubMed CAS PubMed Central Google Scholar
Li, R. et al. The sequence and de novo assembly of the giant panda genome. Nature 463, 311–317 (2010).
Article ADS CAS PubMed Google Scholar
Cahill, J. A. et al. Genomic Evidence for Island Population Conversion Resolves Conflicting Theories of Polar Bear Evolution. PLoS Genet. 9, e1003345 (2013).
Article CAS PubMed PubMed Central Google Scholar
Liu, S. et al. Population genomics reveal recent speciation and rapid evolutionary adaptation in polar bears. Cell 157, 785–794 (2014).
Article CAS PubMed PubMed Central Google Scholar
Bapteste, E. et al. Networks: expanding evolutionary thinking. Trends Genet. 29, 439–441 (2013).
Article CAS PubMed Google Scholar
Mirarab, S. et al. ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics 30, (2014).
Pease, J. B. & Hahn, M. W. Detection and Polarization of Introgression in a Five-Taxon Phylogeny. Syst. Biol. 64, 651–662 (2015).
Article CAS PubMed Google Scholar
Huson, D. H. H., Regula, R. & Scornavacca, C. Phylogenetic Networks. (Cambridge University Press, 2010).
Bidon, T., Schreck, N., Hailer, F., Nilsson, M. & Janke, A. Genome-wide search identifies 1.9 megabases from the polar bear Y chromosome for evolutionary analyses. Genome Biol. Evol. 7, 2010–2022 (2015).
Article CAS PubMed PubMed Central Google Scholar
Durand, E. Y., Patterson, N., Reich, D. & Slatkin, M. Testing for ancient admixture between closely related populations. Mol. Biol. Evol. 28, 2239–2252 (2011).
Article CAS PubMed PubMed Central Google Scholar
Baryshnikov, G. & Zakharov, D. Early pliocene bear Ursus thibetanus (Mammalia, carnivora) from Priozernoe locality in the Dniester basin (Molodova republic). Proc. Zool. Inst. RAS 317, 3–10 (2013).
Google Scholar
Croitor, R. & Brugal, J. P. Ecological and evolutionary dynamics of the carnivore community in Europe during the last 3 million years. Quat. Int. 212, 98–108 (2010).
Article Google Scholar
Than, C., Ruths, D. & Nakhleh, L. PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships. BMC Bioinformatics 9, 322 (2008).
Article PubMed PubMed Central CAS Google Scholar
Mailund, T. et al. A New Isolation with Migration Model along Complete Genomes Infers Very Different Divergence Processes among Closely Related Great Ape Species. PLoS Genet 8, e1003125 (2012).
Article PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Inference of human population history from individual whole-genome sequences. Nature 475, 493–496 (2011).
Article CAS PubMed PubMed Central Google Scholar
García-Rangel, S. Andean bear Tremarctos ornatus natural history and conservation. Mammal Rev. 42, 85–119 (2012).
Article Google Scholar
Meijaard, E. Craniometric differences among Malayan sun bears (Ursus malayanus); evolutionary and taxonomic implications. Raffles Bull. Zool. 52, 665–672 (2004).
Google Scholar
Edwards, S. V. et al. Implementing and testing the multispecies coalescent model: A valuable paradigm for phylogenomics. Mol. Phylogenet. Evol. 94, 447–462 (2016).
Article PubMed Google Scholar
Nakhleh, L. Computational approaches to species phylogeny inference and gene tree reconciliation. Trends Ecol. Evol. 28, 719–728 (2013).
Article PubMed Google Scholar
Lammers, F., Gallus, S., Janke, A. & Nilsson, M. A. Phylogenetic conflict in bears identified by automated discovery of transposable element insertions in low coverage genomes. arXiv preprint arXiv:123901 (2017).
Puckett, E. E., Etter, P. D., Johnson, E. A. & Eggert, L. S. Phylogeographic Analyses of American Black Bears (Ursus americanus) Suggest Four Glacial Refugia and Complex Patterns of Postglacial Admixture. Mol. Biol. Evol. 32, 2338–2350 (2015).
Article CAS PubMed Google Scholar
Davison, J. et al. Late-Quaternary biogeographic scenarios for the brown bear (Ursus arctos), a wild mammal model species. Quat. Sci. Rev. 30, 418–430 (2011).
Article ADS Google Scholar
Harrison, R. G. & Larson, E. L. Hybridization, Introgression, and the Nature of Species Boundaries. J. Hered. 105, 795–809 (2014).
Article PubMed Google Scholar
Jarvis, E. D. et al. Whole-genome analyses resolve early branches in the tree of life of modern birds. Science 346, 1320–1331 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Nosenko, T. et al. Deep metazoan phylogeny: when different genes tell different stories. Mol. Phylogenet. Evol. 67, 223–233 (2013).
Article PubMed Google Scholar
Hallström, B. M. & Janke, A. Mammalian Evolution May not Be Strictly Bifurcating. Mol. Biol. Evol. 27, 2804–2816 (2010).
Article PubMed PubMed Central CAS Google Scholar
Suh, A., Smeds, L. & Ellegren, H. The Dynamics of Incomplete Lineage Sorting across the Ancient Adaptive Radiation of Neoavian Birds. PLoS Biol. 13, e1002224 (2015).
Article PubMed PubMed Central CAS Google Scholar
Wu, C.-I. The genic view of the process of speciation. J. Evol. Biol. 14, 851–865 (2001).
Article Google Scholar
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Article CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinforma. Oxf. Engl. 25, 1754–1760 (2009).
Article CAS Google Scholar
Baker, M. De novo genome assembly: what every biologist should know. Nat. Methods 9, 333–337 (2012).
Article CAS Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinforma. Oxf. Engl. 25, 2078–2079 (2009).
Article CAS Google Scholar
Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. arXiv preprint arXiv:1207.3907v2. (2012).
Orlando, L. et al. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature 499, 74–78 (2013).
Article ADS CAS PubMed Google Scholar
Smit, A., Hubley, R. & Green, P. RepeatMakser Open-4.0 http:/www.repeatmasker.org (2015).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinforma. Oxf. Engl. 26, 841–842 (2010).
Article CAS Google Scholar
Rice, P., Longden, I. & Bleasby, A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. TIG 16, 276–277 (2000).
Article CAS PubMed Google Scholar
Shimodaira, H. An approximately unbiased test of phylogenetic tree selection. Syst. Biol. 51, 492–508 (2002).
Article PubMed Google Scholar
Stamatakis, A. RAxML Version 8: A tool for Phylogenetic Analysis and Post-Analysis of Large Phylogenies. Bioinformatics 30, 1312–3 (2014).
Article CAS PubMed PubMed Central Google Scholar
Darriba, D., Taboada, G. L., Doallo, R. & Posada, D. jModelTest 2: more models, new heuristics and parallel computing. Nat. Methods 9, 772 (2012).
Article CAS PubMed PubMed Central Google Scholar
Felsenstein, J. PHYLIP (Phylogeny Inference Package) version 3.6. Available from: Author Department of genome sciences, University of Washington. Seattle. (2005).
Huson, D. H. & Bryant, D. Application of Phylogenetic Networks in Evolutionary Studies. Mol. Biol. Evol. 23, 254–267 (2006).
Article CAS PubMed Google Scholar
Korneliussen, T. S., Albrechtsen, A. & Nielsen, R. ANGSD: Analysis of Next Generation Sequencing Data. BMC Bioinformatics 15, 356 (2014).
Article PubMed PubMed Central Google Scholar
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Article CAS PubMed PubMed Central Google Scholar
Yu, Y., Dong, J., Liu, K. J. & Nakhleh, L. Maximum likelihood inference of reticulate evolutionary histories. Proc. Natl. Acad. Sci. 111, 16448–16453 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Onorato, D. P., Hellgren, E. C., van Den Bussche, R. A. & Doan-Crider, D. L. Phylogeographic Patterns within a Metapopulation of Black Bears (Ursus americanus) in the American Southwest. J. Mammal. 85, 140–147 (2004).
Article Google Scholar
Veeramah, K. R. & Hammer, M. F. The impact of whole-genome sequencing on the reconstruction of human population history. Nat. Rev. Genet. 15, 149–162 (2014).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We are grateful to Luay Nakhleh (Rice University) for expert help with Phylo-Net analyses, Yichen Zheng for valuable comments on the manuscript and to Jon Baldur Hlidberg (www.fauna.is), and Aidin Niamir for artwork. Blood samples were kindly provided by Carsten Ludwig (Allwetter Zoo Münster), Tim Schikora (Zoo Schwerin), Christian Wenker (Basel Zoo) and Eva Martinez Nevado (Zoo Madrid). This study was supported by Hesse’s funding program LOEWE (Landes-Offensive zur Entwicklung Wissenschaftlich-ökonomischer Exzellenz) and the Leibniz Society.

Author information

Authors and Affiliations

Senckenberg Biodiversity and Climate Research Centre, Senckenberg Gesellschaft für Naturforschung, Senckenberganlage 25, D-60325 Frankfurt am Main, Germany
Vikas Kumar, Fritjof Lammers, Tobias Bidon, Markus Pfenninger, Maria A. Nilsson & Axel Janke
Goethe University Frankfurt, Institute for Ecology, Evolution & Diversity, Biologicum, Max-von-Laue-Str. 13, D-60439 Frankfurt am Main, Germany
Vikas Kumar, Fritjof Lammers, Tobias Bidon, Markus Pfenninger & Axel Janke
AG Zoologischer Garten Cologne, Riehler Straße 173, Cologne, 50735, Germany
Lydia Kolter

Authors

Vikas Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Fritjof Lammers
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Bidon
View author publications
You can also search for this author in PubMed Google Scholar
Markus Pfenninger
View author publications
You can also search for this author in PubMed Google Scholar
Lydia Kolter
View author publications
You can also search for this author in PubMed Google Scholar
Maria A. Nilsson
View author publications
You can also search for this author in PubMed Google Scholar
Axel Janke
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.J. designed the research and obtained funding. A.J. and T.B. collected the data; V.K. and F.L. conducted the analyses; L.K. provided pedigrees and located samples; A.J., V.K., M.P., M.N., F.L., and T.B. interpreted the results; A.J. and V.K. wrote the paper with the help of all authors.

Corresponding author

Correspondence to Axel Janke.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Information (PDF 2563 kb)

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Kumar, V., Lammers, F., Bidon, T. et al. The evolutionary history of bears is characterized by gene flow across species. Sci Rep 7, 46487 (2017). https://doi.org/10.1038/srep46487

Download citation

Received: 23 November 2016
Accepted: 17 March 2017
Published: 19 April 2017
DOI: https://doi.org/10.1038/srep46487

This article is cited by

Recent increase in species-wide diversity after interspecies introgression in the highly endangered Iberian lynx
- Maria Lucena-Perez
- Johanna L. A. Paijmans
- José A. Godoy
Nature Ecology & Evolution (2024)
Ursids evolved dietary diversity without major alterations in metabolic rates
- A. M. Carnahan
- A. M. Pagano
- Charles T. Robbins
Scientific Reports (2024)
Multiple contact zones and karyotypic evolution in a neotropical frog species complex
- Lucas H. B. Souza
- Todd W. Pierson
- Luciana B. Lourenço
Scientific Reports (2024)
Worldwide Late Pleistocene and Early Holocene population declines in extant megafauna are associated with Homo sapiens expansion rather than climate change
- Juraj Bergman
- Rasmus Ø. Pedersen
- Jens-Christian Svenning
Nature Communications (2023)
Sequencing and assembling bear genomes: the bare necessities
- Courtney Willey
- Ron Korstanje
Frontiers in Zoology (2022)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.