The evolutionary history of bears is shaped by gene flow across species

Bears are iconic mammals with a complex evolutionary history. Natural bear hybrids and studies of few nuclear genes indicate that gene flow among bears may be more common than expected and not limited to the closely related polar and brown bears. Here we present a genome analysis of the bear family with representatives of all living species. Phylogenomic analyses of 869 mega base pairs divided into 18,621 genome fragments yielded a well-resolved coalescent species tree despite signals for extensive gene flow across species. However, genome analyses using three different statistical methods show that gene flow is not limited to closely related species pairs. Strong ancestral gene flow between the Asiatic black bear and the ancestor to polar, brown and American black bear explains numerous uncertainties in reconstructing the bear phylogeny. Gene flow across the bear clade may be mediated by intermediate species such as the geographically wide-spread brown bears leading to massive amounts of phylogenetic conflict. Genome-scale analyses lead to a more complete understanding of complex evolutionary processes. The increasing evidence for extensive inter-specific gene flow, found also in other animal species, necessitates shifting the attention from speciation processes achieving genome-wide reproductive isolation to the selective processes that maintain species divergence in the face of gene flow.


Introduction
Ursine bears are the largest living terrestrial carnivores and have evolved during the last three million years, attaining a wide geographical distribution range (Fig. 1). Bears are a prominent case where conflicting gene trees and an ambiguous fossil record 1 make the interpretation of their evolutionary history difficult 2 . Introgressive gene flow resulting from inter-species mating is believed to be rare among mammals 3 . However, some 600 mammalian hybrids are known 4 and the importance of hybridization has started to gain attention in evolutionary biology 5 . Yet, our knowledge of the extent of post speciation gene flow is limited, because few genomes of closely related species have been sequenced.
In bears, natural mating between grizzlies (brown bears Ursus arctos), and polar bears (Ursus maritimus) results in hybrid offspring, the grolars 6,7 . Genome scale studies in brown and polar bears find that up to 8.8% of individual brown bear genomes have an origin in the polar bear 8 .
Additionally, the brown bear mitochondrial (mt) genome was captured by polar bears during an ancient hybridization event 9,10 and polar bear alleles are distributed across brown bear populations all over the world by male-biased migration and gene flow 8,11 . Polar and brown bears belong to the sub-family Ursinae, which comprises six extant, morphological and ecological distinct species 12 , but hybridization among some ursine bears is possible. A natural hybrid has been reported also between the Asiatic black bear (Ursus thibetanus) and the sun bear (Ursus malayanus) 13 . In captivity more bear hybrids are known (between sun and sloth bear, Asiatic black and brown bear, and between American black and brown bear), some of them have been fertile 4 . Despite limited population sizes for most bears and apparently distinct habitats, morphology and ecology, molecular phylogenetic studies have been unable to unequivocally reconstruct the relationship among the six ursine bear species 2,14 . Especially, the evolution of the American (Ursus americanus) and Asiatic black bear is difficult to resolve, despite 3  Evidence from the fossil record, morphology and mitochondrial phylogeny suggested a closer relationship between the Asiatic and the American black bears [14][15][16][17] . In contrast, autosomal and Y-chromosomal sequences support a different grouping with the American black bear being sister group to the brown/polar bear clade 2,11,14,18,19 . Another apparent conflict between mitogenomics, morphology and autosomal sequence data is the position of the morphologically distinct sloth bears (Ursus ursinus). Mitochondrial DNA (mtDNA) analyses and morphological studies placed sloth bears as sister-group to all other ursine bears, while nuclear gene analyses favor a position close to the sun bears 2,17,20 . A study of nuclear introns with multiple individuals for each ursine species was unable to reconstruct a well-supported species tree and suggested that incomplete lineage sorting (ILS) and/or gene flow caused the complexities in the ursine tree 2 .
However, previous molecular studies did not have access to genome data from all bear species and were thus limited to small parts of the genome.
The genomic era allows for a detailed analyses of how gene flow from hybridization affects genomes, and has revealed much more complex evolutionary histories than previously anticipated for many species, including our own [21][22][23][24] . Multiple genomic studies on polar, brown bears and the giant panda [25][26][27][28] lead to a wealth of available genomic data for some ursine bears. We investigated all living Ursinae and Tremarctinae bear species based on six newly sequenced bear genomes in addition to published genomes. Methods specifically developed to deal with complex genome data 29,30 and gene flow in particular 21,31 are applied to resolve and understand the processes that have shaped the evolution of bears.

Results
The newly sequenced individuals were morphologically typical for the respective species. Mapping Illumina reads against the polar bear genome reference 28 yielded for the Asiatic bear species an average coverage of 11X. Tables in (Supplementary Table 1 and 2) detail the sequencing and assembly data, and provide accession numbers of the included species. As a basis for subsequent evolutionary analyses, non-overlapping 100 kb Genome Fragments (GFs) were extracted from assemblies against polar bear scaffolds larger than 1 megabase (Mb). These have presumably a higher assembly quality than smaller fragments and still represent >96% of the genome (Supplementary Fig. 1). Ambiguous sites, gaps, repetitive sequences, and transposable element sequences were removed from GF alignments ( Supplementary Fig. 2). The newly sequenced individuals were morphologically typical for the respective species. Pedigrees ( Supplementary Fig.   3) and heterozygosity plots ( Supplementary Fig. 4) show that the sequenced individuals are neither hybrids nor, compared to wild specimens, severely inbred.

Network analysis depicts hidden conflict in the coalescent species tree
GFs larger than 25 kb, representing the majority of the length distribution ( Supplementary Fig. 2), contain on average 104 substitutions between the Asiatic black bear and respectively, sun and sloth bear ( Supplementary Fig. 5). Phylogenetic topology testing on real and simulated sequence data shows that GFs with this information content significantly reject alternative topologies (Supplementary Fig. 6 and Fig. 7). For subsequent coalescence, consensus, and network analyses only GFs larger than 25 kb were used and are thus based on firmly supported ML-analyses.
A coalescent species tree utilizing 18,621 GFs >25 kb (869,313,834 bp) resolved the relationships among bears with significant support for all branches ( Fig. 2A, Supplementary Fig. 8). In the coalescent-based species tree, sun and sloth bears form the sister group to the Asiatic black bear, and the American black bear groups with polar and brown bears. The spectacled bear (Tremarctos 123 ornatus) is, consistent with previous results 2, 18 , placed as sister taxon to Ursinae. The well-resolved coalescent species tree appears to be without conflict from the genomic data.
However, a network analysis 32 gained from the same 18,621 GFs identifies conflicting phylogenetic signal in the data (Fig. 2B). The square and cuboid-like structures are indicative of alternative phylogenetic signals, particularly among brown and polar bears, but also among the Asiatic bear species. The brown bear from the Admiralty, Baranof, and Chichagof (ABC) islands groups in different arrangements with the other brown and polar bears, consistent with gene flow between the two species 8,10,28 . When the threshold level for depicting conflicting branches is reduced in the network analysis, the signal becomes increasingly complex, illustrating the conflicting signals from the 18,621 ML-trees ( Supplementary Fig. 9). Still, the network analysis agrees with the species tree when the spectacled bear is viewed as outgroup relative to the major cluster of American black, brown and polar bear, and Asiatic black bear with, sun and sloth bear.
The phylogenetic conflict can be caused by incomplete lineage sorting (ILS) or gene flow, but less likely from lack of resolution due to the strong phylogenetic signal of each GF (Supplementary Fig.   6 and Fig. 7). Analyses of 8,050 protein coding sequences (10,303,323 bp) and GFs from scaffolds previously identified as X chromosomal (total 74Mb) 27 , also conform to the species tree and network ( Supplementary Fig. 10). Finally, the paternal side of the bear evolution based on Y chromosome sequences 33 cannot be analyzed for all species, but the resulting tree ( Supplementary   Fig. 11) is consistent with the inferred species tree.
The Bayesian mtDNA tree (Fig. 3, Supplementary Fig. 12) from 11,529 bp of sequence information that remained after alignment and filtering conforms to previous studies 2,17,34 , making this the hitherto largest taxonomic sampling of 38 complete bear mt genomes. However, several nodes of the mtDNA tree differ notably from the coalescent species tree ( Fig. 2A). In the mtDNA tree, the brown bears are paraphyletic, because the brown bear mt genome introgressed into the polar bear  10,35 . The extinct cave bear (Ursus spelaeus) is the sister group to polar and brown bears.
The American black bear is the sister group to the Asiatic black bear, and the sloth bear is the sister group to all ursine bears. The topological agreement of the mtDNA tree to previous mtDNA studies and expected placement of the new individuals corroborates the matrilineal lineages and that the studied individuals are representative for their species and likely not hybrids.
Finally, a consensus analysis based on ML-trees from GF ( Supplementary Fig. 13) produces a tree that is identical to the coalescent species tree, but highlights that numerous individual GF trees support alternative topologies (Supplementary Table 3). Closer inspection of the individual 18,621 GF ML topologies shows that 38.1% (7,086) support a topology where Asiatic black bear is the sister group to the American black/brown/polar bear clade. The Asiatic black bear groups in different arrangements with the two other Asiatic bears: 18.7% (3,474) of the branches support a grouping with the sun bear, and 7.5% (1,394) with the sloth bear in other topologies.

Gene flow among bears is common
Seemingly conflicting phylogenetic signals in evolutionary analyses can be explained either by incomplete lineage sorting (ILS) 36 or gene flow among species. In contrast to the largely random process of ILS, gene flow produces a bias in the phylogenetic signal, because it is a directed process between certain species. The D-statistic measures the excess of shared polymorphisms of two closely related lineages with respect to a third lineage 21 and can thus discriminate between gene flow and ILS. The test assumes that the ancestral population of the three in-group taxa was randomly mating and recently diverging 37 . These assumptions might be compromised in widespread, structured species like bears. However, speciation is rarely instantaneous, but is rather  Table 6). The D-statistic is limited to four-taxon topologies and therefore gene flow signals are difficult to interpret when they occur between distant species, as it cannot determine if it is a direct, indirect, or ancestral signal. For taking more complex phylogenetic signals and gene flow patterns into account, and to determine the direction of gene flow, we applied the recently introduced D FOIL -statistics. This method uses a symmetric five-taxon topology and has specifically been developed to detect and differentiate gene flow signal among ancestral lineages 31 .
In agreement with the phylogenetic conflict and D-statistics, the D FOIL -statistic finds strong gene flow between the ancestor of the American black bear/brown/polar bear clade, possibly the extinct Etruscan bear (Ursus etruscus), and the Asiatic black bear (Fig. 4, Table 1). The Etruscan bear was geographically overlapping with other bear species and was, like the Asiatic black bear, widely distributed 38 (Table 1). PhyloNet 41 has been specifically developed to detect hybridization events in genomic data while accounting for ILS. We applied the maximum likelihood approach implemented in PhyloNet 41,42 to detect signs of hybridization among bear species. We sampled 4,000 ML trees from putatively independent GFs with one individual representing each species. The ABC island brown bear was chosen as another representative for brown bears and positive control, because for the ABC island brown bear population hybridization with polar bears has been documented 8,10,33 . The outgroup, the spectacled bears were removed to reduce the computational complexity and, because previous analyses could not detect gene flow between tremarctine and ursine bears. The complex phylogeny requires exceptional computational time so we analyzed only networks with up to two reticulations. The resulting PhyloNet network with the highest likelihood ( Supplementary Fig. 15) shows reticulations between ABC island brown bear and polar bears, and also between the Asiatic black bear and the ancestral branch to American black, brown and polar bears. It is noteworthy, that the second reticulation has a high inheritance probability (41.8%), which agrees with the strongest gene flow signal identified by D FOIL analyses (Fig. 4, Table 1). Due to computational limits so far only two reticulations that represent the strongest hybridization signals could be identified. For three and more reticulations the network-space becomes extremely large.
Additional analysis using CoalHMM 43 supports the findings of gene flow from D-, D FOIL , and PhyloNet analyses ( Supplementary Fig. 16). It shows that a migration model fits most pair wise comparisons significantly better than ILS. A sensitivity analysis shows that this finding is robust under a broad range of parameters (Supplementary Fig. 17 and 18). Thus, gene flow among bears throughout most of their history is the major factor for generating conflicting evolutionary signals.

Estimation of divergence times and population splits
The phylogenomic based divergence time estimates (Fig. 5) are older than recent estimates based on nuclear gene data 2 , but are consistent with that on mtDNA data 17,44 (Supplementary Table 7). The amount of heterozygous sites differs among species and individuals, and is highest in the Asiatic black bear genome and, as expected 2 lowest in the polar bears and spectacled bears ( Supplementary   Fig. 4). It is noteworthy that the average numbers of heterozygous sites differ among the two sun bears, which may reflect different population histories.
Estimates for past changes in effective population size (N e ) using the pairwise sequentially Markovian coalescent (PSMC) 45 for all newly sequenced bears are shown in Fig. 6 (unscaled PSMC see Supplementary Fig. 19). While PSMC plots from low coverage genomes may vary and not be ultimately accurate, the plots inferred for the brown, polar and American black bear are very similar to previous published on higher coverage genomes ( Supplementary Fig. 20) 26 . The demographic histories of the Asian bear individuals vary widely, but do not overlap in bootstrap analyses since 100 ka ( Supplementary Fig. 21).
According to the PSMC analyses the Asiatic black bear maintained a stable and a relatively high long-term Ne since 500 ka (Fig. 6). This is consistent with its wide geographic distribution and its high degree of heterozygous sites in the genome 2 . The effective population size of the Asiatic black bear declined some 20 ka, correlating with the end of the last ice age. By contrast, the spectacled bear maintained a relatively low long-term effective population size, consistent with their lower population diversity 2,46 . The demography of two sun bear individuals is strikingly different from each other since 100 ka. As the bootstrap replicates do not overlap, the different curves support a hypothesis of separate population dynamics ( Supplementary Fig. 21). Their distinct mitochondrial lineages (Fig. 3)  Thus, it is unlikely that American black bears came into contact with the Asiatic sun and sloth bears 15,51 . Likewise, introgressive gene flow between south-east Asiatic bear species and polar bears requires an explanation, because they have been evolving in geographically and climatically distinct areas, from the time when polar bears diverged from brown bears and began parapatric speciation in the Arctic. It is therefore possible that some gene flow events occurred through an intermediate species. The brown bear has been shown to distribute polar bear alleles across its range 8 and may therefore be a plausible vector species for the exchange of genetic material between Asiatic bears and the polar, or American black bear. The brown bear is a likely extant candidate, because it has been and is geographically wide-spread 52 . Furthermore, the geographical range of brown bears overlaps with all other ursine bear species (Fig. 1), they have reportedly migrated several times across continents and islands 52,53 , and numerous brown bear hybrids with other bears in either direction are known 4 . While also the Asiatic black bear was widely distributed across Asia and had, like the brown bear 26 , a large effective population size (Fig. 6), a migration of the Asiatic black bear into North America has not been shown. Likewise, migration of the American black bear in the opposite direction, from the American to the Asian continent, is not evident from fossil data. The D FOIL and PhyloNet analyses 31,41 are powerful tools to detect ancestral gene flow, such as the prominent signal between the Asiatic black bear and the ancestor to the American black, brown and polar bears (Fig. 4, Table 1). In fact, gene flow during early ursine radiation from extinct bear species, such as the Etruscan bear or the cave bear is to be expected to leave signatures in genomes of their descendants and thus causing conflict in a bifurcating model of evolution.

Speciation as a selective rather than an isolation process
There is no question that bears are morphologically, geographically and ecologically distinct and they are unequivocally accepted as species even by different species concepts 54

Genome sequencing, mapping and creation of consensus sequences
Prior to sampling and DNA extraction and evolutionary analyses, pedigrees from zoo studbooks and appearance of the individuals confirmed that these individuals are not hybrids ( Supplementary Fig.   3). DNA extraction from opportunistically obtained tissue and blood samples was done in a pre-PCR environment on different occasions to avoid confusion by standard phenol/chloroform protocols and yielded between 1 to 6 µg DNA for each of the six bear individuals (S1 Data 1.1).

Data filtration, simulation of sequence length and topology testing
The next step was to create multi-species alignments that could be used for further phylogenetic analysis from all 13 bear individuals included in this study. In order to create a data set with reduced assembly and mapping artefacts, genome data was masked for TEs and simple repeats 22 using the RepeatMasker 69 output file of the reference genome (polar bear) available from http://gigadb.org/ 28 . Since the polar bear reference genome RepeatMasker output file did not contain the simple repeat annotation, we repeatmasked the polar bear reference genome with the option (int) to mask simple repeats. Then all the bear genomes were masked with the help of bedtools version 2.17.0 70 and custom Perl scripts. Non-overlapping, sliding window fragments of 100 kb were extracted using custom perl scripts together with the program splitter from the Emboss package 71 from scaffolds larger than 1 Mb (Supplementary Fig. 1), creating a dataset of 22,269 GFs from 13 bear individuals. In addition, heterozygous sites, and repeat elements and ambiguous sites were all marked "N" and removed using custom Perl scripts. An evaluation of the minimum sequence length of GFs needed for phylogenetic analysis was done by estimating how much sequence data is needed to reject a phylogenetic tree topology with using the approximate unbiased, AU test 72 . This is important, because only sufficiently long sequences can differentiate between alternative trees with statistical significance. The evaluation was done in two separate analyses:

Phylogenetic analysis of nuclear protein-coding genes and mitochondrial genomes
The annotation of the polar bear genome 28 was used to extract the protein coding sequences (CDS) from the genome that could be used for phylogenetic analysis. The species alignments were complemented by giant panda (Ailuropoda melanoleuca, ailMel1) sequences from Ensembl (http://www.ensembl.org/) to provide an outgroup to the analyses. In order to determine orthologous CDS between the polar bear reference genome and the giant panda Proteinortho version 5.06 77 was used. The CDS of the new bear genomes that corresponded to the polar bear CDS were aligned with MAFFT version 7.154b 78 . Gaps were removed using Gblocks version 0.91b 79 and a custom perl script removed ambiguous sites. CDS <300 bp were not used for phylogenetic analyses. The best evolutionary model, GTR+G+I 80  model of sequence evolution. In addition, a concatenated analysis of the coding sequence was also done to estimate the concatenated CDS tree. The CDSs were concatenated and the substitution model GTR+G+I was used to create an ML tree with RaxML 73 . A AU topology test was made on the CDS topology using the CONSEL version 1.20 81 and the species tree ( Fig. 2A).
In order to extract the complete mt genomes from Illumina sequence data, the reads for different bear species were mapped to their respective published complete mt genome sequences using BWA version 0.7.5a 64 . Consensus sequences were created using Samtools version: 0.1.

Gene flow analysis using D-statistics and the D FOIL -method
The program ANGSD 83 was used for admixture analysis (D-statistics) among the ursine bears using the spectacled bear-Chappari as outgroup. The reads of other bears were mapped to the consensus sequence of the spectacled bear as previously described. In addition, insertion/deletion (indel) realignment was done using GATK version 3.1-1 84  investigate, if the choice of the outgroup affected our conclusions. In addition, we analyzed the data using D FOIL -statistics 31 , to detect the signatures of introgression. For this analysis we assumed the coalescent species tree ( Fig. 2A) and selected a window size of 100 kb with --mode dfoil as suggested by the authors 31 . Other parameters were left at default.

Hybridization inference using PhyloNet
A data set of 4,000 random (every fourth) GFs, that are putatively in linkage equilibrium, was created to calculate rooted ML trees with RAxML as described earlier. The trees were pruned to contain one individual of each ursine species plus the ABC-brown bear to reduce computational complexity of the ML analyses. Maximum likelihood networks in a coalescent framework, thus incorporating ILS and gene flow, were inferred using PhyloNet 41,42 allowing 0, 1 and 2 reticulations in 50 runs and returning the five best networks.

Estimation of heterozygosity, past effective population size and divergence times
In order to calculate the amount of heterozygous sites as well as their distribution in all the bear genomes, their genomes were fragmented into 10 Mb regions using custom Perl scripts. The number of heterozygous sites was counted using a custom Perl script and plotted as distributions using R. The pairwise sequentially Markovian coalescent (PSMC) 45 analysis was done to assess past changes in effective population size over time. We used default parameters and 100 bootstrap replicates assuming a generation time for brown and polar bears of ten years, and six years for the other bear species for the PSMC analysis. We selected a mutation rate of 1×10 -8 changes/site/generation for all species. These parameters were used in previous brown and polar bear 26 and enable comparability between the studies. A generation time of six years has been shown for the American black bear 85 and was deemed realistic for the other relatively small-bodied bears.
The mutation rate is close to a pedigree-based mutation rate of 1.1×10 -8 changes/site/generation in humans 86 that is considered to be typical for mammals.        Fig. 21) and the Asiatic black bear had a constant large Ne since 500 ka similar to that of the brown bear which is consistent with the wide geographic distribution and high heterozygosity. Other bears maintained a relatively low long-term Ne, consistent with a lower population diversity and low heterozygosity ( Supplementary Fig. 4). The bear paintings were made by Jon Baldur Hlidberg (www.fauna.is).  Note - The table shows