Demographic histories and genetic diversity across pinnipeds are shaped by human exploitation, ecology and life-history

A central paradigm in conservation biology is that population bottlenecks reduce genetic diversity and population viability. In an era of biodiversity loss and climate change, understanding the determinants and consequences of bottlenecks is therefore an important challenge. However, as most studies focus on single species, the multitude of potential drivers and the consequences of bottlenecks remain elusive. Here, we combined genetic data from over 11,000 individuals of 30 pinniped species with demographic, ecological and life history data to evaluate the consequences of commercial exploitation by 18th and 19th century sealers. We show that around one third of these species exhibit strong signatures of recent population declines. Bottleneck strength is associated with breeding habitat and mating system variation, and together with global abundance explains much of the variation in genetic diversity across species. Overall, bottleneck intensity is unrelated to IUCN status, although the three most heavily bottlenecked species are endangered. Our study reveals an unforeseen interplay between human exploitation, animal biology, demographic declines and genetic diversity.

The ms "Recent demographic histories and genetic diversity across pinnipeds are shaped by anthropogenic interactions and mediated by ecology and life -history" evaluates the evidence for genetic bottlenecks in multiple pinniped populations worldwide and tes ts how genetic diversity and bottleneck signal correlate with some other factors. The authors finds some evidence of genetic bottlenecks but not to the extend expected given the historical exploitation of many pinnipeds. This is a well-written ms and I commend the reviewers for making the data and R script available. I have a few specific comments and questions which I detail below.
Regarding the genetic data analysed, the authors do not explain how the relevant published microsatellite data were identified or selected from the available information. I know of at least 1 source which is not included and that provides data for one of the studied species in different sampling areas (see citation below). The authors need to explain how they identified source s and, if relevant, why some were excluded (and whether this could affect results). This is important to be able to understand how representative the results may be. González -Suárez, M; Aurioles-Gamboa, D; Gerber LR (2010) Past exploitation of C alifornia sea lions did not lead to a genetic bottleneck in the Gulf of C alifornia C iencias Marinas 36: 199-211. Note that the data for this study are freely available at Dryad Digital Repository doi:10.5061/dryad.1454.
Related to this issue, I was unclear on how many sampling sites were represented in each dataset. This is important to understand how representative the data are and also key to better understand the STRUC TURE analyses.
For several species the posterior probabilities for the bottleneck and neutral models are very similar (eg Walrus, Baltic ringed seal, NZ sea lion, and others). Given the posterior probability of correct model classification are relatively low (error rates of 25-30%) I am not convinced an arbitrary threshold of 0.5 can be used as evidence of fitting a bottleneck vs neutral model. Wouldn't the rejection of the neutral model be a better way to determine which species show patterns consistent with a bottleneck? Using that approach one can clearly identify the four species for which bottl eneck Ne is clearly estimated as small in figure 2.
Related to this in line 197 "ABC was clearly able to distinguish between the two models, with the posterior probability of correct model classification being 75% for the bottleneck model and 71% for the neutral model" I may be misinterpreting these results, but those probabilities of correct classification do not seem particularly good, you can be wrong about 25 -30% of the times, which is not trivial. Figure 1 (and in fact all figures) is visually appe aling but I find it difficult to see the probability values with the colors. Using the actual values would be much more informative and given the colours are already in a table format, that seems very much doable.
Line 199 "for every species the preferred model showed a good fit to the data" for the Saimaa ringed seal the p-value is 0.08, perhaps works discussing a bit more uncertainty in this case?
The motivation of the comparative approach is strong, but the authors ought to cite some of the work already done with ABC , especially the studies focusing on bottleneck expansion models (e.g. Xue and Hickerson, Burbrink et al, Gehara et al.). The authors also ought to cite the 90s (?) Roman/Pal umbirelated papers examining the weaker than predicted impact that whaling had on observed genetic diversities.
To strengthen the inferential confidence, the authors should consider a model that also incorporates a crucial features that could have a major impact. Late Pleistocene history bottleneck expansions of huge magnitude associated with the LGM could have had a huge impact on the data, even if there was a subsequent one in the 1800s. I liked that the authors repeated the analysis on the largest clu sters (informed by STRUC TURE results of K > 2).
The casual mention of "at selectively neutral loci such as microsatellites is an indicator of recent bottlenecks because during a population decline the number of alleles decreases faster than heterozygosity (Nei et al. 1975)" is difficult to digest with a straight face in light of the well known strong possibility of strong linkage to parts of the genome undergoing positive selection. The authors ought to loudly acknowledge that their bottleneck expansion inference could have been confounded by this process.
Why are bottleneck histories mutually exclusive with historical population expansions? It makes sense that they could often occur in sequence (expansion followed by a bottleneck). The way it is stated it sounds like the ABC model did not allow for both to happen.
The result that the majority of species fit a no-growth model over the bottleneck model makes me think that major point about the LGM above could have confounded the results.

Response 1
Thanks for raising this point. We did indeed conducted systematic searches to identify suitable datasets and we now report on this in a new section at the beginning of Supplementary Information part 3 at p.28, which includes a new table of our literature survey results. Briefly, we conducted Web of Science searches (last updated 28th June 2018) separately for all 35 species, combining the term 'microsat*' with all known latin and common species names. This led to the identification of a total of 304 unique records (see Supplementary information 1, Supplementary Table 1, or below). For each species, we then identified one or a small number of papers reporting datasets that were deemed suitable on this basis of the balance between the number of loci and individuals. As in most cases the raw data were not publically available, we contacted the authors of what we considered to be the most appropriate dataset for each species for our analyses. This allowed us to collate a single high quality microsatellite dataset for each of 25 pinniped species.
For several species, we could not identify any suitable records, and this is what motivated us to generate new microsatellite datasets for five further species where samples could be obtained. For many more species, only a single appropriate dataset could be identified. Although for a few species multiple datasets are in principle available, we feel our approach of selecting the single most appropriate dataset is valid for the following reasons: (i) Our selected datasets are the largest available to our knowledge for each species with regard to the number of overall genotypes. We are aware of the study of California sea lions highlighted by the referee, but the dataset in question is smaller than the dataset used in our study (3550 versus 4511 genotypes respectively).
(ii) Merging microsatellite datasets for the same species from multiple studies is not possible as differences in loci, genotyping and scoring methodologies prohibit combining these datasets in a meaningful way.
(iii) Analysing multiple datasets for a small subset of species would introduce a pseudoreplication problem into most of our analysis.
(iv) We show that our analyses are not sensitive to population structure (see Supplementary Fig.6). By implication, adding a few additional datasets is unlikely to change our results as well. Supplementary Table 12: Identification of microsatellite datasets. We searched relevant papers using scientific names and common names of each species, as shown in the "Web of Science search term" column. The "Results" column shows the number of papers found with the respective search term.
Related to this issue, I was unclear on how many sampling sites were represented in each dataset. This is important to understand how representative the data are and also key to better understand the STRUCTURE analyses.

Response 2
We have added the number of sampling locations to Supplementary Table 1. The names of the sampling sites are provided together with the raw genotypes and will deposited on Dryad upon acceptance of the article. These information are also available via https://github.com/mastoffel/pinniped_bottlenecks/blob/master/data/processed/seal_data_largest_clust_and_ pop_30.xlsx. Furthermore, documented scripts are available in the data processing pipeline to easily retrieve and investigate the data further.
For several species the posterior probabilities for the bottleneck and neutral models are very similar (eg Walrus, Baltic ringed seal, NZ sea lion, and others). Given the posterior probability of correct model classification are relatively low (error rates of 25-30%) I am not convinced an arbitrary threshold of 0.5 can be used as evidence of fitting a bottleneck vs neutral model. Wouldn't the rejection of the neutral model be a better way to determine which species show patterns consistent with a bottleneck? Using that approach one can clearly identify the four species for which bottleneck Ne is clearly estimated as small in figure 2.

Response 3
It is common and accepted practice in ABC modeling to estimate parameters under the model with the highest posterior probability. We therefore feel that 0.5 is not an arbitrary threshold. Moreover, by selecting the model with the highest probability for each species, we essentially reject the model with the lower probability. We are therefore not sure what the reviewer means by their suggestion of 'rejecting the neutral model'.
The referee is correct in pointing out from Figure 2 that four species of pinniped were strongly bottlenecked. However, we know from history of sealing that many other pinniped species were severely hunted. This is reflected not only by the four most extreme examples, but also in the bottleneck effective population sizes of the seven further species shown in Figure 2. We therefore feel that the inclusion of these species is justified based on the ABC analysis and reflects what we know very well from the history of sealing. Importantly, our model selection only affects the list of species for which we estimate bottleneck effective population sizes. What is not affected is all of the downstream phylogenetic mixed models on which most of our inferences are based. These models all analyse the bottleneck model probability (pbot), which is a continuously distributed variable and does not rely on a threshold.
Finally, in response to this comment as well as the comment below (Response 4), we made major changes to our manuscript by defining a more extreme bottleneck scenario and re-running all of the downstream analyses. This resulted in significantly improved model classification, but our results and thus our conclusions did not change. The same species favoured the bottleneck scenario, even after substantial changes to the analysis, suggesting that our model selection is robust.
Related to this in line 197 "ABC was clearly able to distinguish between the two models, with the posterior probability of correct model classification being 75% for the bottleneck model and 71% for the neutral model" I may be misinterpreting these results, but those probabilities of correct classification do not seem particularly good, you can be wrong about 25-30% of the times, which is not trivial.

Response 4
The reviewer is right in saying that our model classification probabilities could be improved upon. However, our ABC analysis is unique in that we simulated genetic data based on just two scenarios which had to broadly fit to 30 different pinniped species. This required broad priors for both models, which unavoidably results in a certain amount of overlap between the models when trying to classify a randomly chosen simulation. Hence, some degree of misclassification is inevitable when comparing realistic scenarios that should capture a broad range of demographic histories.
To tackle this comment, we have now done a major re-analysis of our data (see also Response 3). First of all, we defined the two scenarios more distinctly by specifying tighter priors on the bottleneck effective population size (Nebot) under the bottleneck scenario. Nebot now ranges from one to 500 individuals as opposed to one to 800 in the previous analysis. This resulted in a substantially improved model classification, with simulations under the bottleneck model being correctly classified 85% of the time and simulations under the neutral model being correctly classified 89% of the time.
Based on the newly derived model probabilities, we re-estimated all posterior distributions and re-ran all of the downstream phylogenetic mixed models. While the models are now more distinct and thus better classifiable, all downstream inference remains unaltered.  Using the actual values would be much more informative and given the colours are already in a table format, that seems very much doable.

Response 5
Many thanks for complimenting us on the quality and visual appeal of the figures. However, we don't agree that Figure 1 would be improved by replacing the heat map element with actual numbers. We think the overall patterns are visually clear, as we used an established colour palette which is designed to maximise differences between the diverging ends of a scale. In addition, the values themselves are easily accessible in Supplementary Tables 2 and 3, which is clearly stated in the figure legend.
Line 199 "for every species the preferred model showed a good fit to the data" for the Saimaa ringed seal the pvalue is 0.08, perhaps works discussing a bit more uncertainty in this case?

Response 6
Similar to the northern elephant seal, the Saiima ringed seal shows and extremely low genetic diversity. This causes both species to be on the lower end of the spectrum of simulated diversity under the bottleneck model and hence the p-values are towards the lower end of those observed across pinnipeds. This is a consequence of having to specify broad priors to fit a large range of species. However, the p-value of the Saiima ringed seal has now increased to 0.14 based on our new analysis (see Responses 3 and 4) which restricted the Nebot to between one and 500 individuals. The simulated genetic diversity is therefore on average lower and fits better to species at that end of the spectrum such as the Saiima ringed seal and northern elephant seal.
IUCN status data are aggregated in a non-traditional way, with NT combined with threatened categories rather than with LC as non-threatened (the IUCN considers NT as non-threatened). The authors should justify their grouping. Also while I realize sample size and unbalanced groups may be a problem, MCMCglmm allows for fitting multinomial model which better reflect the status data.

Response 7
We agree with this criticism, which we have now addressed by changing the IUCN status categories according to the reviewer's suggestion. We now grouped least concern and near threatened into a 'low concern' and vulnerable and endangered into a 'high concern' group. We then re-ran the MCMCglmm analysis for these revised groupings and the results did not change appreciably. We've updated the results section in-cluding Figure 5 accordingly (lines 323-331).
We also concur that given a large enough dataset it would be desirable to analyse all four categories (i.e. least concern, near threatened, vulnerable and endangered) separately rather than grouping them into two main categories, as this might better reflect the status data. The reviewer is correct in saying that, at least in principle, this could be done using a multinomial model in MCMCglmm. However, as also recognised by the reviewer, our sample size of species is too small for this to work, especially because the near threatened category contains only species, the vulnerable category contains only four species and the endangered category contains only six species.
The authors talk about differences based on a species' ecology and life-history but only considered two aspects: whether a species breeds on land or ice, and SSD. The text could be revised to reflect that narrow scope (and ideally explain why they focus on only those two traits) or even better, the authors could consider other traits to make a stronger case on the importance of traits. There are potentially many other relevant ecological and lifehistory traits such as reproductive rates, generation time -both linked to how fast pop size may recover; social structures and breeding seasonality -linked to how easy to hunt/exploit, geographic area -regions in which exploitation may have been unlikely.

Response 8
We agree, and have therefore followed the reviewer's suggestion of incorporating additional variables into this analysis. However, we had to work within some constraints. First of all, data are not available for all of the traits that we would ideally like to analyse. In particular, data on reproductive rate are lacking for the vast majority of species, as this requires long-term individual based studies quantifying lifetime reproductive success. Second, the complexity of our models is limited due to the sample size of 30 species. In other words, our models will not converge if we add many more predictors.
To address this comment, we have incorporated two additional variables that we believe are relevant and for which data are available for all of the species. The first of these, as suggested by the reviewer, is generation time, as species with short generation times should be able to recover more quickly from bottlenecks. The second variable we chose to incorporate was the length of the breeding season, as we reasoned that species with long breeding seasons could be more prone to exploitation as sealers have a larger time window in which to hunt the animals. We think this is broadly analogous to 'breeding seasonality'. With regards to 'social structures', we're not really sure what the reviewer means by this. Detailed data on social organisation, patterns of genetic relatedness etc., are not available for the vast majority of species. However, we believe that SSD captures the most important social structures across pinnipeds-their mating and harem systems.

We incorporated these additional variables into the Bayesian phylogenetic mixed models of prophet-exc and pbot.
Our results for breeding habitat and SSD remain unchanged and neither breeding season length nor generation time further impacted bottleneck strength across pinnipeds. We re-wrote the corresponding results section (lines 281 -303) and revised Fig. 3, which is also shown below.
We would also have liked to include these two variables in our model of genetic diversity. However, the existing model already contains five predictor variables. We therefore fitted a separate model of Ar containing breeding season length and generation time as predictor variables. Neither variable was related to Ar (breed- We are grateful to the reviewer for this comment and we believe that our new analyses broadens the scope of the paper and justify our general use of the terms ecology and life history.

Fig. 3: Ecological and life-history effects on bottleneck signatures. Shown are the results of phylogenetic mixed models of prophet-exc and pbot with breeding habitat and SSD fitted as fixed effects. Panels A and B show differences between ice-and land-breeding species in prophet-exc and pbot respectively. Raw data points are shown together with standard Tukey box plots. Panel C shows the relationship between sexual size dimorphism (SSD) and prophet-exc, with individual points colour coded according to the ABC bottleneck probability (pbot) and the line representing the predicted response from the prophet-exc .model. Marginal and unique R 2 values, standardized β coefficients and structure coefficients are shown for models of prophet-exc (filled points) and
pbot (open points) in panels D-F, where they are presented as posterior medians with 95% credible intervals. Species abbreviations are given in Fig. 1 and Supplementary Table 1. Related to this point, land breeding was more likely to be associated with bottleneck signals, and this is explained by accessibility and pop density. I was left wondering if those two proposed mechanisms could not in fact be tested? For instance there is an ice-breeding seal with high Pbot, is this distinct from other ice-breeders in terms of accessibility or pop density?

Response 9
We agree that the two most accessible ice-breeding seals (the Ladoga ringed seal and Saimaa ringed sea) indeed show the strongest bottleneck signals and the lowest diversities among ice-breeding seals. However, this could also be a reflection of limited geographic ranges.
We used the terms 'accessibility' and 'density' rather loosely to refer to broad-scale differences between ice and land-breeding seals. The majority of ice breeding seals, especially polar species such as Weddell and crabeater seals, breed in remote areas that will usually be inaccessible by ship due to the presence of dense pack ice. By contrast, land-breeding seals tend to be found in high densities on breeding beaches that could be easily accessed by sealers.
It would of course be nice to try to tease apart the relative contributions of accessibility and density, but unfortunately quantitative data on both aspects are lacking, even for the most intensively studied species. We therefore used breeding habitat and SSD as proxies. These variables are not only readily quantifiable across all species but also explain a large proportion of the variance in our data.
Line 324 "We also found that heterozygosity-excess was strongly linked to sexual size dimorphism (SSD)" Does the relationship change if you exclude the Southern elephant seal?

Response 10
We repeated the analysis of SSD after excluding the Southern elephant seal and found that both R 2 and β were highly similar (Before exclusion: R 2  Are results influenced by the number of loci or individuals sampled?

Response 11
Our results are not influenced by the sample size either of individuals or loci for the following reasons: (1) The number of individuals has already been controlled for in all of our analyses. Specifically, we used a thorough standardization procedure to eliminate the effects of sample size variation on genetic diversity by randomly sub-sampling ten individuals from each dataset 1000 times with replacement and calculating the corresponding mean and 95% confidence interval for each summary statistic. These standardized genetic summary statistics were used in all analyses.
(2) The number of loci will not systematically bias the mean of any summary statistic as all of them are calculated as a mean across loci. Therefore, the expected mean will be the same independently of how many loci are used.
In response to this comment, we have clarified in both the methods and results sections how we standardised our summary statistics over individuals. Furthermore, we have included a statement in the methods section (lines 463-468) outlining why differences in the number of loci will not bias our analyses.
Line 500 "Details of all the models are given in the supplementary material" presumably this refers to supplementary table 8. The legend of that table reads "Estimated parameters for the main Bayesian phylogenetic mixed models" which makes me wonder if there are other non-main models tested.

Response 12
Apologies for the confusion. We did not construct any other models and so we have rephrased the table legend to clarify this point.
Line 505 "For all models, we report the marginal R2 as in 66" Is there a reason not to report the conditional R2 too? It may be interesting to know how much of the total variance the full model explained.

Response 13
We originally reported only marginal R 2 values because we were interested in the strength of the fixed effects after controlling for phylogenetic relatedness. By contrast, the conditional R 2 also includes variation attributable to the phylogeny. To address this comment, we have included conditional R 2 values for all of our models in Supplementary Table 10. Phylogenetic effects are small in all of our models and our conclusions remain unchanged.
Line 398 "Overall, 6% of loci were found to deviate from HWE in both tests and as these are unlikely to affect our comparative analyses, we focused subsequently on the full datasets." Why are these unlikely to affect the analyses?

Response 14
We conducted broad analyses including a large number of loci across many different species. Excluding a very small proportion of loci from these datasets is unlikely to appreciably change the means of the calculated summary statistics and consequently our downstream analyses. Figure 1 after excluding loci from each dataset that deviated significantly from Hardy-Weinberg equilibrium. The results, shown in the figure below, are essentially unchanged (all repeatabilities > 0.95). We have updated the manuscript to include a new results section containing this analysis (lines 237 -242), and a new supplementary figure and table (Supplementary Fig. 7,  Supplementary Table 8). Supplementary Fig. 7: Replicated Figure 1 based on reduced datasets containing only loci in Hardy-Weinberg equilibrium for each dataset.

Nevertheless, we have now repeated the analyses shown in
Reference 13 does not list authors.

Response 15
We have corrected the reference, thanks for spotting this.
Line 396 the reference has a unedited format

Response 16
We also corrected this reference, thanks again.

Reviewer #2 (Remarks to the Author):
Decision: accept conditional on major revisions The manuscript "Recent demographic histories and genetic diversity across pinnipeds are shaped by anthropogenic interactions and mediated by ecology and life-history" by Stoffel et al. is a very nice study that showed that land breeders had a more drastic demographic history impacted by hunting than ice-breeders. There are also some other very cool basic results relating allelic richness with global abundance and ABC bottleneck model probability. I am not confident of the results for various reasons stated below, but I think that some more work can remedy this.

Response 17
Many thanks for the positive comments and we have worked hard to incorporate all of these important points (see below).

Major Points
The abstract details some of the data yet only says "genetic data". It is crucial to provide more details in the abstract about what type of genomic sampling took place.

Response 18
We have changed the abstract to reflect the fact that our genetic data are from microsatellites.
The motivation of the comparative approach is strong, but the authors ought to cite some of the work already done with ABC, especially the studies focusing on bottleneck expansion models (e.g. Xue and Hickerson, Burbrink et al, Gehara et al.). The authors also ought to cite the 90s (?) Roman/Palumbi-related papers examining the weaker than predicted impact that whaling had on observed genetic diversities.

Response 19
We already cited over ten papers including both empirical and methodological ABC studies. Nevertheless, we have added several new references to the introduction section in order to strengthen our case for using ABC (see below). We have also cited the Roman and Palumbi paper in the introduction and the Burbrink paper in a new results section dealing with postglacial expansions (see Responses 20). However, in some of the above cases we were not sure which paper the referee was referring to. Therefore, we would be happy to incorporate further papers given more detailed guidance. To strengthen the inferential confidence, the authors should consider a model that also incorporates a crucial features that could have a major impact. Late Pleistocene history bottleneck expansions of huge magnitude associated with the LGM could have had a huge impact on the data, even if there was a subsequent one in the 1800s.

Response 20
We think it's important to strongly focus our study on recent bottlenecks for several reasons. First, our study is based on a very clear hypothesis: sealing has severely impacted the population sizes and genetic diversity of pinnipeds. This hypothesis is founded on a large body of literature reporting severe reductions in pinniped population sizes during the 18th and 19th centuries. Second, this focus on a short timeframe and well known sealing history allows us to clearly define our models around reasonable and rather tight priors. Third, the focus on recent demographic declines also allows us to compare our ABC results to a widely used and established method -heterozygosity excess, which is a different but complimentary approach to quantify recent population size changes. Fourth, Hoban et al 1 have shown that bottlenecks followed by an extended period of recovery (i.e. many hundreds of generations after the last glacial maximum, LGM) are difficult to detect using microsatellites. Therefore we do not believe our microsatellite data are well suited to modelling events further back in a species' history.
Of course it would be interesting to model post-glacial expansions. However, we cannot frame such an analysis around a clear hypothesis as virtually nothing is known about how different pinniped species were affected by changes in habitat availability during the last ice age. Furthermore, in contrast to our analysis of recent bottlenecks, it is difficult to define appropriate priors given this almost complete lack of knowledge.
Finally, our ABC analysis is unique in that we defined two models to broadly capture the demographic histories of 30 species. We believe that such a wide-reaching analysis has to be kept simple to facilitate comparative analyses. Using the bottleneck probability (pbot) from our ABC analysis, we found strong and expected associations with both breeding habitat and allelic richness. This shows that our models, although simple, capture meaningful variation in recent demographic histories across pinnipeds.
To address the reviewer's concerns, however, we have conducted a major new analysis of the data. Following the reviewers suggestion, we evaluated whether small population sizes during the LGM followed by expansion could cause similar genetic patterns across pinnipeds to recent bottlenecks caused by anthropogenic exploitation. Specifically, we simulated four scenarios, which included the two scenarios from the main analysis ('bottleneck' and 'neutral' models) plus two scenarios that were identical to the former two but which also included a smaller population size during the LGM followed by an expansion ('LGM + bottleneck' and 'LGM + neutral' models). In contrast to the main analysis in which we specified just two models, ABC could not reliably distinguish between these four models. Correct rates of model classification were 64% for the bottleneck model, 60% for the LGM + bottleneck model, 60% for the neutral model and 66% for the LGM + neutral model. This suggests that ABC cannot reliably distinguish between broadly equivalent models that do and do not include ice age effects. However, none of the 11 species that supported the bottleneck model in our main analysis were found to support the LGM + neutral model in our new analysis. This suggests that genetic patterns in our dataset caused by recent bottlenecks are different from those expected under a postglacial expansion model. Furthermore, in our new analysis, all 11 species that originally supported the bottleneck model again had the highest posterior probability for one of the two scenarios incorporating a recent bottleneck. Eight of these still supported the more simple bottleneck model. Consequently, our inference of recent bottlenecks remains unaltered regardless of whether or not these were preceded by an ice age effect.
We believe our study is strengthened by this new analysis. However, for the reasons described above, as well as our empirical findings of much greater uncertainty in model classification as well as a lack of evidence that our inference of recent bottlenecks is confounded by ice age effects, we do not believe that this analysis is preferable to our two-model ABC analysis. We therefore extensively report this analysis in the Supplementary  Information (p.22 -27) as well as in a new results paragraph of the main manuscript (lines 233 -263).
Finally, in response to this and several other comments by the reviewer, we have also expanded our discussion section (lines 350-355, lines 370-385) to further clarify and justify our approach, as well as to discuss the re-sults of our new analyses which we believe strengthen our conclusions.
I liked that the authors repeated the analysis on the largest clusters (informed by STRUCTURE results of K > 2).

Response 21
Great. We believe that this analysis, together with the new re-analysis of the data after Hardy-Weinberg filtering (see Response 14) and the new ABC analysis incorporating ice age effects (see Response 20) strengthen our conclusions.
The casual mention of "at selectively neutral loci such as microsatellites is an indicator of recent bottlenecks because during a population decline the number of alleles decreases faster than heterozygosity (Nei et al. 1975)" is difficult to digest with a straight face in light of the well known strong possibility of strong linkage to parts of the genome undergoing positive selection. The authors ought to loudly acknowledge that their bottleneck expansion inference could have been confounded by this process.

Response 22
We believe that selection is unlikely to explain our results given that we analysed averages across many loci and that selection would have to operate in the same direction across loci within species and across species to explain our comparative patterns. However, we have acknowledged this possibility in a new discussion paragraph detailing potential caveats (lines 381 -385) Why are bottleneck histories mutually exclusive with historical population expansions? It makes sense that they could often occur in sequence (expansion followed by a bottleneck). The way it is stated it sounds like the ABC model did not allow for both to happen.

Response 23
This appears to be a misunderstanding as it was not our intention to imply that recent bottlenecks are mutually exclusive with historical population expansions. Both our bottleneck and neutral models do in fact allow changes in population size from pre-to post-bottleneck. Consequently, in the case of neutral model, population expansion would be reflected as a smaller historical Ne relative to the current Ne, as the priors for both are independently drawn and fitted in the model. The bottleneck model similarly allows for long term changes in Ne while also incorporating a recent demographic reduction due to hunting. To make this clearer, we have now revised the corresponding methods and results sections (lines 491-503, lines 190 -206) to provide a better explanation of how we defined our demographic scenarios.
The result that the majority of species fit a no-growth model over the bottleneck model makes me think that major point about the LGM above could have confounded the results.

Response 24
As explained in Response 23, neither the neutral model nor the bottleneck models are nogrowth models. Furthermore, as explained in Response 20, our initial inference of recent bottlenecks in 11 species is not confounded by historically small population sizes and subsequent expansions.
The authors report a "good fit to the data" but did the authors conduct posterior predictive tests (as recommended best ABC practices to verify that the models could largely generate the observed data)? I see in supp table 4A the authors report "prediction error" being good, but it is hard to gain a intuition here. To gain the confidence of the reader, the authors need to just produce simple dot plots from the "leave one out" cross validation (i.e. plot real values vs the point estimates such as the mode estimates).

Response 25
We did indeed follow best ABC practices by carefully evaluating every step in our ABC analysis. This is described in detail in the Methods section (lines 547 -570) and the results of these evaluations are given both in the main results section of the manuscript and in the Supplementary material (Supplementary  Table 4 for the prediction errors of the model estimates and Supplementary Table 5 Supplementary Table 4).

Minor points
By "neutral model" do the authors really mean the null no-growth model? The Bottleneck model also assumes neutrality.

Response 26
Actually it was quite tricky to come up with a suitable name for the model not containing a bottleneck. We considered calling it a 'null model' but (i) this could easily be confused with a linear model containing no predictors, and (ii) this model is not a no-growth model (see Response 23). We therefore retained the name 'neutral model' but provide a clearer definition as well as a statement in the Methods to explain that we do not imply for either model a departure from the neutrality of genetic loci (lines 498-503).
Where are the STRUCTURE results? I don't see any table or figure cited. Table 6) including the number of inferred genetic clusters and sample sizes corresponding to the largest clusters analysed. This table is clearly referred to in the text of the Results section now.