Terrestrial pollen records are abundant and widely distributed, making them an excellent proxy for past vegetation dynamics. Age-depth models relate pollen samples from sediment cores to a depositional age based on the relationship between sample depth and available chronological controls. Large-scale synthesis of pollen data benefit from consistent treatment of age uncertainties. Generating new age models helps to reduce potential artifacts from legacy age models that used outdated techniques. Traditional age-depth models, often applied for comparative purposes, infer ages by fitting a curve between dated samples. Bacon, based on Bayesian theory, simulates the sediment deposition process, accounting for both variable deposition rates and temporal/spatial autocorrelation of deposition from one sample to another within the core. Bacon provides robust uncertainty estimation across cores with different depositional processes. We use Bacon to estimate pollen sample ages from 554 North American sediment cores. This dataset standardizes age-depth estimations, supporting future large spatial-temporal studies and removes a challenging, computationally-intensive step for scientists interested in questions that integrate across multiple cores.
|Technology Type(s)||Bayesian Model|
|Factor Type(s)||number of age controls • number of pollen samples • interval between age controls|
|Sample Characteristic - Environment||sediment|
|Sample Characteristic - Location||North America|
Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.9768113
Background & Summary
Fossil pollen can be used as a proxy for past vegetation changes, allowing us to infer or compare these changes to past environmental conditions. With pollen we can determine the mechanisms driving ecosystem changes and infer future responses to environmental change. The pollen record of the last 22,000 years, since the Last Glacial Maximum (LGM), is particularly useful because it traverses many abrupt changes in climate, such as Bølling–Allerød warming event (15,000 BP) and Younger Dryas cooling event (12,700 BP)1 with high temporal resolution2. Many impactful macroecological studies have used long-term pollen data from this period to examine vegetation responses to climate shifts3,4, the extent to which no-analog climates result in no-analog communities5,6,7, and the effectiveness of reserve prioritization methods8,9. These studies have demonstrated that plant taxa individualistically track late-Quaternary climate rather than shifting simultaneously as communities of species3,4. They have also shown that no-analog plant communities existed in the past and will be common in the near future under novel climate5,6,7. Pollen analysis will continue to provide critical insights that inform our predictive models. However, they are currently hampered by the extensive data processing necessary before big-picture questions can be addressed.
The greatest obstacle to vegetation studies that span large spatio-temporal scales is the challenge of establishing accurate, consistent pollen ages. To explore spatio-temporal changes across pollen samples, the events in one pollen record must be linked to events across records. Thus it is critical to construct a consistent and accurate temporal framework across sediment cores (e.g., Giesecke, 201410 and Blois, 201111). In sediment cores, age-depth models estimate pollen sample ages by integrating radiocarbon-dated specimens with pollen sample depths. Several different age-depth model routines have been developed (clam12, Bacon13, OxCal14, etc.) that can produce different age estimations for a single core by incorporating uncertainty and depositional processes differently15. Even when using a single age-depth model, the selection of different parameter values can produce different age estimates. Data for largescale pollen studies are most frequently acquired from community data repositories, such as the Neotoma Paleoecology Database (https://www.neotomadb.org/)16. Paleoecological databases, including Neotoma, compile pollen records and their estimated ages from authors’ publications and contributions. Thus, if one were to try to use multiple cores for analysis, the cores would likely possess a variety of low-quality age estimations, including poorly-calibrated radiocarbon dates and simple interpolations between dates. Few age estimations would include uncertainty estimations. Any analyses that use pollen records derived from multiple studies must recalculate age-depth models and parameter values for consistency and accuracy12,17,18,19.
Two main classes of age-depth models are used in the literature: classical and Bayesian. Classical age modeling includes linear interpolation, polynomial fits, or smooth splines. These estimated ages can be accurate, but estimated age uncertainties are usually underestimated18. Bayesian age-depth models, including Bacon, BChron20, and OxCal, have features that can consider the underlying uncertainty of the chronological controls, but many also have the benefit of using other prior information in model construction. For example, Bacon provides support for estimates of accumulation rates and their autocorrelation within the core. As a result, Bayesian age-depth models can increase accuracy when chronological controls are complex and/or possess high resolution sampling12. More importantly, the way Bayesian methods accomidate uncertainty removes the subjectivity from deciding which dates should be included13,21. Bacon has become an important tool for modelling age-depth relationships in large scale studies22,23,24,25.
To facilitate continental-scale analyses in North America, we created a single dataset that provides Bacon age estimations for all suitable North American pollen records from Neotoma (as of January 2018). In this dataset, we used advanced age-depth models with consistent priors while considering the full breadth of conditions for core deposition, and we provided appropriate uncertainty estimations for the sample ages. With the publication of this dataset, any researcher interested in using this pollen dataset will have access to accurate, consistent age estimations, allowing them to directly compare records across North America. We anticipate that this dataset will be used extensively by the ecological community to address critical questions about habitat change throughout the Pleistocene, Holocene, and into the present.
We used the neotoma package26 for R27 to query the Neotoma database for records that met the selection criteria. We got sedimente cores containing 23,187 pollen samples from 531 sites throughout North America (172.25°W to 48.25°W, 10.25°N to 79.75°N) (Fig. 1). The Neotoma database contains over 32,000 datasets from more than 16,000 globally distributed sites, including fossil pollen, vertebrates, diatoms, plant macrofossils, and other types of data from terrestrial paleoecological and paleoenvironmental studies16. Pollen data compiled by the Neotoma database is the most complete dataset available for North America.
We used three criteria to select pollen-containing sediment cores that could produce accurate Bacon models: (1) At least three age controls (including radiocarbon dates and other age controls from the most recent age estimation in Neotoma) are present in the core; (2) The maximum interval between two adjacent age controls is less than 3000 years; and (3) There are at least four pollen samples in the record, as it is unusual to analyze cores with few pollen samples in vegetation history analyses (Fig. 2). We occasionally included a subset of long cores that met these criteria but excluded the subdivisions with age intervals exceeding 3,000 years.
We used established age controls in Neotoma, including but not limited to radiocarbon dates, U-Pb dates, biostratigraphic events, and the age control at the top layer of the sediment cores if it exists. There are more than one chronology in some sediment cores in Neotoma. These new chronologies have previously been added to the database using more advanced age-depth models, such as the Bacon age model in this work. New chronologies have also been added using updated age controls. For example, Blois et al.11 added biostratigraphic events as age controls to pollen-containing sediment cores in eastern North America. For the sediment cores that have more than one set of age controls, we used the set that was uploaded to Neotoma most recently.
There are four common Bayesian age-depth model frameworks: BChron20, Bpeat28, OxCal14, and Bacon13. Of these four, BChron and OxCal are not informed by prior information about sediment accumulation rates, which can lead to models that rely too heavily on chronological controls that may themselves be problematic as a result of secondary processes within the sedimentary basin13. OxCal is also particularly suited to chronologies from vertebrate sites that have time-averaged assemblages of taxa rather than regularly-accumulating lake sediment cores. Bpeat assumes a linear accumulation rate within deposits, which is unlikely within natural environments29,30. Bacon uses prior information about regional accumulation rates with estimates of the rate of change of accumulation rates to estimate an accumulation rate at each sample depth using a gamma autoregressive process13. Studies13,18 have demonstrated that Bacon produces the age estimations with the most appropriate uncertainty estimation. We used the rbacon package31 in R and bulk-baconizing repository32 – a set of codes to implement Bacon code for building age-depth models for many cores simultaneously – to estimate ages for each pollen record33.
Bacon uses a gamma autoregressive semiparametric model to simulate sediment deposition processes within subsections of the core13. For a given section of the core, Bacon estimates a rate of deposition based on the position of chronological controls within the core, and the rate of deposition of adjacent sections. Users can define prior estimates for accumulation rates, the memory parameter, and can additionally add “breakpoints”, or hiatuses where they believe that these accumulation and memory parameters might have different priors. Age estimates are then generated from the cumulative sum of the deposition rates, as a function of section thickness (which can also be set). Bacon has two advantages over classical models: it does not over-inflate estimated age uncertainties18 and it identifies and avoids outlier radiocarbon dates by relying on accumulation rates13.
Bacon has been used in 1054 published works according to Google Scholar citations (Fig. 3), getting much more usage than Bpeat and BChron. OxCal is more cited, in part because of a longer history within the discipline, but also because the software is also used to calibrate individual radiocarbon dates, whether or not they are then used in chronology construction. Clam is more cited before 2016, but is less cited in the recent two years. Despite the Bacon’s more complicated implementation, it is nonetheless continuing to increase in usage because of its advantages of higher precision age estimation throughout the core and because training has been widely available.
Identifying prior parameter values
Bacon requires the researcher to identify and assess prior information, including mean sedimentation rate and section thickness. These priors can be estimated regionally29 or can vary regionally or locally based on prior knowledge. However, Bacon can be highly sensitive to selection18 of priors, so a clear framework for the selection of priors should be established. The Bacon model divides a core into equal-length sections. Section length is set by the user, although Bacon may suggest a default section thickness. For each iteration of the model and for each section, an accumulation rate is sampled randomly from a Gamma distribution. The sampled accumulation rate is based on the prior defined for the model run and adjusted using the accumulation rate estimated from the section below. This strategy effectively accounts for autocorrelation of accumulation rates within a core. Accumulation rate sampling minimizes the uncertainty of the fit through the chronological controls. The sampling from the Gamma distribution at a single sample provides a posterior estimate of accumulation for the individual section (as opposed to the prior defined by the user). Using Markov Chain Monte Carlo (MCMC) sampling, the accumulation rate for each section of the core is then estimated once per iteration, across some large number of iterations, resulting in a posterior distribution of accumulation rates for each section. Sample ages can then be estimated by combining the accumulation rates (cm/yr) with the section thicknesses (cm), providing a distribution of ages at any depth within the core.
We ran Bacon for each core using the bulk-baconizing repository32, using 16 combinations of prior parameters: four mean accumulation rates and four section thicknesses (Fig. 2). Goring et al.29 identified likely prior accumulation rates for Holocene sediment cores in the northeastern United States. According to their work, accumulation rates rarely exceed 50 yr/cm and are rarely lower than 5 yr/cm. The median accumulation rate is 10 yr/cm and the mean rate is 20 yr/cm. We proposed four mean accumulation rates in our work based on the Goring et al.29 paper: 5 yr/cm, 10 yr/cm, 20 yr/cm, and 50 yr/cm. When Bacon approximates the deposition process, sedimentation rate varies smoothly within the core using the gamma model. We also used four values as potential section thickness: 5 cm, 10 cm, 15 cm, and 20 cm. Section thicknesses smaller than 5 cm result in so many core sections that computational processing becomes unwieldy; section thicknesses larger than 20 cm prevent the model from smoothing sufficiently.
We also set a hiatus in the cores from the Northeast and upper Midwest when European settlement was indicated in the age controls, because accumulation rate increased significantly after European settlement34,35. This pattern of increasing accumulation rates is apparent in many cores across eastern North America and may be due to a number of factors, including increasing erosion, but is also due to sediment compaction and de-watering with depth and basin dynamics36. Dawson et al.35 listed 185 cores including a hiatus of European settlement from Neotoma. These cores that had identified settlement horizons that were assigned a hiatus (of length 10 yr, the minimum hiatus length in Bacon). This allowed us to apply a different accumulation rate prior to and after European settlement in the Northeast and upper Midwest.
Selecting the best prior parameter values and estimated ages
We established a set of rules by which we could consistently select the best priors and age estimation results for all analyzed pollen records (Fig. 2). Once Bacon has been run, each pollen sample possesses an estimated age along with an uncertainty estimation. These estimated ages were only calculated for subdivisions of the cores that contained sufficient age controls. We compared the results of the 16 models and excluded any results where interpolated ages were not sequential or extrapolated ages were not reasonable, i.e. younger than today (2020 AD, given the uncertainty of estimated ages). We then selected the model with the shortest distance between the estimated ages and the age controls. As the method we used here is a dot-to-dot fit and flawed, we also visually checked the 16 age-depth models for each sediment core to confirm that our selection is the best. All the age estimations that we produced list the prior mean accumulation rate and section thickness used in Bacon33.
Comparing Bayesian age estimations with previous age estimation in Neotoma
Neotoma provides an extensive amount of data, including high-quality and low-quality age estimations. Some age-depth models only used simplistic interpolations between dates. Some age estimations are radiocarbon dates without calibration. Many age estimations are simply calibrated using a look-up table in Neotoma without using a proper calibration curve. Most age-depth models did not give uncertainty estimations. For the 554 sediment cores in this work, 202 sediment cores do not have properly-calibrated ages, 162 sediment cores only have estimated ages from a simplistic interpolation between chronological controls, and 275 sediment cores do not have uncertainty estimations for sample ages. 240 out of 554 sediment cores in this work have properly-calibrated ages from age-depth models with sample age uncertainty estimation.
We plotted all the available age estimations against the Bayesian age from this work in each pollen record33. For 314 of the 554 sediment cores, this work is the first to provide a properly-calibrated age estimation with uncertainty using an established age-depth model. For the other 240 sediment cores that already have good age estimations, the dataset in this work provides an additional age estimation with reliable uncertainty using a Bayesian age-depth model. Most pollen samples are younger than 10,000 cal BP (Fig. 4c,d). Differences between Bayesian ages and previously-estimated ages generally range from 10–100 years (Fig. 4c). Differences between Bayesian age uncertainty and previously-estimated age uncertainty generally ranges from 100–1000 years (Fig. 4d). The Bayesian ages estimated in this work and previous age estimations increasingly diverge, particularly after 15,000 cal BP (Fig. 4a,b). Sample ages older than 15,000 cal BP are typically extrapolated by traditional age-depth models. The extrapolated age estimations are sensitive to parameter settings in the age-depth model. Moreover, this dataset provides a consistent temporal framework for large spatio-temporal analysis covering 554 sediment cores across North America, avoiding errors by individual age estimations at each pollen record.
The estimated ages are expressed as calendar years before present (cal BP), in which ‘present’ is 1950 AD. The uncertainty is estimated with 95% confidence, i.e. the min and max values represent 2.5% and 97.5% values in quartiles. Depth indicates the position of each pollen sample from the top of the core.
In this work, the Bacon age models do not consider possibilities of a hiatus in the core deposition other than the European settlement hiatus. Sudden changes in the deposition environment, such as flooding events or human settlement, may result in an abrupt shift in the mean accumulation rate. Accumulation rate is not necessarily correlated before and after a hiatus. Bacon provides methods for estimating ages when there is known hiatus in core deposition, but the depth where hiatus occurs and a possible shift in accumulation rate must be decided for that core specifically.
This work uses established chronological controls in Neotoma. However, the established chronological controls may be problematic. For example, core top and biostratigraphic events are common chronological controls in age-depth models, but some of those chronological controls are recorded in Neotoma without uncertainty estimation, which is required by the Bacon age-depth model. We used 2 years as uncertainty if the chronological controls from Neotoma do not have uncertainty estimation, but it is a rough estimation and may underestimate the uncertainty of sample ages. Another potential source of error is hard water effects in the radiocarbon chronological controls that were dated in early periods. Dissolved carbonate in freshwater can cause a dilution of 14C in the freshwater reservoir and result in older-than-real radiocarbon dates. Reservoir corrections have been applied to those chronological controls, but not all the corrections are recorded in Neotoma. Age-depth models based on those uncorrected radiocarbon dates will produce errors in the age estimation. Future work should look up the record and the related publications core by core manually to correct the problematic chronological controls.
We have provided the output of 554 Neotoma pollen-containing sediment cores that meet our criteria for possessing a subdivision with sufficient age controls for accurate age estimations and reliable uncertainty estimations. If the reader wishes to integrate additional pollen-containing sediment cores, new Bacon age estimations can be calculated using the same consistent criteria using the scripts provided in association with this work. Estimated ages, together with prior parameters, can be sent to the author of this paper for addition to the dataset download and/or an appropriate Neotoma data steward for database upload (https://www.neotomadb.org/contacts/investigators). The framework and scripts in this work can also be applied to other regions (such as European pollen records) and other paleo periods. Readers may need to change the prior values based on the deposition environment of sediment cores they are interested in.
The script used to compile pollen records, run Bacon, and select the best age estimations for each pollen record is available in Github repository https://github.com/yuewangpaleo/BaconAgeNeotoma. The script “BaconAgeCode.R” was run using R27. Necessary packages and repository for this code include: bulk-baconizing32, rbacon31, neotoma26, and Bchron37.
McManus, J. F. et al. Collapse and rapid resumption of Atlantic meridional circulation linked to deglacial climate changes. Nature 428, 834 (2004).
Mayewski et al. Holocene climate variability. Quaternary Research 62, 243–255 (2004).
Davis, M. B. In Forest succession (eds West, D. C., Shugart, H. H. & Botkin, D. B.) 132–153 (Springer, 1981).
Webb, T. Is vegetation in equilibrium with climate? How to interpret late-Quaternary pollen data. Vegetatio 67, 75–91 (1986).
Overpeck, J. T., Webb, R. S. & Webb, T. III. Mapping eastern North American vegetation change of the past 18 ka: No-analogs and the future. Geology 20, 1071–1074 (1992).
Williams, J. W. & Jackson, S. T. Novel climates, no‐analog communities, and ecological surprises. Frontiers in Ecology and the Environment 5, 475–482 (2007).
Williams, J. W., Jackson, S. T. & Kutzbach, J. E. Projected distributions of novel and disappearing climates by 2100 AD. Proceedings of the National Academy of Sciences 104, 5738–5742 (2007).
Salzmann, U., Haywood, A., Lunt, D., Valdes, P. & Hill, D. A new global biome reconstruction and data‐model comparison for the middle Pliocene. Global Ecology and Biogeography 17, 432–447 (2008).
Miller, P. A. et al. Exploring climatic and biotic controls on Holocene vegetation change in Fennoscandia. Journal of Ecology 96, 247–259 (2008).
Giesecke, T. et al. Towards mapping the late Quaternary vegetation change of Europe. Vegetation History and Archaeobotany 23, 75–86 (2014).
Blois, J. L., Williams, J. W. J., Grimm, E. C., Jackson, S. T. & Graham, R. W. A methodological framework for assessing and reducing temporal uncertainty in paleovegetation mapping from late-Quaternary pollen records. Quaternary Science Reviews 30, 1926–1939 (2011).
Blaauw, M. Methods and code for ‘classical’ age-modelling of radiocarbon sequences. Quaternary Geochronology 5, 512–518 (2010).
Blaauw, M. & Christen, J. A. Flexible paleoclimate age-depth models using an autoregressive gamma process. Bayesian Analysis 6, 457–474 (2011).
Ramsey, C. B. Bayesian analysis of radiocarbon dates. Radiocarbon 51, 337–360 (2009).
Blaauw, M. & Heegaard, E. In Tracking environmental change using lake sediments (Birks, H., Lotter, A., Juggins, S. & Smol, J.) 379–413 (Springer, 2012).
Williams, J. W. et al. The Neotoma Paleoecology Database, a multiproxy, international, community-curated data resource. Quaternary Research 89, 156–177 (2018).
Blaauw, M. Out of tune: the dangers of aligning proxy archives. Quaternary Science Reviews 36, 38–49 (2012).
Trachsel, M. & Telford, R. J. All age–depth models are wrong, but are getting better. The Holocene 27, 860–869 (2017).
Blaauw, M., Christen, J. A., Bennett, K. & Reimer, P. J. Double the dates and go for Bayes—Impacts of model choice, dating density and quality on chronologies. Quaternary Science Reviews 188, 58–66 (2018).
Parnell, A. C., Haslett, J., Allen, J. R., Buck, C. E. & Huntley, B. A flexible approach to assessing synchroneity of past events using Bayesian reconstructions of sedimentation history. Quaternary Science Reviews 27, 1872–1885 (2008).
Ramsey, C. B. Deposition models for chronological records. Quaternary Science Reviews 27, 42–60 (2008).
Lowe, D. J., Blaauw, M., Hogg, A. G. & Newnham, R. M. Ages of 24 widespread tephras erupted since 30,000 years ago in New Zealand, with re-evaluation of the timing and palaeoclimatic implications of the Lateglacial cool episode recorded at Kaipo bog. Quaternary Science Reviews 74, 170–194 (2013).
McKay, N. P. & Kaufman, D. S. An extended Arctic proxy temperature database for the past 2,000 years. Scientific data 1, 140026 (2014).
Herbert, T. D. et al. Late Miocene global cooling and the rise of modern ecosystems. Nature Geoscience 9, 843 (2016).
McKay, N. P., Kaufman, D. S., Routson, C. C., Erb, M. P. & Zander, P. D. The onset and rate of Holocene Neoglacial cooling in the Arctic. Geophysical Research Letters 45(12), 487–412,496 (2018).
Goring, S. et al. neotoma: A Programmatic Interface to the Neotoma Paleoecological. Database. Open Quaternary 1, 2 (2015).
R Core Team. R: A Language and Environment for Statistical Computing R Foundation for Statistical Computing Vienna, Austria, https://www.R-project.org/ (2018).
Blaauw, M. & Christen, J. A. Radiocarbon peat chronologies and environmental change. Journal of the Royal Statistical Society: Series C (Applied Statistics) 54, 805–816 (2005).
Goring, S. et al. Deposition times in the northeastern United States during the Holocene: establishing valid priors for Bayesian age models. Quaternary Science Reviews 48, 54–60 (2012).
Belyea, L. R. & Clymo, R. Feedback control of the rate of peat formation. Proceedings of the Royal Society of London B: Biological Sciences 268, 1315–1321 (2001).
Blaauw, M. et al. Age-Depth Modelling using Bayesian Statistics v. 2.3.6, https://CRAN.R-project.org/package=rbacon (2019).
Goring, S. J., Dawson, A., Stegner, M. A. & Wang, Y. Bulk baconizing Github, https://github.com/NeotomaDB/bulk-baconizing (2019).
Wang, Y., Goring, S. & McGuire, J. L. Bayesian ages for pollen records since the last glaciation in North America. figshare, https://doi.org/10.6084/m9.figshare.c.4385825 (2019).
Kujawa, E. R. et al. The effects of anthropogenic land cover change on pollen-vegetation relationships in the American Midwest. Anthropocene 15, 60–71 (2016).
Dawson, A. et al. Quantifying pollen-vegetation relationships to reconstruct ancient forests using 19th-century forest composition and pollen data. Quaternary Science Reviews 137, 156–175 (2016).
Bennett, K. D. & Buck, C. E. Interpretation of lake sediment accumulation rates. The Holocene 26, 1092–1102 (2016).
Parnell, A. Bchron: Radiocarbon Dating, Age-Depth Modelling, Relative Sea Level Rate Estimation, and Non-Parametric Phase Modelling, https://CRAN.R-project.org/package=Bchron (2016).
Reimer, P. J. et al. IntCal13 and Marine13 radiocarbon age calibration curves 0–50,000 years cal BP. Radiocarbon 55, 1869–1887 (2013).
The work is funded by NSF (grant number: DEB-1655898). SJG was funded through NSF grants 1740694 and 1550707. We thank Jadyn M. Sethna for recording Bayesian age into Neotoma. We thank M. Allison Stegner, Jordan Schutz, and Jack Williams for tutoring data-upload methods. We thank Jessica Blois for discussion about this project. And we thank the reviewers for the comments to improve the manuscript.
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Wang, Y., Goring, S.J. & McGuire, J.L. Bayesian ages for pollen records since the last glaciation in North America. Sci Data 6, 176 (2019). https://doi.org/10.1038/s41597-019-0182-7
Environmental Geochemistry and Health (2022)