Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Mass production of unvouchered records fails to represent global biodiversity patterns

Abstract

The ever-increasing human footprint even in very remote places on Earth has inspired efforts to document biodiversity vigorously in case organisms go extinct. However, the data commonly gathered come from either primary voucher specimens in a natural history collection or from direct field observations that are not traceable to tangible material in a museum or herbarium. Although both datasets are crucial for assessing how anthropogenic drivers affect biodiversity, they have widespread coverage gaps and biases that may render them inefficient in representing patterns of biodiversity. Using a large global dataset of around 1.9 billion occurrence records of terrestrial plants, butterflies, amphibians, birds, reptiles and mammals, we quantify coverage and biases of expected biodiversity patterns by voucher and observation records. We show that the mass production of observation records does not lead to higher coverage of expected biodiversity patterns but is disproportionately biased toward certain regions, clades, functional traits and time periods. Such coverage patterns are driven by the ease of accessibility to air and ground transportation, level of security and extent of human modification at each sampling site. Conversely, voucher records are vastly infrequent in occurrence data but in the few places where they are sampled, showed relative congruence with expected biodiversity patterns for all dimensions. The differences in coverage and bias by voucher and observation records have important implications on the utility of these records for research in ecology, evolution and conservation research.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The taxonomic coverage of lineages and grid cells by observation records are more biased and less congruent to expected richness patterns.
Fig. 2: Patterns of geographic coverage of species and grid cells by voucher and observation records of individual taxa.
Fig. 3: Temporal coverage of species and grid cells by voucher and observation records.
Fig. 4: Coverage of functional traits documented by voucher and observation records.
Fig. 5: The estimates and 95% confidence intervals predicted by a spatial autoregressive error model of coverage (taxonomic, geographic and temporal) by voucher and observation records with socioeconomic predictors.

Similar content being viewed by others

Data availability

The links to the species occurrence records downloaded from the GBIF are available at Zenodo (https://doi.org/10.5281/zenodo.6834577). The datasets, data tables, grid cell vector polygons and R codes are archived at Zenodo (https://doi.org/10.5281/zenodo.6834577).

Code availability

All scripts, codes and data documentation necessary to repeat our analyses have been made available in the Zenodo database (https://doi.org/10.5281/zenodo.6834577) under the folder ‘SCRIPTS’.

References

  1. Butchart, S. H. et al. Global biodiversity: indicators of recent declines. Science 328, 1164–1168 (2010).

    Article  CAS  PubMed  Google Scholar 

  2. Tittensor, D. P. et al. A mid-term analysis of progress toward international biodiversity targets. Science 346, 241–244 (2014).

    Article  CAS  PubMed  Google Scholar 

  3. Johnson, C. N. et al. Biodiversity losses and conservation responses in the Anthropocene. Science 356, 270–275 (2017).

    Article  CAS  PubMed  Google Scholar 

  4. Díaz, S. et al. Pervasive human-driven decline of life on Earth points to the need for transformative change. Science 366, eaax3100 (2019).

  5. Kageyama, M. et al. in Museum Studies: Perspectives and Innovations (eds Williams, S. L. & Hawks, C. A.) 257–264 (Society for the Preservation of Natural History Collections, 2007).

  6. Cook, J. A. et al. The Beringian Coevolution Project: holistic collections of mammals and associated parasites reveal novel perspectives on evolutionary and environmental change in the North. Arct. Sci. 3, 585–617 (2016).

    Article  Google Scholar 

  7. Jungblut, A. D. & Hawes, I. Using Captain Scott’s Discovery specimens to unlock the past: has Antarctic cyanobacterial diversity changed over the last 100 years? Proc. R. Soc. B 284, 20170833 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Daru, B. H., Bowman, E. A., Pfister, D. H. & Arnold, A. E. A novel proof-of-concept for capturing the diversity of endophytic fungi preserved in herbarium specimens. Philos. Trans. R. Soc. B 374, 20170395 (2018).

    Article  Google Scholar 

  9. Meineke, E. K., Davis, C. C. & Davies, T. J. The unrealized potential of herbaria for global change biology. Ecol. Monogr. 88, 505–525 (2018).

    Article  Google Scholar 

  10. Colella, J. P. et al. The open-specimen movement. BioScience 71, 405–414 (2021).

    Article  Google Scholar 

  11. Unger, S., Rollins, M., Tietz, A. & Dumais, H. iNaturalist as an engaging tool for identifying organisms in outdoor activities. J. Biol. Educ. 55, 537–547 (2021).

    Article  Google Scholar 

  12. Sullivan, B. L. et al. eBird: a citizen-based bird observation network in the biological sciences. Biol. Conserv. 142, 2282–2292 (2009).

    Article  Google Scholar 

  13. Dickinson, J. L. et al. The current state of citizen science as a tool for ecological research and public engagement. Front. Ecol. Environ. 10, 291–297 (2012).

    Article  Google Scholar 

  14. Miller-Rushing, A., Primack, R. & Bonney, R. The history of public participation in ecological research. Front. Ecol. Environ. 10, 285–290 (2012).

    Article  Google Scholar 

  15. Petersen, T. K., Speed, J. D. M., Grøtan, V. & Austrheim, G. Species data for understanding biodiversity dynamics: the what, where and when of species occurrence data collection. Ecol. Solut. Evid. 2, e12048 (2021).

    Article  Google Scholar 

  16. Dickinson, J. L., Zuckerberg, B. & Bonter, D. N. Citizen science as an ecological research tool: challenges and benefits. Annu. Rev. Ecol. Evol. Syst. 41, 149–172 (2010).

    Article  Google Scholar 

  17. Tulloch, A. I. T., Possingham, H. P., Joseph, L. N., Szabo, J. & Martin, T. G. Realising the full potential of citizen science monitoring programs. Biol. Conserv. 165, 128–138 (2013).

    Article  Google Scholar 

  18. Hortal, J. & Lobo, J. M. A synecological framework for systematic conservation planning. Biodivers. Inform. 3, 16–45 (2006).

    Article  Google Scholar 

  19. Lobo, J. M., Baselga, A., Hortal, J., Jimenez-Valverde, A. & Gomez, J. F. How does the knowledge about the spatial distribution of Iberian dung beetle species accumulate over time? Divers. Distrib. 13, 772–780 (2007).

    Article  Google Scholar 

  20. Sandel, B. et al. Estimating the missing species bias in plant trait measurements. J. Veg. Sci. 26, 828–838 (2015).

    Article  Google Scholar 

  21. Hijmans, R. J. et al. Assessing the geographic representation of genebank collections: the case of the Bolivian wild potatoes. Conserv. Biol. 14, 1755–1765 (2000).

    Article  CAS  PubMed  Google Scholar 

  22. Hortal, J., Lobo, J. M. & Jimenez-Valverde, A. Limitations of biodiversity databases: case study on seed-plant diversity in Tenerife, Canary Islands. Conserv. Biol. 21, 853–863 (2007).

    Article  PubMed  Google Scholar 

  23. Funk, V. A. & Morin, N. A survey of the herbaria of the southeast United States. SIDA Contrib. Bot. 18, 35–52 (2000).

    Google Scholar 

  24. Norris, W. R., Lewis, D. Q., Widrlechner, M. P., Thompson, J. D. & Pope, R. O. Lessons from an inventory of the Ames, Iowa, flora (1859–2000). J. Iowa Acad. Sci. 108, 34–63 (2001).

    Google Scholar 

  25. Schmidt-lebuhn, A. N., Knerr, N. J. & Kessler, M. Non-geographic collecting biases in herbarium specimens of Australian daisies (Asteraceae). Biodivers. Conserv. 22, 905–919 (2013).

    Article  Google Scholar 

  26. Meyer, C., Kreft, H., Guralnick, R. & Jetz, W. Global priorities for an effective information basis of biodiversity distributions. Nat. Commun. 6, 8221 (2015).

    Article  PubMed  Google Scholar 

  27. Meyer, C., Weigelt, P. & Kreft, H. Multidimensional biases, gaps and uncertainties in global plant occurrence information. Ecol. Lett. 19, 992–1006 (2016).

    Article  PubMed  Google Scholar 

  28. Tingley, R., Meiri, S. & Chapple, D. G. Addressing knowledge gaps in reptile conservation. Biol. Conserv. 204, 1–5 (2016).

    Article  Google Scholar 

  29. Guedes, T. B. et al. Patterns, biases and prospects in the distribution and diversity of Neotropical snakes. Glob. Ecol. Biogeogr. 27, 14–21 (2018).

    Article  PubMed  Google Scholar 

  30. Daru, B. H. et al. Widespread sampling biases in herbaria revealed from large-scale digitization. New Phytol. 217, 939–955 (2018).

    Article  PubMed  Google Scholar 

  31. Troudet, J., Vignes-Lebbe, R., Grandcolas, P. & Legendre, F. The increasing disconnection of primary biodiversity data from specimens: how does it happen and how to handle it? Syst. Biol. 67, 1110–1119 (2018).

    Article  PubMed  Google Scholar 

  32. Wheeler, Q. D. Insect diversity and cladistic constraints. Ann. Entomol. Soc. Am. 83, 1031–1047 (1990).

    Article  Google Scholar 

  33. Cotterill, F. P. D. Systematics, biological knowledge and environmental conservation. Biodivers. Conserv. 4, 183–205 (1995).

    Article  Google Scholar 

  34. Dalton, R. Natural history collections in crisis as funding is slashed. Nature 423, 575 (2003).

    Article  CAS  PubMed  Google Scholar 

  35. Gropp, R. E. Are university natural science collections going extinct? BioScience 53, 550 (2003).

    Article  Google Scholar 

  36. Stokstad, E. Nebraska husks research to ease budget squeeze. Science 300, 35 (2003).

    Article  CAS  PubMed  Google Scholar 

  37. Vollmar, A., Macklin, J. A. & Ford, L. Natural history specimen digitization: challenges and concerns. Biodivers. Inform. 7, 93–112 (2010).

    Article  Google Scholar 

  38. Andreone, F. et al. Italian natural history museums on the verge of collapse? ZooKeys 456, 139–146 (2014).

    Article  Google Scholar 

  39. Kemp, C. The endangered dead. Nature 518, 292–294 (2015).

    Article  CAS  PubMed  Google Scholar 

  40. Paknia, O., Rajaei, Sh,H. & Koch, A. Lack of well-maintained natural history collections and taxonomists in megadiverse developing countries hampers global biodiversity exploration. Org. Divers. Evol. 15, 619–629 (2015).

    Article  Google Scholar 

  41. Nowogrodzki, A. Biological specimen troves threatened by funding pause. Nature 531, 561 (2016).

    Article  CAS  PubMed  Google Scholar 

  42. Bakker, F. T. et al. The Global Museum: natural history collections and the future of evolutionary science and public education. PeerJ 8, e8225 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  43. Pyke, G. H. & Ehrlich, P. R. Biological collections and ecological/environmental research: a review, some observations and a look to the future. Biol. Rev. 85, 247–266 (2010).

    Article  PubMed  Google Scholar 

  44. Maldonado, C. et al. Species diversity and distribution in the era of Big Data. Glob. Ecol. Biogeogr. 24, 973–984 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  45. Rudbeck, A. V. et al. The Darwinian shortfall in plants: phylogenetic knowledge is driven by range size. Ecography 2022, e06142 (2022).

  46. Speed, J. D. M. et al. Contrasting spatial, temporal and environmental patterns in observation and specimen based species occurrence data. PLoS ONE 13, e0196417 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Meineke, E. K., Davies, T. J., Daru, B. H. & Davis, C. C. Biological collections for understanding biodiversity in the Anthropocene. Philos. Trans. R. Soc. B 374, 20170386 (2018).

    Article  Google Scholar 

  48. Pearson, K. D. & Mast, A. R. Mobilizing the community of biodiversity specimen collectors to effectively detect and document outliers in the Anthropocene. Am. J. Bot. 106, 1052–1058 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  49. What is GBIF? (GBIF, accessed 18 October 2022); https://www.gbif.org/what-is-gbif

  50. The IUCN Red List of Threatened Species. Version 2022-1 (IUCN, accessed 19 October 2022); https://www.iucnredlist.org

  51. Fleishman, E. & Murphy, D. D. A realistic assessment of the indicator potential of butterflies and other charismatic taxonomic groups. Conserv. Biol. 23, 1109–1116 (2009).

    Article  PubMed  Google Scholar 

  52. Troudet, J. et al. Taxonomic bias in biodiversity data and societal preferences. Sci. Rep. 7, 9132 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  53. Lehikoinen, A. et al. Declining population trends of European mountain birds. Glob. Change Biol. 25, 577–588 (2019).

    Article  Google Scholar 

  54. Allington-Jones, L. & Bailey, R. Treatments for lipid oxidation in taxidermy and impact on DNA recovery. Stud. Conserv. 66, 463–476 (2021).

    Article  CAS  Google Scholar 

  55. Rocha, L. A. et al. Specimen collection: an essential tool. Science 344, 814–815 (2014).

    Article  CAS  PubMed  Google Scholar 

  56. Myers, N. et al. Biodiversity hotspots for conservation priorities. Nature 403, 853–858 (2000).

    Article  CAS  PubMed  Google Scholar 

  57. Venter, O. et al. Targeting global protected area expansion for imperiled biodiversity. PLoS Biol. 12, e1001891 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  58. Loiselle, B. A. et al. Predicting species distributions from herbarium collections: does climate bias in collection sampling influence model outcomes? J. Biogeogr. 35, 105–116 (2008).

    Google Scholar 

  59. Newbold, T. Applications and limitations of museum data for conservation and ecology, with particular attention to species distribution models. Prog. Phys. Geogr. 34, 3–22 (2010).

    Article  Google Scholar 

  60. Mair, L. & Ruete, A. Explaining spatial variation in the recording effort of citizen science data across multiple taxa. PLoS ONE 11, e0147796 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  61. Yang, W., Ma, K. & Kreft, H. Geographical sampling bias in a large distributional database and its effects on species richness–environment models. J. Biogeogr. 40, 1415–1426 (2013).

    Article  Google Scholar 

  62. Yates, T. L. The role of voucher specimens in mammal collections: characterisation and funding responsibilities. Acta Zool. Fenn. 170, 81–82 (1985).

    Google Scholar 

  63. Donegan, T. M. New species and subspecies descriptions do not and should not always require a dead type specimen. Zootaxa 1761, 37–48 (2008).

    Article  Google Scholar 

  64. Palmer, M. W., Earls, P. G., Hoagland, B. W., White, P. S. & Wohlgemuth, T. Quantitative tools for perfecting species list. Environmetrics 13, 121–137 (2002).

    Article  Google Scholar 

  65. Robinson, J. G. in Conservation of Exploited Species (eds Reynolds, J. D. et al.) 485–498 (Cambridge Univ. Press, 2001).

  66. Freitag, S., Hobson, C., Biggs, H. C. & Jaarsveld, A. S. Testing for potential survey bias: the effect of roads, urban areas and nature reserves on a southern African mammal data set. Anim. Conserv. 1, 119–127 (1998).

    Article  Google Scholar 

  67. Funk, V. A. & Richardson, K. Biological specimen data in biodiversity studies: use it or lose it. Syst. Biol. 51, 303–316 (2002).

    Article  CAS  PubMed  Google Scholar 

  68. Soria-Auza, R. W. & Kessler, M. The influence of sampling intensity on the perception of the spatial distribution of tropical diversity and endemism: a case study of ferns from Bolivia. Divers. Distrib. 14, 123–130 (2008).

    Article  Google Scholar 

  69. Ballesteros-Mejia, L., Kitching, I. J., Jetz, W., Nagel, P. & Beck, J. Mapping the biodiversity of tropical insects: species richness and inventory completeness of African sphingid moths. Glob. Ecol. Biogeogr. 22, 586–595 (2013).

    Article  Google Scholar 

  70. Yang, W., Ma, K. & Kreft, H. Environmental and socio-economic factors shaping the geography of floristic collections in China. Glob. Ecol. Biogeogr. 23, 1284–1292 (2014).

    Article  Google Scholar 

  71. Zizka, A., Antonelli, A. & Silvestro, D. sampbias, a method for quantifying geographic sampling biases in species distribution data. Ecography 44, 25–32 (2021).

    Article  Google Scholar 

  72. Hanson, T. et al. Warfare in biodiversity hotspots. Conserv. Biol. 23, 578–587 (2009).

    Article  PubMed  Google Scholar 

  73. Zipper, S. C. et al. Urban heat island impacts on plant phenology: intra-urban variability and response to land cover. Environ. Res. Lett. 11, 054023 (2016).

    Article  Google Scholar 

  74. Li, D., Stucky, B. J., Deck, J., Baiser, B. & Guralnick, R. P. The effect of urbanization on plant phenology depends on regional temperature. Nat. Ecol. Evol. 3, 1661–1667 (2019).

    Article  PubMed  Google Scholar 

  75. Jeppsson, T., Lindhe, A., Gärdenfors, U. & Forslund, P. The use of historical collections to estimate population trends: a case study using Swedish longhorn beetles (Coleoptera: Cerambycidae). Biol. Conserv. 143, 1940–1950 (2010).

    Article  Google Scholar 

  76. Yates, T. L. et al. The ecology and evolutionary history of an emergent disease: hantavirus pulmonary syndrome. Bioscience 52, 989–998 (2002).

    Article  Google Scholar 

  77. O’Connell, A. F. Jr, Gilbert, A. T. & Hatfield, J. S. Contribution of natural history collection data to biodiversity assessment in national parks. Conserv. Biol. 18, 1254–1261 (2004).

    Google Scholar 

  78. Prather, L. A., Fuentes, O. A., Mayfield, M. H. & Ferguson, C. J. The decline of plant collecting in the United States: a threat to the infrastructure of biodiversity studies. Syst. Bot. 29, 15–28 (2004).

    Article  Google Scholar 

  79. Winker, K. Natural history museums in a postbiodiversity era. BioScience 54, 455–459 (2004).

    Article  Google Scholar 

  80. Bortolus, A. Error cascades in the biological sciences: the unwanted consequences of using bad taxonomy in ecology. Ambio 37, 114–118 (2008).

    Article  PubMed  Google Scholar 

  81. Joseph, L. Museum collections in ornithology: today’s record of avian biodiversity for tomorrow’s world. Emu https://doi.org/10.1071/MUv111n3_ED (2011).

  82. Bradley, R. D., Bradley, L. C., Garner, H. J. & Baker, R. J. Assessing the value of natural history collections and addressing issues regarding long-term growth and care. BioScience 64, 1150–1158 (2014).

    Article  Google Scholar 

  83. Renner, S. S. & Rockinger, A. Is plant collecting in Germany coming to an end? Willdenowia 46, 93–97 (2016).

    Article  Google Scholar 

  84. Spear, D. M., Pauly, G. B. & Kaiser, K. Citizen science as a tool for augmenting museum collection data from urban areas. Front. Ecol. Evol. 5, 86 (2017).

    Article  Google Scholar 

  85. Dunnum, J. L., McLean, B. S. & Dowler, R. C. Mammal collections of the Western Hemisphere: a survey and directory of collections. J. Mammal. 99, 1307–1322 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  86. Malaney, J. & Cook, J. A perfect storm for mammalogy: declining sample availability in a period of rapid environmental degradation. J. Mammal. 99, 773–788 (2018).

    Article  Google Scholar 

  87. Ferguson, A. W. On the role of (and threat to) natural history museums in mammal conservation: an African small mammal perspective. J. Vert. Biol. 69, 20028–1 (2020).

    Google Scholar 

  88. Salvador, R. & Cunha, C. Natural history collections and the future legacy of ecological research. Oecologia 192, 641–646 (2020).

    Article  PubMed  Google Scholar 

  89. Fischer, E. E., Cobb, N. S., Kawahara, A. Y., Zaspel, J. M. & Cognato, A. I. Decline of amateur Lepidoptera collectors threatens the future of specimen-based research. BioScience 71, 396–404 (2021).

    Article  Google Scholar 

  90. Heberling, J. M., Miller, J. T., Noesgaard, D., Weingart, S. B. & Schigel, D. Data integration enables global biodiversity synthesis. Proc. Natl Acad. Sci. USA 118, e2018093118 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  91. Hedrick, B. P. et al. Digitization and the future of natural history collections. BioScience 70, 243–251 (2020).

    Article  Google Scholar 

  92. Sandel, B. et al. The influence of Late Quaternary climate-change velocity on species endemism. Science 334, 660–664 (2011).

    Article  CAS  PubMed  Google Scholar 

  93. Holt, B. G. et al. An update of Wallace’s zoogeographic regions of the world. Science 339, 74–78 (2013).

    Article  CAS  PubMed  Google Scholar 

  94. Mainali, K., Hefley, T., Ries, L. & Fagan, W. F. Matching expert range maps with species distribution model predictions. Conserv. Biol. 34, 1292–1304 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  95. McKenna, D. D. & Farrell, B. D. in The Timetree of Life (eds Hedges, S. B. & Kumar, S.) 278–289 (Oxford Univ. Press, 2009).

  96. Xue, Y., Davies, I., Fink, D., Wood, C. & Gomes, C. P. in Principles and Practice of Constraint Programming (ed. Rueher, M.) 707–719 (Springer, 2016).

  97. Robinson, O. J., Ruiz‐Gutierrez, V. & Fink, D. Correcting for bias in distribution modelling for rare species using citizen science data. Divers. Distrib. 24, 460–472 (2018).

    Article  Google Scholar 

  98. Callaghan, C. T., Rowley, J. J. L., Cornwell, W. K., Poore, A. G. B. & Major, R. E. Improving big citizen science data: moving beyond haphazard sampling. PLoS Biol. 17, e3000357 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  99. Clemann, N. et al. Value and impacts of collecting vertebrate voucher specimens, with guidelines for ethical collection. Mem. Mus. Vic. 72, 141–151 (2014).

    Article  Google Scholar 

  100. Syfert, M. M., Smith, M. J. & Coomes, D. A. The effects of sampling bias and model complexity on the predictive performance of MaxEnt species distribution models. PLoS ONE 8, e55158 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  101. Soberón, J., Jiménez, R., Golubov, J. & Koleff, P. Assessing completeness of biodiversity databases at different spatial scales. Ecography 30, 152–160 (2007).

    Article  Google Scholar 

  102. Marcer, A. et al. Uncertainty matters: ascertaining where specimens in natural history collections come from and its implications for predicting species distributions. Ecography 2022, e06025 (2022).

  103. Zizka, A. et al. CoordinateCleaner: standardized cleaning of occurrence records from biological collection databases. Methods Ecol. Evol. 10, 744–751 (2019).

    Article  Google Scholar 

  104. World Flora Online (WHO, accessed 12 July 2022); http://www.worldfloraonline.org

  105. Pinkert, S., Barve, V., Guralnick, R. & Jetz, W. Global geographical and latitudinal variation in butterfly species richness captured through a comprehensive country-level occurrence database. Glob. Ecol. Biogeogr. 31, 830–839 (2022).

    Article  Google Scholar 

  106. Frost, D. R. Amphibian Species of the World: An Online Reference (Version 5.3) (American Museum of Natural History, 2009); http://research.amnh.org/vz/herpetology/amphibia/index.php

  107. del Hoyo, J. & Collar, N. J. HBW and BirdLife International Illustrated Checklist of the Birds of the World: Non-passerines Vol. 1 (Lynx Edicions, 2014).

  108. del Hoyo, J. & Collar, N. J. HBW and BirdLife International Illustrated Checklist of the Birds of the World: Passerines Vol. 2 (Lynx Edicions, 2016).

  109. Uetz, P., Freed, P., Aguilar, R. & Hošek, J. (eds) The Reptile Database (accessed January 6, 2020); http://reptile-database.org/

  110. Wilson, D. E. & Reeder, D. M. Mammal Species of the World: A Taxonomic and Geographic Reference 3rd edn (John Hopkins Univ. Press, 2005).

  111. Soberón, J. & Peterson, T. Biodiversity informatics: managing and applying primary biodiversity data. Philos. Trans. R. Soc. B 359, 689–698 (2004).

    Article  Google Scholar 

  112. The IUCN Red List of Threatened Species. Version 6.2 (IUCN, accessed 28 February 2022); https://www.iucnredlist.org

  113. Bird Species Distribution Maps of the World. Version 2020.1 (BirdLife International, 2020); http://datazone.birdlife.org/species/requestdis

  114. Brummitt, R. K. World Geographical Scheme for Recording Plant Distributions 2nd edn (TDWG, 2001); http://www.tdwg.org/standards/109

  115. Harris, L. W. & Davies, T. J. A complete fossil-calibrated phylogeny of seed plant families as a tool for comparative analyses: testing the ‘time for speciation’ hypothesis. PLoS ONE 11, e0162907 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  116. Shields, O. World numbers of butterflies. J. Lepid. Soc. 43, 178–183 (1989).

    Google Scholar 

  117. Smith, S. A. & Brown, J. W. Constructing a broadly inclusive seed plant phylogeny. Am. J. Bot. 105, 302–314 (2018).

    Article  PubMed  Google Scholar 

  118. Chazot, N. et al. Priors and posteriors in Bayesian timing of divergence analyses: the age of butterflies revisited. Syst. Biol. 68, 797–813 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  119. Jetz, W. & Pyron, R. A. The interplay of past diversification and evolutionary isolation with present imperilment across the amphibian tree of life. Nat. Ecol. Evol. 2, 850–858 (2018).

    Article  PubMed  Google Scholar 

  120. Jetz, W., Thomas, G. H., Joy, J. B., Hartmann, K. & Mooers, A. O. The global diversity of birds in space and time. Nature 491, 444–448 (2012).

    Article  CAS  PubMed  Google Scholar 

  121. Tonini, J. F. R., Beard, K. H., Ferreira, R. B., Jetz, W. & Pyron, R. A. Fully-sampled phylogenies of squamates reveal evolutionary patterns in threat status. Biol. Conserv. 204, 23–31 (2016).

    Article  Google Scholar 

  122. Bininda‐Emonds, O. R. et al. The delayed rise of present‐day mammals. Nature 446, 507–512 (2007).

    Article  PubMed  Google Scholar 

  123. Kreft, H. & Jetz, W. Global patterns and determinants of vascular plant diversity. Proc. Natl Acad. Sci. USA 104, 5925–5930 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  124. R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2022).

  125. Daru, B. H., Karunarathne, P. & Schliep, K. phyloregion: R package for biogeographic regionalization and macroecology. Methods Ecol. Evol. 11, 1483–1491 (2020).

    Article  Google Scholar 

  126. Hijmans, R. terra: Spatial data analysis. R package version 1.7-3 https://CRAN.R-project.org/package=terra (2023).

  127. Paradis, E. & Schliep, K. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35, 526–528 (2019).

    Article  CAS  PubMed  Google Scholar 

  128. Bivand, R. S., Hauke, J. & Kossowski, T. Computing the Jacobian in Gaussian spatial autoregressive models: an illustrated comparison of available methods. Geogr. Anal. 45, 150–179 (2013).

    Article  Google Scholar 

  129. Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer-Verlag, 2016).

  130. Jombart, T. & Dray, S. adephylo: exploratory analyses for the phylogenetic comparative method. Bioinformatics 26, 1907–1909 (2008).

    Article  Google Scholar 

  131. Revell, L. J. phytools: An R package for phylogenetic comparative biology (and other things). Methods Ecol. Evol. 3, 217–223 (2012).

    Article  Google Scholar 

  132. Abouheif, E. A method for testing the assumption of phylogenetic independence in comparative data. Evol. Ecol. Res. 1, 895–909 (1999).

    Google Scholar 

  133. Blomberg, S. P., Garland, T. & Ives, A. R. Testing for phylogenetic signal in comparative data: behavioural traits are more labile. Evolution 57, 717–745 (2003).

    PubMed  Google Scholar 

  134. Pagel, M. Inferring the historical patterns of biological evolution. Nature 401, 877–884 (1999).

    Article  CAS  PubMed  Google Scholar 

  135. Pavoine, S., Ollier, S., Pontier, D. & Chessel, D. Testing for phylogenetic signal in phenotypic traits: new matrices of phylogenetic proximities. Theor. Popul. Biol. 73, 79–91 (2008).

    Article  PubMed  Google Scholar 

  136. Koleff, P. et al. Measuring beta diversity for presence–absence data. J. Anim. Ecol. 72, 367–382 (2003).

    Article  Google Scholar 

  137. Kingsland, S. The importance of history and historical records for understanding the Anthropocene. Bull. Ecol. Soc. Am. 98, 64–71 (2017).

    Article  Google Scholar 

  138. Díaz, S. et al. The global spectrum of plant form and function. Nature 529, 167–171 (2016).

    Article  PubMed  Google Scholar 

  139. Shirey, V. et al. LepTraits 1.0: a globally comprehensive dataset of butterfly traits. Sci. Data 9, 382 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  140. Oliveira, B. et al. AmphiBIO, a global database for amphibian ecological traits. Sci. Data 4, 170123 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  141. Myhrvold, N. P. et al. An amniote life-history database to perform comparative analyses with birds, mammals, and reptiles. Ecology 96, 3109–3109 (2015).

    Article  Google Scholar 

  142. Wilman, H. et al. EltonTraits 1.0: species-level foraging attributes of the world’s birds and mammals. Ecology 95, 2027–2027 (2014).

    Article  Google Scholar 

  143. Carmona, C. P. et al. Erosion of global functional diversity across the tree of life. Sci. Adv. 7, eabf2675.

  144. Davies, T. J. The macroecology and macroevolution of plant species at risk. New Phytol. 222, 708–713 (2019).

    Article  PubMed  Google Scholar 

  145. Purvis, A. et al. Nonrandom extinction and the loss of evolutionary history. Science 288, 328–330 (2000).

    Article  CAS  PubMed  Google Scholar 

  146. Pelletier, T. A. et al. Predicting plant conservation priorities on a global scale. Proc. Natl Acad. Sci. USA 115, 13027–13032 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  147. Yessoufou, K., Daru, B. H. & Davies, T. J. Phylogenetic patterns of extinction risk in the Eastern Arc ecosystems, an African biodiversity hotspot. PLoS ONE 7, e47082 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  148. Redding, D. W. & Mooers, A. Ø. Incorporating evolutionary measures into conservation prioritization. Conserv. Biol. 20, 1670–1678 (2006).

    Article  PubMed  Google Scholar 

  149. Bielby, J., Cunningham, A. A. & Purvis, A. Taxonomic selectivity in amphibians: ignorance, geography or biology? Anim. Conserv. 9, 135–143 (2006).

    Article  Google Scholar 

  150. Nelson, A. Travel Time to Major Cities: A Global Map of Accessibility (Global Environment Monitoring Unit, 2008).

  151. Fick, S. E. & Hijmans, R. J. WorldClim 2: new 1 km spatial resolution climate surfaces for global land areas. Int. J. Climatol. 37, 4302–4315 (2017).

    Article  Google Scholar 

  152. Allan, J., Venter, O. & Watson, J. Temporally inter-comparable maps of terrestrial wilderness and the Last of the Wild. Sci. Data 4, 170187 (2017).

  153. Partow, A. The Global Airport Database. Release Version 0.0.1 (Partow, 2003); http://www.partow.net/miscellaneous/airportdatabase/

  154. Global Peace Index 2022: Measuring Peace in a Complex World (Institute for Economics & Peace, accessed 9 July 2022); http://visionofhumanity.org/resources

  155. Palmer, L. Show me the money. Nat. Clim. Change 1, 376–380 (2011).

    Article  Google Scholar 

  156. Science and Technology Report (UNESCO Institute for Statistics, 2012); http://www.uis.unesco.org/ScienceTechnology/Pages/research-and-development-statistics.aspx

  157. Bivand, R. R packages for analyzing spatial data: a comparative case study with areal data. Geograph. Anal. https://doi.org/10.1111/gean.12319 (2022).

Download references

Acknowledgements

We thank Stanford University and Texas A&M University-Corpus Christi for logistic support. B.H.D. was supported by the US National Science Foundation (awards 2031928 and 2113424). We are grateful to G. Nakamura, L. Ford and S. Pons for comments on earlier drafts of the paper. In addition, we are grateful to Holger Kreft for kindly sharing his data on the expected distribution of plants, which was instrumental in our analysis.

Author information

Authors and Affiliations

Authors

Contributions

The study was conceived and designed by B.H.D. Analyses were carried out by B.H.D. The paper was written by B.H.D and revised by B.H.D. with help from J.R.

Corresponding author

Correspondence to Barnabas H. Daru.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Ecology & Evolution thanks James Speed and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Patterns of expected species richness of terrestrial taxa.

The expected species richness of (a) Plants was derived from a co-kriging interpolation model of 1,032 regional floras worldwide, and (b) Butterflies, derived from a co-kriging interpolation of 543 geographic units covering the known inventory of butterflies, whereas the expected species richness of (c) Amphibians, (d) Birds, (e) Reptiles, and (f) Mammals, were generated by overlaying expert-based extent-of-occurrence range map of each species with equal-area grid cells of 100 km × 100 km. The bamako colour palette is common to all panels, with dark green indicating high coverage and yellow indicating low coverage. The maps are in the Wagner IV projection.

Extended Data Fig. 2 Spatial composition of β-diversity across grid cells by voucher and observation records.

Maps of dissimilarity between record types for: (a, b) Plants (n = 240,377 species), (c, d) Butterflies (n = 9809 species), (e, f) Amphibians (n = 4862 species), (g, h) Birds (n = 9380 species), (i, j) Reptiles (n = 7259 species), and (k, l) Mammals (n = 4508 species). Dissimilarity was assessed by generating pairwise distance matrices of Simpson’s β-diversity between all pairs of grid cells within major biogeographically defined areas recognized by the Biodiversity Information Standards (also known as the Taxonomic Databases Working Group (TDWG)). Values of β vary between 0 (species composition is identical between grid cells) and 1 (high dissimilarity, no shared taxa). Both voucher and observation records of most taxonomic groups showed high dissimilarity in less frequently sampled regions of South America, Africa, and Southeast Asia, and decline in frequently sampled Europe and North America.

Extended Data Fig. 3 Relationship between sampling effort (measured as taxonomic coverage) versus dissimilarity (measured as spatial composition of beta diversity) by voucher and observation records.

Indicated are the relationships between sampling effort and dissimilarity of record types for (a, b) plants, (c, d) butterflies, (e, f) amphibians, (g, h) birds, (i, j) reptiles, and (k, l) mammals. Trend line (in red) computed by evaluating the loess smooth at equally spaced points covering the range of dissimilarity values for each sampling effort.

Extended Data Fig. 4 Patterns of geographic coverage of grid cells by voucher and observation records of plants across spatial grain (50 × 50, 100 × 100, 200 × 200, 400 × 400, 800 × 800 and 1600 km × 1600 km).

Geographic coverage of grid cells was calculated as number of unique collection locales for each grid cell. Evenness or clustering of geographic coverage indicated by Moran’s I (Monte Carlo test, 999 randomizations) with values of 1 indicating clustered/biased coverage and 0 corresponding to geographically even coverage. The bamako colour palette is common to all panels, with darkgreen indicating high coverage and yellow indicating low coverage. The maps are in the Wagner IV projection.

Extended Data Fig. 5 Patterns of geographic coverage of grid cells by voucher and observation records of butterflies across spatial grain (50 × 50, 100 × 100, 200 × 200, 400 × 400, 800 × 800 and 1600 km × 1600 km).

Geographic coverage of grid cells was calculated as number of unique collection locales for each grid cell. Evenness or clustering of geographic coverage indicated by Moran’s I (Monte Carlo test, 999 randomizations) with values of 1 indicating clustered/biased coverage and 0 corresponding to geographically even coverage. The bamako colour palette is common to all panels, with darkgreen indicating high coverage and yellow indicating low coverage. The maps are in the Wagner IV projection.

Extended Data Fig. 6 Patterns of geographic coverage of grid cells by voucher and observation records of amphibians across spatial grain (50 × 50, 100 × 100, 200 × 200, 400 × 400, 800 × 800 and 1600 km × 1600 km).

Geographic coverage of grid cells was calculated as number of unique collection locales for each grid cell. Evenness or clustering of geographic coverage indicated by Moran’s I (Monte Carlo test, 999 randomizations) with values of 1 indicating clustered/biased coverage and 0 corresponding to geographically even coverage. The bamako colour palette is common to all panels, with darkgreen indicating high coverage and yellow indicating low coverage. The maps are in the Wagner IV projection.

Extended Data Fig. 7 Patterns of geographic coverage of grid cells by voucher and observation records of birds across spatial grain (50 × 50, 100 × 100, 200 × 200, 400 × 400, 800 × 800 and 1600 km × 1600 km).

Geographic coverage of grid cells was calculated as number of unique collection locales for each grid cell. Evenness or clustering of geographic coverage indicated by Moran’s I (Monte Carlo test, 999 randomizations) with values of 1 indicating clustered/biased coverage and 0 corresponding to geographically even coverage. The bamako colour palette is common to all panels, with darkgreen indicating high coverage and yellow indicating low coverage. The maps are in the Wagner IV projection.

Extended Data Fig. 8 Patterns of geographic coverage of grid cells by voucher and observation records of reptiles across spatial grain (50 × 50, 100 × 100, 200 × 200, 400 × 400, 800 × 800 and 1600 km × 1600 km).

Geographic coverage of grid cells was calculated as number of unique collection locales for each grid cell. Evenness or clustering of geographic coverage indicated by Moran’s I (Monte Carlo test, 999 randomizations) with values of 1 indicating clustered/biased coverage and 0 corresponding to geographically even coverage. The bamako colour palette is common to all panels, with darkgreen indicating high coverage and yellow indicating low coverage. The maps are in the Wagner IV projection.

Extended Data Fig. 9 Patterns of geographic coverage of grid cells by voucher and observation records of mammals across spatial grain (50 × 50, 100 × 100, 200 × 200, 400 × 400, 800 × 800 and 1600 km × 1600 km).

Geographic coverage of grid cells was calculated as number of unique collection locales for each grid cell. Evenness or clustering of geographic coverage indicated by Moran’s I (Monte Carlo test, 999 randomizations) with values of 1 indicating clustered/biased coverage and 0 corresponding to geographically even coverage. The bamako colour palette is common to all panels, with darkgreen indicating high coverage and yellow indicating low coverage. The maps are in the Wagner IV projection.

Extended Data Fig. 10 Pairwise relationships between 6 socioeconomic and ecological variables.

Correlations based on pairwise Spearman-rank correlations between the variables at spatial grain of 100 km. All variables were log-transformed before analysis. The statistical test used was two-sided. Exact p values are indicated below correlation coefficients.

Supplementary information

Supplementary Information

Supplementary Tables 1–7 and Notes 1 and 2.

Reporting Summary

Peer Review File

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Daru, B.H., Rodriguez, J. Mass production of unvouchered records fails to represent global biodiversity patterns. Nat Ecol Evol 7, 816–831 (2023). https://doi.org/10.1038/s41559-023-02047-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41559-023-02047-3

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing