Main

Biodiversity and the many ecosystem functions and services it underpins are undergoing significant and often rapid changes worldwide1. A range of global initiatives and policy frameworks, including the Convention on Biological Diversity (CBD) and Sustainable Development Goals (SDGs), have aimed to reduce this change and to halt the loss of biodiversity, with limited progress to date2. Appropriately gauging the impact of such policies or the progress toward international biodiversity goals has a key requirement: the availability of information on the status and trends of biodiversity in a form that is easily understood, timely, scientifically rigorous, standardized, relevant, global and representative of species populations across taxa and regions over time. Such information is particularly crucial in assessments, such as those carried out by the Intergovernmental Science–Policy Platform on Biodiversity and Ecosystem Services (IPBES)3, and is needed to construct ‘indicators’, which are aggregate measures that often address specific conservation targets4,5. Underpinning such metrics are core, essential measurements known as EBVs, which capture key constituent components of biodiversity change6,7, akin and complementary to the ‘essential climate variables’ supporting climate change assessment and policy8. Facilitated by the Group on Earth Observations Biodiversity Observation Network (GEO BON, http://geobon.org) and related efforts, the biodiversity science and observation community is now engaging in an effort to conceptualize and formulate these essential biodiversity components to enable more focused, integrated, and effective biodiversity monitoring in support of assessment and policy within a unified framework. This study represents the formal outcome of a process undertaken from 2015 through 2018 by the founding members of the GEO BON Species Populations Working Group9, which includes the authors of this Perspective, charged with providing the formal definitions, conceptualizations and recommendations addressing species distribution and abundance EBVs.

Changes in species distribution and abundance affect all biodiversity facets10, including the loss of potentially significant traits and functions1,11 and associated ecosystem consequences12,13. Patterns of spatial distribution and changes to these patterns inform us about the commonness, rarity and potential extinction risk for species14,15,16, determine the national and regional stewardship of species and are key to ensuring effective monitoring17, protection18,19 and population connectivity20 of species. Species-conservation goals often are particularly relevant to conservation legislation, and species population information used in tracking progress for CBD 2020 Targets 5, 11, 12 and 19 and SDG Goals 14 and 15, among others. When linked to data on surrounding conditions, occurrence information may provide insight into the realized environmental niche spaces of species21, which is key to capturing future consequences of global change22,23. Finally, species distribution and abundance have a range of other applications in science and society and facilitate app-based biodiversity discovery, learning and citizen science24,25,26,27.

Many countries already support a range of survey activities, such as national atlases, monitoring programs focused on threatened or flagship species and large-scale sharing of biodiversity data28,29,30. Conservation organizations and researchers add to this effort, but usually with a focus on a particular region or set of species. Critically, however, while often successful in addressing specific jurisdictional, organizational or scientific agendas, this data collection is naturally taxonomically, spatially, temporally and ecologically limited, biased and unrepresentative of overall biodiversity31,32,33,34, skewing the decisions that the data inform35. Sound measurement of progress toward policy targets and effective decision-making requires an information base geared toward overcoming these limitations through data and metadata capture and organization26,36,37,38 as well as harmonization and integration26,39,40. With species populations unconstrained by national borders and measurement of conservation progress requiring comparable and complete information, this integration should not only be explicit about its biodiversity representation but truly standardized and global in nature.

Here we characterize the elements of a capture of species population information that spans the entire Earth system and introduce the conceptual framework for the two ‘species population EBVs’ (SP EBVs) applicable to terrestrial, fresh water and marine environments. Specifically, we provide operational definitions for the species abundance EBV (SA EBV), which addresses counts of individuals for a given location in space and time, and the species distribution EBV (SD EBV), which is conceptually similar to the SA EBV but is simplified to a binary form and is usually more attainable. To address global policy and decision requirements, these EBVs need to fulfil four key criteria: (1) cover an explicit and, for a given taxonomic scope, maximally representative set of species; (2) have a near-global scope or, at minimum, address a given taxonomic scope to its full spatial extent so that national stewardship responsibilities are captured; (3) be geographically and temporally contiguous or maximally representative and (4) offer information at spatial and temporal resolutions that are useful for decision-makers and policy creation. As raw data alone is usually unable to fulfil these criteria, model-based and covariate-supported data integration is vital. We argue that the combination of global-scale remote sensing, new modelling methodology and novel computational and informatics solutions along with different species population data types now enable the required characterization.

Species occurrence data

The dynamics of species distribution and abundance are effectively assessed, and data to inform them fruitfully characterized, along the axes of space, time and taxonomic diversity41,42. Along these dimensions, an array of different data types from vastly varying sources contributes information about the occurrence of species. These usually vary by their spatial and temporal extent, resolution and frequency and are of different value in characterizing species distributions and their changes26. Currently, species distributions and the SD EBV are much more readily addressed than abundances (SA EBV); to introduce the concept, we first focus on species occurrence data. In its basic form, spatiotemporal species occurrence information requires at least a binary distinction of presence (≥1 individual) and/or non-detection (0 individuals). While all data types can provide evidence of species presence, only some are informative about absence (Fig. 1). Yet, reliable absence information is important for ascertaining change, including local emigrations and extirpations (absence given prior presence) or immigrations and introductions (presence given prior absence). Effective information integration for both SP EBVs therefore first requires a synthesis of the absolute and complementary value of core data types. As we illustrate (Fig. 2), these different data types and sources combine to offer very heterogeneous occurrence evidence.

Fig. 1: A typology of heterogeneous raw occurrence data supporting the capture of species distributions in space and time.
figure 1

The three major data types that we describe (incidental records, inventories over small or large areas and expert synthesis maps) differ in spatiotemporal scope, in taxonomic scope, in the quality of the presence (+) and absence information (–) that they provide and in spatiotemporal specificity, which in turn determine the key characteristics that they can inform. For further use and legend for the ‘Output’ field, see Fig. 2.

Fig. 2: Heterogeneous species occurrence data types in space and time (species–space–time–gram).
figure 2

Left, Consider the occurrence of species in three countries (A–C) over a defined time period that has been informed by multiple data types (1–5). To facilitate the integration, data are summarized over a contiguous grid (right) with cells of equal size that includes all data types and species of a group and their global extent. Cells sizes could vary geographically to reflect data density, and, for marine or freshwater ecosystems, cells may represent units of volume or linear or area space. The grid allows geographic space to be collapsed into one dimension with, for example, neighbouring cells arranged in roughly sequential order along the vertical axis and over time for graphical presentation. This species–space–time–gram enables the visual characterization of available data for the two dimensions (space and time) and multiple species. Cells can be spatially aggregated at governance scales, such as regions, countries or counties and, potentially, themselves be politically defined. In our example (circled numbers on both sides of the figure), an expert synthesis map (1) characterizes the static species ranges for the 20-year period and is deemed by experts to reliably separate presence (green) and absence (orange) for the chosen space–time grid. This is complemented by two large-area inventories (2), developed before the year 2000, indicating — with little spatiotemporal specificity — species presence in parts of Country C, and likely absence in all of Country A. Two sets of small-area inventories exist, including single-year visits (3) and multi-year atlas efforts (4), all providing spatiotemporally explicit evidence that may hold reliable absence information. Finally, multiple incidental records (5) provide presence data for select grid cells and years. Assuming, for now, a high suitability of small-scale inventories for grid-level absence inference, the raw occurrence data does enable the detection of local extinctions (i) or non-native occurrence or invasions (ii).

Incidental observations

These are single records that lack information about co-observed species, taxonomic scope and, usually, sampling protocol, such as most museum records and many citizen-science contributions. They are therefore unable to directly inform non-detections, yet can often offer species presences for detailed locations (hence the term ‘presence-only’ data). Thanks to increasing amateur data collection24,27, advance of animal tracking43,44, activities of aggregators like the Global Biodiversity Information Facility (GBIF) and the Ocean Biogeographic Information System (OBIS), existing protocols facilitating sharing and interoperability (DarwinCore37), and easy-to-use modelling tools45, this data type and its direct use have seen strong growth. However, major taxonomic, ecological and geographic biases in data availability still exist31,46, impeding straightforward interpretation.

Inventories

This data type differs in two key aspects — it has a defined taxonomic and spatiotemporal scope that is larger than that of a single observed individual or species (Fig. 1). Inventories may address all members of a taxon (for example, mammals or bryophytes) or a group defined in another way (for example, trees or phytoplankton above a certain size). This enables inference about non-detections — that is, members of the focal species group that were not recorded (hence the term ‘presence–absence’ data). Such non-detections can provide information about absences, but the reliability of such inference depends on the overall survey effort and effectiveness and sampling protocol. For small-area inventories addressing relatively immobile or readily detected organisms, such as vegetation plots or a short survey transect, a given effort may provide very spatiotemporally specific evidence and potentially reliable absences. For more mobile groups that are harder to detect, larger or longer-lasting survey campaigns, such as atlas efforts or sensor/trap-based surveys, may offer more reliable absence evidence, but at the cost of spatiotemporal specificity. Over much larger extents, such as countries, counties, islands or national parks, large-area inventories like multispecies ‘checklists’ or state-level databases for select species of national interest (for example, pests or non-natives) are often based on multiple data sources and protocols and are considered ‘summary inventories’38. Depending on effort and rigor, they may provide both reliable presence and absence information, but usually with limited spatial or temporal specificity. Despite the fundamental role of inventories for ascertaining potential absences, especially in the context of growing modelling methodology47, they have seen limited mobilization and integration with other data types. Key past causes for this include the lack of (1) data and metadata, (2) infrastructure to facilitate the capture and use of this information and (3) community-wide appreciation and incentives for data and metadata sharing. With a prototype inventory data standard (Humboldt Core38) and associated informatics tools at Map of Life and format extensions at GBIF and OBIS in place, future growth in the capture of effort and complete metadata looks promising.

Expert synthesis maps

These are binary or categorical distribution maps that are developed by species experts. They aim to separate coarsely occupied areas from those without species occurrence and typically cover a longer timeframe, often decades rather than years48,49. Similar to large-area inventories but addressing only a single species at a time, they usually summarize multiple sources and data types from multiple time points (with details and provenance usually not retained). These data are based on by-region expert determinations or are interpolated to inform occurrence boundaries that are often hand-drawn based on sources including taxonomic monographs, handbooks, large-scale field guides and conservation reports. Such expert predictions are now sometimes quantitatively supported by species distribution modelling50,51,52,53,54 with pixel suitability scores subsequently thresholded, masked or otherwise modified by the expert to exclude areas presumed to be unoccupied55. While both data and (human or machine-based) models underpinning such predictions are usually a ‘black box’, they can offer information vital to delineating the geographical scope for a species within which presence may be expected and outside of which absence is likely. This often implies broad temporal scope and thus limited opportunity for direct change inference and a lack of spatial detail: hand-drawn presence–absence boundaries may have substantial inaccuracies, often include substantial areas of false presence and sometimes also include false absences49,56,57. However, other data may allow validation of a reliable spatial grain for this data type to be used for presence–absence information, for example ca. 100 km for global bird expert maps49,58, and potentially finer in model-supported expert predictions55.

Model-based integration and prediction

The species population EBVs

As shown above and in our data ‘species–space–time–gram’ (Fig. 2), raw biodiversity data alone are usually able to characterize species distributions in space and time only in an idiosyncratic way and with limited sensitivity for detecting change. General reasons include sparse data and taxonomically, spatially and environmentally uneven coverage31; ecological, environmental or phenological variation in species detectability59; and highly heterogeneous spatial and temporal grains of available data. Thus, on their own, raw data fail some or all of our initially formulated four key criteria for SP EBVs. Status and trend metrics and indicators that are mere aggregates of raw data are likely to be biased and sub-optimal and potentially may mislead downstream inference. This can be addressed through the use of spatiotemporally contiguous environmental and other species-level covariates in a statistical framework (Fig. 3). Observations representing different data types collected over heterogeneous spatiotemporal scales are unified in a space–time analysis grid or, in aggregate form, a species–space–time cube in which cells represent a model-based measure of presence or abundance. Individual cells may refer to space of any dimensionality, including linear or three-dimensional in the case of aquatic habitats. Cell size would ideally represent the relevant scale of occurrence or change processes60 or, more operationally, be adaptively driven by available data, intended output and acceptable uncertainty (see below) and may thus vary by species group. Models integrating the respective data types and signals among species, locations and time then enable predictions of a cell’s suitability, presence probability, or abundance of a species at particular points in space and time, while including measurements of uncertainty. Applied contiguously over an extent encompassing the geographic ranges of species in the taxonomic scope, this enables assessment and monitoring of occupied areas and associated statistical signals of local, regional and global change (Fig. 3).

Fig. 3: The development and uses of SP EBVs.
figure 3

Bottom, Heterogeneous occurrence data, remotely sensed environmental conditions and ecological species attributes facilitate global predictions of species distributions in space and time. The SP EBVs are the predicted probability of occurrence (SD EBV) or the predicted number of individuals (SA EBV) over contiguous spatial and temporal units that cover the spatial extent of each species group member. For example, when data are aggregated for a single species, the SD EBV measures changes in distribution or population size. Centre, When data are aggregated for single cells, the SD EBV informs about community change in, for example, species richness or compositional similarity, or — via ancillary data — functional or phylogenetic turnover. Top, The fundamental contribution of SP EBVs for indicators of biodiversity and ecosystems status and trends. On their own, or combined with ancillary data on species or locations as well as EBVs from different classes, the SP EBVs underpin a large range of uses in policy, conservation, management, research and society. Inspired by earlier work42, the species–space–time–gram (cube) concept and graph were originally developed by the authors of the present work and then shared with the GEO BON community in 2016.

EBV definition

We thus define the essential biodiversity variable for species distributions (SD EBV) as the probability of occurrence over contiguous spatial and temporal units addressing the global extent of a species group consisting of one to many members. With support from models, this space–time–species cube is characterized for all members of a taxonomically or ecologically defined species group over their respective global extent, with a cell size that is potentially variable. The conceptually equivalent species abundance EBV (SA EBV) is the predicted count of individuals over contiguous spatial and temporal units. Together, they and their potential spatial aggregates or combinations with species attributes (see below) represent the species populations EBV class, as in ref. 6.

Environmental data

This conceptualization of the SP EBVs is uniquely facilitated by the environmental data revolution, specifically the availability of worldwide high-resolution remote sensing products. A number of sources, such as the National Aeronautics and Space Administration (NASA)’s Landsat, NASA’s moderate resolution imaging spectroradiometer (MODIS), NASA’s other upcoming missions61, the European Space Agency (ESA)’s Sentinel, and other (increasingly including commercial) ventures, provide data of growing spatiotemporal and spectral detail and extent and are increasingly representative of ecological drivers. The fine spatial and temporal resolution of data allows an environmental characterization of biodiversity data at spatiotemporal grains near that of in situ records. This enables an increasingly scale-conscious niche capture and provides critical spatiotemporal sensitivity and flexibility for inferring and predicting distributions and their change62,63. And these environmental measurements are increasingly relevant for species population processes, addressing biological drivers such as land cover, topographic and habitat heterogeneity, fine-scale weather variation, plant functional traits or productivity64,65,66,67,68 and, in select cases, even species directly63,69,70. Data across the depth-gradient for the oceans are more limited, but the first near-global distribution modeling71,72,73 and the first near-global characterizations of freshwater conditions are emerging74.

Models

The underlying modelling concepts and techniques supporting our approach are based broadly on species distribution models50,51,52,54, which identify the environmental conditions associated with species occurrence and allow their mapping in space. These conditions are usually identified inductively from occurrence data, but may also be informed deductively through species attributes, such as known ecological or physiological associations with land cover, elevation/depth and climatic or water conditions75,76,77. Dispersal and biotic constraints limit actual distributions, and for SP EBVs, predictions of the realized niche in the existing biological and spatial template is required, rather than the fundamental niche, which is more appropriate for projection into different time and space21. Expert-assessed, data-driven or phylogenetically inferred biotic species dependencies can be linked to hosts, prey, predators or other actors to inform distribution or abundance predictions78,79. These and landscape characteristics may also be gauged from the species co-occurrence data itself, for example, through a focus on assemblages and their compositional dissimilarity80 or a new generation of ‘joint’ distribution models for multiple species81,82. These methods are particularly promising for the strength of information gained across species in the face of limited occurrence data.

Almost all species distribution modelling applications to date use temporally static or collapsed input data only, even as they aim to project distributions changes in time (‘space for time substitution’52,83,84). This temporal stationarity assumption represents a key constraint for assessing temporal change85. With sufficient data, a more appropriate, yet somewhat inefficient, approach is to run separate models for different time periods. More desirable are models that are fit across the entire spatial and temporal scope of available data and that explicitly address spatiotemporal co-dependencies and signals of change86. This is addressed by dynamic distribution or dynamic occupancy models that set out to parameterize and predict variation in occurrence jointly in space and time82,87,88,89,90 as well as assemblage dissimilarity models applied to temporal turnover with no explicit consideration of species-level patterns or separation of spatial and temporal drivers91,92. The first large-scale demonstrations of dynamic occupancy approaches with spatiotemporal change assessment are now emerging93,94. We see great potential to extend and implement these methods as the backbone for addressing species occurrence in contiguous space and time.

Model-based data integration

The vast majority of species distribution prediction efforts to date are based on presence-only models using environmental covariates alone. This constrains delineation of non-environmental distribution limits, such as past or current physical or ecological barriers95,96. With sufficient data, this limitation can be addressed through hierarchical spatial models97 or alternatively by combining presences with an expert synthesis map data to restrict model predictions98. Presence-only models are limited in that they only provide relative cell suitability and use binary presence–absence thresholds contingent on ancillary information, expert judgment and assumptions99,100. We suggest that here the inclusion of inventory data will be critical and lead to a new generation of species distribution modelling approaches. Inventory data implicitly provide information on species absences and, through the use of an occupancy modelling framework40, enable the assessment of species- and environment-specific detection probabilities101,102 and a quantification of absolute occurrence probabilities103,104. While repeated sampling following a standardized protocol may be ideal, such data is obviously often limited to few or unrepresentative species and regions. We therefore argue that in many cases, alternative or even self-assessed information on the completeness of an inventory, and implicitly the level of detection or absence information afforded by it, may strengthen occurrence predictions38. We highlight the need for more statistical work to address this potential and acknowledge that species with extremely limited observational data will continue to pose a challenge, especially for trend and aggregate assemblage metrics. Combined with in situ abundance data or size estimates of home range, the same framework can address abundance105,106. Point process models in particular unify distribution and abundance predictions and seem especially suited to be a statistical framework39,107,108, in principle enabling smooth transition between the SD EBV and SA EBV.

A key challenge for model-based integration, exacerbating the known issue of gaps and biases in spatial biodiversity data, is the heterogeneity of taxa and sampling methods and of spatial and temporal grain of available data. Notably, presence and absence evidence have fundamentally different spatiotemporal grain properties. The presence of a single individual for a given place and time automatically implies the presence of the species for all larger spatial or temporal units that contain it. In contrast, an observed absence at the level of a small plot or short trawl does not imply absence at the level of an encompassing coarser-grain grid cell. Equally, for mobile organisms a reliable absence during a weeklong survey does not imply absence in that year. Hierarchical statistical models, often using Bayesian approaches, have been developed specifically to address the cross-type and cross-scale nature of occurrence data and used to combine inventory and incidental data109 or data from disparate spatial scales110. Such approaches enable predictions at a common spatial (and hypothetically temporal) resolution111, that is, the up- or downscaling of underlying heterogeneous data to a single spatiotemporal prediction grid for both species distribution and abundance112,113. The issue of scale is intimately connected to that of uncertainty, as occurrence at the continental or centennial scale is naturally fraught with less uncertainty than that at the 100-km or annual scale. For most uses, predictions at finer scales are preferred as long as uncertainty is captured, which is increasingly being facilitated by Bayesian and related modelling techniques114,115. We consider the capture, reporting, spatial visualization and cascading of uncertainty into aggregate products as key for supporting effective data collection and sound policy and management decisions. We note that the interconnections between the scale of process, evidence and predictions and the trade-offs between scale, uncertainty and sensitivity are key areas in need of further research.

Uses and applications

The envisioned essential species distribution information, or SD EBV, offers an exceptional breadth of applications in biodiversity and ecosystem monitoring and assessment (Fig. 3). Consider the idealized case of data and models providing annual occurrence probabilities and associated uncertainties for hundreds of species globally over a medium spatial grid sized at 100 km, 10 km or even 1 km. Such an empirically driven SD EBV enables the monitoring of species distribution dynamics (contractions, expansions and redistributions) and of the sizes and levels of fragmentation of geographic ranges. For any cell location, it provides information about community richness, composition and its change (thus addressing variables in the community EBV class sensu6), including immigration and loss of native species. When aggregated with data from regions or the globe, it offers compound characterizations of both species and community change in a larger-scale context and is thus able to directly inform global indicators of change, such as the suite of GEO-BON-endorsed biodiversity indicators116.

Ancillary data on species and places allow for enriched characterizations. The SD EBV joined with data on traits or functional roles of species may, for example, support inference about functional biodiversity losses10,117 or the potential ecological impacts of species invasions13. Combined with species-level estimates of life history and home range sizes, the SD EBV has the potential to support more temporally sensitive and accurate estimates of species extinction risk. Linking in spatial data on environmental change enables the identification of drivers of change. When combined with spatial protected area information, the SD EBV can support monitoring of progress for international biodiversity conservation targets or help identify new conservation opportunities, including in support of the Half-Earth Project118 or related aspirations. Extending the SD EBV to address abundance estimates for the same space–time–species cells, the SA EBV can offer even greater ecological and conservation relevance (for example, see refs. 94,119). Combined with ancillary data, such as species-typical body mass and function, it can inform biomass and biomass/abundance-scaled functional changes and more. Where attributes have high intraspecific variation, for example, due to local adaptation, these extended uses of SD EBV and SA EBV will benefit from in situ species-, community-, or even ecosystem-level measurements that can offer vital local detail.

For nations, the presented framework and associated infrastructure (see below) enable a substantially improved capacity to track biodiversity change and to assess progress toward national and international commitments35 by: (1) informing local predictions with globally pooled and integrated data and by involving a concerted expert effort for data input and validation and potentially offering more complete and rigorous status and trend estimates than an isolated national analysis or system, especially in undersampled regions; (2) capturing, through its global scope, various nations’ stewardship of species (for example, the proportion of a species’ global population that a given nation holds) and changes to it; (3) enabling indicators that, because of their global and standardized nature, inform progress toward internationally agreed biodiversity targets in a comparable way; and (4) guiding and providing infrastructure, tools and dashboards that support in-country and international assessment and reporting on species populations (facilitated by GEO BON and its partners), such as Map of Life, the BON-in-a-Box web toolkit, or the GEO BON EBV portal.

Infrastructure

The SP EBVs require rethinking traditional approaches to producing knowledge products. At the heart of the SP EBVs is the recognition of harmonization and model-based integration of multiple types of biodiversity and environmental data from heterogeneous sources that address different scales and are stored in multiple formats. This requires workflows that connect data to models in order to produce and disseminate credible and transparent modelled products. Such workflows necessitate a network of tools and infrastructure to address four main steps (Fig. 4): (1) data generation, contribution and aggregation, (2) data integration, (3) modelling and production of SP EBVs and (4) delivery and use of SP EBVs. In step 1, biodiversity data producers, such as taxon-, region- or data-type-specific networks, sampling campaigns (for example, through citizen science) or national/institutional interests (for example, country atlases and national marine surveys) are improving the sophistication of their database tools and data size. Unfortunately, much data still remain unavailable owing to lack of sharing or restrictive licensing, but data contributions are facilitated by a range of platforms (for a review of examples and associated standard and workflow issues, see ref. 36). For incidental records, data and metadata standards are enabling such data to effectively support EBV development. Key examples of such networks include GBIF and OBIS, which operate globally via connection to national nodes and thematic networks. Such global networks are important infrastructures to ensure availability, repeatability, standardization and archiving in support of downstream data integration and use120. However, still nascent and of immediate need is infrastructure that can play a similar role for more complex data types, such as select inventory data that require detailed metadata to most effectively feed into models, data-type or model-focused effort. Data integration and modelling (steps 2 and 3) are the backbone for producing SP EBVs. An informatics framework is needed that is strongly informed by research and is based on incorporation of environmental sensor data, flexible modelling methodology that integrates data types and scalable computational statistical approaches and associated cloud-based data management. The infrastructure should form a community-platform for the best-possible development of standardized, scientifically rigorous and transparent SP EBVs. This should include dashboards that enable taxon-region experts to provide community feedback and product improvement as well as data evaluation and product delivery back to their respective platforms and networks. Species distributions and data traverse national boundaries. We thus consider a harmonized global infrastructure that addresses core model-based integration steps as key to achieving standardized, geographically comparable information and downstream indicator products. Data integration and modelling steps and downstream products are otherwise likely to be near impossible to standardize.

Fig. 4: Key actors, workflows and informatics infrastructure for the production and use of essential species population information and SP EBVs.
figure 4

a, Data contribution and validation. Networks composed of individuals, organization and institutions with an array of data types engage with platforms that support the standardized capture of data and metadata and provide tools for quality control and taxonomic harmonization. Infrastructure, such as GBIF and OBIS, are established as global examples of infrastructure in this space with the capacity to effectively store, improve and mobilize data onward for further use. b, Data integration. In this step, workflows and dashboards unify disparate data types and sources in a common spatiotemporal framework and supply networks of data providers, experts and other users with initial reporting on raw data coverage and trends in space and time. c, Modelling. After raw data is annotated with, for example, remotely-sensed environmental data, a default set of dynamic models, adapted to data type and quantity, is applied for predictions in space and (past to current) time, producing the EBV. Dashboards allow both taxon- and region-specific experts and modelling specialists to engage with, and iteratively improve, models and products. b and c, and parts of a concerning particularly model-relevant data and metadata, are enabled by infrastructure (such as Map of Life) that addresses global data integration, annotation, modelling and feedback to networks of experts. d, Use. A range of users, from policy, management, research, advocacy and the broader public, engages with summary maps or with products derived from the SP EBVs, such as indicators, which all can be connected with ancillary biodiversity or spatial data (Fig. 4). Further development of products and their communication is performed by scientific, international, national and non-governmental groups and associated platforms, including GEO BON, CBD and conservation organizations.

This approach places a premium on community involvement, and such a platform is meant to strongly support the needs of different communities to develop and curate their own SP EBVs products while enabling the best-possible transnational information products. In the absence of such a central resource, we foresee the potential for scattered and hard-to-replicate outputs of SP EBVs that are unable to be further integrated into the most usable synthesis products. We expect that this approach will not operate via a single actor or access point, but through coordinated efforts to develop shared methods and protocols, most likely using shared, cloud-based workflows. With support from a range of funding, science and technology partners, the core elements of this infrastructure are built or prototyped in Map of Life (http://mol.org/), which, with further development, we consider well placed to serve a key role in coordinating many components of data assembly and especially modelled outputs for the SD EBV. Finally (step 4), the visualization and programmatic delivery of basic SP EBV information and directly connected indicators should be well served through the infrastructure driving its production. But we envision a range of other national or global infrastructure, such as that associated with GEO BON, the CBD, conservation organizations, national agencies and others would similarly host programmatically accessed EBV information and/or combine with ancillary data to address community-specific needs or to build additional products.

Evidence base

Dataset information value

With billions of organisms on this planet that are changing in distribution and abundance at any moment, it is clearly impossible to fill the space–time–species cube in fine spatial, temporal and taxonomic resolution for all life and at the planetary scale. Even with the aid of ongoing environmental data collection and models, the operational grain for a given species group will be finer (and typical uncertainty levels will be lower) for those with many and well-stratified records in space and time (and environment) and high detection levels, such as birds, and coarser for those with more limited and/or spatially clustered data, such as ferns (Fig. 5a,b). This operational grain in turn is intimately connected to our ability to infer change — the coarser it is, the more constrained is our ability to infer species population trends, and particularly so for species groups with fine-scale spatiotemporal dynamics (Fig. 5c). Critically, even in the best-known species groups, viable spatiotemporal prediction grains may only arise in a proportion of species, usually an ecologically and spatially highly non-random subset103,121,122. Accordingly, taxonomic or functional representativeness of the predicted space–time–species cube may be high for some well- or easily-studied groups, but not others. Such limited taxonomic or functional coverage (representativeness) has strong repercussions for the relevance and generality of detected change (Fig. 5d). Consequently, the value of a dataset on a given species group is not determined by a single attribute, but is instead driven by its minimum ‘performance’ across a range of attributes (Fig. 6). We advocate for an expanded assessment of existing and planned data collections that accounts for this multitude of components and that provides a valuation of their ability to inform model-based predictions relative to other data.

Fig. 5: Data and information trade-offs.
figure 5

a, Prediction uncertainty is affected not just by record count but a suite of other issues and varies strongly among species groups and regions. b, Species groups with lower prediction uncertainty can support a finer spatiotemporal grain for EBV production. c, The magnitude of uncertainty or prediction grain in turn has direct effects on the potential for change inference, with exact effect depending on how readily such change is detectable for a species group at a given space–time grid, which will vary with the spatiotemporal scale of their distribution (for example, over the same spatiotemporal grain small-bodied versus large-bodied species groups will show different sensitivity). d, The generality and relevance of change inferences decreases quickly with any taxonomic or ecological biases in the data — for example, if some functional groups of a taxon are missing. US, United States; UK, United Kingdom; PNG, Papua New Guinea; the countries here are simply used to symbolize current perceived differences in available data, qualitatively informed by ref. 31.

Fig. 6: Dataset value for informing the SP EBVs varies with a multitude of factors.
figure 6

Poor performance in a single criterion can result in limited overall value of a dataset for the purposes of global mapping and monitoring in this framework, and useful datasets would have at least medium performance in all criteria and would score high in as many as possible. This is highlighted with an initial characterization of some example datasets that could feed into EBV production (reflecting their status in the year 2018).

Effective data contributions

What then are the most valuable data contributions from individuals, organizations and government agencies in the context of these grain and coverage limitations? It follows from above that the value of additional data sampling and mobilization activities must be seen in the context of remote-sensing-aided models as well as other available data types and sources26,123. For charismatic, readily identified or common species in particular, citizen-science activities may now represent a relatively inexpensive, ongoing form of data collection, although highly geographically biased. Amateur-deployed visual or acoustic sensors or even environmental DNA data collection also increasingly contribute data at a relatively low cost124,125. In a more organized and standardized form, such as in many volunteer ‘atlas’ efforts or bird counts126, these sorts of citizen-scientist-supported activities hold the potential to reduce spatial and taxonomic data biases and, through capture of inventory information, absence inference. For select species and regions, conservation organizations may have a particular role in organizing and funding such sampling activities. Despite the vast potential this form of data collection holds for some taxa, for many species groups, taxon-region expert networks hosted by scientific societies and museums are key in providing guidance and quality control for amateur contributions or driving forward primary data collection as well as taxonomic assignment and harmonization.

National activities

Countries have a strong self-interest in effective biodiversity monitoring to maintain and sustainably use their own biological resources and to gauge their progress toward international obligations. To this end, many have set up national monitoring activities and are looking to the global observation and science community for facilitation and guidance17,30,33. While less so than bottom-up amateur efforts, most nationally organized data collection is still marked by heterogeneous methodology that usually fails to address detection probabilities, spatiotemporal biases and limited taxonomic coverage28,29, and sometimes scientific data sharing principles are not followed127,128. These activities also usually do not yet consider ancillary data and model-based inference in sampling design127,129. Some countries, such as Switzerland (http://atlas.vogelwarte.ch/), are leading the way in country-wide designs aimed at delivering highly effective data while not yet fully considering other citizen-science data collections and data from beyond their border, which both can improve prediction and inference123. For larger and more biodiverse countries that are less populated and have fewer resources, these sorts of campaigns are out of reach, highlighting the importance of national activities that are cost-effective and impactful.

For effective national contributions to essential species population information, we put forth a few key insights. (1) National monitoring efforts should be guided by existing biodiversity and remote sensing data collection and their model-based integration. The most (cost-)effective information gains may be achieved from strategically complementing past or ongoing data collection and taking advantage of statistical inference frameworks. (2) Country-level monitoring should be minimally isolated. Effective data collection relies on international coordination and collaboration and potentially cross-border support — species distributions span borders, and changes missed in one part of the range compromise inference for all countries that hold stewardship for a species. (3) Data should be published and open access. Only a full sharing of both data and metadata and their integration with other data in one place unlocks their full value and the potential for identifying data gaps and most strategic sampling opportunities. Such sharing can be incentivized by infrastructure that supports data discovery and measures data use and value in a global EBV context. (4) Support and guidance for bottom-up efforts may maximize cost-effectiveness. With strong and growing engagement of citizen scientists, some of the most effective gains for national monitoring may arise from supporting existing or nascent taxon-region specific networks and efforts and incentivizing them for more stratified sampling, full data and metadata sharing and engagement with experts for quality control and potentially greater taxonomic coverage. These may be rewarding areas of investment by governments and conservation organizations.

The necessary internationally integrated and science-driven approach to effective biodiversity monitoring systems at national and global scales holds an important role for GEO BON, which has this guidance and operationalization as its core mission. The SP EBVs framework enables the harmonization of data from an array of survey designs and technologies to deliver on the broad array of needs for biodiversity information, while avoiding potentially restrictive standards. GEO BON’s usage of the SP EBV framework and the associated research network and infrastructure and its engagement of National Biodiversity Observation Networks have the potential to make national monitoring activities more effective and to increase the value of their contributions to global biodiversity monitoring and knowledge advance.

Conclusions and recommendations

This conceptual and infrastructure framework for the production and use of essential information on species populations is intended to foster more effective and rigorous collection, communication and use of biodiversity data to support research, conservation, management and policy. By focusing attention on how, via standardized integration and modelling, different types of data from heterogeneous sources are improved in information value, the framework enables effective delivery of species population information for multiple management and policy objectives. Undoubtedly, the path toward a globally implemented and operational SD EBV (and even more so an SA EBV) will be long, and uncertainties or spatiotemporal grains may often impede inference. But the aspirational goal of a best-possible completion of the SP EBVs (Fig. 3) and the formulation of required elements and steps would herald a new and global phase of species population information collection, synthesis, and use. To attain this vision, we make the following key recommendations:

  • Enhanced data and metadata publication and sharing by countries, agencies, conservation organizations, research networks and individuals. Too much collected species occurrence-relevant data remains locked up, or its use is restricted by licensing, a key cause behind existing spatial biodiversity data gaps. We urge all to contribute to publishing and sharing mechanisms that inform SP EBVs globally, for example, via central aggregators GBIF and OBIS, via taxon-region specific efforts with unrestricted data and metadata sharing or via a direct link or publication to global SP EBV producing infrastructure such as Map of Life.

  • Collection, recognition and sharing of inventory data and detection-relevant metadata. We highlight inventories and associated non-detection information as key for effective species population inference. All too often, these data are simplified to presence records, or other detection-relevant information, for example, on taxonomic scope or observer qualification, is not being shared. We advocate for general infrastructure to support this capture.

  • Recognition of the relative, complementary value any primary biodiversity datasets have in the context of other biodiversity data, environmental data and models. This directly emerges from the SP EBV concept, and we suggest its use for designing monitoring efforts or incentivizing and supporting data collection.

  • Recognition of the role of agency-based and private remote sensing. Near-global remote sensing, in particular as conducted by United States’ NASA, the European Space Agency ESA and other space agencies that make their data freely available, is a key enabler of species population inference, empowering beneficiaries of biodiversity change information worldwide. This arena of impact and societal value deserves stronger recognition and support by agencies, governments and business.

  • Recognition of the role of science and research networks. The SP EBVs and their downstream uses only become unlocked through the engagement of scientists at the cutting edge of statistical methods development, big data integration and biodiversity informatics, strongly linked to relevant outcomes. By their nature, these activities are outside the scope of single agencies or conservation organizations that traditionally dominate biodiversity indicator and target discussions. Scientific projects based at academic institutions, research networks and organizations providing robust, policy-relevant science’ such as GEO BON and Future Earth, play a key role here by connecting monitoring and research with policy. This includes model development and infrastructure for the production of SP EBVs and filling of the space–time–species cube (Fig. 3). Given the complex and rapidly evolving informatics and modelling methodology, this task is best suited for development and hosting by academia and associated research networks.

  • Funding support for taxon-region expert and amateur networks, methods development and integration infrastructure. All three components are vital for an effective compilation and use of SP EBVs. Yet, with few exceptions, science funding agencies, conservation organization and foundations tend to be unwilling or unable to support such basic research and development activities. We encourage recognition of the societal benefits for all that a pooling of resources or dedicated engagement by some would enable.