Biodiversity data are varied, sometimes messy, and indicative of life’s grandeur. But, increasingly, they also need to be streamlined into easily interpretable indicators that can inform policy decisions. Hundreds of local data types of many different properties of thousands of species do not readily translate into large-scale indicators of, say, habitat loss or ecosystem function. To help make these connections between biodiversity data and conservation policy, several working groups are attempting to define essential biodiversity variables (EBVs) using a framework set up in 2013 by the Group on Earth Observations Biodiversity Observation Network (GEOBON).

EBVs are biological state variables that can be realistically measured or modelled at a global scale across ecosystem types and which can capture change at meaningful spatial and temporal scales. Six EBV classes have been defined. These are related to genetic composition, species populations, species traits, community composition, ecosystem structure and ecosystem function, and each has a working group. Two recent Perspective articles in Nature Ecology & Evolution describe the interim outputs of the working groups on species populations and species traits.

To define workable EBVs, the many types of primary data currently available must be identified, and proposals must be generated to unify and standardise these data in the future. For example, species population distribution can be measured either as presence-only data or as presence–absence data. The former can range from citizen science observations through to environmental DNA data, whereas the latter may come in the form of traditional national inventories or expert syntheses informed by species-distribution modelling. Similarly, species trait data can range from field-observed measures of reproduction and morphology to remotely sensed measures of phenology and movement. It is not a case of anyone trying to limit the different types of data that are collected, but of ensuring sufficient consistency of collection and sampling methodology. A key part of facilitating comparison and integration of different data types is the use of ontologies for data description, as is the extensive use of metadata. One immediate advantage of this, before even being applicable to EBVs, is that datasets become machine readable, allowing the enormous potential of artificial intelligence to be realised for analysis of biodiversity data.

Data also need to be open and accessible. Any attempt to integrate data from thousands of sources falls down as soon as there are barriers to access, as it is unfeasible to spend time and effort getting into specific datasets. The FAIR (findable, accessible, interoperable and reusable) data initiative adopted by the Earth science community sets a helpful standard here. The more data are stored in recognised, and preferably structured, repositories with stable identifiers and rich metadata, the easier they can be integrated into EBVs and thence into indicators and policy.

The next step is to move from primary data, both raw and processed, to EBVs. The working groups have placed considerable emphasis on defining variables that are biologically meaningful and as close as possible to universal. This is why the species traits group, for example, recommend focusing on five trait classes — morphology, phenology, reproduction, physiology and movement — and the species populations group have divided their EBVs into two fundamental classes — species distribution and species abundance. One key aspect to data integration, as described in particular by Jetz et al. of the species population working group, is the use of model-based approaches. These allow multiple data types to be integrated into coherent measures with temporal, spatial and taxonomic dimensions, and they iron out the problems of datasets that are imperfect or incomplete and those that have differing levels of granularity.

While the EBV project is divided into working groups, it should be noted that there is plenty of overlap and synergy between them. For example, species population data feed into measures of community composition, and trait data are essential components of ecosystem function. As EBVs are developed, it is important that territorial boundaries between the working groups are minimalized and allowed to evolve.

There is still a long way to go in all of the working groups, but what will fully developed EBVs allow us to achieve? One example to look to, as the biodiversity community often does, is the climate community. Essential climate variables (ECVs) document change in key parameters of the climate system such as precipitation, temperature and atmospheric composition, and they are regularly used to support the workings of the Intergovernmental Panel on Climate Change (IPCC) and national and international policy. For the biodiversity community, the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES) is the obvious parallel, but EBVs will also be essential in measuring progress towards the Aichi Targets and their replacements after the 2020 Convention on Biological Diversity (CBD) meeting, the UN Sustainable Development Goals (SDGs), as well as specific local or regional conservation goals.

We should probably not be surprised that the EBVs lag behind the ECVs. Standardisation of measurement types and methodologies is more straightforward when talking about the perhaps hundreds of physical and chemical variables used in climate science compared to the near limitless number of species and biological variables that biologists deal with. But progress in the hugely ambitious EBV project is now being made, and the outputs from the working groups are to be welcomed. As well as the tangible and pragmatic aim of facilitating the translation of biodiversity data into policy, the EBV project represents the aspiration to give biodiversity science the level of universality that is sometimes envied in the physical sciences.