Inadequate information on the geographical distribution of biodiversity hampers decision-making for conservation. Major efforts are underway to fill knowledge gaps, but there are increasing concerns that publishing the locations of species is dangerous, particularly for species at risk of exploitation. While we recognize that well-informed control of location data for highly sensitive taxa is necessary to avoid risks, such as poaching or habitat disturbance by recreational visitors, we argue that ignoring the benefits of sharing biodiversity data could unnecessarily obstruct conservation efforts for species and locations with low risks of exploitation. We provide a decision tree protocol for scientists that systematically considers both the risks of exploitation and potential benefits of increased conservation activities. Our protocol helps scientists assess the impacts of publishing biodiversity data and aims to enhance conservation opportunities, promote community engagement and reduce duplication of survey efforts.
Achieving effective conservation relies on accurate knowledge of where species occur to assist with their management1,2,3. This is particularly true for rare and endangered species that are at risk of extinction. Despite this, one in six International Union for Conservation of Nature (IUCN)-listed species are considered data deficient, and conservation practitioners routinely face a paucity of primary data on the temporal and spatial distribution of biodiversity4,5,6. Resolving this issue is urgent: without adequate spatially explicit biodiversity data, good management and policy decisions that enable the protection of species and ecosystems may be unachievable7,8,9.
Primary biodiversity data are evidence that associates a species or taxon with a geographic location within a specified time interval. This may include one or more types of evidence: a sighting; a DNA sample; a verified photographic image; or traces such as scats, tracks, nests or burrows that that can be attributed to a given taxon with confidence. Primary data may also provide biologically useful information such as age, sex, breeding status and population abundance. Today there are not just unprecedented online science data services for researchers, conservationists and the public (for example, wildlife atlases and scientific data repositories such as http://aekos.org.au)10, but an increased willingness to share primary biodiversity data (for example, via citizen science programmes such as eBird)11. Furthermore, scientific journals and funding agencies increasingly request transparent archiving of research data12,13,14.
Sharing species occurrence information publicly or privately presents a challenge for scientists because it requires balancing potentially difficult and uncertain trade-offs. For example, shortly after their discovery was published, poaching for the pet trade contributed to the local extinction of Chinese cave geckos Goniurosaurus luii in Vietnam15, prompting calls not to publish primary biodiversity data16. In contrast, primary occurrence data shared by researchers in publicly available databases and within the scientific literature were critical to recent re-assessments of extinction risk for endemic birds in Bolivia and Australia17,18, which allowed for accurate assessments of extinction status of up to two-thirds of the examined species that otherwise would have been uncertain. To ensure effective conservation informed by the best available knowledge of species distributions and abundances, we must understand the benefits of sharing data and the costs of not sharing data, rather than only the risks as has been the recent focus. Here, we propose a risk management decision protocol that balances potential negative outcomes for species against the conservation benefits of publishing primary occurrence data. By following our decision tree, scientists collecting biodiversity data will be able to ensure that they do not overlook potential conservation opportunities for study species, and that conservation mistakes do not occur through inappropriate release or restriction of data.
How biodiversity data are shared
Data publication is often carefully managed by data authors and custodians to maintain confidentiality and meet jurisdictional laws and national regulations (Supplementary Table 1). Ways of managing the release of data classified as ‘sensitive’ range from publishing precise locations but changing species identifiers to a classification of ‘restricted’ or to a higher taxonomic resolution such as genus or family (if spatial locations are important to share for conservation purposes), to keeping species names accurate but changing locations to mask true spatial coordinates (for example, by buffering or masking the location), or restricting species location information completely by withholding it from public access (see Supplementary Table 1).
The most comprehensive guide on assessing sensitivities around species and required generalization rules for publishing species locations is provided by the Global Biodiversity Information Facility (GBIF)19. GBIF’s protocol is first to identify which species are at risk from harm by human activity, and second to assess the impact of this activity on the taxon. These criteria are used to determine whether a species is flagged as sensitive and are then followed by further rules determining the degree of sensitivity. A subsequent rule determines whether release of information will increase the likelihood of harmful impacts on the species. The assessment for whether data should be released considers what level of generalization or ‘denaturing’ might be required. These range from no restriction for species classed as ‘low sensitivity’, to increasing restrictions through data generalization for ‘low to medium sensitivity’ (0.001°), ‘medium to high sensitivity’ (0.01°) and ‘highly sensitive taxa’ (0.1°). All location data are withheld if a species is identified as being of high biological significance and under high threat19. However, no consideration of the benefits of publishing data is made.
There are methods of publishing information on where species occur that do not directly release raw species locality data. Many non-governmental organization (NGO) expeditions assess and publish data on the biological value of areas to highlight the need for conservation action; for example, Conservation International’s Rapid Assessment Program shares expedition data online to promote awareness of regions with high biodiversity value and high threat20. Alternatively, species habitat suitability maps can now be published at high resolutions (down to 10 m grid-cell size). Such maps, showing locations that have a high probability of containing the species, often include (1) locations where the species occurs and this is known; (2) locations where the species occurs but this is not known; and (3) locations where the species does not occur but can colonize or be translocated if habitat quality is maintained. It is not possible to distinguish between the last two categories a priori so they are typically represented as a combined mapped area (https://mol.org/species/map/). As habitat suitability maps are derived from actual species records they are only meaningful and useful if they are produced using precise rather than denatured locations. Hence it is essential that the experts generating these maps have access to full details of the sightings.
Benefits of publishing biodiversity data
Here we define data publishing as the release of primary biodiversity data (defined earlier), or products based on these that link a taxon to a location at a given time, to public databases for use by others. In addition to direct conservation benefits, publishing biodiversity data has multiple benefits for researchers and society including research verification, public engagement, stimulation of new/collaborative research and informing non-researchers about key ecological or conservation issues21,22,23,24,25 (Table 1).
For species affected primarily by threats such as climate change and habitat loss, if greater availability of biodiversity data enabled more efficient and cost-effective management decisions, the benefits of revealing population locations may outweigh the overall risk of increasing human exploitation of locations26. For instance, habitat loss due to forestry and farming is the most frequent threat to global terrestrial biodiversity27. Rare species with poorly known distributions are especially likely to have declined from habitat loss, but new populations are often found in unexpected parts of their former ranges28,29. Any known location data are crucial to protect the remaining habitats of such species through activities such as building accurate species habitat suitability models30, which can be incorporated into conservation planning and management. Accurate species distribution models built on fine-resolution location data could result in more effective conservation measures because they can lead to investment in conservation at locations where species occur but have not been sighted and locations where species do not occur but can be colonized or translocated. Sharing data is particularly helpful for data-deficient species that often slip through the net of regulatory mechanisms due to poor information on where they are and what threatens them31. Ignoring these species in conservation plans risks failing to preserve important locations as well as diversity in ecological traits and evolutionary features of biodiversity32.
Withholding data and records can lead to perverse outcomes for species requiring management to ensure their persistence. For example, where new locations for threatened species remain undiscovered or are destroyed unknowingly in land development, or there is a false impression of range restriction or small population size. If the objective of government conservation agencies, NGO ecologists, scientists and land managers is to minimize the risk of species extinction (Table 1), then sharing data could help indirectly, by improving information on a species’ population size or distribution and enabling a more accurate assessment of threat status, or directly, through enabling increased conservation action in known locations. Additionally, agencies that need occurrence data to manage or assess populations may waste limited resources funding redundant data collection.
Risks of publishing biodiversity data
Despite recent data sharing initiatives and regulations (Supplementary Table 1), there is evidence that different types of data collectors have varying perceptions of how sharing data could undermine their own objectives (Table 1). Moreover, there is no doubt that poaching of species highly valued for traditional medicine and recreational hunting has caused species population declines and even extinctions (for example, the Javan rhino Rhinoceros sondaicus33,34,35; Table 2). In addition to documented population declines, human access to habitats has caused individual mortality, changes in wildlife behaviour, reduced reproductive rates and habitat disturbance or loss that affect species’ ability to persist in their environment36,37. Individual mortality has a greater impact on rare than common species and can cause feedbacks that eventually lead to population declines. Much of the evidence for data publication leading to species declines is anecdotal, with few instances of a direct link between a decline in a population after data on its location being published (Table 2).
Many perceived risks of publishing biodiversity data stem from cultural, social or economic objectives rather than conservation objectives (Table 1). For example, many fishers do not share fishing location data because of concerns their data may be used against them to prosecute for violations or lead to fishing restrictions. Many resource managers view their knowledge as private intellectual property and feel that sharing it with others may put them at an economic and social disadvantage26. To achieve a goal of maximizing research output38, a research scientist might be concerned about the extra time and cost required to share reproducible data, which could instead be used to publish more papers or write more grants.
A balanced decision tree for sharing biodiversity data
From risks to opportunities to conserve species
A sole focus on the risk to a species fails to consider situations in which the benefits of sharing data outweigh the benefits of not allowing access to biodiversity data. The context of any decision about data publication should not miss opportunities to conserve species, and needs to consider public and private costs such as those from redundant surveying effort or the loss of a species. As such, we propose that scientists follow a decision tree that considers the benefits of sharing biodiversity data (Fig. 1 and Box 1), which include highlighting species and places of conservation concern (Table 1). In our decision tree, we assess these kinds of benefit against possible risks of sharing data, such as increased pressure on populations (Table 2). Importantly, our protocol considers all relevant threats to the species, and whether conservation mechanisms are either already in place or could be put in place to mitigate or avoid these. A balanced and transparent evaluation of how, not whether, to share biodiversity data requires owners to clarify the risks and likely impacts to a species from data publication, and at the same time help place this information in a decision-making framework that considers actions to reduce risks of harm to species.
Risk management for species at threat of exploitation
Following a risk assessment approach39 to sharing biodiversity data, we agree with other discussions on data sharing16 that it is first necessary to identify the risk of published locality data enabling (or increasing) access to a species based on how valuable and accessible it is to collectors, poachers, recreational visitors, or other people with interest in the species (Supplementary Fig. 1). This will enable those considering publishing spatial biodiversity data to assess the likely harm to the species or population if visitors disturb or exploit it at published localities.
Our protocol accounts for various kinds of risk to species from data publication that have been identified in existing ethical data publication guidelines (Supplementary Table 1). The main risks are increased exploitation for trade or resource use (ex situ threats), or disturbance/destruction of habitat due to human access to localities (in situ impacts). High ex situ value species are those exploited by the wildlife trade or for resources such as food or timber (see the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES) Appendix 1 or 2 species; https://cites.org/eng). Well-known examples include the African white and black rhinoceros, all elephant species and many fish. Our decision tree also accounts for the fact that risks to some species might be mitigated by conservation measures, such as restricting access to important sites through regulations or physical barriers (for example, fencing off reserves) — actions that might enable public sharing of data. For species where it is not feasible to restrict access, data publication protocols that mask certain characteristics of the data might be used to protect the species identification or location by the public (Supplementary Table 1), although this might restrict the ability of conservation planners and managers to use the data. We suggest in our decision tree that building a high-resolution habitat model with the data would be a sensible way to publish the data while ensuring the exact locations of individuals were masked (Fig. 1). The full data could be stored securely and granted after request and assessment of motivations, with data protected by a data sharing agreement. For example, a government conservation agency could collate all threatened species occurrence data from researchers licensed to conduct studies on a species, and build a high-resolution map to provide information to the public about the habitat requirements and distribution of species to engage people while not providing site-specific occurrence data that would be available under licence for researchers. For some species, the risk is so high that both measures (masking locations and restricting access) should be enacted while ensuring that monitoring is undertaken to track changes in the species and their threats — this equates to the strictest protocol in our decision tree (Fig. 1). One example is fisheries spawning locations, which are almost impossible to restrict access to when in international waters, but have high value and a history of over-harvesting (example 2 in Table 3).
For many high-risk species, releasing public data on occurrence might increase the risk of species decline if locations were previously unknown. If no conservation measures are in place to avoid these declines, our tree suggests that data be desensitized to mask the identities of species but not locations (Fig. 1). This would mean the public can still learn that a precise location has conservation value but would not have specific information on which threatened species occur there, reducing the incentive to visit the location. In some cases, however, sharing location data is unlikely to increase the risk of decline, as population information is already in the public domain, or there is poor access to populations. For these populations, we recommend publishing data without restrictions (Fig. 1), as additional information could benefit species by improving the ability to track changes in a population or discover new populations in other locations based on improved knowledge about habitat preferences.
Some species have higher value in situ than ex situ, with lower or non-existent market value. Species with high in situ value often have high ecotourism value (for example, whale sharks, rare birds), and may be directly impacted by human disturbance and pathogen exposure associated with human movement into and out of their habitat (for example, disruption of bird behaviour through electronic bird song playback). Without appropriate conservation measures such as infrastructure or sensitive guidelines for researchers, threatened or rare species with high in situ value are vulnerable to perturbation by human visitors, and we recommend restricting data in a way that prevents disturbance, for instance through publishing a habitat map instead of raw locations (Fig. 1).
Many species may not be directly impacted by exploitation for trade or tourism but are still at risk of indirect in situ impacts if shared data increase visitation to their localities. For example, the surrounding environment could be negatively affected by vegetation disturbance, soil compaction, or introduction of invasive species, or the species might be impacted through associations with other species. These types of disturbance seem minor compared with direct exploitation but can result in a decline in the condition of the surrounding habitat, which may harm health and alter behaviour. For example, a fungal species may not be vulnerable to threats of increased disturbance or wildlife trade, but the tree species with which it shares a symbiotic relationship could suffer high exploitation rates resulting from harvesting for its timber. In this case, the vulnerability of the tree population in addition to the fungus should be considered when assessing how to publish data on the occurrence of the fungus (Fig. 1).
Maximizing data availability to help conservation
Data sharing among researchers, government agencies, NGOs and citizen science groups will improve our knowledge of population trends and ecology, and our ability to protect species from anthropogenic impacts. Many species such as threatened orchids are vulnerable to in situ human recreational activities through habitat degradation, irrespective of whether collection activities are restricted40,41. If conservation measures are not in place, increased visitation resulting from the release of new locations for highly valued recreational species could cause local population declines, and we recommend restricting data to mask either species identities or localities (Fig. 1 and example 4 of Malaysian Rafflesia in Table 3), depending on whether data have value for mitigating threats. In many cases, however, conservation measures have been implemented to avoid or mitigate declines (for example, the creation of an exclusion zone to eliminate the chance of visitors disturbing the site; see example 1 of the Australian night parrot Pezoporus occidentalis in Table 3). In these cases we recommend making occurrence data public to improve conservation, ecological learning and community engagement (Fig. 1).
When a species’ or population’s primary threats are neither in situ nor ex situ direct exploitation or disturbance (see example 3 of the Vangunu giant rat Uromys vika in Table 3), we recommend making data public, due to either little known risk of increased visitation to the site, or little chance that visitation would affect population viability (Fig. 1). Even when a species has in situ value, the risks of increased visitation might be outweighed by the benefits of publishing data. One example is the Critically Endangered West Indian Ocean coelacanth Latimeria chalumnae, an ancient fish thought to have been extinct for 60 million years. In 2000, divers observed coelacanths off South Africa’s coast, then tagged several individuals42. These rare, slow-growing fish are potentially valuable to collectors, but their deep cave habitats are difficult to access and fisheries bycatch poses a much greater threat to survival than poaching43. The location data have been made publicly available, triggering widespread interest among scientists, managers and the public. This publicity helped create new marine protected areas, fisheries management measures and a multinational research programme that has generated more than US$6 million in direct government funding, benefitting many additional species in southern Africa (A. Paterson, South African Institute for Aquatic Biodiversity, personal communication).
In some cases, it may be impossible to decide whether a species has value in the wildlife trade or is vulnerable to visitation disturbance. Until protocols can be updated with new data, we recommend a precautionary approach that restricts data publication, such as that taken in the case of the newly rediscovered, endangered night parrot in Australia (Table 3).
Flexibility to adapt to different contexts and changing information
A major challenge of data publication is the evolving nature of restricted data. Lists of ‘sensitive’ species are useful for some data publication protocols (see Supplementary Table 1), but these lists need regular updating to account for changes in conservation status, knowledge and threats, and need to be adopted on a national or global scale. The IUCN Red List is partially revised each year, but local and regional information on threats to species is often poorly mapped and ad hoc44. Genuine status changes may be rapid and can apply to previously unrestricted species of Least Concern. For example, five of the six most prominent and economically valuable formerly common ash tree species in North America entered the IUCN Red List in 2017 as Critically Endangered, due to huge mortality from an invasive insect, driven by warming climate45. Current data restrictions could also be lifted if new conservation actions are implemented, such as the habitat protections for night parrots (see above). Decisions to share data should therefore be updated iteratively and quickly.
As the problem of data sharing is complex, our proposed decision tree is not a one-fits-all solution, and we hope that additional inputs by scientists and other stakeholders will enhance its structure and application to diverse decision contexts. Designation of species-specific data sharing rules will need to be adapted to existing pressures found on national or subnational scales. Users of the decision protocol should also ensure that criteria used to assess existing conservation and policy mechanisms to protect species can detect situations where policy mechanisms or legislation exist but are not implemented. We developed our decision tree based on the objective of maximizing persistence of a species, but the protocol could be adapted to account for additional objectives, such as maximizing public engagement in conservation, or conserving whole ecosystems.
Ensuring data re-use and application to conservation decisions
An essential step to promote data sharing and enhance data re-use is to ensure that users know which data exist and are available. Metadata represent the set of instructions or documentation that describe the content, context, quality, structure and reusability of a data set. In addition to publishing biodiversity data, making public the background metadata is critical, and could be accompanied by a sample of the database to enable potential users to assess if those data are fit for purpose46. We present our protocol anticipating that repositories holding biodiversity data will have cybersecurity data administrators managing the security of holdings. Data policies should state repository security so that data submitters can decide whether the repository is trustworthy. As species locality data are found in multiple repositories, we recommend that the appropriate mode of sharing biodiversity data should be a species or population attribute rather than an attribute of a given set of data points specified by data authors. This places greater responsibility on researchers to determine how to share data and the decision tree we have proposed should help this.
Although acquiring the information needed to walk through our decision tree could sometimes be time consuming and difficult for individual researchers to obtain, all the information needed for applying our decision tree will be available to those evaluating species for CITES or for IUCN Red Listing. Hence it would make sense for the application of our decision tree to be integrated into these evaluation processes as well as national and subnational assessments of species threat status and updated regularly.
Combating illegal species exploitation
Human exploitation of species for trade, resources, or nature-based recreation continues even in locations with few or no scientific studies. Increased use of social media means that the opportunity to manage sensitive information is declining even if we want to restrict it47. The wide range and varied impacts of threats to species mean that researchers and practitioners have an imperative to understand not only where species occur, but also the spread and intensity of both local and off-site threats to species. Despite government agreements such as CITES, illegal resource take (for example, unreported fishing) and wildlife trade continues, with black market prices ranging from US$2 for a sea turtle in Mexico48, to US$31,000 for an Australian black cockatoo49, or US$400,000 for a gorilla50. It is important to articulate whether these kinds of threat, driven by ex situ markets, are likely to increase when new localities or ecological information on a population are published. In this way, data can be responsibly and appropriately restricted if threats to a species would increase after publishing new localities, or shared without restriction if new data would not affect species persistence (see Fig. 1).
Sharing of species information is without doubt critical in building biodiversity knowledge and managing the global extinction crisis. So far, almost all data publication decisions made by governments, societies or individuals have focused on the costs of sharing; benefits are never explicitly quantified, making it impossible to extrapolate data restriction decisions to other species, locations or contexts. Our decision protocol for publishing spatial biodiversity data aims to overcome this inefficiency and enables scientists to better decide how (and when) to publish data responsibly in repositories. The challenge is to share data in a way that avoids perverse outcomes for biodiversity when it is used. In many cases, sharing data will have greater conservation (and educational) benefits than restricting it from use by those wishing to use it to increase community engagement or to promote conservation actions. Above all else, being explicit about what those benefits might be, and weighing them against the likely risks of making data public, will ensure that species are not put in greater danger from new data being released into the public domain.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A.I.T.T. was supported by an Australian Research Council Discovery Early Career Researcher Award DE170100599. E.B., G.E., N.P.L. and L.R. were supported by the Australian Government National Environmental Science Programme’s Threatened Species Recovery Hub. N.P.L. was partially funded by Bush Heritage Australia. N.B. was supported by an Australian Research Council DECRA DE150101552. TERN (A.K.S.) is supported by the Australian National Collaborative Research Infrastructure Strategy. R. Alcorn (eBird), T. Laity (Australian Government Department of the Environment and Energy), S. Murphy and A. Kutt (Bush Heritage Australia) provided feedback on early drafts. J. Miller and R. Fuller contributed to early discussions.