Abstract
Understanding the spatiotemporal distribution of species is fundamental for ecology, evolution and conservation. However, this and other aspects of biodiversity knowledge suffer from shortcomings and biases. Quantifying and mapping biodiversity knowledge shortfalls is therefore crucial to ascertain the current quality and completeness of biodiversity data, prioritize sites for (re)sampling, or plan conservation interventions. Here, we compile a comprehensive dataset of Orthoptera occurrences, and use it to create a global ‘ignorance map’ based on taxonomic, survey and temporal completeness, and on survey and temporal evenness of the inventories. We hypothesize that knowledge of Orthopteran biodiversity is relatively poorer in tropical regions compared to temperate regions, and in the south compared to the northern hemisphere. Due to biocultural factors, we expect regions in the tropics and the Global South to have lower levels of completeness and evenness in time and space for all the studied aspects of biodiversity information. Our findings show gaps in the knowledge of orthopteran distributions, which are characterized by low survey and temporal evenness in tropical regions, but also in many temperate regions (e.g., most of the countries in temperate Asia). The combination of multiple dimensions of biodiversity knowledge (taxonomic, spatial and temporal) reveals that biogeographic interpretations based on only one component can lead to an illusion of completeness. We believe that the novel framework used in our study can guide future research towards building more accessible maps of biogeographical ignorance for entire groups.
Similar content being viewed by others
Introduction
As the world faces a global extinction crisis, good-quality data on the distribution and attributes of biodiversity is increasingly needed. In this context, quantifying how well we know biodiversity becomes imperative, for it allows scientists and policy makers to identify where biodiversity information is most accurate and complete1,2,3,4, thereby providing a baseline for defining priorities for new biodiversity research and conservation5,6. In the past two decades or so, an enormous volume of digitally-accessible primary biodiversity data opened new opportunities to quantify coverage and completeness of biodiversity information7. Freely-available biodiversity information networks compile huge amounts of primary biodiversity records from natural history collections, research articles and technical reports, among other sources, in the form of online database platforms8,9. However, these digital biodiversity data are often incomplete and both spatially and temporally biased10,11,12.
Addressing these limitations is of paramount importance to draw robust descriptions of biodiversity patterns from the available data, and ultimately ensure well-informed conservation decision-making. This kind of challenge can be addressed through the creation of ‘ignorance maps’ that characterize the distribution of areas where knowledge is limited13. The application of ignorance maps to biodiversity11,14 can be addressed by mapping out the various components of data limitations at broad temporal and spatial scales6,15,16. Here, three key data attributes allow us to map the accuracy, coverage and completeness of biodiversity data: the species name, the location of the records, and the date they were recorded.
First, the most fundamental component of digitally-accessible biodiversity data is related to taxonomy. Records may have species names inaccurately (i.e., incorrect identifications, use of synonyms, etc.) or incompletely assigned (i.e., identified to genus or family level)3,12,17. The challenge of assessing taxonomic accuracy is exacerbated by the absence or lack of standardization in the information about who identified species’ records12, which can lead to discarding much valuable data from online and citizen science databases. Moreover, certain records refer to species that are yet to be described3,18. Tropical regions bear the brunt of this challenge due to their high taxonomic uncertainty and megadiversity19,20. The identification of sites requiring further taxonomic effort represents a potential strategy to improve the coverage and completeness of taxonomic information. However, digitally-accessible biodiversity data are also biased towards certain taxa, with are either better sampled21,22 or more abundant in the communities23, or have undergone more extensive digitization efforts12. These taxonomic biases imply that data coverage assessments should be conducted specifically for each group that is sampled or studied altogether.
Second, survey coverage and completeness can range from inadequate to overly intensive, generating gaps and biases in perceived species distributions, and in the diversity of species documented for local and regional inventories across space and time24,25. Here, we use species accumulation curves to assess survey coverage, assuming that inventory completeness can be measured as the rate of the adding new species to the inventory with the increment in database records in each spatial unit16. Notably, well-sampled regions often align with larger and more connected natural areas with higher accessibility26,27,28. Thus, quantifying survey effort across geographic space through measures of survey completeness24,29 allows highlighting well- and poorly-sampled sites at large spatial scales8,30,31.
Third, the temporal pattern of biodiversity surveys is a critical aspect of biodiversity knowledge12,32,33,34. The quality of biodiversity knowledge decays over time due to the dynamism of natural systems, taxonomic rearrangements, and loss of information33, a decay that is exacerbated by human-driven disturbances. Further, certain sites may exhibit skewed records for specific years11. While some regions, such as Europe, North America, and Australia, boast multi-year sampling data for some taxa thanks to their long-term research tradition, others, such as Africa, hold limited temporal completeness and uneven temporal coverage9,12,26. Such disparities highlight the potential importance of considering the number of sampled years (temporal richness) when assessing data quality. However, although some studies describing such quality have measured temporal coverage11,12, or temporal decay of information16,32, many others disregard the temporal component35.
Fourth, the study of survey evenness across different regions and temporal periods can help identify sites and regions holding data appropriate for biodiversity and macroecological studies, or with enough temporal resolution so as to assess temporal shifts in their communities. Here, we use the Hill Numbers Diversity framework36 to assess both survey evenness and temporal evenness in each spatial unit, based on the relative numbers of records per species, and of years per species, respectively. The assumption behind these metrics is that sites with low evenness (i.e., an unbalanced representation of species or years) may generate biased estimates in macroecological or evolutionary models. Those areas would require further surveys or, at least, be used with caution in investigations, as they likely lack data of enough quality for assessing diversity–environment relationships, describing phylogeographic patterns, or estimating past global change impacts on biodiversity.
Orthoptera, which includes grasshoppers, crickets, katydids and relatives, have a rich taxonomic legacy, with approximately 29,500 valid species, making it the sixth most species-rich insect order37. Orthopterans play several key roles in ecosystems38,39,40,41, and hold high cultural significance in various countries42,43,44,45, as they include both food species and agricultural pests45. Indeed, orthopterans are relatively well-known taxonomically compared to other invertebrate groups, mainly owing to their agricultural importance46. In addition, their taxonomy is regularly updated at the Orthoptera Species File Online45,46,47. Further, information on their distribution can be obtained from multiple digitally-accessible data repositories, which make thousands of primary records available48,49,50 (see https://orthoptera.speciesfile.org/). However, research focus has been biased towards grasshopper pest species and, in some cases, pests of specific grasses or cultivars46,51,52, so knowledge on the distribution of most species and higher taxa of this group may still be limited42, particularly in megadiverse regions.
Here we aim to construct a global ignorance map for Orthoptera by combining analyses of different biodiversity shortfalls. Based on the known geographical biases in biodiversity inventories3,12, we hypothesize that the digitally accessible data for Orthoptera is less complete for tropical compared to temperate regions, and for the southern compared to the northern hemisphere. More specifically, we predict that northern temperate regions will have higher levels of taxonomic, survey and temporal completeness, while the tropics and the Global South will show higher evenness in both time and space. To test these hypotheses, we developed metrics of taxonomic, spatial, and temporal data quality. Finally, we combined all metrics into a single global ignorance map to provide a unique biogeographic panorama of global biodiversity knowledge of orthopterans.
Results
We compiled data from 23 digital repositories storing primary information on Orthoptera (Arthropoda, Insecta), totaling 4,911,359 occurrence records (Fig. 1a; Table 1). The data filtering process led to the exclusion of 2,513,619 records, resulting in a final dataset of 2,397,740 species occurrences (48.82% of the initial number of records), pertaining to 16,179 valid species (Supplementary Fig. 1). In addition, we were able to recover 408,377 unique records above the species level (i.e., genus, tribe, family, or order) that included information on the geographical coordinates and year of collection (Supplementary Fig. 2). These records were subsequently used to calculate taxonomic completeness (TaxC). The calculation of data quality indices based on this information revealed significant data gaps and biases in many regions, which were never before measured for Orthoptera (Fig. 2a–e).
Strikingly, it was not possible to calculate data quality indices for most of the extent of tropical regions, due to limited data availability in most of South America, Africa, Asia, and northern Oceania. In these regions, where data is accessible, taxonomic completeness indicates that more than 50% of the records have yet to improve taxonomic refinement. The highest TaxC values were observed in the Northern Hemisphere. In particular, central North America, most European countries, and small scattered areas in eastern Asia, South Korea and Japan exhibit significant proportions of records taxonomically refined at the species level. In the Southern Hemisphere, TaxC values are generally higher than in the tropics (i.e., TaxC ≥ 0.5), although they still fall short of the levels observed in the Northern Hemisphere with the exception of some taxonomically consistent inventories scattered throughout South America, Africa, and Oceania (Fig. 2a).
The tropical regions also show low survey completeness (SC), indicating a general need for additional surveys except perhaps in the north of Australia, which holds well-sampled inventories. In general, all the Southern Hemisphere beyond the tropics is massively under-sampled, except for some relative well-sampled sites in South Australia and New Zealand (Fig. 2b). Although SC was high for most regions of the Northern Hemisphere, parts of North America, regions bordering Europe, Asia, and Africa, as well as Japan, are still insufficiently sampled. Temporal completeness (TC) is low along the entire globe. Only a few patches scattered throughout the North America and some areas in central Europe show temporal completeness values higher than 0.25 (Fig. 2c). In contrast, we did not find a clear spatial pattern of survey and temporal evenness (SE and TE, respectively) at a global scale. While some grid cells exhibit sampling biases towards few species or few time periods, others placed nearby present long-term series of surveys with similar abundances among species, independently of the region (Fig. 2d, e).
The overall ignorance map shows a general lack of information across most of the tropics (Fig. 3), due to the low taxonomic, sampling and temporal completeness. However, a few small isolated patches in southern Africa and northern Oceania present relatively low ignorance (0.5 ≥ IM ≥ 0.25). Ignorance values in the northern Hemisphere are low in several regions of North America and Central Europe (IM ≤ 0.25). However, the coastal zones of the United States of America, Southern and Eastern Europe, and most regions of India and China present high ignorance values (Fig. 3).
Discussion
Our analyses provide evidence that digitally-accessible knowledge on the global distribution Orthopteran biodiversity is limited in tropical regions in particular, and in the Global South in general. Conversely, the temperate regions the Global North present areas of relatively high quality and completeness of biodiversity knowledge, especially in North America and Central Europe, as well as in some extra-tropical areas of the southern hemisphere. Although these results confirm our preliminary hypothesis about the distribution of ignorance, there are also many deficient inventories in these data-rich regions, and knowledge on Orthoptera is still very limited in many temperate areas, particularly in Asia. Importantly, the comparison of our overall ignorance map with those of the indices describing different dimensions of knowledge suggests that interpretations based solely on one of these components could create an illusion of good knowledge coverage for certain regions.
The lack of taxonomic completeness in the tropics contributes to the generation of a taxonomic latitudinal bias (as proposed by Freeman and Pennell19). This bias may be partly caused by a significant ‘species debt’ in tropical regions due to the lack of standardized criteria for recognizing species compared to temperate regions53. A significant portion of such debt is due to records that have already been cataloged but have not been refined or taxonomically revised17. This phenomenon is likely highly pronounced in small-sized and abundant animal taxa such as insects54, and is also exacerbated by the shortage of taxonomists55,56. The high species richness of tropical regions makes taxonomic refinement difficult, as the discovery, description and identification of species requires more time and a higher degree of expertise compared to temperate regions, which typically harbor fewer species and have a longer history of naturalist studies and taxonomic work19,53. Nevertheless, the high taxonomic completeness observed in many temperate regions should be interpreted with caution when such values are accompanied by low survey completeness, as in, e.g., Southern Europe, the East and West Coasts of North America, or the southwest edge of Australia, because the lack of surveys may be reflecting lower levels of the taxonomic effort that should be an integral part of standardized inventories.
The low availability of digitally accessible data arises as a fundamental issue for our knowledge on the geographical data for distribution of Orthoptera, especially in tropical regions and most of Asia. The pattern of low survey completeness observed here reflects a general scarcity of species records, which may be the result of either lack of inventories or lack of data mobilization, or both, arising from limited resources or other impediments for data digitization9,12. Indeed, such limited geographical coverage stands out when considering inventories with low taxonomic completeness. Both taxonomic completeness and survey completeness are affected by the challenges to Orthoptera taxonomy. Besides counting on up-to-date taxonomic keys to identify rare individuals from new samples, it is often also crucial to compare with type specimens or good descriptions57, which may not be possible, especially for species described long time ago, or that lack information on male genitalia (a crucial characteristic for the diagnosis of many orthopterans58,59). This is further complicated because many type specimens or descriptions are not accessible, there is an absence of comprehensive and up-to-date taxonomic keys60 and low economic investment in many high-biodiversity regions9,61. The exceptions are pest species and some groups that have historically received more attention46. This may have added an additional geographical bias to the effort devoted to orthopteran inventories, as pest species tend to predominate in grassland environments46, which are more abundant at temperate latitudes. Thus, although the historical gaps in the study of Orthoptera in the tropics42 are likely caused by the generalized lack of resources for biodiversity research, they have been also exacerbated by the highest attention devoted by agricultural sciences to the orthopterans of temperate regions.
Strikingly, the high taxonomic and survey completeness in countries of the northern hemisphere is not always accompanied by high values of survey and temporal evenness, contradicting our hypothesis for these regions. This can be partly explained by the higher global availability of data on Orthoptera for certain regions, as seen in birds, mammals and amphibians9, in plants12, and also in multi-taxa analyses26. This tends to significantly increase the number of unique records for some local species and years already sampled, causing a significant imbalance in data evenness. Also, species records are subject to biases related to accessibility, as locations placed near to access routes (i.e., roads and rivers), research centers, conservation units, and large connected forest fragments are more frequently sampled9,27,28,62,63. Further, the low taxonomic completeness that is widespread in tropical regions can distort the observed patterns of survey and temporal evenness, as many recently described species from these regions may be present in digitally-accessible data only from their type specimens, while their presence in older records is still pending from their revision using an updated taxonomy. All these factors together can result in the absence of smooth geographical patterns of data equability observed in our large-scale maps.
Our measure of temporal completeness reveals that most digitally accessible species records were collected from a limited number of years (i.e., low richness of years), for both tropical and temperate regions. Such lack of years with records hinders our ability to comprehend the decline of Orthopteran diversity over time64, because the temporal coverage of sampling records is discontinuous and unevenly distributed along different locations. A wider temporal spread of records was only observed in some scattered areas throughout North America and Europe, similar to the historical distribution of sampling effort reported for other taxa9,61, thus reflecting that the higher financial and scientific support of the United States of America and several European countries65,66 extends to the field of Orthopterology. Furthermore, such capacity has been accompanied by the effects of scientific colonialism on the historical and current collection of data about Orthoptera, where taxonomic legacy has a clear impact on the global distribution of knowledge. Only recently have both biodiversity research in general, and depositions of type material in Museums in particular, started to be conducted mainly at local institutions in countries throughout the tropics65.
The observed pattern of temporal completeness may also reflect to some extent the deficiency in data digitization, especially in the tropics and parts of temperate Asia. Here, a digitalization effort specifically tailored to retrieve data prior to the year 2000, originating from agricultural surveys, fieldwork or environmental impact reports, would complement the currently available knowledge, thus providing a more comprehensive historical coverage and temporal evenness considering the chronological gaps associated with the distribution data and the low temporal completeness of information for many locations25,35,67.
Furthermore, the mapped values of different aspects of ignorance highlight that even temperate regions of the northern hemisphere, where we expected to have a good quality of biodiversity knowledge (i.e., low taxonomic, sampling, and temporal completeness, and high survey and temporal evenness), suffer from limited taxonomic refinement, inventory integrity, limited diversity of years sampled, and an imbalance in the number of records among known species and years. This suggests that low ignorance values, whether in the tropics or in temperate regions, may result from a lack of digital access to a large portion of biological collections worldwide65,68,69. This represents a significant impediment to the advancement of scientific knowledge and should be rectified, at least in the collections of developed countries. Here, technological advances such as cameras, cell phones, and tablets can allow the recording of primary data by people who live and interact directly with biodiversity, such as those living in protected land, field biologists, and nature photographers70. However, despite such potential, many orthopterans can only be accurately identified after examining the male genitalia, and this is not often possible from in situ images taken from citizen science platforms, and there is still a significant lack of globally-available biogeographic knowledge coming from Orthoptera in databases such as EcoRegistros or iNaturalist (see Table 1).
Regardless of other alternative initiatives for gathering new data, there is clearly an urgent need to improve biodiversity data curation and to provide financial support to establish an open and collaborative data mobilization infrastructure in biological collections68,69,71, through either local policies in each country, or global agreements such as the Convention on Biological Diversity (https://www.cbd.int/convention/). This is especially important considering that accidents in collections can result in immeasurable losses72,73, that could be partially mitigated through digitized information. In addition, the already-existing platforms aimed at integrating and making literature data available need to pursue optimization initiatives. A good example is that of the Orthoptera Species File (http:// http://orthoptera.archive.speciesfile.org/), which migrated thousands of data records to the Darwin Core standard (https://orthoptera.speciesfile.org/), the most comprehensive taxonomic standard for biodiversity data69,74.
Nevertheless, the solution to the above-mentioned problems goes beyond relying solely on bioinformatics75. Training and educating new taxonomists should also be a priority56,76, along with the development and updating of user-friendly identification keys for other scientists, and the general public. Extensive engagement of taxonomists in the development of online data availability initiatives7, and collaboration with citizen science platforms is also necessary, both by validating and curating species identifications77,78, and by applying these platforms in teaching and training activities79. All these initiatives will ultimately engage some sectors of the general public into higher-quality citizen biodiversity science80,81. Such measures will significantly contribute to the construction, security, and availability of open knowledge over time, gradually filling the gaps highlighted here, subsidizing studies in taxonomy, biogeography, and conservation3, as well as to conservation decision-making based on Orthopteran data by researchers in the future.
Lastly, it is also worth noting that assessments of ignorance for specific groups or at smaller study scales may show local differences from the general pattern observed at a global scale. Our analyses highlight the shortfalls of knowledge on the macroecological patterns that arise from the large-scale mechanisms governing nature82. But these ignorance maps can be also useful to address these data gaps within specific groups and/or at smaller regions.
In conclusion, we believe that this study offers an innovative and simplified framework that can guide future research on Orthoptera, and can also be replicated for other taxa, making the construction of ignorance maps more accessible. Nevertheless, more research on the factors that cause gaps and biases in digitally accessible Orthoptera data is needed. With this work, we hope we have provided a baseline for exploring future challenges as data become more extensive and available. Without robust and reliable data, our quest to uncover the ecological and evolutionary mechanisms driving biogeographical patterns and to develop effective approaches to conserve biodiversity in a changing environment will remain fundamentally compromised.
Methods
A global dataset of Orthoptera occurrence
We compiled data from 23 digital repositories storing primary information on the distribution of Orthoptera (Table 1). Additional information on the extraction process used for each repository can be accessed in the Supplementary Note 1. The primary records were divided into two distinct subsets: (1) occurrences encompassing species/subspecies-level data; and (2) occurrences extending beyond the species taxonomic level, up to the order level (Fig. 1b).
We also built a list of currently accepted species, alongside their corresponding taxonomic arrangements (synonyms) according to the classification proposed by the Orthoptera Species File catalog—(OSF)48, as available on the Catalogue of Life platform (https://doi.org/10.48580/d4sw-388) on August 2021. Using the species list, we organized the taxonomic nomenclatural information for each species and subspecies across all repositories, excluding fossil species, and synonymized subspecies with species. The final species list includes 28,309 valid extant species (Supplementary Dataset).
We applied filters to identify and remove records that lacked the necessary information for subsequent analyses: (i) taxonomic adequacy filter—we used the valid species list (described above) to exclude occurrences that were not in accordance with the proposed taxonomic classification. For records without species names, we did not use taxonomic adequacy (see Data Quality analysis section); (ii) spatial filter—we removed occurrences without geographical coordinates, and also excluded records with only descriptive locations (e.g., municipality, state, country) or dubious coordinates; (iii) temporal filter—we kept only the occurrences that include the year of collection; and (iv) land boundaries filter—only georeferenced occurrences for the terrestrial part of the planet, including islands, were computed. To do this, we used the Natural Earth platform, at a 1:10 m resolution (https://www.naturalearthdata.com/). Records that did not cover these areas were excluded (Fig. 1b).
Then, all occurrences were divided into two groups: species/subspecies occurrences and records with identification at a higher taxonomic level, where we applied two additional filters: (v) ambiguities filter—duplicate records were removed for species/subspecies and records above the species level, separately; and (vi) fossil filter—excluding all fossil species, which were not considered in our analyses.
Indices of data quality
We calculated five indices of data quality: taxonomic completeness (TaxC); survey completeness (SC), based on species accumulation curves11,16,29; temporal completeness (TC); and survey and temporal evenness (SE and TE, respectively), both of them calculated using a Hill-Diversity method slightly adapted by us36,83. All indices were computed based on the information for terrestrial grid cells at 1° resolution and then plotted into global maps (Fig. 1c, d).
Taxonomic completeness
For each cell (pi), we calculated taxonomic completeness using the equation where TaxC is the result of the number of primary occurrences identified at the species taxonomic level (s), divided by the total number of records (t).
The total number of records (t) corresponds to the sum of all species records (s) and all identifications available only above the species level (i.e., genus, tribe, family, or order) (Supplementary Fig. 3). TaxC values range from 0 to 1; when TaxC = 0 means that none of the occurrences are identified at the species level, thus indicating a large Linnean shortfall. While values close to 1 indicate that most records are identified to the species taxonomic level, thus corresponding a well-sampled cell from a taxonomic point of view. Thus, TaxC index allows us to assess, at least in part, the geographical distribution of the extent of the Linnean shortfall, based on the data quality in relation to taxonomic refinement of occurrences (Fig. 1c).
Survey completeness
We calculated survey completeness (SC) for each cell based on species accumulation curves, where the final slope angle of each curve represents the rate of accumulation of new species in the inventory29. The order of entrance of records in the curves was smoothed by running 100 random permutations using the function specaccum from the R package vegan84, and the final slope was calculated as the average difference in the number of species observed in the final and next-to last records. This final slope was subtracted from 1, thus obtaining completeness values ranging from 0 to 1. SC was performed in R environment using the function Accum_curve provided in Tessarolo et al. 16. Values of SC close to 1 indicate an almost complete inventory (Fig. 1c).
Temporal completeness
We measured the temporal completeness (TC) at each cell (pi) based on the number of sampled years divided by the highest range of sampled years found in our database (Supplementary Fig. 4), that is,
This index ranges from values very close to 0 to 1, with values near 1 indicating maximum temporal coverage of the period covered by the surveys (Fig. 1c).
Survey evenness
We measured survey evenness (SE) in each cell pi using the Hill Diversity model, following36.
This index is less sensitive to variations in sample size than classical evenness indices such as Shannon or Simpson36,83. Here we used the parameter q = 4 to assign greater sensitivity to equity for the Hill diversity index. To scale the index between 0 and 1, we divided the Hill index values by the number of species within each grid cell. Values of SE = 0 indicate a maximum bias of occurrence abundance for some species, whereas SE = 1 indicates no local bias in the distribution of the numbers of records per species. Thus, SE values allow us to detect possible imbalances in the observed species abundances, which can be interpreted as a measure of bias in community data. It is important to note that we did not extrapolate our interpretations of SE to infer whether the community is more uniform or not, but rather to evaluate the potential bias of species occurrences within our inventory (Fig. 1c).
Temporal evenness
To calculate temporal evenness, we used the same Hill diversity model mentioned above, with parameter q = 4. However, we considered the years as “species” and the number of samples per year as “abundance”. This allowed us to measure the bias in the number of samples over the years, considering the entire known temporal range of each cell. Values of TE close to 0 indicate maximum temporal bias, with occurrence samples concentrated in very few years, whereas values of TE = 1 indicate no local bias in the distribution of samples per year, meaning that each year holds the same number of samples (Fig. 1c).
Finally, we highlight a thorough analysis of Wallacean and Linnean noise is beyond the scope of this study. Here, regardless of the source of the records and the authors who gathered and/or identified the specimens, we are considering that the taxonomic identifications, geographic localities, and the dates are correct after our filtering processes (see filters in methods). Thus, the quality and coverage of the distribution data may not be free from errors, and we recommend that they be handled and used with caution. Additionally, we recommend reading Ronquillo et al. 67 and Castro-Souza et al. 85 for a more systematic view of these issues.
Assessing the sensitivity of data quality indices to low numbers of records
Small sample sizes may result in overestimating taxonomic completeness, and can cause noise in species accumulation curves by generating artificial completeness values7,8,16, and result in values of temporal coverage and survey and temporal evenness that are not representative due to the low proportions of records. Therefore, it is necessary to establish a minimum threshold of number of records per grid cell to perform all index calculations. As the thresholds used in data quality analysis can be subjective, potentially introducing noise into the ignorance map16, we conducted tests with different thresholds to evaluate how the cutoff settings of the minimum number of occurrences affect the results of the indices. For calculating TaxC we used no threshold, 5, 10, 15, 25, 35, 45, 55, 65, and 75 total occurrences as threshold values. In SC analyses we used 10, 15, 25, 35, 45, 55, 65, and 75 species occurrences as threshold. Lower values were not tested due to the long processing time and the fact that very small values can lead to misleading estimates11,16. For TC, SE, and TE analyses we also used no threshold, 5, 10, 15, 25, 35, 45, 55, 65, and 75 species occurrences. Additionally, for SE and TE, we used q = 4 as we were interested in placing greater weight on evenness while also considering species or year richness in our model. Finally, with the maps constructed using different thresholds for each index, we calculated density curves. This allowed us to observe how the index behaved with respect to the different threshold configurations. Based on these results (density curves shown in Fig. 4, maps not shown), we selected a common threshold of 45 occurrences in for all index calculations; grid cells with less than these occurrences were thus discarded as having too poor information. All analyses mentioned above were performed in R86.
Building a composite ignorance map
Different methods have investigated various aspects of the ignorance about the distribution of biodiversity, as conceptualized by Rocchini et al. 14 and Ladle and Hortal15. These authors feed from the idea of creating of global ignorance atlas that allow identifying lesser-known areas for a specific subject of investigation (e.g., political, social, or biological) initially proposed by Boggs13. While Ruete87 proposed to assess the spatial availability of primary biodiversity records through ignorance scores, Stropp et al. 11 focused on mapping the spatial and temporal biases of information, and Meyer et al. 12 provided a framework for analyzing gaps, uncertainties, and biases in taxonomic, spatial, and temporal information. More recently, Tessarolo et al. 16 incorporated explicit measures of coverage to these dimensions, to provide measures of the data-driven uncertainty in the projections of species distribution models. Here we propose a different and easy-to-use simplified framework for the rapid quantification of the overall taxonomic, spatial and temporal ignorance for any biological group, using the order Orthoptera as an example that may be applied to other groups.
Our composite map depicts areas of biodiversity ignorance as the result of the ensemble of the five indices described above (i.e., TaxC + SC + TC + SE + TE) (Supplementary Figs. 5–9), rescaled to range from 0 to 1, and finally subtracted from 1 (Fig. 1d, e). Here, values close to 1 indicate maximum ignorance (i.e., extremely poor data quality), and values close 0 indicate minimum ignorance (i.e., good data quality). By mapping these values, the ignorance map will allow us to identify issues of data quality, coverage and completeness, as it will provide a quantitative indication of the composite deficiencies in taxonomic, survey and temporal completeness, plus biases in the species abundance and years sampled (Fig. 2).
Data availability
The datasets used during the study are available from the corresponding author on reasonable request. The most updated versions of the original data can also be downloaded from the URLs available in Table 1.
Code availability
The underlying code for this study is not publicly available but may be made available to qualified researchers on reasonable request from the corresponding author.
References
Singh, J. S. The biodiversity crisis: a multifaceted review. Curr. Sci. 82, 638–647 (2002).
Dirzo, R. et al. Defaunation in the Anthropocene. Science 345, 401–406 (2014).
Hortal, J. et al. Seven shortfalls that beset large-scale knowledge of biodiversity. Annu. Rev. Ecol. Evol. Syst. 46, 523–549 (2015).
Venter, O. et al. Sixteen years of change in the global terrestrial human footprint and implications for biodiversity conservation. Nat. Commun. 7, 12558 (2016).
Cardoso, P., Erwin, T. L., Borges, P. A. V. & New, T. R. The seven impediments in invertebrate conservation and how to overcome them. Biol. Conserv. 144, 2647–2655 (2011).
Hortal, J., Ladle, R. J., Stropp, J. & Tessarolo, G. Accounting for biogeographical ignorance within biodiversity modelling. Res. Outreach 129, (2022).
Sober¢n, J. & Peterson, A. T. Biodiversity informatics: Managing and applying primary biodiversity data. Philos. Trans. R. Soc. B: Biol. Sci 359, 689–698 (2004).
Sousa-Baena, M. S., Garcia, L. C. & Peterson, A. T. Completeness of digital accessible knowledge of the plants of Brazil and priorities for survey and inventory. Divers. Distrib. 20, 369–381 (2014).
Meyer, C., Kreft, H., Guralnick, R. & Jetz, W. Global priorities for an effective information basis of biodiversity distributions. Nat. Commun. 6, 8221 (2015).
Hortal, J., Lobo, J. M. & Jiménez-Valverde, A. Limitations of biodiversity databases: case study on seed-plant diversity in Tenerife, Canary Islands. Conserv. Biol. 21, 853–863 (2007).
Stropp, J. et al. Mapping ignorance: 300 years of collecting flowering plants in Africa. Glob. Ecol. Biogeogr. 25, 1085–1096 (2016).
Meyer, C., Weigelt, P. & Kreft, H. Multidimensional biases, gaps and uncertainties in global plant occurrence information. Ecol. Lett. 19, 992–1006 (2016).
Boggs, S. W. An atlas of ignorance: a needed stimulus to honest thinking and hard work. Proc. Am. Philos. Soc. 93, 253–258 (1949).
Rocchini, D. et al. Accounting for uncertainty when mapping species distributions: the need for maps of ignorance. Prog. Phys. Geogr. 35, 211–226 (2011).
Ladle, R. J. & Hortal, J. Mapping species distributions: living with uncertainty. Front Biogeogr. 5, 8–9 (2013).
Tessarolo, G., Ladle, R. J., Lobo, J. M., Rangel, T. F. & Hortal, J. Using maps of biogeographical ignorance to reveal the uncertainty in distributional data hidden in species distribution models. Ecography 44, 1743–1755 (2021).
Lessa, T., Stropp, J., Hortal, J. & Ladle, R. J. How taxonomic change influences forecasts of the Linnean shortfall (and what we can do about it)? J. Biogeogr. 00, 1–9 (2024).
Bebber, D. P. et al. Herbaria are a major frontier for species discovery. Proc. Natl Acad. Sci. USA 107, 22169–22171 (2010).
Freeman, B. G. & Pennell, M. W. The latitudinal taxonomy gradient. Trends Ecol. Evol. 36, 778–786 (2021).
Stropp, J., Ladle, R. J., Emilio, T., Lessa, T. & Hortal, J. Taxonomic uncertainty and the challenge of estimating global species richness. J. Biogeogr. 49, 1654–1656 (2022).
Troudet, J., Grandcolas, P., Blin, A., Vignes-Lebbe, R. & Legendre, F. Taxonomic bias in biodiversity data and societal preferences. Sci. Rep. 7, 9132 (2017).
de Siracusa, P. C., Gadelha, L. M. R. & Ziviani, A. New perspectives on analysing data from biological collections based on social network analytics. Sci. Rep. 10, 3358 (2020).
Molles, M. C. & Sher, A. Species Abundance and Diversity. in Ecology: Concepts and Applications 343–361 (McGraw-Hill Education, 2019).
Lobo, J. M. et al. KnowBR: An application to map the geographical variation of survey effort and identify well-surveyed areas from biodiversity databases. Ecol. Indic. 91, 241–248 (2018).
Ronquillo, C. et al. Assessing spatial and temporal biases and gaps in the publicly available distributional information of Iberian mosses. Biodivers. Data J. 8, e53474 (2020).
Hughes, A. C. et al. Sampling biases shape our view of the natural world. Ecography 44, 1259–1269 (2021).
Sobral-Souza, T. et al. Knowledge gaps hamper understanding the relationship between fragmentation and biodiversity loss: The case of Atlantic Forest fruit-feeding butterflies. PeerJ e11673 (2021).
Zizka, A., Antonelli, A. & Silvestro, D. sampbias, a method for quantifying geographic sampling biases in species distribution data. Ecography 44, 25–32 (2021).
Hortal, J. & Lobo, J. M. An ED-based protocol for optimal sampling of biodiversity. Biodivers. Conserv. 14, 2913–2947 (2005).
Moreno, C. E. & Halffter, G. Assessing the completeness of bat biodiversity inventories using species accumulation curves. J. Appl. Ecol. 37, 149–158 (2000).
Soberón, J., Jiménez, R., Golubov, J. & Koleff, P. Assessing completeness of biodiversity databases at different spatial scales. Ecography 30, 152–160 (2007).
Escribano, N., Ariño, A. H. & Galicia, D. Biodiversity data obsolescence and land uses changes. PeerJ 4, e2743 (2016).
Tessarolo, G., Ladle, R., Rangel, T. & Hortal, J. Temporal degradation of data limits biodiversity research. Ecol. Evol. 7, 6863–6870 (2017).
Stropp, J. et al. The ghosts of forests past and future: deforestation and botanical sampling in the Brazilian Amazon. Ecography 43, 979–989 (2020).
Boakes, E. H. et al. Distorted views of biodiversity: Spatial and temporal bias in species occurrence data. PLoS Biol. 8, e1000385 (2010).
Roswell, M., Dushoff, J. & Winfree, R. A conceptual guide to measuring species diversity. Oikos 130, 321–338 (2021).
Bánki, O. et al. Catalogue of Life. Catalogue of Life Checklist (Version 2023-05-15) (2023).
Branson, D. H., Joern, A. & Sword, G. A. Sustainable management of insect herbivores in grassland ecosystems: New perspectives in grasshopper control. BioScience 56, 743–755 (2006).
Lavoie, K. H., Helf, K. L. & Poulson, T. L. The biology and ecology of North American cave crickets. J. Cave Karst Studies 6, 114–134 (2007).
Santana, F. D., Baccaro, F. B. & Costa, F. R. C. Busy nights: high seed dispersal by crickets in a neotropical forest. Am. Naturalist 188, 126–133 (2016).
Tan, M. K. et al. Overlooked flower-visiting Orthoptera in Southeast Asia. J. Orthoptera Res. 26, 143–153 (2017).
Song, H. Grasshopper systematics: past, present and future. J. Orthoptera Res. 19, 57–68 (2010).
Bidau, C. J. Patterns in Orthoptera biodiversity. I. Adaptations in ecological and evolutionary contexts. J. Insect Biodivers. 2, 1–39 (2014).
Bidau, C. J. Patterns in Orthoptera biodiversity. II. The cultural dimension. J. Insect Biodivers. 2, 1–15 (2014).
Song, H. Biodiversity of Orthoptera. in Insect Biodiversity: Science and Society (eds. Foottit, R. G. & Adler, P. H.) vol. 2 245–279 (John Wiley & Sons Ltd., Chichester, 2018).
Green, S. V. The taxonomic impediment in orthopteran research and conservation. J. Insect Conserv. 2, 151–159 (1998).
Cigliano, M. M. & Eades, D. New technologies challenge the future of taxonomy in Orthoptera. J. Orthoptera Res. 19, 15–18 (2010).
Cigliano, M. M., Braun, H., Eades, D. C. & Otte, D. Orthoptera Species File. (Version 5.0/5.0) http://Orthoptera.SpeciesFile.org (2022).
The Global Biodiversity Information Facility. What is GBIF? https://www.gbif.org/what-is-gbif (2022).
iNaturalist. https://www.inaturalist.org (2022).
Samways, M. J. & Lockwood, J. A. Orthoptera conservation: pests and paradoxes. J. Insect Conserv. 3, 143–149 (1998).
Nouh, G. M. & Adly, D. Evaluation of the virulence of entomopathogenic nematodes as a biological control agents against Gryllotalpa gryllotalpa (Gryllotalpidae). J. Appl. Entomol. 145, 1050–1056 (2021).
Diniz-Filho, J. A. F. et al. Macroecological links between the Linnean, Wallacean, and Darwinian shortfalls. Front Biogeogr. 15, e59566 (2023).
Diniz-Filho, J. A. F., de Marco, P. & Hawkins, B. A. Defying the curse of ignorance: perspectives in insect macroecology and conservation biogeography. Insect Conserv. Divers 3, 172–179 (2010).
Gaston, K. J. & May, R. M. Taxonomy of taxonomists. Nature 356, 281–282 (1992).
Engel, M. S. et al. The taxonomic impediment: a shortage of taxonomists, not the lack of technical approaches. Zool. J. Linn. Soc. 193, 381–387 (2021).
Braby, M. F., Hsu, Y.-F. & Lamas, G. How to describe a new species in zoology and avoid mistakes. Zool. J. Linn. Soc. 1–16 https://doi.org/10.1093/zoolinnean/zlae043 (2024).
Randell, R. L. On the presence of concealed genitalic structures in Female Caelifera (Insecta; Orthoptera). Trans. Am. Entomol. Soc. 88, 247–260 (1962).
Alexander, R. D. & Otte, D. The Evolution of Genitalia and Mating Behavior in Crickets (Gryllidae) and Other Orthoptera. (Ann Arbor, University of Michigan, Miscellaneous Publications, No. 133, 1967).
Balakrishnan, R. Species concepts, Species Boundaries and Species Identification: A View from the Tropics. Syst. Biol. 54, 689–693 (2005).
Rodrigues, A. S. L. et al. A global assessment of amphibian taxonomic effort and expertise. Bioscience 60, 798–806 (2010).
Vale, M. M. & Jenkins, C. N. Across‐taxa incongruence in patterns of collecting bias. J. Biogeogr. 39, 1744–1748 (2012).
Amano, T. & Sutherland, W. J. Four barriers to the global understanding of biodiversity conservation: Wealth, language, geographical location and security. Proc. R. Soc. B: Biol. Sci. 280, 20122649 (2013).
Lewinsohn, T. M., Agostini, K., Lucci Freitas, A. V. & Melo, A. S. Insect decline in Brazil: An appraisal of current evidence. Biol. Lett. 18, 20220219 (2022).
Bakker, F. T. et al. The global museum: natural history collections and the future of evolutionary science and public education. PeerJ. 8, e8225 (2020).
Raja, N. B. et al. Colonial history and global economics distort our understanding of deep-time biodiversity. Nat. Ecol. Evol. 6, 145–154 (2022).
Ronquillo, C., Stropp, J., Medina, N. G. & Hortal, J. Exploring the impact of data curation criteria on the observed geographical distribution of mosses. Ecol. Evol. 13, e10786 (2023).
Krishtalka, L. & Humphrey, P. S. Can natural history museums capture the future? Bioscience 50, 611–617 (2000).
Nelson, G. & Ellis, S. The history and impact of digitization and digital data mobilization on biodiversity research. Philos. Trans. R. Soc. B: Biol. Sci. 374, 20170391 (2018).
Jarić, I. et al. iEcology: Harnessing Large Online Resources to Generate Ecological Insights. Trends Ecol. Evol. 35, 630–639 (2020).
Johnson, K. R. & Owens, I. F. P. A global approach for natural history museum collections. Science 379, 1192–1194 (2023).
Phillips, T. Sao Paulo fire destroys one of the largest collections of dead snakes. The Guardian, https://www.theguardian.com/world/2010/may/16/firedestroys-snake-collection (2010).
Phillips, T. Brazil National Museum: as much as 90% of collection destroyed in fire. The Guardian, https://www.theguardian.com/world/2018/sep/04/brazilnational-museum-fire-collection-destroyed-not-insured (2018).
Wieczorek, J. et al. Darwin core: an evolving community-developed biodiversity data standard. PLoS ONE, e29715 (2012).
Ferro, M. L. & Flick, A. J. ‘Collection Bias’ and the importance of natural history collections in species habitat modeling: a case study using thoracophorus costalis erichson (Coleoptera: Staphylinidae: Osoriinae), with a critique of GBIF.org. Coleopterists Bull. 69, 415–425 (2015).
Kemp, C. The endangered dead. Nature 518, 292–294 (2015).
Turnhout, E., Lawrence, A. & Turnhout, S. Citizen science networks in natural history and the collective validation of biodiversity data. Conserv. Biol. 30, 532–539 (2016).
Bartoccioni, F. Big data in biogeography: From museum collection to citizen science. Biogeographia 32, 1–3 (2017).
Chozas, S. et al. Rescuing Botany: using citizen-science and mobile apps in the classroom and beyond. npj Biodiversity 2, (2023).
Fontaine, C., Fontaine, B. & Prévot, A. C. Do amateurs and citizen science fill the gaps left by scientists? Curr Opin Insect Sci 46, 83–87 (2021).
Bowler, D. E. et al. Decision-making of citizen scientists when recording species observations. Sci. Rep. 12, 11069 (2022).
Brown, J. H. Macroecology: progress and prospect. Oikos 87, 3–14 (1999).
Silva, F. R., Gonsalvez-Souza, T., Paterno, G. B., Provete, D. B. & Vancine, M. H. Análises Ecológicas No R (Nupeea, Recife, 2022).
Oksanen, J. et al. vegan: Community Ecology Package. R package version 2. 6–2 (2022).
Castro-Souza, R. A. et al. O (Des)conhecimento da Biodiversidade: uma Sistematização sobre Lacunas, Limitações, Vieses, Déficits e Ruídos. Oecologia Australis 28, 159–177 (2024).
R Core Team. R: A language and environment for statistical computing. https://www.R-project.org/ (2022).
Ruete, A. Displaying bias in sampling effort of data accessed from biodiversity databases using ignorance maps. Biodivers. Data J. 3, e5361 (2015).
Acknowledgements
We are grateful to the team at the Laboratory of Macroecology and Biodiversity Conservation (MacrEco) of the Federal University of Mato Grosso (UFMT) for the valuable discussions, as well as to the National Institute of Science & Technology (INCT) in Ecology, Evolution, and Biodiversity Conservation (EECBio), based at the Federal University of Goiás (UFG), for the precious discussions that contributed to the development of this study. RACS extends its gratitude primarily to the Coordination for the Improvement of Higher Education Personnel (CAPES) and to the Brazilian Institute of Development and Sustainability (IABS), in partnership with the National Center for Cave Research and Conservation (CECAV), for the doctoral and sandwich scholarships granted, respectively. JH was supported by project NICED, grant PID2022-140985NB-C21 funded by MCIN/AEI/ 10.13039/501100011033 / FEDER, EU. TSS expresses thanks to FAPEMAT (project FAPEMAT-PRO.000274/2023). We also wish to thank the developers of the 23 platforms that provided digitally-accessible data (see Methods).
Author information
Authors and Affiliations
Contributions
R.A.C.S., G.T., N.S. and T.S.S. conceived the study with input from all authors; investigation, all authors; data curation, R.A.C.S., N.S. and T.S.S.; formal analysis, R.A.C.S., G.T. and T.S.S.; visualization, R.A.C.S and J.H.; funding acquisition, J.A.D.F. and T.S.S.; writing—original draft, R.A.C.S., G.T., N.S. and T.S.S.; writing—review & editing, J.S., J.A.D.F., R.L., J.H. All authors have read, discussed the results and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
J.H. is Editor-in-Chief and J.A.D.F. is Associate Editor of npj Biodiversity. All other authors declare having no competing interests as defined by Nature Portfolio, or other interests that might be perceived to influence the results and/or discussion reported in this paper.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Castro-Souza, R.A., Tessarolo, G., Stropp, J. et al. Mapping ignorance to uncover shortfalls in the knowledge on global Orthoptera distribution. npj biodivers 3, 22 (2024). https://doi.org/10.1038/s44185-024-00059-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s44185-024-00059-1