Introduction

As the world faces a global extinction crisis, good-quality data on the distribution and attributes of biodiversity is increasingly needed. In this context, quantifying how well we know biodiversity becomes imperative, for it allows scientists and policy makers to identify where biodiversity information is most accurate and complete1,2,3,4, thereby providing a baseline for defining priorities for new biodiversity research and conservation5,6. In the past two decades or so, an enormous volume of digitally-accessible primary biodiversity data opened new opportunities to quantify coverage and completeness of biodiversity information7. Freely-available biodiversity information networks compile huge amounts of primary biodiversity records from natural history collections, research articles and technical reports, among other sources, in the form of online database platforms8,9. However, these digital biodiversity data are often incomplete and both spatially and temporally biased10,11,12.

Addressing these limitations is of paramount importance to draw robust descriptions of biodiversity patterns from the available data, and ultimately ensure well-informed conservation decision-making. This kind of challenge can be addressed through the creation of ‘ignorance maps’ that characterize the distribution of areas where knowledge is limited13. The application of ignorance maps to biodiversity11,14 can be addressed by mapping out the various components of data limitations at broad temporal and spatial scales6,15,16. Here, three key data attributes allow us to map the accuracy, coverage and completeness of biodiversity data: the species name, the location of the records, and the date they were recorded.

First, the most fundamental component of digitally-accessible biodiversity data is related to taxonomy. Records may have species names inaccurately (i.e., incorrect identifications, use of synonyms, etc.) or incompletely assigned (i.e., identified to genus or family level)3,12,17. The challenge of assessing taxonomic accuracy is exacerbated by the absence or lack of standardization in the information about who identified species’ records12, which can lead to discarding much valuable data from online and citizen science databases. Moreover, certain records refer to species that are yet to be described3,18. Tropical regions bear the brunt of this challenge due to their high taxonomic uncertainty and megadiversity19,20. The identification of sites requiring further taxonomic effort represents a potential strategy to improve the coverage and completeness of taxonomic information. However, digitally-accessible biodiversity data are also biased towards certain taxa, with are either better sampled21,22 or more abundant in the communities23, or have undergone more extensive digitization efforts12. These taxonomic biases imply that data coverage assessments should be conducted specifically for each group that is sampled or studied altogether.

Second, survey coverage and completeness can range from inadequate to overly intensive, generating gaps and biases in perceived species distributions, and in the diversity of species documented for local and regional inventories across space and time24,25. Here, we use species accumulation curves to assess survey coverage, assuming that inventory completeness can be measured as the rate of the adding new species to the inventory with the increment in database records in each spatial unit16. Notably, well-sampled regions often align with larger and more connected natural areas with higher accessibility26,27,28. Thus, quantifying survey effort across geographic space through measures of survey completeness24,29 allows highlighting well- and poorly-sampled sites at large spatial scales8,30,31.

Third, the temporal pattern of biodiversity surveys is a critical aspect of biodiversity knowledge12,32,33,34. The quality of biodiversity knowledge decays over time due to the dynamism of natural systems, taxonomic rearrangements, and loss of information33, a decay that is exacerbated by human-driven disturbances. Further, certain sites may exhibit skewed records for specific years11. While some regions, such as Europe, North America, and Australia, boast multi-year sampling data for some taxa thanks to their long-term research tradition, others, such as Africa, hold limited temporal completeness and uneven temporal coverage9,12,26. Such disparities highlight the potential importance of considering the number of sampled years (temporal richness) when assessing data quality. However, although some studies describing such quality have measured temporal coverage11,12, or temporal decay of information16,32, many others disregard the temporal component35.

Fourth, the study of survey evenness across different regions and temporal periods can help identify sites and regions holding data appropriate for biodiversity and macroecological studies, or with enough temporal resolution so as to assess temporal shifts in their communities. Here, we use the Hill Numbers Diversity framework36 to assess both survey evenness and temporal evenness in each spatial unit, based on the relative numbers of records per species, and of years per species, respectively. The assumption behind these metrics is that sites with low evenness (i.e., an unbalanced representation of species or years) may generate biased estimates in macroecological or evolutionary models. Those areas would require further surveys or, at least, be used with caution in investigations, as they likely lack data of enough quality for assessing diversity–environment relationships, describing phylogeographic patterns, or estimating past global change impacts on biodiversity.

Orthoptera, which includes grasshoppers, crickets, katydids and relatives, have a rich taxonomic legacy, with approximately 29,500 valid species, making it the sixth most species-rich insect order37. Orthopterans play several key roles in ecosystems38,39,40,41, and hold high cultural significance in various countries42,43,44,45, as they include both food species and agricultural pests45. Indeed, orthopterans are relatively well-known taxonomically compared to other invertebrate groups, mainly owing to their agricultural importance46. In addition, their taxonomy is regularly updated at the Orthoptera Species File Online45,46,47. Further, information on their distribution can be obtained from multiple digitally-accessible data repositories, which make thousands of primary records available48,49,50 (see https://orthoptera.speciesfile.org/). However, research focus has been biased towards grasshopper pest species and, in some cases, pests of specific grasses or cultivars46,51,52, so knowledge on the distribution of most species and higher taxa of this group may still be limited42, particularly in megadiverse regions.

Here we aim to construct a global ignorance map for Orthoptera by combining analyses of different biodiversity shortfalls. Based on the known geographical biases in biodiversity inventories3,12, we hypothesize that the digitally accessible data for Orthoptera is less complete for tropical compared to temperate regions, and for the southern compared to the northern hemisphere. More specifically, we predict that northern temperate regions will have higher levels of taxonomic, survey and temporal completeness, while the tropics and the Global South will show higher evenness in both time and space. To test these hypotheses, we developed metrics of taxonomic, spatial, and temporal data quality. Finally, we combined all metrics into a single global ignorance map to provide a unique biogeographic panorama of global biodiversity knowledge of orthopterans.

Results

We compiled data from 23 digital repositories storing primary information on Orthoptera (Arthropoda, Insecta), totaling 4,911,359 occurrence records (Fig. 1a; Table 1). The data filtering process led to the exclusion of 2,513,619 records, resulting in a final dataset of 2,397,740 species occurrences (48.82% of the initial number of records), pertaining to 16,179 valid species (Supplementary Fig. 1). In addition, we were able to recover 408,377 unique records above the species level (i.e., genus, tribe, family, or order) that included information on the geographical coordinates and year of collection (Supplementary Fig. 2). These records were subsequently used to calculate taxonomic completeness (TaxC). The calculation of data quality indices based on this information revealed significant data gaps and biases in many regions, which were never before measured for Orthoptera (Fig. 2a–e).

Fig. 1: Outline of all methodological and analytical steps required to construct the ignorance map for the order Orthoptera (Arthropoda, Insecta) according to digitally-accessible data.
figure 1

a We compile all digitally accessible occurrence records for Orthoptera (see Table 1). b We applied several filters to the records, which were divided into two groups according to their taxonomic resolution: 1°) species and subspecies level, and 2°) above species up to order level. c Within each 100 km-width grid cell, we calculated five different data quality indices. d The indices were subsequently mapped. e five of the metrics were combined by summing them in the form of an ensemble, which was then rescaled to range from 0 to 1, and finally subtracted from 1, in order to create a single final ignorance map.

Table 1 Digital repositories accessed to build the global dataset of Orthoptera
Fig. 2: Maps of the five indices (a–e) used in this study to assess the quality of digitally-accessible data for the global distribution of the order Orthoptera (Arthropoda, Insecta).
figure 2

a Taxonomic completeness, values close to 1 indicate that most records are identified to the species taxonomic level. b Survey completeness, values close to 1 indicate an almost complete inventory. c Temporal completeness, values close to 1 indicate maximum temporal coverage. d Survey evenness, values close to 1 indicate no local bias in the distribution of the numbers of records per species. e Temporal evenness, values close to 1 indicate no local bias in the distribution of samples per year.

Strikingly, it was not possible to calculate data quality indices for most of the extent of tropical regions, due to limited data availability in most of South America, Africa, Asia, and northern Oceania. In these regions, where data is accessible, taxonomic completeness indicates that more than 50% of the records have yet to improve taxonomic refinement. The highest TaxC values were observed in the Northern Hemisphere. In particular, central North America, most European countries, and small scattered areas in eastern Asia, South Korea and Japan exhibit significant proportions of records taxonomically refined at the species level. In the Southern Hemisphere, TaxC values are generally higher than in the tropics (i.e., TaxC ≥ 0.5), although they still fall short of the levels observed in the Northern Hemisphere with the exception of some taxonomically consistent inventories scattered throughout South America, Africa, and Oceania (Fig. 2a).

The tropical regions also show low survey completeness (SC), indicating a general need for additional surveys except perhaps in the north of Australia, which holds well-sampled inventories. In general, all the Southern Hemisphere beyond the tropics is massively under-sampled, except for some relative well-sampled sites in South Australia and New Zealand (Fig. 2b). Although SC was high for most regions of the Northern Hemisphere, parts of North America, regions bordering Europe, Asia, and Africa, as well as Japan, are still insufficiently sampled. Temporal completeness (TC) is low along the entire globe. Only a few patches scattered throughout the North America and some areas in central Europe show temporal completeness values higher than 0.25 (Fig. 2c). In contrast, we did not find a clear spatial pattern of survey and temporal evenness (SE and TE, respectively) at a global scale. While some grid cells exhibit sampling biases towards few species or few time periods, others placed nearby present long-term series of surveys with similar abundances among species, independently of the region (Fig. 2d, e).

The overall ignorance map shows a general lack of information across most of the tropics (Fig. 3), due to the low taxonomic, sampling and temporal completeness. However, a few small isolated patches in southern Africa and northern Oceania present relatively low ignorance (0.5 ≥ IM ≥ 0.25). Ignorance values in the northern Hemisphere are low in several regions of North America and Central Europe (IM ≤ 0.25). However, the coastal zones of the United States of America, Southern and Eastern Europe, and most regions of India and China present high ignorance values (Fig. 3).

Fig. 3: Map of ignorance on the global distribution of the order Orthoptera (Arthropoda, Insecta) based on digitally-accessible data.
figure 3

Lower ignorance values indicate higher levels of knowledge, pinpointing areas with good data quality that only require some refinements to provide high-quality inventories.

Discussion

Our analyses provide evidence that digitally-accessible knowledge on the global distribution Orthopteran biodiversity is limited in tropical regions in particular, and in the Global South in general. Conversely, the temperate regions the Global North present areas of relatively high quality and completeness of biodiversity knowledge, especially in North America and Central Europe, as well as in some extra-tropical areas of the southern hemisphere. Although these results confirm our preliminary hypothesis about the distribution of ignorance, there are also many deficient inventories in these data-rich regions, and knowledge on Orthoptera is still very limited in many temperate areas, particularly in Asia. Importantly, the comparison of our overall ignorance map with those of the indices describing different dimensions of knowledge suggests that interpretations based solely on one of these components could create an illusion of good knowledge coverage for certain regions.

The lack of taxonomic completeness in the tropics contributes to the generation of a taxonomic latitudinal bias (as proposed by Freeman and Pennell19). This bias may be partly caused by a significant ‘species debt’ in tropical regions due to the lack of standardized criteria for recognizing species compared to temperate regions53. A significant portion of such debt is due to records that have already been cataloged but have not been refined or taxonomically revised17. This phenomenon is likely highly pronounced in small-sized and abundant animal taxa such as insects54, and is also exacerbated by the shortage of taxonomists55,56. The high species richness of tropical regions makes taxonomic refinement difficult, as the discovery, description and identification of species requires more time and a higher degree of expertise compared to temperate regions, which typically harbor fewer species and have a longer history of naturalist studies and taxonomic work19,53. Nevertheless, the high taxonomic completeness observed in many temperate regions should be interpreted with caution when such values are accompanied by low survey completeness, as in, e.g., Southern Europe, the East and West Coasts of North America, or the southwest edge of Australia, because the lack of surveys may be reflecting lower levels of the taxonomic effort that should be an integral part of standardized inventories.

The low availability of digitally accessible data arises as a fundamental issue for our knowledge on the geographical data for distribution of Orthoptera, especially in tropical regions and most of Asia. The pattern of low survey completeness observed here reflects a general scarcity of species records, which may be the result of either lack of inventories or lack of data mobilization, or both, arising from limited resources or other impediments for data digitization9,12. Indeed, such limited geographical coverage stands out when considering inventories with low taxonomic completeness. Both taxonomic completeness and survey completeness are affected by the challenges to Orthoptera taxonomy. Besides counting on up-to-date taxonomic keys to identify rare individuals from new samples, it is often also crucial to compare with type specimens or good descriptions57, which may not be possible, especially for species described long time ago, or that lack information on male genitalia (a crucial characteristic for the diagnosis of many orthopterans58,59). This is further complicated because many type specimens or descriptions are not accessible, there is an absence of comprehensive and up-to-date taxonomic keys60 and low economic investment in many high-biodiversity regions9,61. The exceptions are pest species and some groups that have historically received more attention46. This may have added an additional geographical bias to the effort devoted to orthopteran inventories, as pest species tend to predominate in grassland environments46, which are more abundant at temperate latitudes. Thus, although the historical gaps in the study of Orthoptera in the tropics42 are likely caused by the generalized lack of resources for biodiversity research, they have been also exacerbated by the highest attention devoted by agricultural sciences to the orthopterans of temperate regions.

Strikingly, the high taxonomic and survey completeness in countries of the northern hemisphere is not always accompanied by high values of survey and temporal evenness, contradicting our hypothesis for these regions. This can be partly explained by the higher global availability of data on Orthoptera for certain regions, as seen in birds, mammals and amphibians9, in plants12, and also in multi-taxa analyses26. This tends to significantly increase the number of unique records for some local species and years already sampled, causing a significant imbalance in data evenness. Also, species records are subject to biases related to accessibility, as locations placed near to access routes (i.e., roads and rivers), research centers, conservation units, and large connected forest fragments are more frequently sampled9,27,28,62,63. Further, the low taxonomic completeness that is widespread in tropical regions can distort the observed patterns of survey and temporal evenness, as many recently described species from these regions may be present in digitally-accessible data only from their type specimens, while their presence in older records is still pending from their revision using an updated taxonomy. All these factors together can result in the absence of smooth geographical patterns of data equability observed in our large-scale maps.

Our measure of temporal completeness reveals that most digitally accessible species records were collected from a limited number of years (i.e., low richness of years), for both tropical and temperate regions. Such lack of years with records hinders our ability to comprehend the decline of Orthopteran diversity over time64, because the temporal coverage of sampling records is discontinuous and unevenly distributed along different locations. A wider temporal spread of records was only observed in some scattered areas throughout North America and Europe, similar to the historical distribution of sampling effort reported for other taxa9,61, thus reflecting that the higher financial and scientific support of the United States of America and several European countries65,66 extends to the field of Orthopterology. Furthermore, such capacity has been accompanied by the effects of scientific colonialism on the historical and current collection of data about Orthoptera, where taxonomic legacy has a clear impact on the global distribution of knowledge. Only recently have both biodiversity research in general, and depositions of type material in Museums in particular, started to be conducted mainly at local institutions in countries throughout the tropics65.

The observed pattern of temporal completeness may also reflect to some extent the deficiency in data digitization, especially in the tropics and parts of temperate Asia. Here, a digitalization effort specifically tailored to retrieve data prior to the year 2000, originating from agricultural surveys, fieldwork or environmental impact reports, would complement the currently available knowledge, thus providing a more comprehensive historical coverage and temporal evenness considering the chronological gaps associated with the distribution data and the low temporal completeness of information for many locations25,35,67.

Furthermore, the mapped values of different aspects of ignorance highlight that even temperate regions of the northern hemisphere, where we expected to have a good quality of biodiversity knowledge (i.e., low taxonomic, sampling, and temporal completeness, and high survey and temporal evenness), suffer from limited taxonomic refinement, inventory integrity, limited diversity of years sampled, and an imbalance in the number of records among known species and years. This suggests that low ignorance values, whether in the tropics or in temperate regions, may result from a lack of digital access to a large portion of biological collections worldwide65,68,69. This represents a significant impediment to the advancement of scientific knowledge and should be rectified, at least in the collections of developed countries. Here, technological advances such as cameras, cell phones, and tablets can allow the recording of primary data by people who live and interact directly with biodiversity, such as those living in protected land, field biologists, and nature photographers70. However, despite such potential, many orthopterans can only be accurately identified after examining the male genitalia, and this is not often possible from in situ images taken from citizen science platforms, and there is still a significant lack of globally-available biogeographic knowledge coming from Orthoptera in databases such as EcoRegistros or iNaturalist (see Table 1).

Regardless of other alternative initiatives for gathering new data, there is clearly an urgent need to improve biodiversity data curation and to provide financial support to establish an open and collaborative data mobilization infrastructure in biological collections68,69,71, through either local policies in each country, or global agreements such as the Convention on Biological Diversity (https://www.cbd.int/convention/). This is especially important considering that accidents in collections can result in immeasurable losses72,73, that could be partially mitigated through digitized information. In addition, the already-existing platforms aimed at integrating and making literature data available need to pursue optimization initiatives. A good example is that of the Orthoptera Species File (http:// http://orthoptera.archive.speciesfile.org/), which migrated thousands of data records to the Darwin Core standard (https://orthoptera.speciesfile.org/), the most comprehensive taxonomic standard for biodiversity data69,74.

Nevertheless, the solution to the above-mentioned problems goes beyond relying solely on bioinformatics75. Training and educating new taxonomists should also be a priority56,76, along with the development and updating of user-friendly identification keys for other scientists, and the general public. Extensive engagement of taxonomists in the development of online data availability initiatives7, and collaboration with citizen science platforms is also necessary, both by validating and curating species identifications77,78, and by applying these platforms in teaching and training activities79. All these initiatives will ultimately engage some sectors of the general public into higher-quality citizen biodiversity science80,81. Such measures will significantly contribute to the construction, security, and availability of open knowledge over time, gradually filling the gaps highlighted here, subsidizing studies in taxonomy, biogeography, and conservation3, as well as to conservation decision-making based on Orthopteran data by researchers in the future.

Lastly, it is also worth noting that assessments of ignorance for specific groups or at smaller study scales may show local differences from the general pattern observed at a global scale. Our analyses highlight the shortfalls of knowledge on the macroecological patterns that arise from the large-scale mechanisms governing nature82. But these ignorance maps can be also useful to address these data gaps within specific groups and/or at smaller regions.

In conclusion, we believe that this study offers an innovative and simplified framework that can guide future research on Orthoptera, and can also be replicated for other taxa, making the construction of ignorance maps more accessible. Nevertheless, more research on the factors that cause gaps and biases in digitally accessible Orthoptera data is needed. With this work, we hope we have provided a baseline for exploring future challenges as data become more extensive and available. Without robust and reliable data, our quest to uncover the ecological and evolutionary mechanisms driving biogeographical patterns and to develop effective approaches to conserve biodiversity in a changing environment will remain fundamentally compromised.

Methods

A global dataset of Orthoptera occurrence

We compiled data from 23 digital repositories storing primary information on the distribution of Orthoptera (Table 1). Additional information on the extraction process used for each repository can be accessed in the Supplementary Note 1. The primary records were divided into two distinct subsets: (1) occurrences encompassing species/subspecies-level data; and (2) occurrences extending beyond the species taxonomic level, up to the order level (Fig. 1b).

We also built a list of currently accepted species, alongside their corresponding taxonomic arrangements (synonyms) according to the classification proposed by the Orthoptera Species File catalog—(OSF)48, as available on the Catalogue of Life platform (https://doi.org/10.48580/d4sw-388) on August 2021. Using the species list, we organized the taxonomic nomenclatural information for each species and subspecies across all repositories, excluding fossil species, and synonymized subspecies with species. The final species list includes 28,309 valid extant species (Supplementary Dataset).

We applied filters to identify and remove records that lacked the necessary information for subsequent analyses: (i) taxonomic adequacy filter—we used the valid species list (described above) to exclude occurrences that were not in accordance with the proposed taxonomic classification. For records without species names, we did not use taxonomic adequacy (see Data Quality analysis section); (ii) spatial filter—we removed occurrences without geographical coordinates, and also excluded records with only descriptive locations (e.g., municipality, state, country) or dubious coordinates; (iii) temporal filter—we kept only the occurrences that include the year of collection; and (iv) land boundaries filter—only georeferenced occurrences for the terrestrial part of the planet, including islands, were computed. To do this, we used the Natural Earth platform, at a 1:10 m resolution (https://www.naturalearthdata.com/). Records that did not cover these areas were excluded (Fig. 1b).

Then, all occurrences were divided into two groups: species/subspecies occurrences and records with identification at a higher taxonomic level, where we applied two additional filters: (v) ambiguities filter—duplicate records were removed for species/subspecies and records above the species level, separately; and (vi) fossil filter—excluding all fossil species, which were not considered in our analyses.

Indices of data quality

We calculated five indices of data quality: taxonomic completeness (TaxC); survey completeness (SC), based on species accumulation curves11,16,29; temporal completeness (TC); and survey and temporal evenness (SE and TE, respectively), both of them calculated using a Hill-Diversity method slightly adapted by us36,83. All indices were computed based on the information for terrestrial grid cells at 1° resolution and then plotted into global maps (Fig. 1c, d).

Taxonomic completeness

For each cell (pi), we calculated taxonomic completeness using the equation where TaxC is the result of the number of primary occurrences identified at the species taxonomic level (s), divided by the total number of records (t).

$${\rm{TaxC}}=[{\rm{s}}({\rm{pi}})]/[{\rm{t}}({\rm{pi}})]$$
(1)

The total number of records (t) corresponds to the sum of all species records (s) and all identifications available only above the species level (i.e., genus, tribe, family, or order) (Supplementary Fig. 3). TaxC values range from 0 to 1; when TaxC = 0 means that none of the occurrences are identified at the species level, thus indicating a large Linnean shortfall. While values close to 1 indicate that most records are identified to the species taxonomic level, thus corresponding a well-sampled cell from a taxonomic point of view. Thus, TaxC index allows us to assess, at least in part, the geographical distribution of the extent of the Linnean shortfall, based on the data quality in relation to taxonomic refinement of occurrences (Fig. 1c).

Survey completeness

We calculated survey completeness (SC) for each cell based on species accumulation curves, where the final slope angle of each curve represents the rate of accumulation of new species in the inventory29. The order of entrance of records in the curves was smoothed by running 100 random permutations using the function specaccum from the R package vegan84, and the final slope was calculated as the average difference in the number of species observed in the final and next-to last records. This final slope was subtracted from 1, thus obtaining completeness values ranging from 0 to 1. SC was performed in R environment using the function Accum_curve provided in Tessarolo et al. 16. Values of SC close to 1 indicate an almost complete inventory (Fig. 1c).

Temporal completeness

We measured the temporal completeness (TC) at each cell (pi) based on the number of sampled years divided by the highest range of sampled years found in our database (Supplementary Fig. 4), that is,

$${\rm{TC}}=[{\rm{S}}({\rm{pi}})]/[{\rm{S}}({\rm{pi}})\max ]$$
(2)

This index ranges from values very close to 0 to 1, with values near 1 indicating maximum temporal coverage of the period covered by the surveys (Fig. 1c).

Survey evenness

We measured survey evenness (SE) in each cell pi using the Hill Diversity model, following36.

$${\rm{SE}}={\left(\mathop{\sum }\limits_{{\rm{i}}=1}^{{\rm{S}}}{{p}^{{\rm{i}}}}^{{\rm{q}}}\right)}^{1/(1-{\rm{q}})}$$
(3)

This index is less sensitive to variations in sample size than classical evenness indices such as Shannon or Simpson36,83. Here we used the parameter q = 4 to assign greater sensitivity to equity for the Hill diversity index. To scale the index between 0 and 1, we divided the Hill index values by the number of species within each grid cell. Values of SE = 0 indicate a maximum bias of occurrence abundance for some species, whereas SE = 1 indicates no local bias in the distribution of the numbers of records per species. Thus, SE values allow us to detect possible imbalances in the observed species abundances, which can be interpreted as a measure of bias in community data. It is important to note that we did not extrapolate our interpretations of SE to infer whether the community is more uniform or not, but rather to evaluate the potential bias of species occurrences within our inventory (Fig. 1c).

Temporal evenness

To calculate temporal evenness, we used the same Hill diversity model mentioned above, with parameter q = 4. However, we considered the years as “species” and the number of samples per year as “abundance”. This allowed us to measure the bias in the number of samples over the years, considering the entire known temporal range of each cell. Values of TE close to 0 indicate maximum temporal bias, with occurrence samples concentrated in very few years, whereas values of TE = 1 indicate no local bias in the distribution of samples per year, meaning that each year holds the same number of samples (Fig. 1c).

Finally, we highlight a thorough analysis of Wallacean and Linnean noise is beyond the scope of this study. Here, regardless of the source of the records and the authors who gathered and/or identified the specimens, we are considering that the taxonomic identifications, geographic localities, and the dates are correct after our filtering processes (see filters in methods). Thus, the quality and coverage of the distribution data may not be free from errors, and we recommend that they be handled and used with caution. Additionally, we recommend reading Ronquillo et al. 67 and Castro-Souza et al. 85 for a more systematic view of these issues.

Assessing the sensitivity of data quality indices to low numbers of records

Small sample sizes may result in overestimating taxonomic completeness, and can cause noise in species accumulation curves by generating artificial completeness values7,8,16, and result in values of temporal coverage and survey and temporal evenness that are not representative due to the low proportions of records. Therefore, it is necessary to establish a minimum threshold of number of records per grid cell to perform all index calculations. As the thresholds used in data quality analysis can be subjective, potentially introducing noise into the ignorance map16, we conducted tests with different thresholds to evaluate how the cutoff settings of the minimum number of occurrences affect the results of the indices. For calculating TaxC we used no threshold, 5, 10, 15, 25, 35, 45, 55, 65, and 75 total occurrences as threshold values. In SC analyses we used 10, 15, 25, 35, 45, 55, 65, and 75 species occurrences as threshold. Lower values were not tested due to the long processing time and the fact that very small values can lead to misleading estimates11,16. For TC, SE, and TE analyses we also used no threshold, 5, 10, 15, 25, 35, 45, 55, 65, and 75 species occurrences. Additionally, for SE and TE, we used q = 4 as we were interested in placing greater weight on evenness while also considering species or year richness in our model. Finally, with the maps constructed using different thresholds for each index, we calculated density curves. This allowed us to observe how the index behaved with respect to the different threshold configurations. Based on these results (density curves shown in Fig. 4, maps not shown), we selected a common threshold of 45 occurrences in for all index calculations; grid cells with less than these occurrences were thus discarded as having too poor information. All analyses mentioned above were performed in R86.

Fig. 4: Density curves of the values of the data quality indices generated by different threshold configurations for taxonomic, survey and temporal completeness, and survey and temporal evenness.
figure 4

a density curves for taxonomic completeness. b density curves for survey completeness. c density curves for temporal completeness. d density curves for survey evenness. e density curves for temporal evenness.

Building a composite ignorance map

Different methods have investigated various aspects of the ignorance about the distribution of biodiversity, as conceptualized by Rocchini et al. 14 and Ladle and Hortal15. These authors feed from the idea of creating of global ignorance atlas that allow identifying lesser-known areas for a specific subject of investigation (e.g., political, social, or biological) initially proposed by Boggs13. While Ruete87 proposed to assess the spatial availability of primary biodiversity records through ignorance scores, Stropp et al. 11 focused on mapping the spatial and temporal biases of information, and Meyer et al. 12 provided a framework for analyzing gaps, uncertainties, and biases in taxonomic, spatial, and temporal information. More recently, Tessarolo et al. 16 incorporated explicit measures of coverage to these dimensions, to provide measures of the data-driven uncertainty in the projections of species distribution models. Here we propose a different and easy-to-use simplified framework for the rapid quantification of the overall taxonomic, spatial and temporal ignorance for any biological group, using the order Orthoptera as an example that may be applied to other groups.

Our composite map depicts areas of biodiversity ignorance as the result of the ensemble of the five indices described above (i.e., TaxC + SC + TC + SE + TE) (Supplementary Figs. 59), rescaled to range from 0 to 1, and finally subtracted from 1 (Fig. 1d, e). Here, values close to 1 indicate maximum ignorance (i.e., extremely poor data quality), and values close 0 indicate minimum ignorance (i.e., good data quality). By mapping these values, the ignorance map will allow us to identify issues of data quality, coverage and completeness, as it will provide a quantitative indication of the composite deficiencies in taxonomic, survey and temporal completeness, plus biases in the species abundance and years sampled (Fig. 2).