Mapping knowledge gaps in marine diversity reveals a latitudinal gradient of missing species richness

A reliable description of any spatial pattern in species richness requires accurate knowledge about species geographical distribution. However, sampling bias may generate artefactual absences within species range and compromise our capacity to describe biodiversity patterns. Here, we analysed the spatial distribution of 35,000 marine species (varying from copepods to sharks) to identify missing occurrences (gaps) across their latitudinal range. We find a latitudinal gradient of species absence peaking near the equator, a pattern observed in both shallow and deep waters. The tropical gap in species distribution seems a consequence of reduced sampling effort at low latitudes. Overall, our results suggest that spatial gaps in species distribution are the main cause of the bimodal pattern of marine diversity. Therefore, only increasing sampling effort at low latitudes will reveal if the absence of species in the tropics, and the consequent dip in species richness, are artefacts of sampling bias or a natural phenomenon.


Supplementary Note 1 | Latitudinal range of marine species
To evaluate the range contiguity assumption for marine species at the scale of our analysis (5° latitudinal bands), we quantified spatial gaps in the latitudinal distribution of relatively wellknown species using range maps provided by the International Union for the Conservation of Nature (IUCN). The IUCN range maps are based not only on point-occurrence records, but also on expert knowledge of the biology of species and habitat preference. We measured latitudinal gaps in geographical distribution of all Scombridae, Elasmobranchii and coral species available in the IUCN repository (http://www.iucnredlist.org). Unfortunately, these are the only marine groups available in the IUCN repository among the taxonomic groups in our study.
We also evaluated the range cohesion of the marine species using estimates of species distribution models, which are less sensitive to geographical gaps in sampling effort and have been considered a good alternative to estimate complete species range 1 . All the modeled range maps were retrieved from AquaMaps (https://www.aquamaps.org), which provides standardized range maps for marine species based on environmental tolerances with respect to depth, salinity, temperature and primary production. A large part of species with range maps available in the IUCN repository also had estimated ranges in AquaMaps, except for corals (Scombridae: 98%, Elasmobranchii: 71%, Corals: 0.8%).
Our results show that range contiguity is the most common distribution pattern in nature ( Supplementary Fig. 1). Only 10.7% of species have disjunct latitudinal range according to IUCN's range maps, whereas this estimate drops to only 2.7% according to species distribution models. Conversely, according to OBIS occurrence records, 57.3% have range discontinuities at 5° latitudinal resolution. It is also noteworthy that the equatorial dip in species diversity tends to decrease, or even disappear, when using range maps ( Supplementary Fig. 1). Therefore, our results suggest that the geographical distribution of most marine species tends to be contiguous across space.

Supplementary Note 2 | Bathymetric distribution of marine species records
Records from relatively shallow waters (0-2,000 m) represent 89.23% of all records analysed in our study. Considering only the records with depth information and within our range of analysis (0 -6,500 m), euphotic and bathyal strata represent 99% of all records (euphotic: 87.97%; bathyal: 11.10%), meaning that less than 1% of information came from the deep sea, the largest habitat on Earth. The maximum sampling effort per latitudinal band for any taxon, for example, decreased two orders of magnitude toward deeper strata (euphotic: 59,288; bathyal: 14,775; abyssal: 313). Consequently, the estimates of inventory completeness were, on average, lower at the deep sea than in shallow waters, especially in the southern hemisphere (see Fig. 3f in the main text). The total number of species, and species with gaps in their latitudinal range, also decreased 4.78 and 4.74 times, respectively, from euphotic to the deep-sea stratum (euphotic: 26,538 -8,262; bathyal: 15,828 -4,735; abyssal: 5,551 -1,744).

Supplementary Note 3 | Standardizing sample size and completeness
Because tropical marine biota is undersampled, a direct comparation of species richness across latitudinal bands is not appropriated. However, two strategies may be employed to overcome this problem. First, it is possible to standardize the sample size by randomly subsampling the records across latitudinal bands to the same level as observed in the tropics. We repeated this procedure 1000 times and calculated the average latitudinal variation in species richness and completeness estimate expected in a scenario of homogeneous sampling effort across the globe. Note that we did not standardize the number of records, but the number of sampling events. Therefore, the number of records by sampling event was not altered. Polar latitudes were not subsampled because they already have sampling intensity below the tropical threshold. Second, it is also possible to standardize the sampling coverage (interpolation) 2 . Using the iNEXT R package 3 , we reduced the sampling effort across latitudes to the lowest completeness estimate observed in the tropics (between -15° and 15°). Additionally, we extrapolated tropical species richness. It is noteworthy that such extrapolation should be extended only to a doubling of the reference sample size. Beyond that level the extrapolation could create unreliable estimates.
For this reason, we did not extrapolate the species richness for all latitudes until estimated total completeness. Instead, we used the extrapolation to achieve a higher standardized sampling coverage 2 .
Our results show that standardization of sampling effort have a great impact on the latitudinal pattern in marine species richness. For most groups, when sampling effort have equal size, the number of species at those rich mid-latitudes drops to a value similar or lower than that recorded near the equator ( Supplementary Fig. 2). Interestingly, if this reduced sampling effort was real, many species would not have been recorded at mid-latitudes. The reduction in sampling effort and species richness also altered the completeness estimate. However, the changes were more modest at higher latitudes, especially for sample coverage . This result suggests that while many tropical species are rare (few records), species at high latitudes are equally abundant and well represented, keeping completeness estimate relatively high even under reduced sampling effort. The latitudinal difference in completeness estimate despite standardized sample size also reinforces that a homogeneous sampling effort could produce a heterogeneous latitudinal gradient in inventory completeness. Likewise, heterogeneous sampling effort could also produce homogeneous inventory completeness. Therefore, species rich areas require more sampling effort to reach high completeness, especially if most species are rare or spatially restricted.
The results of coverage-based standardization also showed that species richness tends to be higher near the equator, for both interpolation and extrapolation scenarios ( Supplementary Fig.  17-18). Indeed, for some groups, the extrapolated species richness was even higher in the tropical dip than around it (see for example Ophiuroidea, Amphipoda, and Porifera in Supplementary Fig.   17). Interestingly, the extrapolation also revealed that doubling the sample size and increasing the completeness estimate had greater impact on tropical species richness, but almost did not affect the diversity at higher latitudes. Our results suggest that species richness should increase rapidly with additional sampling effort in the tropics, but substantial effort would be necessary to reach the same level of completeness as currently existing at higher latitudes.

Supplementary Note 4 | OBIS representativeness and inventory completeness estimate
To evaluate if data retrieved from OBIS is representative of current knowledge of global patterns of marine biodiversity, we compiled ophiuroids data from additional nine data repositories worldwide. All retrieved data were submitted to the same quality control procedures described in the main text. OBIS records represented 57.81% of all valid records in the full dataset (Supplementary Table 1). We then calculated independently for each dataset the number of observed species (richness observed), number of absent species (spatial gaps), number of sampling events, and estimated inventory completeness for each 5° latitudinal bands.
Following Stropp et al. 4 , we employed three methods to estimate inventory completeness.
First, we applied Sousa-Baena et al. 5  where Ci is the inventory completeness for the latitudinal band i. Completeness estimate cannot be calculated when the parameter b is not found.
The second method to estimate inventory completeness uses sample coverage 2 . For each latitudinal band we quantified the total number of records (n), and number of species observed in only one (f1) or two (f2) sampling events. Sample coverage can be calculated by: where Ki is the estimated inventory completeness of latitudinal band i. Both C and K ranges from zero to one, with one indicating a complete inventory.
Finally, we also estimated inventory completeness by the curvilinearity of smoothed species accumulation curves (SACs) 6,7 . Smoothed SACs were calculated by the 'exact' method (function 'specaccum' in the R package vegan 8 ). The average slope of the last 10% of SACs obtained for each latitudinal band reflects the degree of curvilinearity, and was used to estimate the inventory completeness (ri). A flat slope indicates saturation in the sampling (high inventory completeness), but produces ri value close to zero. We calculated the 1-completeness estimate ri to convert the values to a normalized scale from zero to one, in which one indicate high inventory completeness 4 .
We used Pearson's product-moment correlation to evaluate the congruence of latitudinal gradients in species richness, spatial gaps and sampling effort estimated using only OBIS and the full dataset (OBIS + additional datasets). Our results revealed a strong correspondence between the two datasets (Supplementary Table 2), suggesting that additional datasets are not substantially different from OBIS ( Supplementary Fig. 1). Estimates of inventory completeness are also not affected by the dataset, suggesting that OBIS can be used in isolation to estimate the latitudinal pattern in inventory completeness (Supplementary Table 2).
Estimates using Sousa-Baena et al. 5 method were not so correlated between the two datasets when compared to the other two completeness estimates. In addition, the monotonic relationship between completeness estimate and number of records seems more unstable under Sousa-Baena et al. 5 Fig. 22), suggesting that this method is more susceptible to artefactual values of C in latitudinal bands with small number of sampling records 4,5 .

method (Supplementary
Because many latitudinal bands have few sampling records of some taxonomic groups, we abandoned Sousa-Baena et al. 5 method in all further analyses.

Supplementary Note 5 | Data cleaning procedures
Over half of our initial dataset retrieved from OBIS (and additional repositories for Ophiuroidea) was eliminated during the data cleaning processes (65.82%; Supplementary Table   3). One third (33.81%) of excluded records were duplicates, and a quarter (26.51%) lacked specieslevel identification. Although exclusion of duplicates has no effect on presence of latitudinal gaps in species distribution, the effect of excluding records with missing species identification requires further investigation. In fact, the predominance of such records in tropical waters could indicate that tropical species absence may be caused by lack of taxonomic expertise, instead of low sampling effort. To further investigate such possibility, we studied the latitudinal distribution of records lacking species-level identification. The data used in this analysis was retrieved on May 30 (2018), and is not identical to the dataset used in our main analyses, retrieved months earlier.
Comparison between the two datasets shows an increase of 7.1% in the number of total records, but an equivalent proportion of records missing species-level identification (29.97%). Most records with unidentified species are located in well-sampled mid-latitudes ( Supplementary Fig.   20). While equatorial data (between -5° and 5°) comprise only 2.49% of all records without identification, 29.4% of records lacking species-level identification are located between 50° and 60°. Thus, exclusion of records lacking species-level identification during the data cleaning process does not seem to cause spatial gaps in species' ranges.
Another potential cause of spatial gaps at low latitudes could be the removal of records with both coordinates equal to zero. Such records were eliminated because 0-0 location is probably auto-filled by computers when coordinate fields are left blank 9 . Because there are not many records at the 0-0 location (Supplementary Table 4), the removal of these records should have minimal impact for the latitudinal pattern of species absence. However, the 0-0 location is an actual marine location in the Atlantic Ocean (Gulf of Guinea) and some of the excluded records at this location could potentially represent accurate records. Thus, we identified all species with records at the 0-0 location, and then quantified the proportion of species with range gaps. We found that less than 1% of the species with spatial gaps had any excluded 0-0 location record (Supplementary Table   4). In addition, most of the records for benthic taxa indicate an unrealistic shallow depth. While