Measuring protected-area effectiveness using vertebrate distributions from leech iDNA

Ji, Yinqiu; Baker, Christopher C. M.; Popescu, Viorel D.; Wang, Jiaxin; Wu, Chunying; Wang, Zhengyang; Li, Yuanheng; Wang, Lin; Hua, Chaolang; Yang, Zhongxing; Yang, Chunyan; Xu, Charles C. Y.; Diana, Alex; Wen, Qingzhong; Pierce, Naomi E.; Yu, Douglas W.

doi:10.1038/s41467-022-28778-8

Download PDF

Article
Open access
Published: 23 March 2022

Measuring protected-area effectiveness using vertebrate distributions from leech iDNA

Nature Communications volume 13, Article number: 1555 (2022) Cite this article

7222 Accesses
8 Citations
139 Altmetric
Metrics details

Subjects

Abstract

Protected areas are key to meeting biodiversity conservation goals, but direct measures of effectiveness have proven difficult to obtain. We address this challenge by using environmental DNA from leech-ingested bloodmeals to estimate spatially-resolved vertebrate occupancies across the 677 km² Ailaoshan reserve in Yunnan, China. From 30,468 leeches collected by 163 park rangers across 172 patrol areas, we identify 86 vertebrate species, including amphibians, mammals, birds and squamates. Multi-species occupancy modelling shows that species richness increases with elevation and distance to reserve edge. Most large mammals (e.g. sambar, black bear, serow, tufted deer) follow this pattern; the exceptions are the three domestic mammal species (cows, sheep, goats) and muntjak deer, which are more common at lower elevations. Vertebrate occupancies are a direct measure of conservation outcomes that can help guide protected-area management and improve the contributions that protected areas make towards global biodiversity goals. Here, we show the feasibility of using invertebrate-derived DNA to estimate spatially-resolved vertebrate occupancies across entire protected areas.

Annual estimates of occupancy for bryophytes, lichens and invertebrates in the UK, 1970–2015

Article Open access 05 November 2019

Charlotte L. Outhwaite, Gary D. Powney, … Nick J. B. Isaac

The dominant mesopredator and savanna formations shape the distribution of the rare northern tiger cat (Leopardus tigrinus) in the Amazon

Article Open access 04 November 2022

Tadeu G. de Oliveira, Lester A. Fox-Rosales, … Katia Maria Paschoaletto Micchi de Barros Ferraz

Hierarchical habitat-use by an endangered steppe bird in fragmented landscapes is associated with large connected patches and high food availability

Article Open access 12 December 2019

Julia Gómez-Catasús, Vicente Garza, … Juan Traba

Introduction

In 2010, the signatories to the Convention on Biological Diversity (CBD) agreed to the twenty Aichi Biodiversity Targets for 2011–2020¹. Aichi Target 11 concerns the safeguarding of biodiversity, and sets the goal of placing 17% of terrestrial and inland water habitats into a system of protected areas (e.g. national parks and other reserves) that is ecologically representative, well-connected, equitably managed, and effective. The world has nearly achieved the areal goal, with 15% of global land area protected under national jurisdiction^2,3,4. Contributing to this total, China, a CBD signatory, has placed 15% (1.43 million km²) of its own land area into a reserve system^5,6.

Chinese’s reserve system demonstrates considerable institutional capacity for achieving Aichi Target 11. In western China, for example, the reserves cover most ecoregions, biodiversity priority areas, and natural vegetation types⁷. Landsat imagery shows that the reserves successfully prevent deforestation⁸. But in southern and eastern China, the reserves are not so ecologically representative⁹, many reserves are isolated⁷, there is little information on the impact of reserves on local human populations and, most importantly, we know little about whether the reserves are effective at protecting their biodiversity.

Measuring the effectiveness of protected areas is challenging. Worldwide, it has proven so difficult to assess directly whether protected areas are achieving positive biodiversity outcomes that a recent review deemed their efficacy ‘unknown’⁴. Indirect measures, such as evaluations of staffing and budget adequacy (‘input evaluation’⁴), or evaluations of biodiversity threats like pollution and human pressures (‘threat-reduction evaluation’⁴), are often used as proxies for conservation outcomes, especially where high-throughput technologies such as remote sensing can be employed^2,4,10,11. However, indirect measures assume that management inputs and/or the reduction of known threats successfully result in positive biodiversity outcomes⁴, are unable to detect whether conservation outcomes differ across taxa, and cannot identify new threats.

In this study, we ask whether we can use environmental DNA (eDNA) to quantify vertebrate biodiversity on a scale large enough for use as a direct measure of protected-area conservation outcomes. We focus on vertebrates (mammals, birds, amphibians and squamates) because one of the most important threats to vertebrate populations in China is overexploitation¹²; this threat is undetectable using remote-sensing methods and is thus especially difficult to measure. Ideally, biodiversity assessments should achieve high spatial and taxonomic resolution. They should allow frequent updates over large areas so that changes in wildlife populations can be detected quickly, allowing causes to be inferred and potentially mitigated. Assessments should be able to be validated rigorously by independent stakeholders and neutral third parties such as courts, and the assessments should be direct – i.e. be based on species detections rather than proxies – both of which are necessary for dispute resolution and for directing and incentivizing effective management. Finally, biodiversity measures should be efficient and simple to understand for decision-makers and the public, contributing to political sustainability and legitimacy^13,14,15.

Advances in technologies such as camera traps and bioacoustic recorders allow broad biodiversity monitoring on relatively large scales. Nevertheless, the costs of buying, deploying and monitoring such equipment still imposes some limit on the spatial resolution or extent of monitoring that is feasible. For example, Beaudrot et al.¹⁶ recently reported on multi-year camera-trap surveys of 511 populations of terrestrial mammals and birds in fifteen tropical-forest protected areas. But while their camera-trap sets covered between 140 and 320 km² in each protected area, this represented only 1–2% of the largest parks in their dataset, reflecting the difficulty and expense of setting up and maintaining a camera-trap network to cover large, difficult-to-access areas, exacerbated by theft and vandalism in some settings^17,18. Furthermore, both camera traps and acoustic recorders may systematically miss portions of vertebrate biodiversity. For example, amphibians, squamates, and many birds are not readily captured on camera traps; likewise many mammals, amphibians, and squamates may be missed via bioacoustic monitoring.

eDNA has the potential to complement camera traps and bioacoustic recorders¹⁹, while avoiding some issues of deployment logistics, loss of field equipment, and taxonomic biases. In this study, we focus on iDNA, which is a subset of eDNA²⁰, as an emerging sample type for broad taxonomic and spatial biodiversity monitoring. iDNA is vertebrate DNA collected by invertebrate ‘samplers,’ including haematophagous parasites (leeches, mosquitoes, biting flies, ticks) and dung visitors (flies, dung beetles)^21,22,23. iDNA methods are rapidly improving, with research focused on documenting the ranges of vertebrate species and their diseases that can be efficiently detected via iDNA^{24,25,26,27,28,29}, comparisons with camera trapping and other survey methods^30,31,32, and pipeline development^33,34.

We report on the use of iDNA to estimate spatially resolved vertebrate occupancies on the scale of an entire protected area: the 677 km² Ailaoshan reserve in Yunnan province, China (Fig. 1). After the reserve’s establishment in 1981, a 1984–85 survey generated a species list of 86 mammal, 323 bird, 39 (non-avian) reptile and 26 amphibian species/subspecies³⁵. Investigators have since carried out one-off targeted surveys^36,37,38 and individual-species studies^{39,40,41,42,43}. A recent camera-trap study by the Yunnan Forestry Service⁴⁴ detected 10 mammal species and 10 bird species, but was not comprehensive enough to serve as a general vertebrate biodiversity assessment, surveying just 2 of 172 patrol areas in the reserve. Thus, an updated synoptic survey of vertebrate biodiversity remains lacking and, consequently, the current statuses and population trends of vertebrates in the park are largely unknown.

**Fig. 1: Study site location and layout.**

Our study tests the feasibility of employing iDNA surveys within a real protected-area management setting. We had several reasons to explore leech-derived iDNA as a promising broad-scale monitoring technology. First, personnel collecting leeches require little specialised training. The Ailaoshan reserve is divided into 172 patrol areas, each visited monthly by park rangers from neighbouring villages. We contracted these rangers to collect terrestrial, haematophagous leeches during their rainy-season patrols. We were thus able to sample across the reserve in three months at relatively low cost. Second, leech sampling provides an efficient way to correct for imperfect detection, which may include false negatives (i.e. failure to detect species that are present at a site) and false positives (i.e. detecting or appearing to detect a species’ DNA when that species is absent). With leeches, false negatives can arise when, for example, a species was not fed upon by leeches at a site; leeches containing that species’ DNA were not captured from that site; or the species’ DNA was not successfully amplified and associated with the correct taxon. Sources of false positives may include leech movement between sites; sample contamination in the field or lab; and errors in sequencing or bioinformatic processing.

Statistical models can be used to account for imperfect detection. In this project, we analyzed our DNA sequencing results using hierarchical site-occupancy models^45,46, which distinguish between the detection of a species’ DNA at a site, and the true presence or absence of the species, which is not directly observed. The goal of site-occupancy modelling is to infer where each species is truly present, by separately estimating the probability that a species is present at a site, and the probability that a species is detected if it is present^45,47. Separating these probabilities relies on a replicated sampling design, with replicates taken in sufficiently close spatial and/or temporal proximity that the underlying distribution of species presences or absences may be treated as fixed. We achieved replicate samples per patrol area in just one patrol by issuing each ranger with multiple, small plastic bags, each containing small tubes with preservative, inducing subsets of leeches to be stored in separate bags²³, which we processed separately.

A third advantage of leech-derived iDNA is the potential to yield inferences about a broad range of taxa, as leeches are known to feed on small and large mammals, birds, squamates, and amphibians, including arboreal species. This provides a taxonomic breadth that is not typically captured via methods such as camera traps or bioacoustic surveys^27,28,48. DNA sequences can also potentially distinguish some visually cryptic species³⁰ (although iDNA methods can also suffer from a lack of species-level resolution). Finally, leeches can yield PCR-amplifiable DNA for at least four months after their last blood meal⁴⁹, improving the efficiency of leech iDNA by increasing the proportion of collected leeches that can yield information on their previous bloodmeal. On the other hand, leech iDNA persistence could also decrease the spatio-temporal resolution of vertebrate detections, since a long period between leech capture and the previous feed affords more opportunity for leeches or vertebrate hosts to have moved between sampling areas²³.

In this study, we use metabarcoding⁵⁰ to detect vertebrate species in the blood meals of wild leeches sampled from the Ailaoshan reserve in Yunnan Province, China. We use occupancy modelling to estimate the spatial distributions of the vertebrates throughout the reserve, and identify environmental factors correlated with those distributions. We find that leech-derived iDNA data can identify informative occupancy patterns for a wide range of vertebrates, including species that are less likely to be detected with camera traps and bioacoustic surveys. We conclude that iDNA may be a useful tool for quantifying vertebrate biodiversity, providing a direct measure of protected-area effectiveness and helping achieve conservation outcomes by informing improvements to management strategies.

Results

Sampling and metabarcoding

The Ailaoshan reserve runs northwest-to-southeast for around 125 km along a ridgeline (approx. 24.9° N 100.8° E to 24.0° N 101.5° E), averaging just 6 km wide along its length, with elevation between 422 and 3157 m, and annual precipitation between 1000 and 1860 mm depending on altitude⁵¹ (Fig. 1 and Supplementary Fig. 1a, b). Vegetation is subtropical, evergreen broadleaf forest, and the reserve is flanked by agricultural land on lower-elevation slopes in all directions. There are 261 villages within 5 km of the reserve⁵², with an estimated human population of >20,000.

A total of 30,468 leeches were collected during the rainy season, from July to September 2016, by 163 rangers across 172 ranger patrol areas. These constituted 893 replicate samples after collected leeches were partially pooled in the field or laboratory as described in the Methods section.

We extracted DNA from each replicate sample and PCR-amplified two mitochondrial markers: one from the 16S rRNA gene (MT-RNR2), and one from the 12S rRNA gene (MT-RNR1). We refer to these two markers as LSU and SSU, respectively, denoting the ribosomal large subunit and small subunit that these genes code for. (We do this to avoid confusion with the widely-used bacterial 16S gene, which is homologous to our 12S marker, rather than our 16S.) After bioinformatic processing of our sequence data, we estimated multispecies site-occupancy models for the LSU and SSU datasets using parameter-expanded data augmentation^46,53 to accommodate imperfect detection and identify ecological patterns in our datasets.

Vertebrate species

We identified 86 vertebrate species across the LSU and SSU datasets, in addition to humans. The LSU dataset included 59 species, and the SSU dataset contained 72 species. Although the LSU primers target mammals, both the LSU and SSU primers amplified amphibians, birds, mammals, and squamates, with the general-vertebrate SSU primers amplifying more bird species (Fig. 2a). Forty-five species were common to both datasets, including those identified by their distribution across replicate samples (Supplementary Fig. 2), leaving 14 species unique to LSU and 27 species unique to SSU. We could assign taxonomic names to species level for 58 of our 86 species (45 LSU, 50 SSU). Tables 1 and 2 list the top 20 species in each dataset by estimated occupancy.

**Fig. 2: Species richness, occupancy and detection.**

Table 1 Top species by estimated occupancy in the LSU dataset.

Full size table

Table 2 Top species by estimated occupancy in the SSU dataset.

Full size table

With the supercommunity size of M = 200 that we used for our final occupancy models, estimated total species richness in Ailaoshan was 119 species in the LSU dataset and 113 species in the SSU dataset (Fig. 2b). Setting M = 150 produced similar results, while M = 100 clearly constrained the species richness estimates.

Domesticated species featured heavily in our data (Supplementary Data 1), consistent with observed grazing of these species in the reserve (DWY, pers. obs.). Domestic cattle (Bos taurus) were the most frequently detected taxon in both datasets, being detected in almost half of all patrol areas; domestic goats (Capra hircus) were also common, being detected in just under a third of patrol areas, and domestic sheep (Ovis aries) were detected in ca. 6% of patrol areas. The O. aries detections were concentrated in the reserve’s southeastern section (Xinping county), located near to Shiping town and the main breeding area of the dark-haired Shiping Qin sheep breed.

Several wild taxa detected in our survey are listed as Threatened or Near Threatened by the IUCN (Table 3). Among mammals, four species have IUCN Vulnerable status: Asiatic black bear (Ursus thibetanus), mainland serow (Capricornis milneedwardsii), sambar (Rusa unicolor), and stump-tailed macaque (Macaca arctoides). Among amphibians, the Yunnan spiny frog (Nanorana yunnanensis) and Chapa bug-eyed frog (Theloderma bicolor) are listed as Endangered, while the piebald spiny frog (Nanorana maculosa), Yunnan Asian frog (Nanorana unculuanus) and Jingdong toothed toad (Oreolalax jingdongensis) have Vulnerable status. Some of these taxa, especially the amphibians, were widespread in Ailaoshan (Table 3 and Supplementary Data 1), highlighting the value of this reserve for protecting these species.

Table 3 Threatened and near-threatened species.

Full size table

Leech iDNA appeared more successful at detecting Ailaoshan’s mammals and amphibians than its birds and squamates, based on our comparison with species lists from the Kunming Institute of Zoology (Supplementary Data 2). Among mammals, 34 of the 127 species in Ailaoshan were detected, with nearly half the detections in the larger-bodied orders: Artiodactyla (8 of 11 species), Carnivora (7 of 18), and non-human primates (1 of 4). Of the smaller-bodied orders, we detected 14 of 41 Rodentia species (including two porcupine species, Atherurus macrourus and Hystrix brachyura), 2 of 24 Eulipotyphla species (shrews and allies), and no bats (0 of 25), rabbits (0 of 1), pangolins (0 of 1) or treeshrews (0 of 1). We also detected two unnamed species assigned to Rodentia. Among amphibians, 12 of the 25 frog species (order Anura) known from Ailaoshan were detected, and so were both of the salamander species (family Salamandridae). We detected 13 more anuran species that could not be assigned to species, including two assigned to the genus Kurixalus, which has not been reported from Ailaoshan but which has a distribution that overlaps Yunnan (Supplementary Data 3). Among squamates, we detected only 3 unnamed species, compared to 39 species known from Ailaoshan. One of our species was assigned only to Squamata, and the others to families Scincidae and Viperidae respectively. Finally, among birds, 12 of the 462 bird species known from Ailaoshan were detected, plus 10 more species that were assigned to genus or higher. Interestingly, of the 12 species identified to species level, five are in the ground-feeding and terrestrial Phasianidae (pheasants and allies), out of 14 species known from Ailaoshan, and the other seven are known to be part-time ground and understorey feeders. Given that our LSU and SSU primers both had high amplification success B_c for mammals and birds (see Laboratory Processing in the Methods section), we tentatively attribute the difference in detection rates to the leeches – which were predominantly collected by rangers at ground level – having been more likely to have parasitised frogs than non-ground-feeding birds.

The most common taxa had occupancy estimates of around 0.6 in the LSU dataset and 0.8 in the SSU dataset (Tables 1 and 2). Most taxa, however, were observed infrequently (median number of detections: 2 and 3 patrol areas in the LSU and SSU datasets, respectively). This was reflected in low occupancy and detection estimates for many taxa (Fig. 2c) (median fraction of sites occupied: 0.33 and 0.24 in LSU and SSU, respectively; median detection probability per 100 leeches: 0.02 and 0.08 in LSU and SSU, respectively).

Species richness

Per patrol area, estimated median species richness was 32 in the LSU dataset and 27 in the SSU dataset, compared to observed median species richnesses of 3 and 4 species per patrol area respectively (Supplementary Fig. 3a, b). Per replicate, observed median species richness was 1 and 2 in the LSU and SSU datasets respectively, from a median of 3 and 4 replicates per patrol area in each dataset.

The substantial gap between observed and estimated species richness per patrol area in both datasets highlights the extent to which imperfect detection of vertebrate species may bias biodiversity estimates. Although estimated detection varied widely among species, most species had very low detection probabilities, especially in replicates containing few leeches (Fig. 3c–f). These results underscore the importance of correcting for false negatives when using iDNA to conduct biodiversity surveys.

**Fig. 3: Species richness by patrol area.**

Almost half of all patrol areas had no associated species observations, either because they were not sampled, or because samples were inadequately labelled (Fig. 3a, b; though note that this map does not display samples without location information, which were still used as data in our model). Our occupancy models impute missing data and therefore provided species-richness estimates for all patrol areas, both with and without observed values (Fig. 3c, d). Both datasets indicated that species richness is highest in the southern third of the Ailaoshan reserve.

At the community level, species were more likely to occur at higher elevation and (to a lesser extent) further from the reserve edge. This can be seen in two ways. Firstly, estimated species richness in the reserve increased with elevation (both datasets) and with distance to reserve edge (LSU dataset) (Fig. 3e, f). Secondly, community mean occupancy (Eqs. (11) and (12)) increased with elevation in both datasets, holding distance to reserve edge constant in the LSU dataset (Fig. 4a, e). On the other hand, community mean occupancy showed limited increase with distance to reserve edge in the LSU dataset, with elevation held constant (Fig. 4c).

**Fig. 4: Occupancy estimates versus environmental covariates.**

There was good agreement on species richness between the LSU and SSU datasets. Observed species richness in the two datasets was positively correlated at the grain of individual replicates (Supplementary Fig. 4a) and of patrol areas (Supplementary Fig. 4c). Unsurprisingly, estimated species richness was also tightly and positively correlated between the two datasets (Supplementary Fig. 4e). Sampling effort increased species detections: replicates with more leeches tended to contain more species (Supplementary Fig. 4b), as did patrol areas with more replicates (Supplementary Fig. 4d). However, as expected, estimated species richness did not increase with sampling effort, because our model compensates for variation in leech quantity and replicate number (Supplementary Fig. 4f).

At the species level, the effects of elevation (both datasets) and distance to reserve edge (LSU only) varied in both direction and strength (Fig. 4b, d, f). Among mammals over 10 kg, domestic cow (B. taurus), domestic sheep (O. aries), domestic goat (C. hircus) and muntjak (Muntiacus vaginalis) showed decreasing occupancy probability with elevation (Supplementary Figs. 5 and 7). Lower elevation sites in turn tend to be closer to the reserve edge; however, as for community mean occupancy, the independent effect of distance to reserve edge was small (Supplementary Fig. 6). In contrast, species such as tufted deer (Elaphodus cephalophus), sambar (R. unicolor), serow (C. milneedwardsii), Asiatic black bear (U. thibetanus) and wild boar (Sus scrofa) showed increasing occupancy probability with elevation and were thus more likely to occur in higher-elevation forest toward the centre of the reserve (Supplementary Figs. 5 and 7).

Most species of mammal below 10 kg were also estimated to have greater occupancy in more central, higher-elevation forest, including the Asian red-cheeked squirrel (Dremomys rufigenis) and the shrew gymnure (Neotetracus sinensis) (Supplementary Figs. 5 and 7). Birds likewise tended to have higher occupancy in higher elevation sites. On the other hand, a few small-mammal species such as the Himalayan field rat (Rattus nitidus) fared better in reserve-edge, lower-elevation forest. Amphibians showed a mix of responses, with some species such as the Tonkin toad (Bufo pageoti; IUCN Near Threatened) and the Jingdong toothed toad (O. jingdongensis; IUCN Vulnerable) more common in less accessible areas at higher elevations, but others such as the fire-bellied toad (Bombina maxima) more common in reserve-edge, lower-elevation forest.

Community composition

In both datasets, hierarchical clustering separated patrol areas into three groups, corresponding to low-, intermediate- and high-elevation sites (Fig. 5a, b and Supplementary Fig. 8). These groups of sites were highly congruent across the two datasets (Cramer’s V = 0.79, 95% confidence interval 0.73–0.85). The higher-elevation areas tend to be located in the interior of the reserve, especially in the south, and contain larger amounts of relatively inaccessible forest compared to lower-elevation areas (Supplementary Fig. 1a, i; mean ± s.d. distance to reserve edge 1540 m ± 850 m for top quartile of sites by elevation, compared to 830 m ± 390 m for the bottom quartile).

Communities in low-elevation patrol areas were strongly characterised by the presence of domestic cow (B. taurus), domestic goat (C. hircus), muntjak (M. vaginalis) and fire-bellied toad (B. maxima) (Fig. 6). These species were present in the majority of low-elevation sites, but less than half of the high-elevation sites. In contrast, the Tonkin toad (B. pageoti) and Jingdong toothed toad (O. jingdongensis) showed the reverse pattern: i.e. they were absent from most of the low-elevation sites, but present in most of the high-elevation patrol areas. Indeed, many amphibians and birds occupied a larger fraction of high-elevation sites than of low-elevation sites (Supplementary Figs. 9 and 10). Nonetheless, some species, such as the Yunnan Asian frog (N. unculuanus), showed similar site occupancy across low-, intermediate- and high-elevation sites (Fig. 6).

**Fig. 6: Occupancy for selected species by site cluster.**

Comparing the variation in composition among sites across the two datasets revealed significant co-inertia (RV coefficient⁵⁴ 0.77, p ≤ 0.001), indicating that there was substantial shared signal in the two datasets. The Jaccard distances from the two datasets were also highly correlated (Pearson correlation r = 0.94, p = 0.001).

Discussion

Here we demonstrate that metabarcoding of leech-derived iDNA permits large-scale, spatially-resolved estimation of vertebrate biodiversity. Our study is both the most granular and the broadest-scale biodiversity survey using iDNA to date. Leech surveys were conducted by untrained forest rangers for only 2–3 months and captured distribution information on mammals and amphibians, and to a lesser extent birds and squamates, across a topographically challenging, 677 km² nature reserve (Fig. 1). Our results show that the Ailaoshan reserve provides protected space for vertebrate species of high conservation value, mostly in its core area. The results also highlight the vulnerability of the reserve to degradation arising from human activity (e.g. farming, livestock, and poaching) (Figs. 3 and 5). The study provides an iDNA vertebrate biodiversity baseline for Ailaoshan, and future iDNA surveys can test for changes in occupancy as a proxy for effectiveness¹⁶. More generally, our study functions as a progress report on the use of iDNA monitoring in real-world management settings, and highlights areas for improvement going forward.

Vertebrate biodiversity in Ailaoshan

Our iDNA survey recovered 86 species of mammals, amphibians, birds and squamates, plus humans. Many were common wildlife species, or domesticated taxa such as cattle. The dataset also included many less common taxa that would have not been detected without targeted, taxon-specific traditional surveys, including 15 species recognised by the IUCN as Near Threatened or Threatened (Table 3).

Occupancy modelling indicated that vertebrate species richness was greatest in the higher-elevation interior of Ailaoshan. Our result likely reflects greater anthropogenic disturbance (e.g. hunting, disease transmitted from domestic animals to wildlife, and habitat alteration) in the lower, more-accessible parts of the park, causing local extinctions of many wildlife species at lower elevations. Alternatively, more mobile species may have shifted their home ranges from their previously-preferred lower-elevation areas to less suitable habitat to escape human encroachment¹⁹.

Elevation and distance to reserve edge were important predictors of vertebrate community richness and composition (Figs. 3e, f and 5a, b). Examining the distribution of individual taxa revealed that many species, especially birds and small mammals, had higher occupancy at higher elevation and in the reserve core area. These species include several that are IUCN Near-Threatened or Threatened species: stump-tailed macaque (Macaca arctoides), tufted deer (E. cephalophus), sambar (R. unicolor), serow (C. milneedwardsii) and Asiatic black bear (U. thibetanus). Some or all of these species are sensitive to habitat alteration along the reserve edge, poaching, competition with domestic animals (e.g. most ungulates), and/or may be prone to human-wildlife conflict (e.g. Asiatic black bear) in peripheral areas of the reserve, which are used heavily by livestock. In contrast, a few wild species, like the northern red muntjak (M. vaginalis), appear to have increased occupancy in reserve-edge areas.

Using iDNA for biodiversity monitoring

Two key benefits of leech-iDNA surveys are (a) the ability to survey a wider range of vertebrate taxa and body sizes than is possible with other methods and (b) the feasibility of engaging large numbers of minimally-trained personnel for sampling and data collection. This results in time and cost savings, and makes regular broad-scale surveys more feasible. However, these benefits are partly offset by a greater laboratory workload (which could be mitigated by automation); challenges over the design of sampling incentives (see below); iDNA-specific sampling errors and biases; and the workload associated with bioinformatic processing and statistical modelling. We required 12 person-months to count the leeches, extract DNA, and run PCRs, and Novogene required one month to construct libraries and carry out sequencing. The consumables cost of DNA extraction, PCR, and sequencing was around RMB 210,000 (USD 30,000), with an additional RMB 80,000 (USD 12,000) for primers sufficient to run several surveys of this size.

Design of sampling incentives

Sampling with the assistance of forest rangers proved to be a feasible way to collect large numbers of leeches across the entire reserve. Rangers were hired locally from villages neighbouring the park. They did not report to a central location; instead, forestry officials brought boxes of hip packs to groups of rangers at locations around the park in June-July 2016, issued instructions verbally, and retrieved the packs after surveys ended in September. Provisioning the packs with tubes distributed over multiple self-sealing bags naturally enforced replicate sampling with minimal explanation²³. This made it feasible for replicates from each patrol area to be collected at a single time point, removing the possibility that occupancy might change between temporal replicates³⁰. However, for logistical reasons, collections from different patrol areas took place over a period of three months.

Collection of metadata, however, was less successful, as many samples had information on the collecting ranger but not the patrol area. In future sampling, metadata submission could be made a condition of payment, and a subset of senior rangers should be trained on metadata collection. A longer-term possibility is to outfit rangers with a GPS-enabled app on their cell phones for collecting coordinates of collection sites. On the other hand, our occupancy modelling framework deals well with moderate amounts of missing data, and we are wary of creating incentives to fabricate information. For instance, we decided against paying on a per-leech or per-tube basis, because this might incentivize rangers to collect outside the reserve. We found that a fixed payment, plus a small bonus for at least one leech collected, worked well, and we have since used this structure in other rounds of leech sampling. We expect to need to increase future payments.

Error and bias in iDNA sampling

There are several potential sources of error in our study. One is the time between a leech’s last feed and our sampling, which could be up to a few months⁴⁹. While the retention of blood meal DNA facilitates detection of animals, it also means that detected DNA does not necessarily reflect occupancy at the time of leech surveys. Animal hosts may leave the patrol area between the feeding event and our sampling, and even leeches may disperse widely if carried on hosts such as birds that can travel long distances⁵⁵, potentially blurring the spatio-temporal resolution of occupancy results. Our data show that the leeches we collected mostly feed on hosts that probably remain within one patrol area or, at most, move between adjacent areas (e.g. frogs), so our broad conclusions about the overall distributions of wild and domesticated species in Ailaoshan (Figs. 3 and 5) are unlikely to be seriously affected by this bias. Further, the collection of all replicate samples from a location within the three-month window limits the potential for leech or host movements to violate the site-occupancy model assumption that species occupancy remains constant across replicates (i.e. the ‘population closure’ assumption^23,56). Nonetheless, the lag time restricts the suitability of leech iDNA for detecting very rapid change, e.g. occurring on the order of a few months²³.

A second source of error could be systematic differences across patrol areas in leech communities, coupled with differing diet preferences among leech species. For instance, if leech species differ with elevation (which we did not include as a detection covariate), and high-elevation leech species tend to feed more on frogs and less on cattle, this would give the appearance of change in these species’ occupancy with elevation. The large number of leeches in our sample made it infeasible to identify them individually, but the geographic location of our field site and the uniform morphology of the leeches is consistent with all the leeches being in the genus Haemadipsa²⁸, the taxonomy of which is poorly resolved. Haemadipsa are known to feed on a wide range of vertebrate species^27,28, probably because they are opportunistic, sit-and-wait parasites, and studies suggest at most limited evidence for dietary differences^24,28,30. Given this, we opted for a protocol that pooled leeches rather than attempting to take individual leech identity and diet into account, and we do not think it likely that differences in leech diet are likely to account for any of the major results in our study.

A third possible source of error is the choice of PCR primers and genetic markers, which may prevent some taxa from being detected even when their DNA is present, e.g. due to non-amplification at the PCR stage. We addressed this problem in part by using data from two marker genes. More than half of the species were detected by both markers, and high correlation in species richness and co-inertia of community composition between the datasets suggested that broad ecological inferences would not have been strongly affected had either marker been chosen by itself (Figs. 3 and 5). On the other hand, the primers clearly differed in their ability to amplify DNA from certain species. For example, we detected the stump-tailed macaque (M. arctoides) in the LSU dataset in three different patrol areas, with 2700, 170,066, and 245,477 reads. In contrast, there was no obvious SSU equivalent, with no OTUs (other than humans) assigned to the order Primates in the SSU dataset. Using additional primers would likely detect further taxa⁵⁷, albeit with diminishing return on the additional sequencing costs. In the future, the use of nucleic-acid baits and/or metagenomic sequencing⁵⁸, or the new CARMEN method that multiplexes CRISPR-Cas13 detection⁵⁹, may replace PCR. Either approach could allow, for example, the use of the cytochrome c oxidase I (COI) barcode sequence, for which databases are more extensive⁶⁰, while also allowing other genetic markers to be used for taxonomic groups that are not well distinguished by COI.

Finally, leech iDNA will naturally exclude taxa that are not well represented in leech blood meals. Studies have reported lower iDNA detection rates for many species compared to camera trapping, though iDNA appears to be better at detecting smaller-bodied species of mammal^{19,31,32,49,61} and, in our study, amphibians. With sufficiently large samples, taxa that are present infrequently may still be detected, and their low detection rates accounted for using site-occupancy modelling. Taxa that are never detected can still be modelled statistically (e.g. using data augmentation^46,53), but they obviously cannot contribute data towards the model. When leech sampling is the rate-limiting step, such as in researcher-led studies, Abrams et al.³⁰ recommend using leech-iDNA to supplement camera-trap data. For instance, Tilker et al.¹⁹ recently ran a camera-trap survey at 139 stations (17,393 trap-nights) over five protected areas in Vietnam and Laos, spanning 900 km², and supplemented the camera data with iDNA from 2043 leeches from 93 of the stations. The camera-trap data were limited to 23 terrestrial mammal species, with squirrels and large rodents being the smallest organisms detected, and generally produced more species detections. However, leech iDNA provided the sole detections of marbled cat (Pardofelis marmorata), and doubled the detections of Owston’s civet (Chrotogale owstoni) and Asiatic black bear (U. thibetanus). On the other hand, broad ecological patterns may still be identified without necessarily detecting every species present in an area. For example, Gogarten et al. found that camera trapping and fly-derived iDNA detected largely non-overlapping communities (only 6% to 43% of species were found by both methods in any given location)⁶¹, but both methods tended to classify habitats similarly.

Multi-species site-occupancy modelling

Site occupancy modelling identified correlates of detection and occupancy at the level of the community as well as individual species. Most taxa were detected infrequently, and individually, they provided little insight into detection and occupancy rates, as it is difficult to distinguish low detection rates (i.e. crypsis) from low occupancy (i.e. rarity). However, by integrating these infrequent detections into community models of occupancy and detection, and sharing information across species and patrol areas, the entire dataset was able to produce a broad picture of vertebrate diversity across Ailaoshan. This modelling approach dealt well with missing data, demonstrating the usefulness of occupancy models in a Bayesian framework for dealing with the imperfect datasets that are to be expected with surveys across broad areas and relying on limited resources. On the other hand, the data-augmented models represented a substantial computational burden with our large dataset, with high memory requirements, long run times and much experimentation required to fit the models successfully.

While in this study we focused our modelling attention on correcting for false negatives, false positives are also possible, e.g. due to lab contamination or taxonomic misassignment. While false negatives are likely to be a more serious problem than false positives in our dataset, false positives may nonetheless cause serious bias in the estimation of biodiversity⁶². Hierarchical models may, in principle, also be used to correct for false positives, but in practice they have proven challenging to estimate without additional information about the false-positive detection process⁶³. Recent advances in modelling false positives show promise (e.g. Griffin et al.⁶⁴), but these approaches are not yet available for multi-species metabarcoding datasets.

As iDNA surveys are increasingly used for large-scale studies, an important study design consideration will be the degree to which leeches are pooled. Pooling reduces the cost and complexity of the collecting task, since putting leeches into individual tubes requires a larger collecting kit. (Leeches regurgitate into the preservative fluid, such that leeches collected into the same tube cannot be treated as independent replicates; separate tubes for individual leeches would be needed.) Pooling also reduces lab costs and workload. On the other hand, occupancy models such as the one employed here work best when provided with data from unpooled samples. Potentially valuable information about leech host preferences is also lost when samples are pooled: for example, if collected individually, leeches could be DNA-barcoded, and this information used as a detection covariate in occupancy modelling. Development of automated, high-throughput laboratory protocols (e.g. Ackerman et al.⁵⁹) would help make individual sequencing of leeches more practical in large sample sets such as ours (i.e. >30,000 individuals). At the collection stage, a compromise could be to issue collectors with smaller collecting tubes than we used (e.g. 2 mL), in order to lower leech numbers per replicate but not necessarily to the level of individual leeches.

iDNA: a promising biodiversity monitoring tool

As we prepare to replace the Aichi Biodiversity Targets with a new post-2020 framework, there has been a call to focus on directly evaluating conservation outcomes using biodiversity measures such as occupancy, abundance, and population trends^4,65,66. However, many protected areas are under-resourced and under-staffed², and biodiversity monitoring may be difficult to prioritise⁴. In this study, we show the feasibility of using iDNA metabarcoding as a cost-effective way to estimate spatially-resolved vertebrate occupancies across entire protected areas and with broad taxonomic coverage. Our work thus demonstrates the potential for iDNA to facilitate direct measurements of biodiversity conservation outcomes.

In addition to yielding occupancy estimates, our work can also guide future monitoring to identify underlying sources of environmental change, anthropogenic influences, and overall wildlife community dynamics. We recommend using our results to guide the design of targeted scat-collection, camera-trap, and bioacoustic monitoring surveys of Ailaoshan, both to independently test our results with species that are amenable to being recorded with these other methods (e.g. mammals, ground-dwelling birds), and to improve the accuracy of occupancy and detection estimates³⁰. These monitoring methods could also be used to estimate population sizes and population trends for some species using an occupancy modelling framework^67,68,69. We further propose that iDNA may be used to survey other dimensions of biodiversity, such as zoonotic disease. Recent work has demonstrated the exciting possibility of using leech-derived bloodmeals, sampled from the wild, to screen for both viruses and their vertebrate hosts^29,70. The 2020 SARS-CoV-2 pandemic has underscored the urgency of better understanding zoonotic disease in wildlife reservoirs – a need that is likely to become even more pressing as global climate and land use changes continue⁷¹.

Methods

This section provides an overview of methods. The Supplementary Information provides additional detailed descriptions of the leech collections, laboratory processing, bioinformatics pipeline, and site-occupancy modelling. Code for our bioinformatics pipeline is available at Ji⁷² and Yu⁷³. Code for our site-occupancy modelling and analysis is available at Baker et al.⁷⁴.

Leech collections

Samples were collected during the rainy season, from July to September 2016, by park rangers from the Ailaoshan Forestry Bureau. The nature reserve is divided into 172 non-overlapping patrol areas defined by the Yunnan Forestry Survey and Planning Institute. These areas range in size from 0.5 to 12.5 km² (mean 3.9 ± sd 2.5 km²), in part reflecting accessibility (smaller areas tend to be more rugged). These patrol areas pre-existed our study, and are used in the administration of the reserve. The reserve is divided into six parts, which are managed by six cities or autonomous counties (NanHua, ChuXiong, JingDong, ZhenYuan, ShuangBai, XinPing) which assign patrol areas to the villages within their jurisdiction based on proximity. The villages establish working groups to carry out work within the patrol areas. Thus, individual park rangers might change every year, but the patrol areas and the villages responsible for them are fixed.

Each ranger was supplied with several small bags containing tubes filled with RNAlater preservative. Rangers were asked to place any leeches they could collect opportunistically during their patrols (e.g. from the ground or clothing) into the tubes, in exchange for a one-off payment of RMB 300 ( ~USD 45) for participation, plus RMB 100 if they caught one or more leeches. Multiple leeches could be placed into each tube, but the small tube sizes generally required the rangers to use multiple tubes for their collections.

A total of 30,468 leeches were collected in 3 months by 163 rangers across all 172 patrol areas. When a bag of tubes contained <100 total leeches, we reduced our DNA-extraction workload by pooling leeches from all tubes in the same plastic bag and treating them as one replicate. However, when a bag contained ≥100 total leeches, we selectively pooled some of the tubes in that bag to create five approximately equally sized replicates from the bag, to avoid any replicates containing an excessive number of leeches. Eighty-one per cent of bags contained <100 leeches, and 78% of patrol areas consisted only of bags below the threshold. Each patrol area typically returned multiple replicates, in the form of multiple bags below the threshold and/or multiple tubes from the bags above the threshold. After this pooling, the mean number of leeches per replicate was 34 (range 1–98), for a total of 893 replicates across the entire collection.

Environmental characteristics

We used ArcGIS Desktop 9.3 (Esri, Redlands, CA) and R v3.4.0⁷⁵ to calculate characteristics of each patrol area. We created 30 m raster layers for elevation, topographic position index (i.e. difference between each pixel and its surrounding pixels⁷⁶), distance to nearest road, and distance to nearest stream. We then calculated the median of the raster values for each patrol area for use as predictors in our statistical modelling (Table 4 and Supplementary Fig. 1). We also calculated distance to the Ailaoshan reserve edge as the distance of each patrol-area centroid to the nearest nature-reserve edge.

Table 4 Summary of environmental covariates.

Full size table

Laboratory processing

We extracted DNA from each replicate and then PCR-amplified two mitochondrial markers: one from the 16S rRNA gene (MT-RNR2; primers: 16Smam1 5′-CGGTTGGGGTGACCTCGGA-3′ and 16Smam2 5′-GCTGTTATCCCTAGGGTAACT-3′⁷⁷), and the other from the 12S rRNA gene (MT-RNR1; primers: 5′-ACTGGGATTAGATACCCC-3′ and 5′-YRGAACAGGCTCCTCTAG-3′ modified from Riaz et al.⁷⁸). We refer to these two markers as LSU (16S, 82–150 bp) and SSU (12S, 81–117 bp), respectively, referring to the ribosomal large subunit and small subunit that these genes code for. A third primer pair targeting the standard cytochrome c oxidase I marker⁷⁹ was tested but not adopted, as it co-amplified leech DNA and consequently returned few vertebrate reads.

The LSU primers are designed to target mammals, and the SSU primers to amplify all vertebrates. We ran ecoPCR v0.5⁸⁰ with three allowed mismatches on the Tetrapoda in the MIDORI database⁸¹ to estimate expected amplification success, B_c, for our primers. B_c is the proportion of species in the reference database that can be amplified in silico. The 16Smam primers returned high B_c values for Mammalia (99.3%), as expected, and also for Aves (96.2%), a moderate value for Amphibia (79%), and a low value for species grouped under “Reptilia" in the MIDORI database (=Crocodylia + Sphenodontia + Squamata + Testudines) (39.9%). The 12S primers returned high B_c values ( > 98%) for Mammalia, Amphibia, and Aves, and a moderate B_c value (79.8%) for “Reptilia”. We therefore expected most or all Ailaoshan mammals, birds, and amphibians to be amplifiable by one or both primers, and a lower success rate for snakes and lizards.

Primers were ordered with sample-identifying tag sequences, and we used a twin-tagging strategy to identify and remove ‘tag jumping’ errors⁸² using the DAMe protocol⁸³. From our 893 replicate tubes, we successfully PCR-amplified in triplicate 661 samples using our LSU primers and 745 samples using our SSU primers. Successful PCR amplifications were sent to Novogene (Beijing, China) for PCR-free library construction and 150 bp paired-end sequencing on an Illumina HiSeq X Ten.

Negative controls were included for each set of PCRs, and the PCR set was repeated, or ultimately abandoned, if agarose gels revealed contamination in the negative controls. We also sequenced the negative controls, because gels do not always detect very low levels of contamination. Sequences assigned to human, cow, dog, goat, pig, chicken and some wild species appeared in our sequenced negative controls, but with low PCR replication and at low read number. We used these negative controls to set DAMe filtering stringency in our bioinformatics pipeline (see next section and Supplementary Information) for all samples to levels that removed these contaminants: -y 2 for both markers (minimum number of PCRs out of 3 in which a unique read must be present), and -t 9 for LSU and -t 20 for SSU (minimum number of copies per PCR at which a unique read must appear). We also amplified and sequenced a set of positive controls containing DNA from two rodent species, Myodes glareolus and Apodemus flavicollis, along with negative controls that we verified to be contamination-free using agarose gel electrophoresis. M. glareolus and A. flavicollis have European and Western Asian distributions, and we did not detect either species in our leech samples.

Bioinformatics pipeline

The three key features of our bioinformatics pipeline were the DAMe protocol⁸³, which uses twin-tagging and three independent PCR replicates to identify and remove tag-jumped and erroneous reads, the use of two independent markers, which provides an independent check on taxonomic assignments (Supplementary Fig. 2), and the PROTAX statistical ‘wrapper’ for taxonomic assignment^84,85, which reduces overconfidence in taxonomic assignment when reference databases are incomplete, as they always are. In this case, around half of the known Ailaoshan taxa were present in the reference databases (Supplementary Data 2). Mammals and amphibians were relatively well represented: 73% of mammals and 83% of amphibians were in the LSU database, respectively 70% and 67% in the SSU database. Birds and squamates were less well captured, with 42% of birds and 53% of squamates present in the LSU database, respectively 35% and 34% in the SSU database. For OTUs that do not have reference sequences, PROTAX assigns them to higher ranks and flags them as ‘unknowns,’ allowing us to assign those OTUs to morphospecies and potentially supply taxonomy based on other information such as correlations between the datasets as described here.

After DAMe filtering, we removed residual chimeras using VSEARCH v2.9.0⁸⁶, clustered sequences into preliminary operational taxonomic units (‘pre-OTUs’) using Swarm v2.0⁸⁷, and then used the R package LULU v0.1.0⁸⁸ to merge pre-OTUs with high similarity and distribution across samples. We then used PROTAX to assign taxonomy to representative sequences from the merged pre-OTUs^33,84,85, in which we benefited from recent additions to the mitochondrial reference database for Southeast Asian mammals⁸⁹. The full pipeline is described in detail in the Supplementary Information (Assigning taxonomy to preliminary operational taxonomic units and following sections). We shared taxonomic information between the LSU and SSU datasets by making use of correlations between the datasets. To do this, we calculated pairwise correlations of LSU and SSU pre-OTUs across the 619 replicates for which both markers had been amplified and visualised the correlations as a network (Supplementary Fig. 2). If an LSU and an SSU pre-OTU occurred in (mostly) the same subset of replicates and were assigned the same higher-level taxonomies, the two pre-OTUs were deemed likely to have been amplified from the same set of leeches feeding on the same species. We manually inspected the network diagram and assigned such correlated pre-OTU pairs the same taxonomy.

We eliminated any pre-OTUs to which we were unable to assign a taxonomy; these pre-OTUs only accounted for 0.9% and 0.2% of reads in the LSU and SSU datasets respectively, and most likely represent sequencing errors rather than novel taxa. Within the LSU and SSU datasets, we merged pre-OTUs that had been assigned the same taxonomies, thus generating a final set of operational taxonomic units (OTUs) for each dataset. Finally, we removed the OTU identified as Homo sapiens from both datasets prior to analysis. Although it would be informative to map the distribution of humans across the reserve, we expect that most of the DNA came from the rangers themselves, not from other humans using the reserve.

Our final OTUs are intended to be interpreted as species-level groups, even though some cannot yet be assigned taxonomic names to species level (most likely due to incomplete reference databases). Thus, for example, the two frog OTUs Kurixalus sp1 and Kurixalus sp2 in the LSU dataset should be interpreted as two distinct Kurixalus species. Likewise, the frog OTU Megophryidae sp3 in the LSU and SSU datasets should be interpreted as a single species within Megophryidae. We therefore refer to our final OTUs as species throughout this study.

After excluding humans, the final LSU and SSU datasets comprised 18,502,593 and 84,951,011 reads respectively. These reads represented a total of 59 species across 653 replicates and 126 patrol areas in the LSU dataset, and 72 species across 740 replicates and 127 patrol areas in the SSU dataset. To assess the degree to which our iDNA approach was able to capture the breadth of vertebrate biodiversity in the park, we compared the list of species that we detected against unpublished, working species lists maintained by researchers at the Kunming Institute of Zoology.

We also attached additional metadata to our species list: we attached International Union for Conservation of Nature (IUCN) data for individual species by using the R package rredlist v0.6.0⁹⁰ to search for scientific names assigned by PROTAX. For this purpose, we treated Capricornis milneedwardsii as synonymous with Capricornis sumatraensis, in line with recent research and the latest IUCN assessment^91,92. For mammals, we used the PanTHERIA database⁹³ to obtain data on adult body mass for each species; where species-level information was not available, we used the median adult body mass from the database for the lowest taxonomic group possible.

Site-occupancy modelling

We estimated separate multispecies site-occupancy models for the LSU and SSU datasets using parameter-expanded data augmentation^46,53. These models assume that the n_LSU = 59 and n_SSU = 72 species observed in each dataset are, respectively, subsets of larger communities of size N_LSU and N_SSU species that are present in the vicinity of Ailaoshan and vulnerable to capture (e.g. fed on by leeches and amplified by the LSU and SSU primers). Although N_LSU and N_SSU are unknown, these communities can be modelled by embedding them in a larger ‘supercommunity’ of fixed size M. We set M = 200 for our final model. Values from M = 150 up to M = 474 (the latter being the total species richness for mammals, birds, non-avian reptiles and amphibians in the 1984-85 survey of Ailaoshan³⁵) produced similar estimates for N_LSU and N_SSU.

For each species in the supercommunity, our models explicitly capture (i) a ‘community process’ governing whether the species is in the Ailaoshan community or not; (ii) an ‘ecological process’ governing the presence or absence of the species in each patrol area, given that it is in the community; and (iii) an ‘observation process’ governing whether we detect the species’ DNA in each of our replicate samples, given that it is present in the patrol area. The community-, ecological- and observation processes for individual species are linked by imposing community-level parameters and priors as described below.

For the community process, each species i was assumed to be either a member of the Ailaoshan community or not. We denote this unobserved state with w_i, which was assumed to be a Bernoulli random variable governed by the community membership parameter ${{{\Omega }}}_{{g}_{i}}$, i.e. the probability that species i was in the Ailaoshan community:

$${w}_{i} \sim {{{{{{{\rm{Bernoulli}}}}}}}}({{{\Omega }}}_{{g}_{i}}).$$

(1)

For the community process, we separated the species into two natural groupings – homeothermic mammals and birds, and poikilothermic amphibians and squamates – and allowed them to have different probabilities of being in the Ailaoshan community. This is denoted by the subscript on the ${{{\Omega }}}_{{g}_{i}}$ parameter, in which g_i represents which of these two groupings species i belongs to. This approach reflected our expectation that these groupings would differ systematically in their community probabilities, and we employed the same grouping for parameters governing the ecological and detection processes (see below for further discussion).

For the ecological process, each species i was assumed to be either present or absent in each patrol area j, and we used z_ij to denote this unobserved ecological state. We assumed the z_ij to be constant across all replicates taken from patrol area j, consistent with the samples being taken at essentially the same point in time. Any species present were assumed to be members of the Ailaoshan community (i.e. w_i = 1), so we modelled z_ij as a Bernoulli random variable governed by both w_i and an occupancy parameter ψ_ij, i.e. the probability that a species i in the community was present in patrol area j:

$${z}_{ij}| {w}_{i} \sim {{{{{{{\rm{Bernoulli}}}}}}}}({w}_{i}{\psi }_{ij}).$$

(2)

We modelled occupancy ψ_ij as a function of elevation and distance from the reserve edge in the LSU dataset

$${{{{{{{\rm{logit}}}}}}}}({\psi }_{ij})={\beta }_{0i}+{\beta }_{1i}{{{{{{{{\rm{elevation}}}}}}}}}_{j}+{\beta }_{2i}{{{{{{{{\rm{reserve}}}}}}}}}_{j}$$

(3)

and as a function of elevation in the SSU dataset

$${{{{{{{\rm{logit}}}}}}}}({\psi }_{ij})={\beta }_{0i}+{\beta }_{1i}{{{{{{{{\rm{elevation}}}}}}}}}_{j}$$

(4)

where elevation_j is the median elevation for patrol area j, and reserve_j is the distance from the centroid of patrol area j to the nature reserve edge. We chose these specifications by running a ‘full’ model for each dataset with all five environmental covariates, and retaining only those covariates for which the 95% Bayesian confidence interval on the slope coefficient excluded zero.

We modelled observation as a Bernoulli process assuming imperfect detection but no false positives:

$${y}_{ijk}| {z}_{ij} \sim {{{{{{{\rm{Bernoulli}}}}}}}}({z}_{ij}{p}_{ijk}),$$

(5)

where y_ijk is the observed data, i.e. detection or non-detection of species i’s DNA in replicate k from patrol area j.

We allowed the conditional detection probability p_ijk to vary as a function of the conditional detection probability for species i per 100 leeches, r_i, and the number of leeches in the replicate, leeches_jk:

$$\kern0.3pc {p}_{ijk}=1-{(1-{r}_{i})}^{{{{{{{{{\rm{leeches}}}}}}}}}_{jk}/100}$$

(6)

$${{{{{{{\rm{logit}}}}}}}}({r}_{i})={\gamma }_{0i}$$

(7)

We allowed r_i, and its logit-scale equivalent γ_0i, to vary among species to capture e.g. variation in leech feeding preferences among taxa. We used leeches_jk/100 rather than leeches_jk to avoid computational problems arising from rounding.

Note that the detection probability p_ijk is conditional on species i being present in patrol area j, and not on species i’s DNA being present in replicate k from that site. p_ijk therefore subsumes multiple sources of imperfect detection, including those that result in species i’s DNA being absent from the replicate (e.g. the leeches in replicate k did not feed on species i, or they did so long ago and the DNA has since been digested), as well as those that result in apparent non-detection of species i DNA when it is present (e.g. failure to PCR amplify sufficiently, PCR or sequencing errors, or problems arising during bioinformatic processing). The multiple PCRs that we performed for each replicate (see Laboratory processing above, and Supplementary Information) could in principle have been used to decompose p_ijk into (i) a per-replicate probability that species i’s DNA is present in the replicate when the species is present at the site, and (ii) a per-PCR probability that species i’s DNA is detected when it present in the replicate, by adding another hierarchical level to our model^94,95,96,97. However, we instead chose to combine the results from the multiple PCRs using DAMe⁸³ prior to modelling, since DAMe is specifically designed to detect and remove errors arising in PCR and sequencing, and offers filtering options specialised to this task that we found useful.

Finally, whereas Eqs. (1) through (7) define a site-occupancy model for species i alone, we united these species-specific models with a community model for both ecological and detection processes:

$${\beta }_{1i} \sim {{{{{{{\rm{N}}}}}}}}({\mu }_{{\beta }_{1}},{\sigma }_{{\beta }_{1}})$$

(8)

$${\beta }_{2i} \sim {{{{{{{\rm{N}}}}}}}}({\mu }_{{\beta }_{2}},{\sigma }_{{\beta }_{2}})\quad ({{{{\rm{for}}}}\;{{{\rm{the}}}}\;{{{\rm{LSU}}}}\;{{{\rm{model}}}}\;{{{\rm{only}}}}})$$

(9)

$$({\beta }_{0i},{\gamma }_{0i}) \sim {{{{{{{\rm{MVN}}}}}}}}\left([{\mu }_{{\beta }_{0}{g}_{i}},{\mu }_{{\gamma }_{0}{g}_{i}}],\left[\begin{array}{cc}{\sigma }_{{\beta }_{0}{g}_{i}}^{2}&\rho {\sigma }_{{\beta }_{0}{g}_{i}}{\sigma }_{{\gamma }_{0}{g}_{i}}\\ \rho {\sigma }_{{\beta }_{0}{g}_{i}}{\sigma }_{{\gamma }_{0}{g}_{i}}&{\sigma }_{{\gamma }_{0}{g}_{i}}^{2}\end{array}\right]\right)$$

(10)

where N() and MVN() denote normal and multivariate normal distributions. These distributions were characterised by community hyperparameters μ_• and σ_•, with separate distributions for each parameter as denoted by the first subscript. We used a multivariate normal prior for (β_0i, γ_0i) to allow non-zero covariance between species’ occupancy and detection probabilities, as we might expect if, for example, variation in abundance affects both probabilities⁴⁶.

These community models allow rare species effectively to borrow information from more common ones, producing a better overall ensemble of parameter estimates, though at the cost of shrinkage on the individual parameters^46,98,99. As for the community process described above, we separated the species into two groups – homeothermic mammals and birds, and poikilothermic amphibians and squamates – and allowed them to have different community distributions. This is denoted by the subscripts on the μ_• and σ_• community hyperparameters for the occupancy and detection intercepts, in which g_i represents which of these two groupings species i belongs to. This approach reflected our expectation that these groupings would differ systematically in occupancy probabilities (e.g. due to different habitat preferences) and in detection probabilities (e.g. due to different encounter rates with leeches, or leech feeding preferences). Alternative groupings could also be justified on biological grounds: for example, separating mammals and birds on the basis that many of the mammals are terrestrial while many of the birds are arboreal; or grouping birds and squamates together to better reflect phylogeny. Such alternative groupings did not perform well in our datasets, as most birds and squamates were observed too infrequently to provide much information on these groups by themselves, but this aspect of the model would be worth revisiting in future work.

We estimated our models using a Bayesian framework with JAGS v4.3.0¹⁰⁰. We used 5 chains of 100,000 generations, including a burn-in of 50,000. We retained all rounds (i.e. without thinning) for the posterior sample, except for where we needed to save the z matrix for beta diversity and cluster occupancy calculations (see Statistical analyses below); memory limitations prevented us from retaining all posterior samples for the z matrix, and we thinned tenfold in order to make these calculations feasible. The Supplementary Information provides details of the prior distributions used for the model parameters. From the model results we calculated posterior means and quantiles for all model parameters of interest, as well as estimated species richness for each patrol area, and number of sites occupied for each species.

Statistics

Species richness

For each dataset, we obtained estimates of overall species richness for Ailaoshan directly from the model, by summing the w_i. To assess our choice of M, we compared these overall species richness estimates for M = 100, 150 and 200.

After examining occupancy and detection estimates for each species, we used histograms to visualise the distribution of estimated species richness per patrol area (obtained for each patrol area j by summing the z_ij). We calculated median estimated species richness across the patrol areas for comparison with median observed species richness per patrol area and per replicate. We drew choropleths to visualise the spatial distribution of both observed and estimated species richness across the nature reserve.

We examined community mean occupancy and detection probabilities (see e.g. Section 11.7.2 in Kéry and Royle¹⁰¹) to help understand the effects of the site and sample covariates. For each species group g = 1, 2 (representing mammals/birds and amphibians/squamates, respectively), we calculated the posterior mean and 95% Bayesian confidence interval for community mean occupancy and detection as functions of the covariates:

$${\psi }_{g}({{{{{{{\rm{elevation}}}}}}}})=logi{t}^{-1}({\mu }_{{\beta }_{0}g}+{\mu }_{{\beta }_{1}}{{{{{{{\rm{elevation}}}}}}}})$$

(11)

$${\psi }_{g}({{{{{{{\rm{reserve}}}}}}}})=logi{t}^{-1}({\mu }_{{\beta }_{0}g}+{\mu }_{{\beta }_{2}}{{{{{{{\rm{reserve}}}}}}}})\quad ({{{{\rm{for}}}}\;{{{\rm{the}}}}\;{{{\rm{LSU}}}}\;{{{\rm{model}}}}\;{{{\rm{only}}}}})$$

(12)

$${p}_{g}({{{{{{{\rm{leeches}}}}}}}})=1-{(1-{{{{{{{{\rm{logit}}}}}}}}}^{-1}({\mu }_{{\gamma }_{0}g}))}^{{{{{{{{\rm{leeches}}}}}}}}/100}$$

(13)

This approach effectively holds distance from reserve edge at zero in ψ_g(elevation), and elevation at zero in ψ_g(reserve), corresponding to the mean values for these covariates in our data, since predictors were normalised prior to modelling. To visualise variation among species in occupancy and detection response to covariates, we repeated these calculations using each species’ estimates for β₀, β₁, β₂ and γ₀ in place of the community hyperparameters to obtain the posterior means for each species.

We compared three measures of species richness between the two datasets in order to assess the extent to which the two datasets agreed on variation in richness within Ailaoshan. First, the observed species richness in each replicate; second, the observed species richness in each patrol area; and third, the estimated species richness in each patrol area (i.e. the posterior mean number of species, calculated from z_ij). For each of these measures, we computed the Pearson correlation between the datasets and tested the correlation coefficient against zero with a t-test. We also used Poisson GLMs to examine the relationship between each of these species richness measures and sampling effort: we regressed observed species richness per replicate against the log-transformed number of leeches per replicate, and we regressed both the observed and estimated species richness per patrol area against the log-transformed number of replicates per patrol area, testing the significance of the slope coefficients with t-tests.

Community composition

We explored variation in vertebrate community composition among patrol areas using posterior mean Jaccard similarities calculated from the estimated occupancy states z_ij (see Dorazio⁵³ and Kéry and Royle¹⁰¹ for other examples of this approach). We visualised the pairwise Jaccard distances (i.e. distance = (1 − similarity)) using non-metric multidimensional scaling ordinations, overlaying environmental covariates using the vegan::ordisurf function. We clustered patrol areas based on the Jaccard distances using Ward’s criterion (R function hclust(., method = “ward.D2”)). We used this clustering to split the patrol areas into three groups, which turned out to correspond to low-, intermediate-, and high-elevation sites. We used Cramer’s V to quantify the extent to which these clusters matched across the two datasets. We visualised the spatial variation in community composition within the reserve by drawing maps of Ailaoshan with patrol areas coloured by these three clusters. To help understand how vertebrate communities varied among the clusters, we used the posterior sample of the occupancy states z_ij to calculate posterior means and 95% Bayesian confidence intervals for the occupancy (i.e. fraction of patrol areas occupied) of each species in the low-, intermediate- and high-elevation site clusters.

To assess the extent to which the two datasets identified common patterns of variation in community composition across the patrol areas, we performed a co-inertia analysis on the matrices of predicted species in each patrol area in each dataset using ade4::coinertia in R. We used the RV coefficient⁵⁴ to quantify coinertia, testing its significance with the permutation test in ade4::RV.rtest with 999 permutations. We also tested for correlation between the posterior mean Jaccard distances from the two datasets using a Mantel test with 999 permutations.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The Illumina HiSeq/MiSeq read data generated in this study have been deposited in the NCBI Sequence Read Archive under BioProject accession number PRJNA624712. Processed data in the form of OTU- and metadata tables are provided as Supplementary Data 6, and are also included in the GitHub repository containing our occupancy modelling code (https://github.com/bakerccm/leeches-public/releases/tag/v1.1; https://doi.org/10.5281/zenodo.5914708). The MIDORI databases that we used are available from http://www.reference-midori.info. The mitogenomes from Mohd Salleh et al. 2017 (GigaScience 6(8): gix053) are available from GenBank under the accession numbers provided in Tables 1 and 2 of that publication (https://academic.oup.com/gigascience/article/6/8/gix053/3958782). The PanTHERIA database is available from https://doi.org/10.6084/m9.figshare.c.3301274.v1. Working species lists from Kunming Institute of Zoology researchers are provided in Supplementary Data 2 and 3.

Code availability

Our pipeline for processing the Illumina read data is available at https://github.com/jiyinqiu/ailaoshan_leeches_method_code⁷². Bioinformatic scripts for processing the output of this pipeline, including taxonomic reference datasets, are available at https://github.com/dougwyu/screenforbio-mbc-ailaoshan/releases/tag/1.3⁷³. The code for our analysis, including site occupancy modelling, is available at https://github.com/bakerccm/leeches-public/releases/tag/v1.1 (https://doi.org/10.5281/zenodo.5914708)⁷⁴.

References

Convention on Biological Diversity. Aichi Biodiversity Targets (2010).
Coad, L. et al. Widespread shortfalls in protected area resourcing undermine efforts to conserve biodiversity. Front. Ecol. Environ. 17, 259–264 (2019).
Article Google Scholar
Watson, J. E. M. et al. Bolder science needed now for protected areas. Conserv. Biol. 30, 243–248 (2016).
Article PubMed Google Scholar
Maxwell, S. L. et al. Area-based conservation in the twenty-first century. Nature 586, 217–227 (2020).
Article ADS CAS PubMed Google Scholar
Xu, W. H. et al. Strengthening protected areas for biodiversity and ecosystem services in china. Proc. Natl Acad. Sci. USA 114, 1601–1606 (2017).
Article CAS PubMed PubMed Central Google Scholar
Bryan, B. A. et al. China’s response to a national land-system sustainability emergency. Nature 559, 193–204 (2018).
Article ADS CAS PubMed Google Scholar
Wu, R. et al. Strengthening China’s national biodiversity strategy to attain an ecological civilization. Conserv. Lett. 68, e12660 (2019).
Google Scholar
Ren, G. et al. Effectiveness of China’s National Forest Protection Program and nature reserves. Conserv. Biol. 29, 1368–1377 (2015).
Article PubMed Google Scholar
Wu, R. et al. Effectiveness of China’s nature reserves in representing ecological diversity. Front. Ecol. Evol. 9, 383–389 (2011).
Article Google Scholar
Geldmann, J., Manica, A., Burgess, N. D., Coad, L. & Balmford, A. A global-level assessment of the effectiveness of protected areas at resisting anthropogenic pressures. Proc. Natl Acad. Sci. USA 116, 23209–23215 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Laurance, W. F. et al. Averting biodiversity collapse in tropical forest protected areas. Nature 489, 290–294 (2012).
Article ADS CAS PubMed Google Scholar
Li, Y. & Wilcove, D. S. Threats to vertebrate species in China and the United States. BioScience 55, 147–153 (2005).
Article Google Scholar
Ferraro, P. J., Uchida, T. & Conrad, J. M. Price premiums for eco-friendly commodities: are ‘green’ markets the best way to protect endangered ecosystems? Environ. Resour. Econ. 32, 419–438 (2005).
Article Google Scholar
Zabel, A. & Roe, B. Optimal design of pro-conservation incentives. Ecol. Econ. 69, 126–134 (2009).
Article Google Scholar
Dietz, T., Ostrom, E. & Stern, P. C. The struggle to govern the commons. Science 302, 1907–1912 (2003).
Article ADS CAS PubMed Google Scholar
Beaudrot, L. et al. Standardized assessment of biodiversity trends in tropical forest protected areas: the end is not in sight. PLoS Biol. 14, e1002357 (2016).
Article PubMed PubMed Central Google Scholar
Meek, P. D. et al. Camera trap theft and vandalism: occurrence, cost, prevention and implications for wildlife research and management. Remote Sens. Ecol. Conserv. 5, 160–168 (2019).
Article Google Scholar
Glover-Kapfer, P., Soto-Navarro, C. A. & Wearn, O. R. Camera-trapping version 3.0: current constraints and future priorities for development. Remote Sens. Ecol. Conserv. 5, 209–223 (2018).
Article Google Scholar
Tilker, A. et al. Identifying conservation priorities in a defaunated tropical biodiversity hotspot. Divers. Distrib. 10, 100331–100315 (2020).
Google Scholar
Bohmann, K. et al. Environmental DNA for wildlife biology and biodiversity monitoring. Trends Ecol. Evol. 29, 358–367 (2014).
Article PubMed Google Scholar
Bohmann, K., Schnell, I. B. & Gilbert, M. T. P. When bugs reveal biodiversity. Mol. Ecol. 22, 909–911 (2013).
Article CAS PubMed Google Scholar
Calvignac-Spencer, S., Leendertz, F. H., Gilbert, M. T. P. & Schubert, G. An invertebrate stomach’s view on vertebrate ecology. BioEssays 35, 1004–1013 (2013).
Article CAS PubMed Google Scholar
Schnell, I. B. et al. iDNA from terrestrial haematophagous leeches as a wildlife surveying and monitoring tool – prospects, pitfalls and avenues to be developed. Front. Zool. 12, 302 (2015).
Article Google Scholar
Drinkwater, R. et al. Using metabarcoding to compare the suitability of two blood-feeding leech species for sampling mammalian diversity in North Borneo. Mol. Ecol. Resour. 19, 105–117 (2019).
Article CAS PubMed Google Scholar
Gogarten, J. F. et al. Tropical rainforest flies carrying pathogens form stable associations with social nonhuman primates. Mol. Ecol. 28, 4242–4258 (2019).
Article CAS PubMed Google Scholar
Kocher, A. et al. iDNA screening: disease vectors as vertebrate samplers. Mol. Ecol. 26, 6478–6486 (2017).
Article PubMed Google Scholar
Schnell, I. B. et al. Debugging diversity - a pan-continental exploration of the potential of terrestrial blood-feeding leeches as a vertebrate monitoring tool. Mol. Ecol. Resour. 18, 1282–1298 (2018).
Article PubMed Google Scholar
Tessler, M., Weiskopf, S. R. & Systematics, L. B. Bloodlines: mammals, leeches, and conservation in southern Asia. Syst. Biodivers. 16, 488–496 (2018).
Article Google Scholar
Alfanoet, N. et al. Non-invasive surveys of mammalian viruses using environmental DNA. bioRxiv https://doi.org/10.1101/2020.03.26.009993 (2020).
Abrams, J. F. et al. Shifting up a gear with iDNA: from mammal detection events to standardized surveys. J. Appl. Ecol. 18, 511–512 (2019).
Google Scholar
Rodgers, T. W. et al. Carrion fly-derived DNA metabarcoding is an effective tool for mammal surveys: Evidence from a known tropical mammal community. Mol. Ecol. Resour. 17, e133–e145 (2017).
Article CAS PubMed Google Scholar
Weiskopf, S. R. et al. Using terrestrial haematophagous leeches to enhance tropical biodiversity monitoring programmes in Bangladesh. J. Appl. Ecol. 55, 2071–2081 (2018).
Article Google Scholar
Axtneret, J. et al. An efficient and robust laboratory workflow and tetrapod database for larger scale environmental DNA studies. GigaScience 8, giz029 (2019).
Gillett, C. P. D. T., Johnson, A. J., Barr, I., & Hulcr, J. Metagenomic sequencing of dung beetle intestinal contents directly detects and identifies mammalian fauna. bioRxiv https://doi.org/10.1101/074849 (2016).
Investigation Group of Ailaoshan Nature Reserve. Comprehensive survey of Ailaoshan Nature Reserve (Yunnan Ethnic Press, 1988).
Wu, D. L. & Luo, C. C. Effect of human activity on community structure of small mammals in Ailao Mountain. Zool. Res. 14, 35–41 (1993).
Google Scholar
Wang, Z. J., Carpenter, C. & Young, S. S. Bird distribution and conservation in the Ailao Mountains, Yunnan, China. Biol. Conserv. 92, 45–57 (2000).
Article Google Scholar
Li, H., Zhang, X., Rao, D. & Zhang, H. Research on the reptiles diversity in the east of Xinping Ailaoshan Nature Reserve. Hubei Agric. Sci. 51, 3557–3559 (2012).
Google Scholar
Luo, W. S., Zhao, S. Y., Luo, Z. Q. & Wang, Q. Population and distribution of Nomascus concolor in Jingdong jurisdiction of Ailaoshan National Nature Reserve. Sichuan J. Zool. 26, 600–603 (2007).
Google Scholar
Li, H., Zhu, H., Wang, L. & Liu, J. Biological characteristics and protection of Tylototriton shanjing at Mount Ailao in Xinping. J. Chongqing Coll. Educ. 23, 16–18 (2010).
Google Scholar
Li, H. The distribution and perniciousness of Rhabdophis subminiatus at Ailaoshan in Xinping County. Hubei Agric. Sci. 50, 800–801 (2011).
Google Scholar
Li, G., Yang, X., Zhang, H. & Li, W. Population and distribution of western black crested gibbon (Nomascus concolor) at Ailao Mountain, Xinping, yunnan. Zool. Res. 32, 675–683 (2011).
CAS Google Scholar
Kong, D. et al. Status and distribution changes of the endangered green peafowl (Pavo muticus) in China over the past three decades (1990s–2017). Avian Res. 9, 427 (2018).
Article Google Scholar
He, X. L., Luo, K., Lu, Z. Y. & Lin, L. X. Preliminary camera-trapping survey on wild mammals and birds in Ailaoshan National Nature Reserve, Yunnan Province, China. Acta Theriol. Sin. 38, 318–322 (2018).
Google Scholar
MacKenzie, D. I. et al. Estimating site occupancy rates when detection probabilities are less than one. Ecology 83, 2248–2255 (2002).
Article Google Scholar
Dorazio, R. M., Royle, J. A., Soderstrom, B. & Glimskar, A. Estimating species richness and accumulation by modeling species occurrence and detectability. Ecology 87, 842–854 (2006).
Article PubMed Google Scholar
Tyre, A. J. et al. Improving precision and reducing bias in biological surveys: estimating false-negative error rates. Ecol. Appl. 13, 1790–1801 (2003).
Article Google Scholar
Lellouch, L., Pavoine, S., Jiguet, F., Glotin, H. & Sueur, J. Monitoring temporal change of bird communities with dissimilarity acoustic indices. Methods Ecol. Evol. 5, 495–505 (2014).
Article Google Scholar
Schnell, I. B. et al. Screening mammal biodiversity using DNA from leeches. Curr. Biol. 22, R262–R263 (2012).
Article CAS PubMed Google Scholar
Ji, Y. Q. et al. Reliable, verifiable and efficient monitoring of biodiversity via metabarcoding. Ecol. Lett. 16, 1245–1257 (2013).
Article PubMed Google Scholar
Zhang, K. Y., Zhang, Y. P., Liu, Y. H. & Li, Y. R. Vertical distribution characteristics of rainfall in the Ailao mountain. Sci. Geogr. Sin. 14, 144–150 (1994).
Google Scholar
Zhang, Z. Q. Status quo of the biodiversity of Ailaoshan Nature Reserve and countermeasures for protection and management. For. Inventory Plan. 32, 68–70 (2007).
CAS Google Scholar
Dorazio, R. M., Gotelli, N. J. & Ellison, A. M. Modern Methods Of Estimating Biodiversity From Presence-absence Surveys, p. 277–302. (InTech, 2011).
Escoufier, Y. Le traitement des variables vectorielles. Biometrics 29, 751–760 (1973).
Article MathSciNet Google Scholar
Davies, R. W., Linton, L. R. & Wrona, F. J. Passive dispersal of four species of freshwater leeches (Hirudinoidea) by ducks. Freshw. Invertebr. Biol. 1, 40–44 (1982).
Article Google Scholar
Rota, C. T., Fletcher Jr, R. J., Dorazio, R. M. & Betts, M. G. Occupancy estimation and the closure assumption. J. Appl. Ecol. 46, 1173–1181 (2009).
Article Google Scholar
Fahmy, M. et al. Multilocus metabarcoding of terrestrial leech bloodmeal iDNA increases species richness uncovered in surveys of vertebrate host biodiversity. J. Parasitol. 106, 843–853 (2020).
Article PubMed Google Scholar
Liu, S. et al. Mitochondrial capture enriches mito-DNA 100 fold, enabling PCR-free mitogenomics biodiversity analysis. Mol. Ecol. Resour. 16, 470–479 (2016).
Article CAS PubMed Google Scholar
Ackerman, C. M. et al. Massively multiplexed nucleic acid detection with Cas13. Nature 582, 277–282 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Hebert, P. D. N., Hollingsworth, P. M. & Hajibabaei, M. From writing to reading the encyclopedia of life. Philos. Trans. R. Soc. B Biol. Sci. 371, 20150321 (2016).
Article Google Scholar
Gogarten, J. F. et al. Fly-derived DNA and camera traps are complementary tools for assessing mammalian biodiversity. Environ. DNA 2, 63–76 (2019).
Article Google Scholar
Royle, J. A. & Link, W. A. Generalized site occupancy models allowing for false positive and false negative errors. Ecology 87, 835–841 (2006).
Article PubMed Google Scholar
Miller, D. A. et al. Improving occupancy estimation when two types of observational error occur: non-detection and species misidentification. Ecology 92, 1422–1428 (2011).
Article PubMed Google Scholar
Griffin, J. E., Matechou, E., Buxton, A. S., Bormpoudakis, D. & Griffiths, R. A. Modelling environmental DNA data; Bayesian variable selection accounting for false positive and false negative errors. J. R. Stat. Soc. Ser. C (Appl. Stat.) 69, 377–392 (2020).
Article Google Scholar
Visconti, P. et al. Protected area targets post-2020. Science 364, 239–241 (2019).
Article ADS CAS PubMed Google Scholar
Adams, V. M., Visconti, P., Graham, V. & Possingham, H. P. Indicators keep progress honest: A call to track both the quantity and quality of protected areas. One Earth 4, 901–906 (2021).
Article ADS Google Scholar
Royle, J. A. & Nichols, J. D. Estimating abundance from repeated presence–absence data or point counts. Ecology 84, 777–790 (2003).
Article Google Scholar
Royle, J. A. N-mixture models for estimating population size from spatially replicated counts. Biometrics 60, 108–115 (2004).
Article MathSciNet PubMed MATH Google Scholar
Wood, C. M. et al. Detecting small changes in populations at landscape scales: a bioacoustic site-occupancy framework. Ecol. Indic. 98, 492–507 (2019).
Article Google Scholar
Kampmann, M.-L. et al. Leeches as a source of mammalian viral DNA and RNA - a study in medicinal leeches. Eur. J. Wildl. Res. 63, 36 (2017).
Article Google Scholar
Gibb, R. et al. Zoonotic host diversity increases in human-dominated ecosystems. Nature 584, 398–402 (2020).
Article ADS CAS PubMed Google Scholar
Ji, Y. Ecec_ailaishan_leeches_bioinfo_pipeline (2020).
Yu, D. Ailaoshan version with unweighted and weighted PROTAX and MIDORI 1.2 (2020).
Baker, C. C. M. et al. Measuring protected-area vertebrate biodiversity using leech iDNA. GitHub repository. [https://github.com/bakerccm/leeches-public/releases/tag/v1.1] [https://doi.org/10.5281/zenodo.5914708] (2022).
R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, Vienna, Austria, 2019).
Guisan, A., Weiss, S. B. & Weiss, A. D. GLM versus CCA spatial modeling of plant species distribution. Plant Ecol. 143, 107–122 (1999).
Article Google Scholar
Taylor, P. G. Reproducibility of ancient DNA sequences from extinct Pleistocene fauna. Mol. Biol. Evol. 13, 283–285 (1996).
Article CAS PubMed Google Scholar
Riaz, T., Shehzad, W., Viari, A. & Pompanon, F. et al. ecoPrimers: inference of new DNA barcode markers from whole genome sequence analysis. Nucleic Acids Res. 39, e145–e145 (2011).
Article CAS PubMed PubMed Central Google Scholar
Leray, M. et al. A new versatile primer set targeting a short fragment of the mitochondrial coi region for metabarcoding metazoan diversity: application for characterizing coral reef fish gut contents. Front. Zool. 10, 34 (2013).
Article PubMed PubMed Central Google Scholar
Ficetola, G. F. et al. An in silico approach for the evaluation of DNA barcodes. BMC Genom. 11, 434 (2010).
Article Google Scholar
Machida, R. J., Leray, M., Ho, S.-L. & Knowlton, N. Metazoan mitochondrial gene sequence reference datasets for taxonomic assignment of environmental samples. Sci. Data 4, 170027 (2017).
Article CAS PubMed PubMed Central Google Scholar
Schnell, I. B., Bohmann, K. & Gilbert, M. T. P. Tag jumps illuminated – reducing sequence-to-sample misidentifications in metabarcoding studies. Mol. Ecol. Resour. 15, 1289–1303 (2015).
Article CAS PubMed Google Scholar
Zepeda-Mendoza, M. L., Bohmann, K., Carmona Baez, A. & Gilbert, M. T. DAMe: a toolkit for the initial processing of datasets with PCR replicates of double-tagged amplicons for DNA metabarcoding analyses. BMC Res. Notes 9, 255 (2016).
Article PubMed PubMed Central Google Scholar
Somervuo, P., Koskela, S., Pennanen, J., Nilsson, R. H. & Ovaskainen, O. Unbiased probabilistic taxonomic classification for DNA barcoding. Bioinformatics 32, 2920–2927 (2016).
Article CAS PubMed Google Scholar
Somervuo, P. et al. Quantifying uncertainty of taxonomic placement in DNA barcoding and metabarcoding. Methods Ecol. Evol. 8, 398–407 (2017).
Article Google Scholar
Rognes, T., Flouri, T., Nichols, B., Quince, C. & Mahé, F. VSEARCH: a versatile open source tool for metagenomics. PeerJ 4, e2584 (2016).
Article PubMed PubMed Central Google Scholar
Mahe, F., Rognes, T., Quince, C., de Vargas, C. & Dunthorn, M. Swarm v2: highly-scalable and high-resolution amplicon clustering. PeerJ 3, e1420 (2015).
Article PubMed PubMed Central Google Scholar
Frøslev, T. G. et al. Algorithm for post-clustering curation of DNA amplicon data yields reliable biodiversity estimates. Nat. Commun. 8, 1188 (2017).
Article ADS PubMed PubMed Central Google Scholar
Mohd Salleh, F. et al. An expanded mammal mitogenome dataset from Southeast Asia. GigaScience 6, 1–8 (2017).
Article PubMed PubMed Central Google Scholar
S., Chamberlain. rredlist: ‘IUCN’ red list client. R package version 0.6.0 (2018).
Mori, E., Nerva, L. & Lovari, S. Reclassification of the serows and gorals: the end of a neverending story? Mammal Rev. 49, 256–262 (2019).
Article Google Scholar
T.D., Phan, S., Nijhawan, S., Li, and L., Xiao. Capricornis sumatraensis. The IUCN Red List of Threatened Species 2020, page e.T162916735A162916910 (2020).
Jones, K. E. et al. PanTHERIA: a species-level database of life history, ecology, and geography of extant and recently extinct mammals. Ecology 90, 2648–2648 (2009).
Article Google Scholar
Nichols, J. D. et al. Multi-scale occupancy estimation and modelling using multiple detection methods. J. Appl. Ecol. 45, 1321–1329 (2008).
Article Google Scholar
Schmidt, B. R., Kéry, M., Ursenbacher, S., Hyman, O. J. & Collins, J. P. Site occupancy models in the analysis of environmental DNA presence/absence surveys: a case study of an emerging amphibian pathogen. Methods Ecol. Evol. 4, 646–653 (2013).
Article Google Scholar
Hunter, M. E. et al. Environmental DNA (eDNA) sampling improves occurrence and detection estimates of invasive Burmese pythons. PLoS ONE 10, e0121655 (2015).
Article PubMed PubMed Central Google Scholar
Dorazio, R. M. & Erickson, R. A. eDNAoccupancy: An R package for multiscale occupancy modelling of environmental DNA data. Mol. Ecol. Resour. 18, 368–380 (2018).
Article CAS PubMed Google Scholar
Rubin, D. B. Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann. Stat. 12, 1151–1172 (1984).
Article MathSciNet MATH Google Scholar
Link, W. A. & Sauer, J. R. Extremes in ecology: avoiding the misleading effects of sampling variation in summary analyses. Ecology 77, 1633–1640 (1996).
Article Google Scholar
Plummer, M. JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. Version 4.3.0. (2017).
Kéry, M. & Royle, J. A. Applied Hierarchical Modeling in Ecology, Vol. 1. (Elsevier, 2016).

Download references

Acknowledgements

We thank Jiang Xuelong, Yang Xiaojun, Che Jing, Li Xueyou, Chen Hongman and Wu Fei for Ailaoshan species lists; Michael Tessler and Mark Siddall for information on leech species distributions. C.C.M.B., Y.L., Z.W., D.W.Y. and N.E.P. were supported by the Harvard Global Institute. C.H. and Q.W. were supported by Research and Application Demonstration on Key Technology of Primary Forest Resources Investigation and Monitoring in Yunnan Province (2013CA004). Y.J., J.X.W., L.W., C.W., C.Y., C.C.Y.X. and D.W.Y. were supported by the National Natural Science Foundation of China (41661144002, 31670536, 31400470, 31500305, 31872963); the Key Research Program of Frontier Sciences, Chinese Academy of Sciences (QYZDY-SSW-SMC024); the Bureau of International Cooperation (GJHZ1754); the Strategic Priority Research Program, Chinese Academy of Sciences (XDA20050202, XDB31000000); the Ministry of Science and Technology of China (2012FY110800); and the Biodiversity Investigation, Observation and Assessment Program (2019-2023), Ministry of Ecology and Environment of China (8-2-3-4-11). D.W.Y. was also supported by a Leverhulme Trust Research Fellowship. V.D.P. was supported by the Ohio University Department of Biological Sciences and the Sustainability Studies Theme, and a grant from the Romanian National Authority for Scientific Research, CNCS–UEFISCDI (http://uefiscdi.gov.ro) project PN-III-P1-1.1-TE-2019-0835. The computations in this paper were run on the FASRC Cannon cluster supported by the FAS Division of Science Research Computing Group at Harvard University. The publication of this paper was supported by a grant from the Wetmore Colles fund to N.E.P.

Author information

Christopher C. M. Baker
Present address: US Army ERDC Cold Regions Research and Engineering Laboratory, 72 Lyme Road, Hanover, NH, 03755, USA
These authors contributed equally: Yinqiu Ji, Christopher C. M. Baker.

Authors and Affiliations

State Key Laboratory of Genetic Resources and Evolution and Yunnan Key Laboratory of Biodiversity and Ecological Security of Gaoligong Mountain, Kunming Institute of Zoology, 650223, Kunming, Yunnan, China
Yinqiu Ji, Jiaxin Wang, Chunying Wu, Yuanheng Li, Chunyan Yang & Douglas W. Yu
Museum of Comparative Zoology and Department of Organismic & Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA, 02138, USA
Christopher C. M. Baker, Zhengyang Wang, Yuanheng Li & Naomi E. Pierce
Department of Biological Sciences and Sustainability Studies Theme, Ohio University, 107 Irvine Hall, Athens, OH, 45701, USA
Viorel D. Popescu
Center for Environmental Studies (CCMESI), University of Bucharest, 1 N. Balcescu Blvd., Bucharest, Romania
Viorel D. Popescu
Center for Integrative Conservation, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, 666303, Mengla, China
Lin Wang
Center of Conservation Biology, Core Botanical Gardens, Chinese Academy of Sciences, 666303, Mengla, China
Lin Wang
Yunnan Forestry Survey and Planning Institute, 289 Renmin E Rd, 650028, Kunming, Yunnan, China
Chaolang Hua, Zhongxing Yang & Qingzhong Wen
Redpath Museum and Department of Biology, McGill University, 859 Sherbrooke Street West, Montreal, PQ, H3A2K6, Canada
Charles C. Y. Xu
School of Mathematics, Statistics and Actuarial Science, University of Kent, Sibson Building, Canterbury, Kent, CT27FS, UK
Alex Diana
Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, 650201, Kunming, Yunnan, China
Douglas W. Yu
School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, Norfolk, NR47TJ, UK
Douglas W. Yu

Authors

Yinqiu Ji
View author publications
You can also search for this author in PubMed Google Scholar
Christopher C. M. Baker
View author publications
You can also search for this author in PubMed Google Scholar
Viorel D. Popescu
View author publications
You can also search for this author in PubMed Google Scholar
Jiaxin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chunying Wu
View author publications
You can also search for this author in PubMed Google Scholar
Zhengyang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yuanheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Lin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chaolang Hua
View author publications
You can also search for this author in PubMed Google Scholar
Zhongxing Yang
View author publications
You can also search for this author in PubMed Google Scholar
Chunyan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Charles C. Y. Xu
View author publications
You can also search for this author in PubMed Google Scholar
Alex Diana
View author publications
You can also search for this author in PubMed Google Scholar
Qingzhong Wen
View author publications
You can also search for this author in PubMed Google Scholar
Naomi E. Pierce
View author publications
You can also search for this author in PubMed Google Scholar
Douglas W. Yu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

N.E.P. and D.W.Y. initiated the project and supervised the research. Q.W. proposed using the Ailaoshan reserve as the test site, and secured permission and local funding for the fieldwork. C.W., Z.W., Y.L., C.H., Z.Y. and C.Y. supervised sample collection. Y.J. designed the laboratory protocols. Y.J. and J.W. performed the laboratory work. D.W.Y. designed and performed the bioinformatic analyses. C.C.Y.X. contributed code for taxonomic assignment. L.W. contributed GIS analysis. C.C.M.B. conducted all statistical analyses, with advice on modelling provided by V.D.P. and A.D.; C.C.M.B. wrote the manuscript with the input of all authors but especially Y.J., D.W.Y., V.D.P. and N.E.P. All authors approved the final version of the manuscript.

Corresponding authors

Correspondence to Christopher C. M. Baker, Naomi E. Pierce or Douglas W. Yu.

Ethics declarations

Competing interests

D.W.Y. is a co-founder of NatureMetrics (www.naturemetrics.co.uk), which provides commercial metabarcoding services. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Kristine Bohmann, Robert Dorazio, Gentile Francesco Ficetola, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Description of Additional Supplementary Files

Supplementary Dataset 1

Supplementary Dataset 2

Supplementary Dataset 3

Supplementary Dataset 4

Supplementary Dataset 5

Supplementary Dataset 6

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ji, Y., Baker, C.C.M., Popescu, V.D. et al. Measuring protected-area effectiveness using vertebrate distributions from leech iDNA. Nat Commun 13, 1555 (2022). https://doi.org/10.1038/s41467-022-28778-8

Download citation

Received: 13 February 2020
Accepted: 31 January 2022
Published: 23 March 2022
DOI: https://doi.org/10.1038/s41467-022-28778-8

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Sampling and metabarcoding

Vertebrate species

Species richness

Community composition

Discussion

Vertebrate biodiversity in Ailaoshan

Using iDNA for biodiversity monitoring

Design of sampling incentives

Error and bias in iDNA sampling

Multi-species site-occupancy modelling

iDNA: a promising biodiversity monitoring tool

Methods

Leech collections

Environmental characteristics

Laboratory processing

Bioinformatics pipeline

Site-occupancy modelling

Statistics

Species richness

Community composition

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links