The critical role of natural history museums in advancing eDNA for biodiversity studies: a case study with Amazonian fishes

Ichthyological surveys have traditionally been conducted using whole-specimen, capture-based sampling with varied but conventional fishing gear. Recently, environmental DNA (eDNA) metabarcoding has emerged as a complementary, and possible alternative, approach to whole-specimen methodologies. In the tropics, where much of the diversity remains undescribed, vast reaches continue unexplored, and anthropogenic activities are constant threats; there have been few eDNA attempts for ichthyological inventories. We tested the discriminatory power of eDNA using MiFish primers with existing public reference libraries and compared this with capture-based methods in two distinct ecosystems in the megadiverse Amazon basin. In our study, eDNA provided an accurate snapshot of the fishes at higher taxonomic levels and corroborated its effectiveness to detect specialized fish assemblages. Some flaws in fish metabarcoding studies are routine issues addressed in natural history museums. Thus, by expanding their archives and adopting a series of initiatives linking collection-based research, training and outreach, natural history museums can enable the effective use of eDNA to survey Earth’s hotspots of biodiversity before taxa go extinct. Our project surveying poorly explored rivers and using DNA vouchered archives to build metabarcoding libraries for Neotropical fishes can serve as a model of this protocol.

www.nature.com/scientificreports/ libraries; station 3 = 1 library), and the number of raw reads for each library ranged from 135,818 to 213,952 with an average of 173,015 reads (Table S2). The final reference database with 1,671,871 fish reads (99.7% of the denoised reads) yielded 222 species assigned to 104 genera, 41 families, and 9 orders of fishes ( Fig. 2 and Tables S3, S4). Matching sequence identity of > 98.5% for 58 species (26%) of 222 species detected by eDNA was found in the reference library database (Table 1). For the most species-rich orders detected by eDNA, Siluriformes were represented by 36.3% of the species; Characiformes 27.1%; Cichliformes 11.7%; and Gymnotiformes 10.8%. From these, six species (10.3%) were identified as "sp. " in reference libraries. Only 17 species (7.6%) were also identified in the CBS (Table 1).
Species composition among eDNA samples: distinguishing between river versus stream-dwelling communities. Six eDNA samples were collected in the river (stations 1 and 3) and five in the stream (station 2), and a clear split is seen between these two fish communities (Fig. 3). The number of species detected per sample ranged from 33 to 87 (for details see Supp information) with an abrupt differentiation between species composition in the stream (samples [1][2][3][4][5] and river samples (samples [6][7][8][9][10][11], as detected by the Pearson correlation coefficients (Fig. 3A). That is, stream and river-dwelling communities are distinct on the species composition Habitat axis. Pearson coefficients are varying from 0.5 to 1.0 in stream versus 0.0 to − 0.5 in river. Thus, species composition is more similar within each community, except for a clear distinction between the river assemblages at Javari (samples [6][7][8][9][10] and Quixito Rivers (sample 11).
To assess whether the differences in species composition between stream and river communities observed in the Pearson correlation coefficients were significant, we calculated Jaccard's dissimilarities indices through a NMDS analysis. The original position of the 222 detected species in river, stream, and in both habitats were www.nature.com/scientificreports/ represented in a three-dimensional NMDS space (Fig. 3B). The Stress = 0.0524 of the NMDS plot indicated that its first three axes provided an appropriate three-dimensional representation of the habitats according to their species composition 36 , and NMDS significantly distinguished between the river and stream communities (ANOSIM R = 0.4327; p < 0.0001; Fig. 3B). Based on the species frequency detected per order we determined the composition of the stream and river habitats (Fig. 3B,C). Of note is the difference in the species composition between the five samples from Javari River (Samples 1 to 5) and the single sample (Sample 6) collected in the Quixito River (Fig. 3A,D). The interrelationships between habitat and species diversity and composition per order are represented in the chord diagram in Fig. 3E.
Comparing capture-based sampling (CBS) and Molecular-based sampling (MBS)-eDNA metabarcoding species richness. Javari River (station 1). CBS captured a total of 145 species, 101 genera, 32 families and nine orders in the main Javari River. Conversely, MBS found 107 species, 28 genera, 20 families, and seven orders (Fig. 4A,B; Tables S5, S6). Thirteen species were detected by both CBS and MBS ( Table 1). The rarefaction sampling curve illustrating the accumulation of unique species with the number of individuals collected by CBS does not reach an asymptote (Fig. 4C), indicating that several species remain to be detected. This Illustrative cladogram with reads and frequencies for each of 222 molecular operational taxonomic units (MOTUs) and reference sequences included in nine orders and 41 families detected by 11 eDNA samples in the Javari River basin. Color highlighting MOTUs names corresponds to each of the nine orders. In the left side, species richness, key color, and general bauplan silhouettes for each order. At the center, spherical view of species diversity detected by eDNA.  GCG GTT ATA CGA GAG  GCT CAA GTT AAT ACT ATC  GGC GTA AAG TGT GAT TAT  AGG ACC CAA TAC TAA AGC  CAA AAG GCC TCA AAA CTG  TTA TAC GCC CAT TGA GAC  TTG TAG GCT CCA AAA CGA  AAG TAG CTT TAA AAC TTT  AAC CTA GAA TTC ACG ACA  GCT AGG AA   Clupeiformes  Engraulidae  Anchoviella sp. 1  18,117  4  INPA-ICT 055766 S1   ACC GCG GTT ATA CGA GAG  ACC CTA GTT GAT TAA AGC  GGC GTA AAG AGT GGT TAT  GGA ACT ATT CTT TTA AAG  CAG AAA ACC TCT CAA ACT  GTT ATA CGC ACC CAG AGG  TCT GAA TCC CTC ACA CGA  AAG TGA CTT TAT TTA GCG GTT AAA CGA GAG  ACC CTA GTT GAT CAT CAC  GGC GTA AAG AGT GGT TAG  GGG ATT ATC ATA AAT AAA  GCC AAA GAC CTC CCA AGC  TGT CGC ACG CAT TCC GGA  GGC GCG AAG CCC ACA CAC  GAA AGT AGC TTT AAC TAT  TGA CCC TGA TGC CAC GAA  AGC TAA GTT ATA CGA GAG  ACC CTA GTT GAT AGC CGC  GGC GTA AAG AGT GGT TAG  GGA TAC CCA ACA AAT AAA  GCC AAA GAC CTC CCA AGC  TGT TAC ACG CAT CTC GGA  GGC ACG AAG CCC CAC TAC  GAA AGT GAC TTT AAT CTC  TTC CCG ACG CCA CGA AAG  CTA AGA A   Characiformes  Anostomidae  Abramites hypselonotus  1304  1   ACC GCG GTT ATA CGA GAG  ACC CTA GTT GAT AGC TAC  GGC GTA AAG GGT GGT TCG  AGA TAA ATT ACA AAT AAA  GCT AAA GAC CTT CTA AGC  TGT TAC AAG CAC TCC ATT GAT AGG CAC  GGC GTA AAG AGT GGT TAG  GGG TAG ACT ATA AAT AAA  GCT AAA GAC CTT CTA AGC  TGT CAT ACG CAC ACC GAA  GGC ATG AAG TCC TAA TAC  GAA AGT AGC TTT ACT ATT  ATC CTT GAC GCC ACG AAA  GCT AAG AA   Characiformes  Chilodontidae  Caenotropus labyrinthicus  14  1  INPA-ICT 055404 S2   ACC GCG GTT ATA CGA GAG  ACC CTA GTT GAT ATG TAC  GGC GTA AAG AGT GGT TTG  GGA CAC CTT AAT AAA TAA  AGC CAA AGA CCT CCC CAA  GCT GTT GTA CGC ACT CCG  GAG GCA CGA AGC CCT AAT  ACG AAA GTA GCT TTA TTG  AGC CCG ACG CCA CGA GCG GTT ATA CGA GAG  ACC CTA GTT GAT ATA CAC  GGC GTA AAG AGT GGT TTG  GGA CAA ACC AAA TAA TAG  AGC CAA AGA CCT CCC AAG  CCG TCA CAC GCA CCC CGG  AGG CAC GAA GCC CAA GCA  CGA AAG TAG CTT TAT TAC  ACC CCC GAC GCC ACG Tables S7, S8). Six species were detected by both methodologies (Table 1). The rarefaction curve for CBS extrapolates to slightly over 60 species the diversity in the stream (Fig. 4F). Conversely, MBS Chao II bias-corrected estimator calculated 145 species in the stream (95% confidence interval: 134-172).

Discussion
Can eDNA provide an accurate snapshot of the Amazonian megadiverse freshwater ichthyofauna considering current public reference libraries for 12s rRNA? The Javari River basin contains a considerable fraction of Amazonian fish diversity, ca. 15% of species, 37% of genera, 60% of families, and 83% of orders. It is, therefore, an excellent testing ground for eDNA metabarcoding effectiveness for the Amazonian fish fauna. Based on the current public reference libraries, i.e., GenBank and MiFish DB, MBS provided an accurate snapshot of the Amazonian megadiverse freshwater ichthyofauna at the Javari River basin when we consider higher taxonomic levels, i.e., order. The detection of 222 species in 11 samples from three stations confirms that eDNA is highly sensitive. However, the low number (28%) of matching sequences with identity of > 98.5% in the public reference libraries suggests severe gaps in the library for Amazonian fishes. It corroborates a recent global gap analysis of reference databases 24 , which revealed that 13% of the over 33,000 known teleostean fish species are sequenced for 12S, representing 38% of genera, 80% of families and 98.5% of orders. For freshwater fishes, among all continents, South America and Africa had by far the lowest coverage. Not surprisingly, we found the lowest eDNA identification match at the species level.
Conversely, studies that built reference libraries for highly diverse fish communities considerably improved the match ratio to species identification between capture-based and eDNA approaches 30 , for example, identified 65% of 203 species of Guianese fishes. Likewise 37 , detected and correctly assigned all 67 species with 12S previously designed primers and reference library in the São Francisco River, Brazil. In contrast 32 , assigned only 4 of 84 MOTUs to species, demonstrating problems of taxonomic resolution in the target gene and general threshold used for species assignment.
The DNA barcoding and eDNA metabarcoding both rely on short, variable, standardized DNA regions, which can be amplified by PCR, sequenced, and analyzed to identify taxa. The eDNA approach for vertebrates does not efficiently employ the COI gene because interspecific genetic variation prevents the use of universal primers 38 and can result in non-specific amplifications ( 39 ; but see 40 ). Instead, rRNA genes used in DNA metabarcoding, such as 12S rRNA (e.g., 3 ), have the acceptable resolution at the species level and an elevated copy number per cell due to the number of mitochondria per cell. Similarly, rRNA genes are preferable over single-copy nuclear DNA, which is less likely to be detected in the environment. Yet, the low substitution rate of rRNA genes will compromise the identification of rapidly evolved and complex fish assemblages such as those in the Neotropical region. Thus, it is likely that, in the near future, DNA barcoding and eDNA metabarcoding methods will converge to use large portions of the mitochondrial genome. Regardless of the fragment or the threshold used to delimit species (e.g., 32 ), it is essential that studies involving eDNA for assessing fish diversity move towards building robust mitochondrial DNA reference libraries based on vouchered specimens.
In addition to the two aforementioned factors, other specific issues such as low sampling effort in the stream (i.e., restricted to a 50-m stretch), difficulty to sample cryptobiotic habits, hidden species diversity, and lack of  -ICT 055269 S1   GCC GCG GTT ACA CGA GAG  GTC CAA GTT GAT AAA CAA  CGG CGT AAA GGG TGG TTA  GGA ATA AAA ATA AAC TAA  AGC CGA ACG GTT CAC AAA  GTC ATC CTC AAG CTA ACG  AGA ACA TGA AGC CCA ACC  ACG AAA GTG GCT TTA CAT  AAT TCT GAA TCC ACA   www.nature.com/scientificreports/  eDNA species detection across heterogeneous aquatic environments. Amazonian aquatic environments are characterized by specialized fish communities segregated across a variety of habitats, such as streams, rivers, and their microhabitats 14,41 . In streams, diverse microhabitats are home to leaf-dwelling, sanddwelling, and pool-dwelling fish communities [42][43][44] . Similarly, rivers have specialized fish groups living in highenergy or deep water (> 5 m) environments. It is critical that fishes inhabiting all aquatic environments are sampled in biodiversity inventories. Perhaps unsurprisingly, it is incredibly difficult to sample and therefore assess some microhabitats by CBS. For example, some species are buried deep in the roots of plants in the riparian zone (e.g., 45 ), leaf litter, or in the sand of streams that are extremely difficult to collect with traditional sampling gear. These life history strategies naturally obscure the number of fish species living in these areas due to microhabitat partitioning and undescribed diversity. We corroborate the potential effectiveness of MBS to detect specialized fish assemblages across heterogeneous aquatic environments. More intensive sampling efforts might be required to detect low-occurrence taxa as well as to appropriately sample microhabitats, e.g., filtering a higher amount of water or collecting water from temporary pools and the river bottom.
River. In the Javari River (station 1), seven orders were detected by CBS and MBS (Characiformes, Cichliformes, Clupeiformes, Gymnotiformes, Perciformes, Pleuronectiformes, Siluriformes). In addition, CBS detected Beloniformes and Myliobatiformes; and MBS found Osteoglossiformes. The absence of Beloniformes and Myliobatiformes in the MBS could be due to the poor reference library for comparisons. In contrast, Osteoglossiformes (Arapaima gigas) is well known to occur in the region and specimens were found in the local market. Thus, the absence of Arapaima in the CBS was circumstantial. The difference in species composition between the two methodologies that was detected possibly is due to sampling bias in MBS. Water samples for MBS were only collected at the river surface, detecting mostly free DNA of fish assemblages occurring at midwater and near the surface, where species-diverse Characiformes are the dominant assemblage. Despite that, as aforementioned, MBS was also able to take a snapshot of the benthic fish fauna by detecting many catfish species typically restricted to river channels (e.g., 46,47 ). For example, MBS detected river-dwelling fishes living near the surface, as well as some deep-water (> 5 m depth) inhabitants e.g., Brachyplatystoma spp.-goliath catfishes; Pleuronectiformes-flatfishes; and a large number of unidentified species of electric fishes (sequences identities within the range of 80-98.5%) belonging to the families Apteronotidae (10 species) and Sternopygidae (15 species)-common, but often underestimated components of rivers (e.g., [48][49][50] ).
In addition, the sole sample collected in the mouth of the Quixito River (station 3) was substantially different from the five samples collected in the Javari River reflecting the different milieu where the samples were collected. The Javari samples were dominated by Characiformes whereas Gymnotiformes dominated in the Quixito River sample. In the Javari River, samples were collected in fast-flowing water along the edge between a shallow peat www.nature.com/scientificreports/ bog and the main channel. The Quixito River sample was collected at the mouth of the river, characterized by small slow-flowing channel.
Stream. Typically, Characiformes, Siluriformes, Gymnotiformes, Cichliformes, Cyprinodontiformes, Beloniformes, and Synbranchiformes are the dominant orders in Amazonian streams (e.g., 51 ). At station 2, both approaches detected species belonging to Characiformes, Siluriformes, Gymnotiformes, and Cichliformes. In addition, CBS found Beloniformes and Cyprinodontiformes whereas MBS detected Synbranchiformes for a total of seven orders. The absence of Beloniformes in the MBS may be due to the poor reference library for comparisons, and the absence of Synbranchiformes in the CBS here could be due to the difficulty in collecting cryptobiotic species. We were able to detect at a fine-scale specialized species assemblage restricted to microhabitats. For example, we captured members of the leaf-dwelling (e.g., Apistogramma spp.-dwarf cichlids) and sand-dwelling (e.g., Gymnorhamphichthys spp.-sand knifefishes) fish communities. It remains to be determined whether eDNA failed to detect fishes that are residents in the temporary pools (e.g., killifishes-Rivulidae) because of the limitation of its radius of action, or due to the poor reference library for Neotropical fishes. Species diversity in Amazonian Terra firme streams ranges from ca. 30-170 species 52 with Characiformes and Siluriformes being the most species-rich orders (e.g., 53 ). Quantification of fish richness in these streams depends upon the sampling methodology employed and its substrate composition (for reviews see [54][55][56] ). For example, in litter banks-rich streams, Gymnotiformes species diversity can surpass Siluriformes (e.g., 57 ). In station 2, according to CBS, Characiformes and Siluriformes were the dominant orders. In contrast, Characiformes followed by Gymnotiformes were the more species-diverse groups. The extremely high number of species detected by MBS in the sampled stream, more than twice that of CBS, primarily in the two dominant orders, Characiformes and Gymnotiformes, is likely related to five different issues. First, MBS was collected near the confluence between the river and stream, which may have resulted in occasional, wandering river fishes. Second, the CBS was conducted with a standardized sampling effort in a restrict (50-m) stretch of the stream (e.g., 55 ), not including its headwaters and areas near its mouth. Third, Characiformes undoubtedly contain hidden species diversity. This is corroborated by the historical difficulty in identification of small tetra species, wherein one named species may represent several undescribed species, such as in Astyanax (e.g., 58,59 ). Fourth, diversity is also underestimated for the Gymnotiformes (e.g., 60 ), for which difficulties in capturing species with cryptobiotic habits possibly play a critical role in the underestimation of their diversity by CBS methods (sub-estimative may reach three times the local species richness and up to 10 times the specimens abundance; JZ, unpublished data). Fifth, overestimation may be due to intraspecific variety caused by amplicon sequence variants (ASVs). Any thresholds for taxonomic assignments (e.g., 98.5% as used here) are arbitrary and should reflect both over-and underestimation of species diversity, considering that there are several morphologically cryptic species with large molecular divergence and morphologically distinct species with small genetic divergence. The dereplicated reads were denoised by removing all putatively chimeric and erroneous detections to keep only true sequences, yet overestimations in the number of species may still be the result of ASVs (e.g., 32 ). For example, our recognition of the nine species of Leporinus (Leporinus apollo, Leporinus sp. 2-10; Table S3) could be a consequence of ASVs overestimation. In contrast, only six species of Leporinus were detected by MBS across all 46 localities throughout the Javari River basin ( Fig. 1; Table S1).
These examples indicate that further studies are needed to find appropriate thresholds for taxonomic assignments in Amazonian fishes.
The role of natural history museums in the advance of eDNA studies. The biodiversity crisis is one of the grand challenges of the twenty-first century 61,62 with the next two decades critical for the conservation of freshwater environments. Freshwater ecosystems worldwide hold ca. 30% of vertebrate diversity, including ca. 50% of all fish species diversity, and are one of the most vulnerable environments on Earth 35,[62][63][64][65] .
Combining specimens, DNA sampling and taxonomic identification is required to obtain a comprehensive assessment of biodiversity. Yet, DNA samples are available for fewer than 10% of the specimens deposited in most fish repositories. Since most fish specimens deposited in museums and other repositories were collected before the development of PCR, a vast majority were fixed in formalin, a standard method of fixation for over a century. Despite the advances in the techniques of DNA extraction from formalin-fixed materials, the success of these techniques is still limited, especially for specimens stored for long periods in unbuffered solutions [66][67][68] . Thus, well-identified vouchered DNA tissue samples are critical for the identification of unknown DNA in environmental samples. These DNA tissues may be stored as dried, frozen, or alcohol-fixed samples or as cryopreserved living samples that have broad potential applications (e.g., 69 ). However, scientific collections in regions holding most of the fish diversity, such as the Neotropics, often lack the ideal infrastructure to hold long-term genetic resources (e.g. ultrafreezers, liquid nitrogen storage, cryo-facilities). Nevertheless, GGBN has targeted and sometimes funded Neotropical institutions to build biorepository capacity and to make their collections globally discoverable.
These limitations are particularly worrisome given the stark reality of anthropogenic destruction, climate change and the great extent of predicted unknown diversity that remains to be described in the Amazon rainforest 70,71 . These factors make this area and Earth's other hotspots of biodiversity priority targets for complete species inventories in the next decade before suffering irreversible damage (e.g., 72 ). Another advantage of eDNA is the long-term biodiversity monitoring in preserved areas/conservation units (e.g., 73 ). The use of eDNA is a highly valuable and cost-effective way to monitor biodiversity, especially in areas with low anthropogenic threats 74 . This would allow a better prioritization of scarce resources for research and/or conservation actions.
In the face of these challenges, natural history museums should play a primary role in the development of eDNA as a tool of biodiversity inventories as well as to track changes in biodiversity hotspots by: (1)  www.nature.com/scientificreports/ expeditions to jointly secure DNA samples, vouchers, and eDNA in Earth's hotspots of biodiversity; (2) adapting their biorepositories to archive eDNA samples, which as a consequence, would provide samples not only for analysis with current but heretofore unseen technologies; (3) creating reference libraries for the mitochondrial genome; (4) backing up DNA samples with species-level accuracy on the identification of vouchered specimens; (5) expanding and improving their tissue biobanks. It is crucial that these modifications for eDNA storage also occur in museums throughout the tropics, but especially in Neotropical and Afrotropical countries, which host most of the freshwater fish diversity yet lack the resources to build and maintain these tissue collections in perpetuity [75][76][77] . These efforts would maximize the information extracted from eDNA metabarcoding and DNA samples, facilitate the design of sets of universal primers for broader biodiversity inventories, monitor hotspots of biodiversity, and support taxon-specific surveys; (6) improving public platforms to close gaps in sampling information and making possible access to DNA sequences; (7) training students and researchers to use CBS, MBS, morphology and molecular-based taxonomy to survey and identify biodiversity. By combining eDNA with tissues associated with museum-curated voucher specimens, we can continue to fill gaps currently missing in our knowledge of biodiversity, thus, making eDNA a more reliable tool for biodiversity inventories. The high frequency of our lowest taxonomic identifications ending with "sp., " species undetermined, when assessing species diversity using a new technology highlights the need for highly trained taxonomic specialists. Finally, (8) using eDNA research as a gateway to inspire and engage society in natural history and the race against time to survey and protect Earth's hotspots of biodiversity through education and citizen science programs. Considering the simplicity of implementing MBS in certain aquatic environments, such as rivers (see "Methods" section), scientific communities at natural history museums can launch regional/ global outreach and human resource training initiatives involving citizen scientists, K-12 students, and professional scientists. Likewise, it would create niches for large-scale natural history museums to work with regional-scale scientific institutions worldwide, such as in the training of human resources (e.g., technicians to curate genetic resources) and promoting horizontal transfer of technology in South America and Africa (e.g., eDNA methodology). In sum, activities involving eDNA have the potential to fulfill the priorities of natural history museums in the twenty-first century: research, collections, training, and outreach. One successful initiative is the DNA barcoding and metabarcoding libraries for Amazonian fishes supported by Smithsonian's Global Genome Initiative (GGI), DNA Barcode Alliance, and São Paulo Research Foundation (FAPESP). The current project is the first of many scientific expeditions planned over the next three years to survey fishes in poorly explored areas of the Amazon basin supported by these three initiatives. DNA and eDNA samples and vouchers are being used to develop a robust, well-documented, mitochondrial DNA reference database. This eDNA database is validated by morphological (phenotypic) vouchers. Additional eDNA samples have been collected and deposited in the Smithsonian Institution's National Museum of Natural History Biorepository. We aim to make available an online platform of DNA sequences of all orders and families, most of the genera, and a significant number of species of Amazonian fishes. Likewise, GGI is also supporting an initiative for African freshwater fishes. These actions together with the ongoing development of eDNA technology and bioinformatics will enable the use of eDNA metabarcoding in fish inventories and the more effective monitoring of hotspots of biodiversity worldwide.

Methods
Study area. The Javari River encompass an area of 109.202 km 2 with a 1180 km of a main white water river channel (sensu Sioli, 1967; i.e., pH-neutral low-transparency, alluvial sediment-laden tributary of the Amazon River forming the border between Brazil, Peru and Colombia for ca. 800 km). The first formal records for the Javari River basin were obtained during the Thayer Expedition to Brazil, in 1865. Most of region remained largely unexplored until our survey conducted along the Javari River basin during the low water season in July-August of 2017.
Specimens sampling and identification. All samples were collected according with Brazilian regulations under Jansen Zuanon permanent permit (SISBIO # 10199-3). Capture-based specimens were sampled at 46 localities along the Javari River basin (Fig. 1) during the low water season in July-August, 2017, using gill nets, cast nets, hand nets, and trawl nets in rivers, rapids, beaches, streams, and lakes (Table S1). All fish specimens collected were identified to species level and deposited at the Instituto Nacional de Pesquisas da Amazônia Water sampling sites and on-site filtration. Along with the Capture-based specimens, eleven water samples were collected from water surface at three stations to represent the Javari fish fauna: Station 1, Fig. 1; JAV2017081606 (5 samples)-Javari River, below Limoeiro (− 4.176, − 70.779); Station 2, Fig. 1; JAV2017082108 (5 samples)-Terra firme clearwater stream (locally called "igarapés"), i.e., acid, highly-transparent, shallow (depth < 2 m), and non-flooding stream at Palmari community (− 4.293, − 70.291); and Station 3, QUI2017082906 (1 Sample)-Quixito River (− 4.428, − 70.260). We used low-tech bucket-sampling to collect freshwater using a 10L polypropylene bucket fastened to a 5 m rope (nylon rope, 6 mm in diameter) to collect 5L of water. Before the water sampling, we wore disposable gloves on both hands and assembled two sets of on-site filtration kits consisting of a Sterivex filter cartridge (pore size 0.45 µm; Merck Millipore, MA, USA) and a 50 mL disposable syringe. Then we thoroughly decontaminated the bucket with a foam-style 10% bleach solution and brought the equipment to the sampling point. We fastened one end of the 5 m rope to the bucket and collected surface freshwater by tossing and retrieving it. We repeated collection of fresh water three times to minimize sampling biases at each station. www.nature.com/scientificreports/ We performed on-site filtration using a filtration kit described above (filter cartridge + syringe) to obtain duplicate samples. With each collection of fresh water, we removed the filter cartridge from the syringe, drew approximately 50 ml freshwater into the syringe by pulling the plunger, reattached the filter cartridge to the syringe, and pushed the plunger to filter the water. We repeated this step twice in each toss of the bucket sampling so that the final filtration volume reached 100 ml. When the filter was clogged before reaching 100-ml filtration, we recorded the total volume of water filtered (70-100 ml from three stations).
After on-site filtration, we sealed an outlet port of the filter cartridge with Parafilm (Bemis NA, Wisconsin, USA), added 2 ml of RNAlater (Thermo Fisher Scientific, DE, USA) into the cartridge from an inlet port of the cartridge using a disposable capillary pipette (Kinglate, USA) to prevent eDNA degradation, and then sealed the inlet port either with Parafilm or a cap for preservation. Filtered cartridges filled with RNAlater were kept in − 20 °C freezers until shipment to MM's lab at Natural History Museum and Institute, Chiba, Japan. Samples shipped under export for biological material permit at room temperature using an overseas courier service.

DNA extraction.
All DNA experiments were conducted in MM's lab and all experimental protocols were approved by the Natural History Museum and Institute. We sterilized the workspace and all equipment before DNA extraction. We used filtered pipette tips and conducted all eDNA-extractions and manipulations in a dedicated room that is physically separated from pre-and post-PCR rooms to safeguard against cross-contamination from PCR products.
We extracted eDNA from the filter cartridges using a DNeasy Blood & Tissue kit (Qiagen, Hilden, Germany) following the methods developed and visualized by 78 with slight modifications.
We connected an inlet port of each filter cartridge with a 2.0-ml collection tube and tightly sealed the connection between the cartridge and collection tube with Parafilm. We inserted the combined unit into a 15-ml conical tube and centrifuged the capped conical tube at 6000×g for 1 min to remove freshwater and RNAlater. After centrifugation we discarded the collection tube and used an aspirator (QIAvac 24 Plus, Qiagen, Hilden, Germany) to completely remove liquid remaining in the cartridge.
We subjected the filter cartridge to lysis using proteinase K. Before the lysis, we mixed PBS (220 μl), proteinase K (20 μl) and buffer AL (200 μl), and gently pipetted the mixed solution into the cartridge from an inlet port of the filter cartridge. We again sealed the inlet port and then placed the cartridge in a 56 °C preheated incubator for 20 min while stirring the cartridge using a rotator (Mini Rotator ACR-100, AS ONE, Tokyo, Japan) with a rate of 10 rpm. After the incubation, we removed the film from the inlet port and connected the port with a 2-ml tube (DNA LowBind tube, SARSTEDT, Tokyo, Japan) for DNA collection. We placed the combined unit in a 50-ml conical tube and centrifuged the capped tube at 6000×g for 1 min to collect the DNA extract.
We purified the collected DNA extract (ca. 900 μl) using the DNeasy Blood and Tissue kit following the manufacture's protocol with a final elution volume of 200 µl. We completed DNA extraction in one round and used one more premix for the extraction blank (EB) to monitor contamination. All DNA extracts were frozen at − 20 °C until paired-end library preparation.
DNA aliquots for all 11 samples were deposited at the Museu de Zoologia da Universidade de São Paulo (MZUSP) under the accession numbers MZUSP 125856 to 125866.
Paired-end library preparation and sequencing. We sterilized the workspace and equipment in the pre-PCR area before library preparation. We used filtered pipette tips and performed pre-and post-PCR manipulations in two different, dedicated rooms to safeguard against cross contamination.
We employed a two-step PCR for paired-end library preparation on the MiSeq platform (Illumina, CA, USA) and generally followed the methods developed by 3  These primer pairs amplify a hypervariable region of the mitochondrial 12S rRNA gene (ca. 172 bp; hereafter called "MiFish sequence") and append primer-binding sites (5′ ends of the sequences before six Ns) for sequencing at both ends of the amplicon. We used the six random bases (Ns) in the middle of those primer to enhance cluster separation on the flow cells during initial base call calibrations on the MiSeq platform.
We carried out the 1st PCR with 35 cycles in a 12-µl reaction volume containing 6.0-µl 2 × KAPA HiFi Hot-Start ReadyMix (KAPA Biosystems, MA, USA), 2.8 µl of a mixture of the four MiFish primers in an equal volume (U/E forward and reverse primers; 5 µM), 1.2-µl sterile distilled H 2 O and 2.0-µl eDNA template (a mixture of the duplicated eDNA extracts in an equal volume). To minimize PCR dropouts during the 1st PCR, we performed 8 replications for the same eDNA template using a strip of 8 tubes (0.2 ml). The thermal cycle profile after an initial 3 min denaturation at 95 °C was as follows: denaturation at 98 °C for 20 s, annealing at 65 °C for 15 s and extension at 72 °C for 15 s with the final extension at the same temperature for 5 min. We also made a 1st PCR blank (1B) during this process in addition to EB. Note that we did not perform 8 replications and used a single tube for each of the two blanks (EB, 1B) to minimize cost of the experiments.
After completion of the 1st PCR, we pooled an equal volume of the PCR products from the 8 replications in a single 1.5-ml tube and purified the pooled products using a GeneRead Size Selection kit (Qiagen, Hilden, Germany) following the manufacturer's protocol for the GeneRead DNA Library Prep I Kit. This protocol repeats the column purification twice to completely remove adapter dimers and monomers. Subsequently we quantified the purified target products (ca. 172 bp) using TapeStation 2200 (Agilent Technologies, Tokyo, Japan), diluted www.nature.com/scientificreports/ it to 0.1 ng/µl using Milli Q water and used the diluted products as templates for the second-round PCR (2nd PCR). For the two blanks (EB, 1B), we purified the 1st PCR products in the same manner, but did not quantify the purified PCR products, diluted them with an average dilution ratio for the positive samples, and used the diluted products as templates for the 2nd PCR. For the 2nd PCR, we used the following two primers to append dual-index sequences (8 nucleotides indicated by Xs) and flowcell-binding sites for the MiSeq platform (5′ ends of the sequences before eight Xs): 2nd-PCRforward (5′-AAT GAT ACG GCG ACC ACC GAG ATC TAC ACX XXX XXX XAC ACT CTT TCC CTA CAC  GAC GCT CTT CCG ATC T-3′); and 2nd-PCR-reverse (5′-CAA GCA GAA GAC GGC ATA CGA GAT XXX  XXX XXG TGA CTG GAG TTC AGA CGT GTG CTC TTC CGA TCT-3′).
We carried out the 2nd PCR with 10 cycles of a 15-µl reaction volume containing 7.5-µl 2 × KAPA HiFi HotStart ReadyMix, 0.9-µl each primer (5 µM), 3.9-µl sterile distilled H 2 O and 1.9-µl template (0.1 ng/µl with the exceptions of the three blanks). The thermal cycle profile after an initial 3 min denaturation at 95 °C was as follows: denaturation at 98 °C for 20 s, annealing and extension combined at 72 °C (shuttle PCR) for 15 s with the final extension at the same temperature for 5 min. We also made a 2nd PCR blank (2B) during this process in addition to EB and 1B.
To monitor for contamination during the DNA extraction, 1st and 2nd PCRs of the 11 samples, we made a total of 3 blanks (EB, 1B, 2B) and subjected them to the above library preparation procedure.
We pooled each individual library in an equal volume into a 1.5-ml tube. Then we electrophoresed the pooled dual-indexed libraries using a 2% E-Gel Size Select agarose gel (Invitrogen, CA, USA) and excised the target amplicons (ca. 370 bp) by retrieving them from the recovery wells using a micropipette. The concentration of the size-selected libraries was measured using a Qubit dsDNA HS assay kit and a Qubit fluorometer (Life Technologies, CA, USA), diluted them at 12.0 pM with HT1 buffer (Illumina, CA, USA) and sequenced on the MiSeq platform using a MiSeq v2 Reagent Kit for 2 × 150 bp PE (Illumina, CA, USA) following the manufacturer's protocol. We subjected the pooled dual-indexed libraries a MiSeq run with a PhiX Control library (v3) spike-in (expected at 5%).
Data preprocessing and taxonomic assignment. We performed data preprocessing and analysis of MiSeq raw reads using USEARCH v10.0.240 79 according to the following steps: (1) Forward (R1) and reverse (R2) reads were merged by aligning the two reads using the fastq_mergepairs command. During this process, low-quality tail reads with a cut-off threshold set at a quality (Phred) score of 2, too short reads (< 100 bp) after tail trimming and those paired reads with too many differences (> 5 positions) in the aligned region (ca. 65 bp) were discarded; (2) primer sequences were removed from those merged reads using the fastx_truncate command; (3) those reads without the primer sequences underwent quality filtering using the fastq_filter command to remove low quality reads with an expected error rate of > 1% and too short reads of < 120 bp; (4) the preprocessed reads were dereplicated using the fastx_uniques command and all singletons, doubletons, and tripletons were removed from the subsequent analysis following the recommendation by the author of the program 79 ; (5) the dereplicated reads were denoised using the unoise3 command to generate amplicon sequence variants (ASVs) that remove all putatively chimeric and erroneous sequences 80 ; (6) finally ASVs were subjected to taxonomic assignments to species names (Molecular Operational Taxonomic Units; MOTUs) using the usearch_global command with a sequence similarity of > 98.5% with the reference sequences and a query coverage of ≥ 90% (two nucleotide differences allowed). Those ASVs with the sequence similarity of 80-98.5% were tentatively assigned "U98.5" labels before the corresponding species name with the highest identities (e.g., U98.5_Synbranchus marmoratus), and they were subjected to clustering at the level of 0.985 using cluster smallmem command. Nonetheless, using these criteria, any ASVs/MOTUs assigned to species based on < 98.5% sequence similarity represent independent species regardless of species' names. For convenience of biological comparisons and discussion, we call those MOTUs "species".
An incomplete reference database necessitates this clustering step that enables detection of multiple MOTUs under an identical species name. We annotated such multiple MOTUs with "gotu1, 2, 3…" and tabulated all the outputs (MOTUs plus U98.5_MOTUs) with read abundances. We excluded those ASVs with sequence identities of < 80% (saved as "no_hit") from the above taxonomic assignments and downstream analyses, because all of them were found to be non-fish organisms. For a reference database, we used MiFish DB ver. 36 for taxa assignment, which contained 7973 species distributed across 464 families and 2675 genera. In addition, we downloaded all the fish whole mitochondrial genome and 12S rRNA gene sequences from GenBank as of 15 December 2020.
We refined the above automatic taxonomic assignments with reference to a family-level phylogeny based on MiFish sequences from both MOTUs and the reference database. For each family, we assembled representative sequences (most abundant reads) from MOTUs (including U98.5) and added all reference sequences from that family and an outgroup (a sequence from a closely-related family) in FASTA format. We subjected the FASTA file to multiple alignment using MAFFT 81 with a default set of parameters. We constructed a neighbor-joining (NJ) tree with the aligned sequences in MEGA7 82 using pairwise deletion of gaps and the Kimura two-parameter distances 83 with the among-site rate variations modeled with gamma distributions (shape parameter = 1). We assessed statistical support for internal branches of the NJ tree using the bootstrap resampling technique (100 resamplings). In addition, aligned sequences were submitted to Bayesian Inference (BI) analyses run for 10 million generations sampling every 1000 generations to determine posterior probability for each MOTU and reference sequences. Models were obtained on JModeltest2 84 . BI analyses were run in the Mr. Bayes v3.2.7 85 . Some of the BI analyses were conducted on the CIPRES science gateway v3.3 86 . Trees were analyzed and rendered in iTOL v5.7 87 .
The MiSeq paired-end sequencing ( www.nature.com/scientificreports/ of ≥ 30.0 (Q30; error rate = 0.1% or base call accuracy = 99.9%). This run was highly successful considering the manufacture's guidelines (Illumina Publication no. 770-2011-001 as of 27 May 2014) are > 80% bases ≥ Q30 at 2 × 150 bp. Of the 5,274,381 reads, a total of 1,903,160 reads were assigned to the 11 libraries, and the number of raw reads for each library ranged from 135,818 to 213,952 with an average of 173,015 reads (Table S8). After merging the two overlapping paired-end FASTq files (1,826,828 reads [96.0%]), the primer-trimmed sequences were subjected to quality filtering to remove low-quality reads (1,802,098 reads [94.7%]). The remaining reads were dereplicated for subsequent analysis, and single-to tripletons were removed from the unique sequences as recommended by the author of the program 79 . Then, reads were denoised to remove putatively erroneous and chimeric sequences, and the remaining 1,677,402 reads (88.1% of the raw reads) were subjected to taxon assignments. Of these, 1,671,871 reads (99.7% of the denoised reads) were putatively considered as sequences for fishes, and BLAST searches indicated that non-fish sequences (5531 reads [0.3%]) mostly consisted of mammals (i.e., cows, pigs, and humans) and a few unknown sequences. The three negative controls (i.e., EB, 1B, and 2B) were subjected to the same analysis pipeline and yielded only 103 denoised reads in total (only 0.006% of the total raw reads), which were not taken into consideration in the subsequent analyses as their subtraction from the corresponding species did not affect the presence/absence data matrix of sequences assignable to fishes. Contamination from non-Amazonian fishes at Miya's lab was detected and removed (Table S9).
Community structure-molecular-based sampling (MBS). Evaluation of species richness for eDNA included all 11 samples from the river and stream localities. Specifically for river: five samples from station 1 (JAV2017081606) and one sample from station 3 (QUI 2017082906); stream: five samples from station 2 (JAV2017082108). Species richness between CBS and MBS was performed by comparing fish assemblages captured and detected in stations 1 and 2 only. Although a single sample is not appropriate for drawing conclusions about the assemblage composition of a site, in the case of station 3, the single sample shows a trend that is biologically consistent with the community patterns of fishes found along the Amazon basin, e.g., 47, 48. Therefore, we used the single sample from the Quixito River to compare the fish community composition in the three localities.
Species abundance per order was evaluated by heatmaps produced in ggplot2 89 . Composition per Similarity among all 11 samples, three stations, versus stream and river assemblages were calculated using the Pearson correlation coefficient. Then, we calculated Jaccard's dissimilarities, and the coefficient values were ordinated using non-metric multidimensional scaling (NMDS) to visualize how replicated eDNA data discriminate sites and habitat (streams vs. rivers) patterns and to determine the sampling effort needed to identify community changes among sites in the VEGAN package version 2.4-4 90 . A 3D graph was produced in CAR 91 and GLR version 0.103.5 92 packages. Differences in species compositions between sites and habitat types were statistically tested by permutational analysis of similarities (ANOSIM). It allowed for test of the statistical significance of similarity between groups comparing to the within groups similarity using the rank of similarity values 36 . A chord diagram showing the inter-relationship between species composition and habitat (river versus stream) was produced using the Circlize package 93 . Fish silhouettes were produced in Fishsualize v. 0.2.1 94 with the addition of a species of Gymnotiformes. Species richness. Water samples station 1 and station 2: the number of detected taxa between CBS and MBS were represented by Venn diagrams. Rarefaction species accumulation curve for capture-based sampling were calculated for stations 1 and 2 95 using iNEXT package in R 96 for Hill number with order q = 0 (species richness) with 1000 bootstraps. The dissimilarity species composition among samples in stations 1 and 2 were assessed by calculating pairwise Jaccard's distances with the function vegdist. Bias-corrected estimators Chao II 97 was applied to calculate species richness detected by MBS, as suggested by 98 . It was calculated in SpadeR package in R 99 . Species accumulation curves for molecular-based sampling were built using the function specaccum in VEGAN package v2.5.4 90 . Graphs were plotted using ggplot2.