Environmental DNA survey captures patterns of fish and invertebrate diversity across a tropical seascape

Accurate, rapid, and comprehensive biodiversity assessments are critical for investigating ecological processes and supporting conservation efforts. Environmental DNA (eDNA) surveys show promise as a way to effectively characterize fine-scale patterns of community composition. We tested whether a single PCR survey of eDNA in seawater using a broad metazoan primer could identify differences in community composition between five adjacent habitats at 19 sites across a tropical Caribbean bay in Panama. We paired this effort with visual fish surveys to compare methods for a conspicuous taxonomic group. eDNA revealed a tremendous diversity of animals (8,586 operational taxonomic units), including many small taxa that would be undetected in traditional in situ surveys. Fish comprised only 0.07% of the taxa detected by a broad COI primer, yet included 43 species not observed in the visual survey. eDNA revealed significant differences in fish and invertebrate community composition across adjacent habitats and areas of the bay driven in part by taxa known to be habitat-specialists or tolerant to wave action. Our results demonstrate the ability of broad eDNA surveys to identify biodiversity patterns in the ocean.


Materials and Methods
Study Sites. The field component for this study was conducted in Almirante Bay of the Bocas del Toro archipelago ( Fig. 1) on the Caribbean coast of Panama from May to July 2017. A total of 19 study sites that span a wide gradient of environmental conditions were sampled; they were separated by distances ranging from 20 m to 6,000 m and included areas exposed to, and sheltered from, open ocean swell (Fig. 1, Table S1). Six of these areas were previously established Smithsonian Marine Global Earth Observatory (MarineGEO) monitoring sites that each contained adjacent mangrove, seagrass, coral reef and sandy bottom habitats. Eight additional sites contained only mangrove and seagrass habitats, and five were boat docks (Fig. 1). Additional site information can be found in Supplementary Table S1. Surveys in mangrove habitat were conducted in and around the submerged portion of Rhizophora mangle stands (1-2 m depth). Surveys in seagrass habitat were conducted in meadows dominated by Thalassia testudinum (2-4 m depth). Surveys in coral habitat were conducted in reef areas that were dominated by Porites spp. finger corals and Agaricia tenuifolia (2-5 m). eDnA Surveys. The eDNA workflow involved filtering and extracting eDNA from a water sample and amplifying specific regions of DNA using a broad COI metazoan primer set during PCR. The resulting amplicons were then sequenced on a massively parallel sequencing platform and compared to a reference database of partially or fully characterized genomes of organisms to identify the taxa present in the environmental sample 26 .
Sample collection. Three replicate one-liter seawater samples were collected from each of four habitat types (mangrove, seagrass, coral, sand) at the six MarineGEO monitoring sites (n = 72) and from mangrove and seagrass habitats at eight additional sites (n = 48). At the five dock sites, a 1 L water sample was taken from each of the two sides of each dock; there was one dock at three of the sites and 2 docks at two of the sites (n = 14). In total, 134 water samples were collected (see Supplementary Table S1).
Water samples were collected by opening a translucent bleach-sterilized wide-mouth polypropylene sampling bottle directly above (10-30 cm) the surface of the habitat without disturbing benthic sediments. Samples were filtered immediately on the boat after collection when possible, but occasionally frozen at −20 °C until filtering. There was no difference in the amount of DNA extracted (t-test, P = 0.55) or in PCR yield (P = 0.14) from freshly filtered versus frozen samples (Supplementary Figs S1 and S2). Water samples were vacuum-filtered onto sterile 47 mm diameter 0.45 µm cellulose nitrate filters (GE Life Sciences, Pittsburgh, PA and ThermoScientific, Rochester, USA) with a peristaltic pump head following the "Environmental DNA Sampling Protocol #2" prepared by the U.S. Geological Survey 48 . One 1 L field blank was also collected per sampling event (n = 19 total) as a negative control (a 1 L bottle of deionized water was left open during filtration of the triplicate eDNA samples in the field and then filtered as described above). Filters were stored in sterile 15 mL falcon tubes at −20 °C, then transferred to a −80 °C freezer before extraction.
DNA Extraction. Half of each cellulose nitrate filter was cut up into smaller pieces for DNA extraction using sterile snips and the other half was archived at −80 °C. DNA was extracted using the DNeasy PowerSoil Kit (Qiagen, Carlsbad, CA) with the following modifications to the manufacturer's protocol. First, we increased the total volume of Powerbead solution (950 µL total) and C1 solution (80 µL total) to fully submerge filter pieces. Second, we incubated samples in a water bath at 65 °C for 10 minutes directly following bead beating to increase the yield. Third, we eluted the DNA in 75 µL of Solution C6 to increase the final concentration. The concentration of each DNA extract was quantified using the Qubit dsDNA HS Assay (Invitrogen, Ca, USA), and the approximate molecular weight of a selected number of DNA extracts was visualized on a 1.5% agarose gel by electrophoresis.
Amplification and Sequencing. A 313 bp fragment of the variable region of the mitochondrial COI gene 49 was amplified for each sample with a set of PCR primers designed to target metazoans (mlCOIintF and jgHCO2198, full primer sequences can be found in Supplementary Table S2). To combine samples into a sequencing run on the Illumina MiSeq platform, each sample was assigned a unique combination of indices following the adapter ligation method 50 . The first index was added to the target fragments during PCR amplification using tailed-PCR primers (i.e., PCR primers to which unique 6-base-pair indices were added at the 5′ end) and the second index was added to the product of amplification in the form of standard single-indexed Illumina TruSeq Y-adapters. PCR amplifications were conducted three times for each sample. Each 20 µL PCR reaction contained 2 µL of Clontech Advantage 2 PCR buffer (10×), 1 µL of each primer (10 µM), 1.4 µL of dNTP mix (10 mM), 0.4 µL of 50X Advantage 2 Polymerase Mix (50×), 13.2 µL PCR-grade water, and 1 µL of DNA extract. The following cycling conditions were used: 5 minutes at 95 °C (1×); 1 min at 95 °C, 30 s at 48 °C, and 45 s at 72 °C (38×); 5 min at 72 °C (1×). PCR negative controls were conducted alongside samples to test for the presence of contaminants. PCR replicates were pooled, purified using magnetic KAPA Pure Beads (bead:sample ratio of 1.6:1) and quantified using Qubit dsDNA HS Assay. Equimolar amounts of PCR products amplified with distinct tailed PCR primers were pooled before ligation of single-indexed adapters (see Supplementary Table S3 for the multiplexing protocol) following the protocol of the Illumina TruSeq PCR-Free LT kit. All libraries were quantified using a Qubit fluorometer, equimolar amounts pooled into a single tube, and the final product validated using a KAPA qPCR library quantification kit (KAPA Biosystems, Wilmington, Massachusetts, USA). The library was sequenced on an Illumina MiSeq with an Illumina MiSeq v2 500-cycle kit. Sampling sites. Samples were collected in MarineGEO sites (circles) from mangrove, seagrass, coral, and sand habitats, whereas additional sites (triangles) were only sampled in mangrove and seagrass habitats, or at boat docks. Full site names can be found in Supplementary Table S1. Shape data is from GADM (https:// gadm.org/).
We also attempted to amplify and sequence the water samples using the fish-specific 12S MiFish primers for additional validation, but we had limited success with PCR amplification using these primers. For this reason, we do not report further on the fish-specific amplicon sequencing part of this study, but additional details for the MiFish amplicon method used can be found in the Supplementary Information document.
Analysis of sequence data. Reads were demultiplexed and Illumina adapters were trimmed using Flexbar version 3.0.3 51 . The demultiplexed and adapter trimmed FASTQ files can be accessed on the NCBI SRA under the BioProject accession number PRJNA558350. Negative control samples that failed DNA extraction, PCR amplification, or initial quality control checks after demultiplexing were not uploaded. Adapter-trimmed reads were then further trimmed to ensure that all primer regions were removed, filtered for quality, merged, checked for chimeric sequences, and used to infer exact amplicon sequence variants (ASVs) with DADA2 version 1.9.0 52,53 (DADA2 parameters: maxN = 0, maxEE = c(2,2), truncQ = 10, trimLeft = 26, with pseudo-pooling). ASVs were clustered into operational taxonomic units (OTUs) at a 97% identity threshold using VSEARCH 54 . OTUs were further processed with the LULU algorithm, which curates OTUs based on sequence identity and co-occurrence patterns, in order to reduce taxonomic redundancy and improve the accuracy of richness estimates (LULU parameters: minimum_ratio_type = "min", minimum_ratio = 1, minimum_match = 84, minimum_relative_cooccurence = 0.95) 55 . OTUs were assigned taxonomic information using the Bayesian Least Common Ancestor (BLCA) Taxonomic Classification software 56 against Midori-Unique v20180221, a curated database of metazoan mitochondrial gene sequences (available at www.reference-midori.info) 57 . BLCA taxonomic rank assignments with less than 50% confidence were discarded. OTUs that remained unidentified with BLCA were compared to the whole NCBI NT database (retrieved May 2018) using BLAST searches (word size = 7; max e-value = 5e-13) and assigned the taxonomic information of the lowest common ancestor of the top 100 hits.

Visual fish surveys.
To quantify the richness and composition of fish communities in coastal habitats across the network of 19 sites, two scuba divers conducted visual fish surveys following the Reef Life Survey (RLS) protocol 32 . These divers counted, identified to the highest taxonomic resolution possible, and assigned to binned size-classes all fish greater than 2.5 cm in length observed along a 50 ×10 m belt transect positioned parallel to shore. One transect was conducted in each of the same habitats and sites as the eDNA survey ( Fig. 1, Table S1) for a total of 43 transects. Due to the inability to swim through the mangrove habitat, two 50 m transect lines were deployed along the edge of the mangrove and a diver looked 5 m to one side of each of the primary prop roots and under overhanging roots for fish. These mangrove transects were then pooled together into a single transect. Dock sites were sampled similarly to the mangroves, with divers swimming along the edge of the structure looking inward. When fish were in large schools, counts were estimated to the nearest power of ten. Unidentifiable individuals were photographed for later identification using reef fish identification books 58 and online databases (e.g., FishBase 59 , Smithsonian Identification Guide -Shorefish of the Greater Caribbean 60 ). The same two divers conducted all visual surveys after a period of training to ensure consistency.
Analysis of diversity patterns. Community analyses were conducted in R version 3.5.1 61 using the phyloseq 62 , vegan 63 , and DESeq2 64 packages. Machine learning analysis was conducted using the h2o package 65 . Visualizations were done with the Plotly 66 and ggplot2 packages 67 .
Raw counts were transformed to relative abundances before calculating a dissimilarity matrix based on the Bray-Curtis metric. Bray-Curtis takes into account differences in abundance of reads between samples; a value of 0 indicates that samples are exactly identical in terms of OTU composition and relative abundance of reads whereas a value of 1 indicates that samples do not have any OTUs in common. The variance in community composition was calculated using a PERmutational Multivariate ANalysis Of VAriance (PERMANOVA) 68 computed with 10000 free permutations. The following terms were used as factors: region within the bay, habitat, geographic site, diver (for the visual survey), and the interaction terms between bay region and habitat, and site and habitat.
We calculated the accuracy with which a machine learning algorithm was able to assign a sample to its source habitat based on OTU composition to understand how consistently distinct the assemblage was for a given habitat in Bocas del Toro. We selected a random forest classifier over other approaches because of the high-dimensionality of our data, the relative ease of tuning a distributed random forest algorithm, and the ability of random forest algorithms to calculate feature importance. LULU-curated metazoan OTUs were filtered to only include taxa that occurred at more than 5e-5 relative abundance (approximately five reads on average) in at least five samples. OTUs were then used as predictors in a random forest classifier with source habitat (coral reef, mangrove, seagrass, sand, dock) as the response variable with 5-fold cross-validation using the h2o machine learning platform 65 . Cross-validation is a statistical term for training a model on the majority of the data while withholding some data to be used to test the model. Since the withheld data were not used to train the model, the model has never seen them before, thereby providing a better estimate of how accurate the model would be on new data. Hyperparameters were chosen by grid search optimizing for logarithmic loss using 'h2o' . The model parameters with the smallest logarithmic loss value during cross-validation were selected and used to build a random forest from which feature importance information was extracted.
To examine differences in specific pairwise habitat and exposure comparisons (exposed vs. sheltered, mangrove vs. seagrass habitat, and coral vs. sand habitat), differential abundance analysis of visual survey detections Scientific RepoRtS | (2020) 10:6729 | https://doi.org/10.1038/s41598-020-63565-9 www.nature.com/scientificreports www.nature.com/scientificreports/ and OTUs of fish and metazoans between habitats was conducted using Wald t-tests in the DESeq2 package 64 . Coral and sand sites were only contrasted with each other because each only had six sites, while mangrove and seagrass sites had 12 sites each. Variance stabilizing transformation was used on the count data to standardize downstream visualization using heatmaps.

Results
Metazoan community composition. PCR amplification of the mitochondrial COI gene with a general primer set was successful for all 134 water samples whereas all negative field, extraction and PCR controls remained blank. A total of 14,376,785 raw paired end reads were obtained and 12,708,398 (88.4%) remained after filtering, chimera removal, and processing.
A larger portion of the variation in the metazoan eDNA data was explained by sampling site (R 2 = 29.1%) and the interaction term between habitat and sampling site (R 2 = 17.2%). PERMANOVA analysis indicated significant differences between geographic areas (exposure), habitats, sites and all interactions between factors ( Table 2). Despite targeting a much broader range of taxa (including fish and numerous invertebrates), the whole metazoan eDNA sequencing found similar patterns of community dissimilarity across the sampling region and between habitats (PERMANOVA, Table 2) as the visual survey and the subset of the eDNA dataset assigned to fish.
A distributed random forest algorithm correctly predicted the source habitat of the samples an average of 83% and 68% of the time in total and in cross-validation, respectively. Mangrove samples were the most likely to be correctly classified during cross-validation (81% correct) while coral reef samples were the least likely (44%) (Fig. 3). While many of the taxa that were important predictors in the random forest classification algorithm were unidentified taxa, a number of identified taxa were species known to occupy mangrove roots, such as the sponge Haliclona manglaris (100% ID), the flat tree oyster Isognomon alatus (100% ID), and the frond oyster Dendostrea frons (100% ID). (Supplementary File S1).
Of the OTUs identified to species level, only 154 (40%) had been previously reported in the Smithsonian Tropical Research Institute Bocas del Toro species database or OBIS at the time of analysis (see Supplementary File S2 for exact lists). Interestingly, four of the species detected in the eDNA, including two terrestrial species, are listed as invasive in the Caribbean by the Global Invasive Species Database: the bryozoan Bugula neritina, the crab Charybdis hellerii (100% ID), the ant Monomorium pharaonis (100% ID), and the common house mouse Mus musculus (100% ID). DNA from these invasive species was observed in more than 3 water samples, and all were at very low relative abundances (Bugula neritina (100% ID): 46 reads, Charybdis hellerii: 7 reads, Monomorium pharaonis: 186 reads, Mus musculus: 21 reads). We speculate that the terrestrial invasive species DNA is likely from local sewage sources and/or runoff rather than laboratory contamination, since these are species known to occur in the area and were not found in any negative or positive controls.

Fish community composition.
In the eDNA survey, reads assigned to bony fishes (Osteichthyes) made up a median of only 0.07% of the total reads per sample with no sample containing more than 5.2% fish reads. We detected 79 fish species (86 OTUs) which accounted for a total of 26,724 paired reads (Table 1). 93% (80/86 OTUs) had a species-level identification, with two OTUs identified as the same species. 66.6% (4/6) of OTUs only had a family-level identification, and 33% (2/6) of OTUs only had a class-level identification. Six out of 134 eDNA sample replicates did not contain hits to fish, but every habitat within each site had hits to fish when triplicate samples were combined. The highest mean read abundance of bony fishes among habitats was recovered from the dock sites (per sample mean ± SE: 506 ± 160), followed by mangrove habitats (per sample mean ± SE: 436 ± 160) ( Table 1).
Mangrove habitats harbored the highest average read diversity for fish based on Shannon diversity indices, followed by seagrass, dock, sand, and finally coral reef habitats (Table 1). Samples from the Salt Creek mangrove MarineGEO site, located outside the bay, contained among the highest number of hits to fish species (10,229 reads), followed by the Ferry dock site located inside the bay (1,510 reads). The fish species with the highest number of reads were the clupeid big-eye anchovy (Anchoa lamprotaenia, 99.68% ID, 11,312 reads), striped parrotfish (Scarus iseri, 100% ID; 1,700 reads), and clupeid hardhead silverside (Atherinomorus stipes, 100% ID; 1,671 reads).
For the visual fish survey, over 7,000,000 individuals (5,523 kg) and 97 identifiable species from 42 families were observed in transects covering a total area of 21,500 m 2 (Table 1). At least one fish individual was observed from each habitat in every site. Fishes that were ubiquitous across all sites and habitats belonged to species that www.nature.com/scientificreports www.nature.com/scientificreports/ are typically thought of as reef-associated, were already included in the Bocas species database, and were previously observed in surveys of the area [e.g., stoplight parrotfish (Sparisoma viride), threespot damselfish (Stegastes planifrons), bluehead wrasse (Thalassoma bifasciatum)]. Of the non-schooling fish species, the masked goby    www.nature.com/scientificreports www.nature.com/scientificreports/ (Coryphopterus personatus) was the most abundant (and third most abundant overall). For a full list of fish species, see Supplementary Table S4.
Out of a total of 922 diver observation events (i.e., a species or morphospecies recorded in some abundance on a given transect by a given diver), 838 (91%) were species-level identifications, 20 (2%) were genus-level identifications and 64 were family-level identifications (7%). The family-level observations accounted for 99% of the observed individuals across all study sites. Clupeids (small mid-water "baitfish") commonly found in mangrove and dock habitats could not be identified to species in most cases because they are small, move quickly, and occur in very large mixed species schools (e.g., up to 1,000,000 estimated in a single transect). Clupeids and the second most abundant group, belonids, accounted for 75% of diver observation events that were not identified at the species level. For fishes other than clupeids and belonids, 98% of diver observation events achieved species-level identifications.
Comparing fish eDNA and visual survey methods. In both the visual surveys and eDNA surveys, we found significant differences in the fish communities observed between different regions of the bay, and between habitats, using the Bray-Curtis index based PERMANOVA (full results in Table 2). There was no clearly visible structure when the community data from the eDNA or visual surveys were ordinated using a Bray-Curtis PCoA (Fig. 5). For both surveys, the majority of the explained variation, however, was attributed to the factor "site" which encompasses multiple habitats within a given location, (RLS: R 2 = 25.5%, eDNA: R 2 = 16.8%) and the interaction term between habitat and site (RLS: R 2 = 25.6%, eDNA: R 2 = 20.7%).
Visual surveys and eDNA were similar in that each method identified differences between habitats and exposure in the abundance of fish taxa, but they differed in which taxa contributed to this difference ( Fig. 6 and Supplementary Fig. S4). For example, in visual surveys, nine taxa including schoolmaster snapper (Lutjanus apodus), www.nature.com/scientificreports www.nature.com/scientificreports/ latin grunt (Haemulon steindachneri), and yellowfin mojarra (Gerres cinereus) were more abundant in mangrove sites, whereas slippery dick (Halichoeres bivittatus) was more abundant in seagrass sites (Fig. 6). On the other hand, the same comparison between mangrove and seagrass sites for the eDNA fish dataset only showed big-eye anchovy (Anchoa lamprotaenia, 99.68% ID) to be more abundant in mangrove sites (Fig. 6). In general, the visual surveys detected more differentially abundant taxa in the habitat and wave exposure comparisons than the eDNA surveys.
The visual survey detected 97 fish species, 36 of which overlapped with the eDNA survey and 19 of which were absent in the eDNA (Supplementary Table S4). Of the 79 fish species detected by eDNA, 31 had not been previously recorded in the Bocas or OBIS databases. Of the fish species identified with eDNA, none were false positives (except those mis-assigned, see Discussion). In total, eDNA metabarcoding with COI and visual surveys detected 140 fish species, 60 of which had not been previously reported in the Bocas database (which had 219 fish species when we initiated our study) and nine of which had not been reported in OBIS. Our total is slightly greater than the number of detected fish species from the most recent and comprehensive record from this region (2005), which reported 128 fish species (85 of which we also report here) from 12 coral reef sites varying in habitat complexity and exposure in Bocas del Toro after 288 visual surveys on comparable transects 40 . With surveys conducted along 43 transects deployed across five habitat types and using two sampling approaches (i.e., visual and eDNA), we found 65 species that were not reported in 2005 40 and 25 species that were not recorded in the Bocas database. eDNA detected small-sized species (<2.5 cm) and pelagic species that scuba divers did not look for.

Discussion
The broad-range COI primer set revealed community patterns across a heterogeneous tropical coastal system that were generally consistent with visual surveys of conspicuous fish and the known distribution of invertebrates. Both the eDNA and visual surveys show distinct variation in taxon abundances between habitats and regions in the bay (Figs. 4,6, Table 1). Interestingly, the similar patterns of fish community composition uncovered by the two methods were driven by different pools of fish species. This suggests that eDNA can unveil distribution patterns of fish taxa (as well as invertebrates) previously missed or underreported by visual methods, consistent with other eDNA fish studies 69 . Our results also indicate that habitat and wave exposure drive the distribution of diversity across various taxonomic groups.
Patterns of differential sequence abundance observed between habitats and regions of the bay for metazoan taxa can be verified and explained by their ecology. For example, the massive brain coral Pseudodiploria strigosa was consistently detected by eDNA at exposed sites, as would be expected because of their wave resistant morphology 70 . This result is also consistent with previous descriptive studies of Bocas del Toro's reefs, where this species was found more often at sites on the exposed eastern side of Bastimentos Island than on sheltered sites around Colón Island [71][72][73] .
The proportion of the variance explained by site (which encompass multiple habitats) in the PERMANOVA (visual: R 2 = 0.238, eDNA: R 2 = 0.270, Table 2) may be indicative of a shared species pool and/or connectivity of habitats in the ecosystem (i.e., the mangrove-seagrass-coral complex) through organism dispersal, migration or movement of seawater that carries eDNA. The variance explained by habitat alone was statistically significant, but relatively small compared to site and site x habitat, indicating differences between habitats at some sites but not others. The physical distance between water samples may have been a larger contributor to the community similarity than the habitat of collection. Alternatively, this could be due to daily differences in weather patterns if wind-driven water movement transported eDNA across habitats at some sites (i.e., as collections were made at all habitats of a site in the same day).
The random forest classification was able to detect subtle variation that was informative for classifying the source habitat of the eDNA samples, even though only approximately 7% of the variance in Bray-Curtis distances was explained by habitat. Most of the taxa that were important predictors of habitat classification were metazoans not identified to species. The taxa that could be identified to species tended to be taxa that are known to be www.nature.com/scientificreports www.nature.com/scientificreports/ specialized on particular habitats. For instance, a number of common mangrove root epibionts were important to the algorithm's classification accuracy, including the sponge Haliclona manglaris and various bivalves such as the flat tree oyster Isognomon alatus and the mangrove oyster Crassostrea rhizophorae (100% ID). Because mangroves had the most taxa, they likely contained more unique species associated with it, which improved the accuracy of the classifier. This shows that eDNA is able to pick up multiple species indicative of different habitats separated by only a few meters.
The high taxa detection of our study demonstrates the utility of eDNA as a biomonitoring tool for highly diverse tropical seascapes. In combination with a traditional visual survey, eDNA allowed for a more comprehensive survey of the biodiversity contained in the Bocas del Toro lagoon. Overall, over 8,500 metazoan OTUs were detected in the eDNA, which is comparable to previous estimates of marine metazoan diversity for the entire Caribbean region (10,676 animal species reported in a 2010 study of georeferenced species records and taxonomic lists 13 ) and is consistent with other eDNA results from the Caribbean 30 . We detected over 200 metazoan OTUs with species-level identifications that had not been previously recorded in the Bocas or OBIS databases. This is particularly notable given that the majority (95%) of metazoan OTUs detected in our study were not identifiable to species with the databases accessed at time of analysis. Even though COI is one of the most commonly sequenced genes for animals, with over 4.5 million sequences in the Barcode of Life Data system 74,75 , we speculate that many small or cryptic taxa that are not well taxonomically described are not adequately represented in GenBank, a point highlighted by the Open Tree of Life project 76 . In addition, the broad-range COI primers also detected numerous non-metazoan sequences. These sequences represented 53% of the reads, which is around 2-3 times greater in relative abundance than in previous studies sequencing bulk benthic samples utilizing the same COI primers 77,78 but comparable to recent estimates for water samples 79 . This is likely attributable to the lower concentrations of metazoan eDNA in seawater and the broad taxon compatibility of the COI primer set.
Successful taxonomic identification in eDNA-based surveys rely on the taxonomic coverage of reference databases. Yet, as highly diverse tropical systems remain poorly sampled, the majority of OTUs remained unidentified to the species level. Moreover, some of these groups had COI entries for their Caribbean species deposited into GenBank very recently and, thus, were not represented in the latest version of the MIDORI database, resulting in misidentifications as closely-related species from other regions. Two ray OTUs were reported as the Eastern Pacific rays Aetobatus ocellatus (97.44% ID) and Rhinoptera steindachneri (99.36% ID), but are likely the closely-related Western Atlantic species Aetobatus narinari (100% ID) and Rhinoptera brasiliensis (100% ID), respectively. This reinforces the need for accurate and regularly updated reference databases in metabarcoding studies. In addition, OTUs in groups known to have slow rates of COI evolution 80 also had further complicated taxonomic identifications. For example, one octocoral OTU detected as significantly more abundant in exposed sites was assigned as Muricea fruticosa (99.36% ID), an eastern Pacific gorgonian. This OTU is more likely to be one of the many Caribbean gorgonian species-also recently added to GenBank-with identical or nearly identical COI sequences in the PCR target region (e.g., Pseudoplexaura porosa (100% ID) and Eunicea flexuosa (100% ID)). Studies looking to target specific taxa (e.g., Cnidaria or Porifera) will benefit from using more taxon-specific primers.
Other aspects of the data analysis can affect estimates of alpha and beta diversity. A wide-variety of softwares exists for each analytical step (i.e., quality filtering, clustering, assignment) but there is still no consensus on the best approach. Many of the tools that are available for analyzing metabarcoding data are optimized for ribosomal genes because they have been disproportionately targeted in earlier studies 81 . The development of OTU clustering and taxonomic assignment algorithms that take into account the protein coding properties of the COI gene will undoubtedly improve the accuracy of results.
The co-amplification of numerous non-metazoan taxa lowered the sampling depth of our target organisms (metazoan) and likely prevented the consistent detection of rare taxa. The taxonomic selectivity of primers, rather than environmental factors such as organismal DNA shedding rates, has been argued to be the main driver behind differences in detection between eDNA and visual approaches 24 . Using more taxon-specific primers could have improved the detection consistency for organisms of interest. In this study, PCR yield using MiFish primers that specifically target fish was surprisingly low across samples. However, optimizing PCR conditions may have increased amplification success and yield 82 . Regardless of these challenges, we were still able to capture a wide array of target taxonomic groups across the animal tree of life simultaneously, including benthic species that would otherwise each be costly in the form of taxonomic expertise, specialized techniques, and effort to observe through traditional visual means.
Our study supports the feasibility and utility of using molecular-based approaches for quantifying tropical marine biodiversity with relatively low sampling effort. We found the DNA signature of many invertebrate species and, more surprisingly, of numerous fishes that were not observed during visual surveys or never documented in the area. As we hypothesized, eDNA effectively identified patterns of diversity in fish and invertebrate communities in this heterogeneous seascape despite a high level of variability in the dataset likely due to water circulation. We were able to identify environmental features (i.e, exposure) that drive the distribution of animal communities, and we identified sets of habitat specialized species that are consistent with our visual observations. Future studies should continue to compare detections of broad eDNA surveys with taxa-specific primers and concomitant visual-based surveys, when feasible, to understand the discrepancies in detection of each dataset, as they each contain their own unique biases and opportunities. Utilizing environmental DNA sequencing techniques in biodiversity assessments will be crucial as people continue to balance marine resource use and conservation in tropical coastal ecosystems.