eDNA metabarcoding of small plankton samples to detect fish larvae and their preys from Atlantic and Pacific waters

Zooplankton community inventories are the basis of fisheries management for containing fish larvae and their preys; however, the visual identification of early-stage larvae (the “missing biomass”) is difficult and laborious. Here, eDNA metabarcoding was employed to detect zooplankton species of interest for fisheries from open and coastal waters. High-Throughput sequencing (HTS) from environmental samples using small water volumes has been proposed to detect species of interest whose DNA is the most abundant. We analyzed 6-L water samples taken from subtropical and tropical waters using Cytochrome oxidase I (COI) gene as metabarcode. In the open ocean, several commercial fish larvae and invertebrate species important in fish diet were found from metabarcodes and confirmed from individual barcoding. Comparing Atlantic, Mediterranean, Red Sea, and Pacific samples we found a lower taxonomic depth of OTU assignments in samples from tropical waters than in those from temperate ones, suggesting large gaps in reference databases for those areas; thus a higher effort of zooplankton barcoding in tropical oceans is highly recommended. This and similar simplified sampling protocols could be applied in early detection of species important for fisheries.

www.nature.com/scientificreports/ tap connecting with the membrane pump, with the water take at 6 m of depth, was running constantly while underway. Each sampling day, ca. 6 L of water were concentrated through 0.2 μm pore filters, coded and stored in absolute ethanol for further DNA extraction and analysis. To confirm the species found from metabarcoding with an independent method, individual zooplankton samples were taken overboard with a manual plankton net of 20 μm mesh, 200 L concentrated in three replicates of 50 mL. Samples were filtered through a 0.2 μm membrane. Zooplankton individuals were inspected under the microscope, taxonomically identified with the use of zooplankton guides, a few were transferred to ethanol for subsequent DNA barcoding, the rest were fixed in 10% formaldehyde. Along the Mediterranean coast at Prévost lagoon, 6 L of surface water were collected in three sterile bottles of 2 L (three subsamples) and filtered through 0.2 μm polyethersulfone filters. Samples from Rangiroa (French Polynesia) were two 3 L water subsamples were collected in separate sterile bottles and filtered immediately in situ with a syringe system with a Swinnex portafilters from Merck Millipore and a membrane filter with 0.2 µm pore size and 25 mm diameter, and filters were stored in ethanol until eDNA extraction. Samples from the Red Sea (Gulf of Aqaba) were taken from the shore using also a syringe system and one membrane per liter of water. The  In all cases, we used sterile equipment (gloves, syringe, bucket) that was carefully cleaned before and after sampling with bleach diluted at 10% (5% sodium hypochlorite) and triple-rinsed with clean sterile water. HTS procedure. DNA was extracted in all cases from filters using PowerWater DNA Isolation Kit (Qiagen) following manufacturer's instructions. A previous step to pellet the ethanol content and include it in the extraction process was included to avoid starting material loss. All extractions were performed in a per-PCR laboratory under a flow laminar hood equipped with UV light. Negative controls were employed during extraction. For the Polarstern samples, a fragment from the COI gene was amplified with polymerase chain reaction (PCR) from the extracted eDNA using the universal primers mlCOIintF 43 and jgHCO2198 44 that were modified with a PGM sequencing adaptor, the barcodes (one per sample) needed to differentiate the reads belonging to each water sample, and a "GAT" spacer. Amplification was carried out in a total volume of 20 μL including Green GoTaq Buffer 1×, 2.5 mM MgCl2, 0.25 mM dNTPs, 20 pmol of each primer, 4 μL of template DNA, 200 ng μL −1 of bovine serum albumin, and 0.65 U of DNA Taq polymerase (Promega). PCR conditions in the Veriti Thermal Cycler (Applied Biosystems, Foster City, California) were 95 °C for 1 min, followed by 35 cycles of 95 °C for 15 s, 46 °C for 15 s, 72 °C for 10 s, and a final extension of 72 °C for 3 min. Extraction (N = 3) and field (N = 1) negative controls were included in the PCRs. The amplification success was visually assessed on 2% agarose gel. PCR amplicons were purified from agarose gel using the Montage DNA Gel Extraction Kit (Millipore); quantified using the Qubit BR dsDNA Kit (Thermo Fisher Scientific); and double-checked in a Bioanalyzer 2100 (Agilent Technologies) to confirm the fragment size, the absence of by-products, and to do a more precise quantification 45 . For the rest of the samples, the same metabarcoding region was amplified, but using an Illumina approach as described in Ardura et al. 31 . Summarizing, the primers were modified to include IlluminaTM overhang adaptors and sample-specific indices following the dual-PCR Illumina protocol (https:// suppo rt. illum ina. com/), where the conditions for the amplicon PCR were changed to the ones described by Leray et al. 43 . In addition, bovine serum albumin (BSA) was added to the PCR reactions to increase PCR yields from low purity templates and to avoid, as much as possible, the effect of inhibitors presents in the water. After library construction, MiSeq Illumina platform was employed to run the sequencing step using paired-end sequencing (2 × 301).
PCR replicates were not developed, but sampling replicates were taken when possible (Table 1), in order to recover as much of the biodiversity as possible.
Individual barcoding. DNA was extracted using silica gel columns [QIAmp DNA Mini Kit, DNeasy Blood and Tissue kit (Qiagen)]. Aliquots were frozen at -20 °C for long-time preservation, even for years.
The mitochondrial COI gene was amplified using the universal primers designed by Leray et al. 43 as in the metabarcoding analysis. An extra 400-600 bp fragment within the nuclear small subunit ribosomal DNA (18S rRNA gene, 18S thereafter) was amplified by PCR with Uni18SF and Uni18SR primers and protocol described by Zhan et al. 45 .
PCR products were visualized in 2% agarose gel dyed with SimplySafe (EURx Ltd). Purification and sequencing were performed in Sequencing Unit of the University of Oviedo (Spain). Sequences were BLASTed against GenBank database within NCBI (https:// blast. ncbi. nlm. nih. gov) using best match for species assignment, > 97% identity for COI and > 80% for 18S 46 . When no genetic identification was possible with the COI gene, the reference for 18S rRNA gene was considered. Taxonomic information was checked in World Register of Marine Species (WORMS; http:// www. marin espec ies. org/).

Bioinformatics analysis of metabarcoding data.
Adaptors from sequencing platforms were trimmed within the platform software and fastq sequences files were used to filter per quality. Qiime2 2020.2 47 was used to trim the primers and to filter by length (amplicon size 200-400 bp reads were retained) using Cutadapt 1.15 48 . Denoising option was employed where sequences are dereplicated and denoised using the unoise3 algorithm from Usearch 11.0.667_i86 49 . For taxonomic classification, filtered sequences were compared against a public COI reference database (NCBI, accessed on 16/06/2020) and stored locally. The database was downloaded using the esearch query "COI NOT Bacteria NOT environmental NOT viruses NOT unclassified" and constructed with the respective taxonomic information using the script "Entrez_qiime.py" by Baker 50 . Finally, "qiime feature-classifier" command was employed to assign the taxonomy, using a 97% as identity percentage and an e-value of 10 -50 .
Resulting Operational Taxonomic Units (OTUs) from taxonomic assignation were employed in further steps.
Data analysis. First, the OTU tables were inspected manually and non-marine species (e.g. human, insects, etc.) were eliminated. Analyses were done only on species that, from their biological cycle, could be present in plankton in the sampling area at the sampling time. The analyses described further were made only on zooplankton. www.nature.com/scientificreports/ Diversity was estimated with the Brillouin D-index 51 , which is preferred over the Shannon index when the species differ in their capture rates. This is the case of PCR-based metabarcoding, because, as explained above, not all the taxa are detected with the same probability or taxonomic depth.
Tridimensional, non-metric multidimensional scaling (NMDS) was employed to visualize differences between samples. Clustering analysis of the samples was made employing UPGMA and 1000 bootstraps. These multidimensional analyses were based on Gower distances calculated from presence (1) or absence (0) of species DNA.
Mann-Whitney test was employed to compare groups of samples (e.g. temperate/subtropical versus tropical samples) for the taxonomic precision of the assignment (i.e. percentage of OTUs assigned to species). Monte Carlo (9999 permutations) was also employed to assess errors. Statistics was done with the free software PAST 52 .

Results
Overview of HTS data and taxonomic assignment depth HTS results are summarized in Supplementary Table 2, including the GenBank accession numbers of sequences. The list of OTUs found in all the samples and their taxonomic assignments is in Supplementary Table 3.
In the samples of 1.5 L obtained from PS 102 Polarstern after quality filtering, only the sample taken near the coast yielded sequences taxonomically assigned to marine invertebrates, all found in European waters: four Polychaetae (Gyptis propinqua, Lumbrineris latreilli, Paradoneis ilvana, Polycirrus sp.) and one Malacostraca (Pisidia longicornis), with a total of 457 reads. In the other seven DNA samples, marine animals were not detected. Only a few reads of Homo sapiens and insects were found, that can be explained from contamination. Thus, small water volumes, such as 1.5 L, do not seem useful to capture zooplankton diversity in open waters from metabarcoding using COI as a metabarcode.
In the samples of 6 L, the number of quality reads was not different between sampling sites, with sequences of COI metabarcode taxonomically assigned to zooplankton taxa oscillating between 5343 (West Africa #1) and 22,649 (West Africa #4). Samples obtained from coastal areas did not detect more zooplankton sequences than samples from the open seas (Table 1) and ranged from 6294 in Rangiroa (Polynesia) to 14,794 in the Mediterranean Prévost lagoon. The proportion of sequences assigned to animal species that are planktonic, at least in a life stage (percentage of zooplankton in all the assigned reads), was quite different among samples, but the proportion of zooplankton OTUs was quite similar, between 20 and 40%, except in the sample West Africa #2 with about 10% zooplankton OTUs (Fig. 2). The rest of assigned species were principally large animals in the West African open seas, and phytoplankton in the three coastal samples (Supplementary Table 3). The depth of taxonomic assignments was greater in the four samples from temperate-subtropical waters with more than 60% OTUs assigned to species (North Sea, Red Sea, Mediterranean lagoon, and West Africa #1 that is in Canary Islands) than in the four samples from tropical waters. In three of the tropical samples, more than one half of OTUs were assigned at genus, family, or even order (Fig. 2, Supplementary Table 3). The difference between the two groups of samples was significant (Mann-Whitney U of 0, z = 2.165 with p = 0.03; Monte Carlo permutation test with p = 0.028).    Table 3). The number of zooplankton species found from each sampling zone was small (see Table 2 and Relative richness in Fig. 2): only one in West Africa #2, three in West Africa #3 and #4, and four in West Africa #1. Six of these species are abundant and typical of zooplankton in the studied areas ( Table 2). All the zooplankton species found from metabarcoding could be identified by eye, or at least individuals morphologically similar to them, including spat of bivalve similar to Magallana gigas, and fish eggs similar to anchovies. Although individual barcodes could not be obtained from all of species due to poor DNA extraction, likely due to the fixation of specimens, Euphasia and Delibus were confirmed from independent COI barcodes (GenBank accession numbers: MW217272 and MW217273, respectively). Other organisms identified by sight from those areas and confirmed with COI and/or 18S rRNA, were the Appendicularia Oikopleura sp., larvae of the echinoderm Styracaster paucispinus, and the polychaetae Vanadis formosa (GenBank accession numbers: MW217563, MW217564 and MW217565, respectively). These species were undetected with metabarcoding, which was expected given the enormous difference of water volume that had to be filtered to obtain individuals for visual examination (200 L) and to do metabarcoding (6 L).

Zooplankton identified and its importance for fisheries. Open
Five of these species identified from 6 L water in open ocean were important for fisheries (Table 2). Paracalanus nanus and Euphausia are part of hake juveniles´ diet, and Evadne spinifera and Clausocalanus furcatus are important in the diet of several fishes. Engraulis encrasicolus, the Atlantic anchovy, is an important fish resource itself, and also a prey of hake tuna and other fish commercially important for African countries.  Table 3): 26 OTUs in the Red Sea, 16 in the North Sea and the Mediterranean lagoon, and 10 in Rangiroa. Diverse higher taxonomic groups were found: six Phyla from the North Sea and Polynesian samples, five from the Mediterranean lagoon, and four from the Red Sea (Table 3). Some of the species found from metabarcoding could be confirmed from individual barcodes (GenBank accession numbers KT988324 Mytilus galloprovincialis, MK295019 Paracalanus parvus, MK295022 Undinula vulgaris, MK295025 Acartia clausii).
As in open water samples, several species were commercially important fish whose eggs and early larvae are part of zooplankton, such as Merlangius merlangus and Merluccius merluccius in the North Sea or Chelon labrosus and Dicentrarchus labrax in the Mediterranean (Table 3). Some copepods important for fish diet, like Clausocalanus, Paracalanus and others, were also found.
Taxonomic profiles. Despite minimal water volumes, considerable taxonomic diversity was captured in coastal samples (Fig. 3), where many different classes of zooplankton were found. In open-sea samples, only DNA of abundant species was detected, as seen above, and consequently diversity was much lower there (see Brillouin taxonomic diversity in Table 1). Indeed, the samples were different to each other, as expected from their distant origins. According to the species found from metabarcodes, the NMDS had an acceptable stress of 0.11, r 2 of axis 1 of 0.451 and of axis 2 of 0.314 (see Shepard plot in Fig. 4A). Although the four open-sea samples were located close to each other, with WAfrica#2 and WAfrica#3 closer than the other two (Fig. 4B), they all had different zooplankton composition (Table 3).
A UPGMA tree showed clusters that were generally consistent with the geographical locations of the samples. West Africa #2 and #3 samples clustered together, then West Africa #1 and, in a different branch with 78% bootstrap, West Africa #4 appeared separately but in the same clade (Fig. 5). Then, the Red Sea sample joined the cluster with low bootstrap. Another cluster contained the two European coastal samples (NorthSea-C, off Bremerhaven; Med-C, Mediterranean lagoon), connected with the Red Sea sample also with low bootstrap, which is expected because the Mediterranean and Red seas are connected through the Suez Canal. The coastal sample from the tropical Polynesian was in an independent branch clearly separated (100% bootstrap).

Discussion
The results obtained in this study demonstrate the potential of eDNA metabarcoding to unravel the "missing biomass" 1 important for fisheries. Even with minimal volumes of six liters of water it is possible to find metabarcode COI of abundant fish species and their preys from the open seas. DNA metabarcoding is a methodology that enables accurate identification of plankton community, in general, and fish species in particular, avoiding the sampling and identification of all fish, fish larvae and eggs present in plankton samples. However, the generation  www.nature.com/scientificreports/ of a reliable identification requires a comprehensive and well-validated reference library 27 . In some regions, DNA is employed for species identification of ichthyoplankton, and there are species-specific markers useful for this purpose in several fish groups of interest such as megrims and hakes 53,54 , cod 55 , horse-mackerel 56 and others. However, other ichthyoplankton communities are lesser known. Although BOLD system is a promising database for the DNA barcode global research community, and it contains sequences from specimens whose identifications are being refined; the taxonomic coverage and resolution of the COI barcode library on BOLD, and hence the accuracy of identification queries, is still improving 57 . Furthermore, some inconsistencies were found in previous studies developed by our team, with two or more reference sequences from different species that were identical in BOLD, even from different families 11 . On the other hand, although the taxonomic reliability of GenBank has often been questioned 58 , the actual overall data quality is much better than often assumed 59 , still being the most complete database in number of markers and species 27 . Therefore, although Barcode of Life projects are considered a promising tool to species identification, due to the decrease in analysis efforts and costs 60 , DNA-based methodologies are not commonly used and ichthyoplankton still depend on the taxonomic expertise of local pelagic specialists in many regions. Several species found in our African metabarcodes are important for fisheries: the anchovy Engraulis encrasicolus is an important fish resource in West Africa, while Evadne spinifera, Clausocalanus furcatus and the krill Euphausia spp. are important in fish diet, the latter being preferred by hake juveniles in African waters 61 . The exception was the Pacific oyster Magallana gigas, that is not native to the region. Cited in CABI (2020) not far from the sampling area of West African #3, its larvae are planktonic and can be dispersed by currents 62 .

Malacostraca
The number of species captured with the same 6 L water volume was clearly larger in samples taken near the coast, especially in mesotrophic zones, than in those from open waters. In one of the African open sea samples, only one zooplankton species was found (sample West Africa #2, Clausocalanus furcatus). This is consistent with the zooplankton density reported from West African waters, that is generally greater near the coast, for example in the mouth of River Gambia (5250 individuals/m 3 ), than offshore, frequently with less than 100 individuals/ m 339 . Proportionally, in 6 L it would be 0.6, that is, less than one individual. Thus, the low number of species found from open waters in this study is not surprising. In addition, near the coast it is easy to find DNA from sessile species with a short planktonic phase that not enter offshore waters. Short planktonic stages are typical of many mollusks 63 and reef fish 64 . However, despite the detection of fewer species in the open ocean, the levels of diversity in 6 L water samples were enough to reveal differences between locations, reflecting at least partial differences in species abundance among sites (see clustering analysis in Fig. 5).
Although expectations are that diversity is greater near the Equator and decreases with latitude [65][66][67] , some insights about trans-equatorial community variation show something different. In ichthyoplankton, the latitudinal pattern of diversity does not exhibit the expected temperate-tropical cline, reflecting instead a decline in low-oxygen zones 9 . Sample WestAfrica#2, where only DNA of Clausocalanus furcatus was found, marked an oligotrophic zone of minimum oxygen. Indeed, this confirms the importance of environmental conditions for shaping plankton communities.
Besides zooplankton, COI metabarcode analysis served to detect a variety of species that included many phytoplankton in coastal sites (Supplementary Table 3). Although typically considered a good barcode for animals 28,29,68 , COI has sufficient phylogenetic definition at species level in red algae and phytoplankton. This has been also found in other studies on locations near the coast, including the Baltic Sea 15 , Bay of Biscay ports 16 , species attached to beached litter 69 , and in ballast water 35,70 .
In contrast with coast samples, most non-zooplankton OTUs detected from West African open waters were large vertebrates that are not part of plankton in any life stage (Supplementary Table 3). The samples were taken from RV Polarstern and it is well known that ships attract many large predators that feed on food leftovers and benefit from other species that surround ships 71,72 . Metabarcoding detects only DNA molecules, and expectedly large individuals shed more DNA than small species do 73 . This would explain the dominance of sequences from sharks and cetaceans in the samples obtained from open waters in this study. www.nature.com/scientificreports/ Finally, the significant differences found in the taxonomic depth of the assignments between samples can be explained from different geographical coverage of current databases. Reference databases are growing in several regions where zooplankton metabarcoding analysis is conducted 10 , but most of the current studies remark on the necessity to have well-curated databases with a wider taxonomical and geographical coverage 26 . More barcodes would imply a better approximation to the actual number of species in the marine zooplankton assemblage. If these methods, including protocols based on small water volumes, are adequately developed, and independently validated for specific applications, they could be used for several purposes. One could be locating species of interest for fisheries, like preys of commercial fish species or planktonic larvae of such species as we found in West African samples. An inventory of the ichthyoplankton community is essential for understanding how the trophic chain and by extension the whole ecosystem function, as well as for timely prediction of changes due to ichthyoplankton alterations. Which is linked to another possible application, related to the rapid monitoring of the most abundant species detecting large changes in the community.