Multi-marker DNA metabarcoding detects suites of environmental gradients from an urban harbour

There is increasing need for biodiversity monitoring, especially in places where potential anthropogenic disturbance may significantly impact ecosystem health. We employed a combination of traditional morphological and bulk macroinvertebrate metabarcoding analyses to benthic samples collected from Toronto Harbour (Ontario, Canada) to compare taxonomic and functional diversity of macroinvertebrates and their responses to environmental gradients. At the species rank, sites assessed using COI metabarcoding showed more variation than sites assessed using morphological methods. Depending on the assessment method, we detected gradients in magnesium (morphological taxa), ammonia (morphological taxa, COI sequence variants), pH (18S sequence variants) as well as gradients in contaminants such as metals (COI & 18S sequence variants) and organochlorines (COI sequence variants). Observed responses to contaminants such as aromatic hydrocarbons and metals align with known patchy distributions in harbour sediments. We determined that the morphological approach may limit the detection of macroinvertebrate responses to lake environmental conditions due to the effort needed to obtain fine level taxonomic assignments necessary to investigate responses. DNA metabarcoding, however, need not be limited to macroinvertebrates, can be automated, and taxonomic assignments are associated with a certain level of accuracy from sequence variants to named taxonomic groups. The capacity to detect change using a scalable approach such as metabarcoding is critical for addressing challenges associated with biodiversity monitoring and ecological investigations.


Results
A map showing sampling stations in Toronto Harbour and associated gradients in water physical-chemical and sediment contaminants is shown in Fig. 1 (Table S1). From these samples, a total of 18,790,832 × 2 paired-end sequence reads were generated for 75 samples and 16 controls. After read pairing, trimming, and denoising (sequence error, chimera, and pseudogene removal for COI) we generated a total of 9329 exact sequence variants (ESVs) (Table S2).
Overall, we found that taxonomic assignment resolution was finer using COI metabarcoding compared with other methods ( Figure S1a). Sampling effort assessed using rarefaction found that sequencing depth was appropriate for the metabarcoding samples, and all curves reached saturation, however for samples identified by morphology, curves continue to rise indicating that further sampling would have identified additional families ( Figure S1b).
The variance between the macroinvertebrate species detected using COI and morphological methods was similar for many stations, however, COI metabarcoding detected divergent communities at several stations (Fig. 3a). The first two axes of the PCA plot explains 42.8% of the variance in these communities. These differences were correlated with the relative read abundance of Limnodrilus hoffmeisteri, Tubifex tubifex, and Potamothrix vejdovskyi. When we compared community composition using Bray Curtis dissimilarities, most locations appeared similar to each other when using morphological taxa (stress = 0.09, R 2 = 0.98) and ammonia (r = 0.31, p = 0.035) and magnesium (r = 0.35, p = 0.026) correlate with the ordination of these communities (Fig. 3b). When we compared communities using COI sequence variants, locations showed more separation with Toronto Island and Bathurst Quay locations clustering separately from the rest (stress = 0.14, R 2 = 0.86) and ammonia (r = 0.51, p = 0.002), metals (r = 0.38, p = 0.005), and organochlorines (r = 0.38, p = 0.005) correlate with the ordination of these communities (Fig. 3c). When we compared communities using 18S sequence variants, most locations appeared similar, but Bathurst Quay and Keating Channel locations clustered separately (stress = 0.06, R2 = 0.99) and pH (r = 0.43, p = 0.004) and metals (r = 0.40, p = 0.005) correlate with the ordination of these communities (Fig. 3d).
We also compared the top 5 most abundant (non-macroinvertebrate) eukaryote families using 18S metabarcoding with the top 5 most abundant families detected using both morphological methods and COI metabarcoding (Fig. 4a). Although we compare the relative abundance of individuals sampled using morphological methods with the relative read abundance of sequence variants, it's important to acknowledge that read abundances reflect the specificity of our primers, specimen size, as well as abundance in samples. In terms of detections, COI metabarcoding detects some but not all the macroinvertebrate families detected using morphology. The median relative abundance of Naididae, the most abundant group using either method, was higher using morphological methods (90.6%) compared to COI metabarcoding (31.8%) (Wilcox test, p-value = 3.6e−10). The median relative abundance of Chironomidae was higher using COI (9.2%) compared to morphological methods (4.6%) (Wilcox test, p-value = 0.04). Overall, macroinvertebrate communities at the family rank were positively correlated (Pearson, 70.1%, p-value ~ 0, 95% confidence interval 61-78%).
Overall, multi-marker metabarcoding recovers a greater diversity of taxa than morphological methods even when only considering macroinvertebrate taxa in the phyla Arthropoda, Annelida, Mollusca, Cnidaria, and Platyhelminthes for a fair comparison with morphological methods (Table S3). Overall, we detected 47 and 88 unique macroinvertebrate species (using morphological methods versus COI metabarcoding), 77 and 79 genera, and 30 and 61 families in total across all sampled stations. When considering other (non-macroinvertebrate) eukaryotes detected using 18S metabarcoding, we detected a further 271 genera and 111 families from 36 eukaryote phyla with the most diverse phyla detected from the Ciliophora (105 taxa), Cercozoa (64), Ascomycota (Fungi, 61), Basidiomycota (Fungi, 52 taxa), and the Chytridiomycota (42 taxa).
Macroinvertebrate functional metrics using metabarcoding and morphological methods. Functional  www.nature.com/scientificreports/ allowed us to assess whether ecological function varies independently of taxonomic shifts. At each station, the proportion of macroinvertebrate reads (COI metabarcoding) or individuals (morphological) assigned to a family with a particular functional feeding guild(s) was assessed (Fig. 4b). The relative abundance of FFGs between methods were significantly different for parasites (Wilcox test, = 0. 3.8e−05), predators (p.adj = 0. 1.8e−05), and 'other' (p.adj = 0.014). The median relative abundance of parasites was ~ 0% using morphological methods and 0.03% using COI metabarcoding. The median relative abundance of predators was 97% using morphological methods and 11% using COI metabarcoding. The mean relative abundance of 'other' was 2% using morphological methods and 0.2% using COI metabarcoding. Functional communities detected using both methods were not found to be correlated (Pearson, 0.08, p-value = 0.31, 95% confidence interval − 0.07 to 0.22), though this may be due to our difficulty assigning function to macroinvertebrate families detected using COI metabarcoding methods. The three most abundant predator families using COI metabarcoding was Naididae (422 sequence variants, 284,851 reads), Chironomidae (81 sequence variants, 96,261 reads), and Bosminidae (34 sequence variants, 23,495 reads); and using morphological methods was Naididae (30 taxa, 30,163 individuals), Chironomidae (41 taxa, 322 individuals), Dreissenidae (3 taxa, 2241 individuals). We used hierarchical partitioning to identify environmental variables that explain the variance in diversity and functional metrics (Table S5, Table S6). Sediment contaminants explain more variation in diversity metrics and water physical-chemical features tend to explain more variation in functional metrics (Fig. 5). The most influential environmental parameters are metals, aromatic hydrocarbons, and ammonia ( Fig. 1). Metals make significant independent contributions explaining 62% of the variation in macroinvertebrate species richness (COI), 55% of the variation in the relative abundance of Thoracosphaeraceae (18S), and 41% of the variation in eukaryote genus richness (18S). Aromatic hydrocarbons make significant independent contributions explaining 63% of the variation in Candonidae (COI), 57% of the variation in Cyprididae (COI), and 45% of the variation in Limnesiidae (morphology), 45% of the variation in Limnesiidae (morphology), and 41% of the variation in Naididae (morphology). Ammonia makes significant independent contributions explaining 56% of the variation in Bosminidae (COI), 56% of the variation in Naididae (COI), 50% of the variation in Candonidae (COI), 49% of the variation in Thoracosphaeraceae (18S), and 47% of the variation in Hydridae (morphology). Temperature makes significant independent contributions explaining 62% of the variation in macroinvertebrate richness (COI) and 50% of the variation in Haptoria (18S). Ammonia makes significant independent contributions explaining 57% of the variation in parasites (morphology), 30% of the variation in shredders (morphology), and 29% of the variation in collector-gatherers (morphology). Organochlorines also make significant independent contributions explaining 59% of the variation in parasites (COI) and 46% of the variation in collector-filterers (COI).

Discussion
The process of monitoring benthic macroinvertebrate communities has been the focus of freshwater health assessments across the world. Morphological-based approaches, which were once the only method of determining richness, are rapidly being complemented with DNA-based methodologies such as metabarcoding, greatly increasing the taxonomic information obtained from benthic samples 21,21,[32][33][34] . In this study, we have demonstrated the benefits of employing eDNA metabarcoding to lake-based macroinvertebrate biodiversity assessments.
Previously, morphological-based biomonitoring in Toronto Harbour had focused primarily on overall richness as the determining factor of lake health with limited investigation into the composition of taxa 15,16,35 . While Alpha and gamma diversity detected is higher using DNA metabarcoding compared with morphological methods. In (a) alpha diversity is compared across methods showing genus richness for (nonmacroinvertebrate) eukaryotes sampled using 18S metabarcoding and species richness for macroinvertebrate sampling using COI and morphological sampling. In (b) gamma diversity is compared showing the total number of 18S eukaryote genera and total number of species sampled using COI and morphological methods in Toronto Harbour. www.nature.com/scientificreports/ this can be informative as to the overall quantity of macroinvertebrate taxa present in the lake, it misses details on the proportions of bioindicator groups in relation to other taxa. Moreover, species diversity often decreases with degradation of aquatic environments, with more tolerant species dominating species compositions and composition of macroinvertebrates 36 . Therefore, the concept of higher total richness translating to improvements in water quality and lake health may be misleading, as measures of community composition, structure and diversity are also required to understand changes in environmental gradients 36 . COI metabarcoding also provides finer level taxonomic assignments compared with morphological methods as has been shown in a previous study for fishes 37 .
In addition, the confidence levels for morphological macroinvertebrate identification are scarcely reported and classifications can vary between taxonomists 18,38 , meaning likelihood of misidentified or missed taxa is not taken into consideration 21 . Both DNA metabarcoding (i.e., finer resolution taxonomic assignments, ability to detect microscopic taxa) and morphological methods (i.e., absolute abundance counts) provide complementary information that can contribute and support macroinvertebrate community information for further ecological analyses. In addition to the typical benthic macroinvertebrates identified using morphology, DNA metabarcoding enabled accurate detection of additional phyla, using more inclusive markers as used in this study. In this Communities are compared using (a) principal components analysis (PCA) at the species rank for COI and morphological sampling, (b) non-metric multidimensional scaling (NMDS) using morphological taxa identified to a species rank when possible, (c) NMDS using macroinvertebrate COI sequence variants, and (d) NMDS using eukaryote (non-macroinvertebrate) 18S sequence variants. For PCA, COI samples are shown in grey circles and morphological samples in white squares and the most strongly correlated species vectors were added. For NMDS plots, centroids for each location are plotted for each sampling method using Bray Curtis dissimilarities, and environmental vectors were added if correlations were at least 0.30 with a p-value < 0.05. www.nature.com/scientificreports/ study, the detection of unique taxa via 18S sequencing enabled us to clearly distinguish stations, highlighting that non-macroinvertebrate groups can also provide insights towards understanding of spatial community dynamics in lake systems 39 .

Scientific
The 18S marker enabled us to detect fungal groups expected to be found from aquatic environments such as the Chytridiomycota, Blastocladiomycota, and Cryptomycota (Rozellomycota). We even detected members of the Archaeorhizomycota (12 sequence variants, 117 reads), a group of fungi with a global distribution largely known only from marker gene studies of soil with only two species in culture 40,41 . The application of this marker also resulted in detection of several fungal taxa that we would expect to find from benthic samples such as known parasites of arthropods/insects: Cordyceps, Coelomomyces, and Labulomycetales.
Compared to DNA methods, morphological methods detected negligible amounts of aquatic bioindicator taxa, specifically Ephemeroptera (HBI = 0-2; an indicator of excellent water quality) and Trichoptera (HBI = 0-4; www.nature.com/scientificreports/ very good). COI metabarcoding detected Ephemeroptera and Trichoptera more often, but the relative abundance of these taxa were relatively low. Environmentally tolerant taxa (i.e. those with HBI score of 6-10), were detected more so with DNA metabarcoding methods compared to morphology. For example, species/genera including, Limnodrilus hoffmeisteri (HBI = 10; very poor), and Tubifex tubifex (HBI = 10; very poor), and Hydra (HBI = 5; poor) were detected exclusively by DNA methods, whereas Potamothrix vejdovskyi (HBI = 8; poor) was detected exclusively with morphological methods. Naididae as a family are highly tolerant to organic pollution, scoring between 6 and 10 on the HBI index 42 . More families, genera and species from extreme ends of the tolerance index were detected using DNA metabarcoding, highlighting the ability of this method to detect presence of these important taxa.   www.nature.com/scientificreports/ Beyond taxonomic metrics, mapping functional diversity across lake environments can improve our understanding of lake ecosystem integrity 19,[43][44][45] . For example, the presence of diverse macroinvertebrate predators detected in Toronto Harbour may indicate that there are sufficient resources to support a stable multi-level food web 43 . Overall, morphological methods performed very well with regards to their ability to identify a diverse array of functional groups, but it was challenging to use automated methods to annotate function to many of the less abundant but diverse macroinvertebrate families detected using COI metabarcoding. Despite metabarcoding methods facilitating genus-and species-level taxonomic assignments, databases of functional metrics for macroinvertebrates at this taxonomic level are still incomplete 46 . The influence of different environmental variables (e.g. Metals and aromatic hydrocarbons in sediment as well as ammonia in lake water) on the relative abundance of functional feeding groups has implications on the trophic system of the lake as a whole. An increase or decrease in sediment and water pollutant loads will likely influence a shift in the taxa exhibiting different FFG guilds. Assigning FFG at more conservative taxonomic ranks (i.e., family) enables easy integration of both metabarcoding and morphological taxonomic lists whilst maximizing functional assignments.
By using a multi-marker approach, we can begin to close knowledge gaps regarding how water quality and loading of elements in sediment affect taxa at various trophic levels 47,48 . In this study, metals were found to be one of the more influential sediment contaminants explaining variation in overall macroinvertebrate richness using COI metabarcoding as well as variation in (non-macroinvertebrate) eukaryote richness using 18S metabarcoding. In Toronto Harbour, metals such as cadmium, chromium, copper, lead, mercury, and zinc were found to have levels that exceeded the Canadian Federal Probable Effect Level. In this study, we have samples from 5 stations from inner harbour centre that represent a clear gradient of increasing metal concentrations, decreasing macroinvertebrate and eukaryote richness detected using COI and 18S metabarcoding. Ciliates, despite their ubiquity in freshwater systems, the key position they play in trophic webs (feed on bacteria, algae, other protists while also being consumed by other meiofauna) and known sensitivity to a range of water quality types (low to highly polluted), are often a neglected in water quality assessments 49 .

Conclusion
DNA methods can readily be applied for both lake community biomonitoring and targeted monitoring for invasive, threatened, and/or exploited lake species, integrated with periodic morphological assessments to supplement routine DNA monitoring with abundance measures. Overall, for taxonomically comprehensive assessments of lake communities, it is imperative to apply robust high-throughput methods, such as DNA metabarcoding, to increase the resolution of biodiversity data and to understand species-specific responses to environmental gradients in the face of various perturbations and climate warming effects.  (Table S1; Figure S1).

Methods
In October 2018, water, sediment and benthos samples were collected from Toronto's Inner Harbour over a 3-day period, from a subset of 25 stations 17 . At each station, environmental data was collected, followed by the collection of surface and overlying water, sediment and benthos samples in order. Samples were collected following the Canadian Aquatic Biomonitoring Network (CABIN) Open Water sampling protocols for collection of benthic samples 51,52 .
Water samples were analyzed for major ions, nutrients, temperature, conductivity, pH and dissolved oxygen (see supporting information for technical details). Sediment samples were analyzed for trace metals, PCB aroclors, total PCBs and PCB congeners, chlorinated pesticides, chlorobenzenes, technical toxaphene (insecticide) and toxaphene congeners/parlars. See supporting information for technical details on water, sediment and benthos sampling and collection for both DNA and morphological identification.

Morphology-based taxonomic identification of benthos.
Invertebrates in the benthic community samples were sorted, identified to the family level, and counted by EcoAnalysts, Moscow, ID.
DNA sample homogenization and DNA extraction of benthos samples. Samples were transferred from whirl-packs to 50 mL conical tubes, using molecular biology grade water to rinse whirl packs to ensure the entire sample was transferred. Samples were centrifuged to collect sediment at the bottom, and excess water was removed. Approximately 0.3 g was then directly subsampled into 2 mL bead tubes included within the Qiagen PowerSoil kit. Samples were extracted according to manufacturer's protocol eluting with 50uL Buffer C6. Each batch (~ 95 samples) included one negative extraction control where no tissue was included. The remaining mass was stored in the Falcon tubes at − 20 °C as a voucher.
Library preparation and high-throughput sequencing. Two fragments within the standard COI DNA barcode region and one fragment within the 18S (eukaryote) region were amplified with the following previously optimized and validated primer sets: ( www.nature.com/scientificreports/ of 72 °C for 5 min. Amplification was visually confirmed through a 1.5% agarose gel electrophoresis. Amplicons were purified with a MinElute PCR purification kit (Qiagen), quantified with fluorometry using a QuantIT PicoGreen dsDNA assay kit (Invitrogen), and normalized to the same concentration prior to dual indexing with a Nextera Index Kit (Illumina). Indexed samples were pooled and purified through magnetic bead purification. The library was then quantified with the QuantIT PicoGreen dsDNA assay kit, and average fragment length was determined on an Agilent Bioanalyzer 2100 with a DNA 7500 chip. The library was then diluted to 4 nM based on the concentration and average fragment length and sequenced on an Illumina MiSeq using a v3 chemistry kit (2 × 300 cycles). A 10% PhiX control spike-in was used to ensure sequence diversity.
Bioinformatic processing. Illumina MiSeq paired-end reads were processed using the MetaWorks-1.0.0 pipeline available from https:// github. com/ terri mport er/ MetaW orks 58 . MetaWorks is an automated Snakemake 59 bioinformatic pipeline that runs in a conda 60 environment. Details of the sequence processing steps of META-WORKS is described in the Supplementary Material. Further analysis of leave-one-sequence-out testing with the RDP Classifier conducted with our COI and 18S reference sets allowed us to assess expected taxonomic assignment accuracy according to metabarcode length and taxonomic rank [61][62][63] . Using a leave one sequence out approach during classifier validation, we determined that for a ~ 200 bp COI metabarcode, taxonomic assignments are ~ 90% correct at the species rank, 95% + correct at the genus-family ranks using the appropriate bootstrap support cutoffs, and 99% + correct at more inclusive ranks (e.g. order-kingdom) assuming the query taxa are present in the reference database 61 . Using a ~ 200 bp 18S metabarcode, taxonomic assignments are ~ 80% + correct at the genus-order ranks using the appropriate bootstrap support cutoffs and about 95% + correct at more inclusive ranks (class-domain) assuming the query taxa are present in the reference database. Resolution of taxonomic assignments were compared by recoding taxonomic assignments to: species = 1, genus = 2, family = 3, etc. Sampling effort in metabarcoding and morphological samples were assessed using rarefaction. Relative abundance were compared by calculating the proportion or percentage of reads per taxon per station for metabarcoding data and by calculating the number of individuals per taxon per station for morphological methods. Prior to NMDS or PCA matrices were converted to proportions or standardized using a Hellinger transformation.

Statistical analyses. All statistical analyses were conducted in
For a fair comparison of macroinvertebrate metrics using COI metabarcoding and morphological methods, we limited COI results to taxa in the phyla normally detected using morphological methods: Arthropoda, Annelida, Mollusca, Cnidaria, and Platyhelminthes. We compared several macroinvertebrate diversity metrics for morphology-based and metabarcoding methods such as richness at the species level for macroinvertebrate COI and morphological samples, richness at the genus level for (non-macroinvertebrate) eukaryote 18S samples and read abundance of the top 5 most abundant families per sampling method.
We used principal components analysis (PCA) 'rda' function in the vegan package to assess the variance between macroinvertebrate COI metabarcode and morphological samples 66 . Only vectors for the most strongly correlated species were plotted for clarity. Also using vegan, we used non-metric multi-dimensional scaling (NMDS) analyses to visualize beta diversity based on Bray Curtis dissimilarities using the 'metaMDS' function using 3 dimensions and fitted correlated environmental variables using the 'envfit' function using 999 permutations if correlations were greater than 0.30 and the p-value < 0.05. Goodness of fit calculations and Shephard's curve were calculated using the vegan 'goodness' and 'stressplot' functions.
Water physical-chemical features were measured from samples collected at the bottom of the harbour and contaminants were measured from sediment samples. For each set of measurements for organic contaminants/ metals, values for each individual chemical/element was summed across each major contaminant group 17 . We tested each predictor for normality using a Shapiro-Wilk Test using the 'shapiro.test' function in R. Skewness was checked using the 'skewness' function from the moments package 67 . Each individual predictor variable was then transformed (square root, log10, 1/x) as needed to better meet normality assumptions in hierarchical partitioning using a Gaussian model. Predictors were standardized using z-scores and centered prior to analysis. We checked for collinearity among water and sediment variables using Pearson correlation coefficients with a cutoff of 0.70, using the 'rcorr' function from the Hmisc package ( Figure S2) 68 .
For each family, we added primary functional feeding guild (FFG) annotations based on the EPA Freshwater Biological Traits Database 69 . This system recognizes 7 feeding modes: collector-filterer (CF), collector-gatherer (CG), herbivore (scraper) (HB), parasite (PA), predator (piercer, engulfer) (PR), shredder (SH), and Other. Remaining missing family annotations were added using information compiled from the Taxa and Autecology Database for Freshwater Organisms available from freshwaterecology.info 70 . For this database, preference was given to feeding type annotations by Moog 71 . Any further missing annotations were then added by checking feeding habit annotations 72 . Since individuals were identified to the family level, species in a family can exhibit more than one feeding type and the presence of multiple feeding types per family were recorded when this was the case. We multiplied a family x station matrix containing read/individual counts by each feeding mode in a family x FFG matrix containing 1's or 0's indicating the presence of the feeding mode in a family. For each feeding mode, total read counts per station were recorded and combined into a single FFG x station matrix. Out of 61 families detected using metabarcoding, 23 (38%) were assigned feeding types. Out of 30 families detected using morphology, 24 (80%) were assigned feeding types. Further analyses using FFGs represent the proportion of all reads/individuals that could be both taxonomically assigned to family and functionally assigned a feeding mode. www.nature.com/scientificreports/ We used hierarchical partitioning to regress each macroinvertebrate metric, individually, on the environmental predictors to identify which water physical-chemical features or sediment contaminants explain the most variance independently of the others. This was done using the 'hier.part' package in R 73 .
Nutrient/major ion predictors measured from water collected at the bottom of the harbour were analyzed separately from sediment contaminants. Macroinvertebrate diversity metrics included macroinvertebrate species richness based on COI and morphological methods, genus richness based on (non-macroinvertebrate) eukaryote 18S metabarcoding, as well as the relative abundance of the top 5 families based on COI, morphological, and 18S. Functional metrics included the FFGs calculated for macroinvertebrate families using COI and morphological methods. All metrics were standardized using a Hellinger transformation using the ' decostand' function in vegan. Hierarchical partitioning significance was assessed using a randomization test, 1000 replicates, and calculating a goodness of fit measure based on log-likelihood.