Ecosystem degradation is one of the leading causes of biodiversity decline in aquatic realms1. Freshwater degradation can be observed as physical changes to habitat morphology, hydrological alterations and changes to biogeochemistry of water and sediment2,3. Lakes in particular are more susceptible to the consequences of shoreline developments and loading of nutrients, including phosphorous and nitrogen, than other freshwater habitats4,5. These lake stressors often cause a reduction in littoral habitats for submerged macrophytes, changes to sediment composition, and increased levels of eutrophication4,5,6,7, resulting in loss of lake biodiversity2. Restoring degraded lake ecosystems typically involves changing the ecological fate of the system towards an ecologically-sound status, where function is retained1. To achieve this, stressors need to be first identified and then managed via intervention and ecosystem management techniques to enable recovery and restoration1,5,8.

Toronto Harbour is located on the north shore of Lake Ontario, directly south of the City of Toronto and receives water from one of the most highly urbanized and industrialized areas in the Great Lakes. Contaminants from stormwater runoff, spills, and chemical input to sewers from industries and residences have contributed to severely degraded water and environmental health9,10,11,12. In 1994, eight beneficial uses in Toronto Harbour were identified as impaired (beneficial use impairments; BUIs12,13. In 1987, Toronto Harbour was designated as an Area of Concern (AOC)13 and since then, there has been an increase in biological monitoring in the harbour9,14,15.

As of 2016, results of Remedial Action Plan (RAP) activities resulted in re-classification of two of the eight identified BUIs (‘Not Impaired’), however “Loss of Fish and Wildlife Habitat’ remains impaired13,16. Over the last 30 years, water quality, sediment quality and the quantity and condition of terrestrial and aquatic habitats have improved considerably, resulting in the preparation to delist Toronto Harbour as an AOC13. In Toronto Harbour, the RAP investigates trends in benthic macroinvertebrate diversity by equating a higher diversity to a healthier system16, with diversity metrics derived solely from morphological analyses17. The challenges of morphological approaches compared to a DNA-based identification approach for monitoring macroinvertebrates have been highlighted previously (e.g. low taxonomic resolution and high processing costs)18,19,20,21,22,23.

DNA extracted from bulk benthos samples has been transformative for routine benthic macroinvertebrate biomonitoring24 and for determining gradients in freshwater environmental conditions18,25,26,27,28. When taxonomic lists and corresponding the Hilsenhoff Biotic Index (HBI) tolerance values are paired with ecological information, such as functional feeding guilds (FFG), metabarcoding analysis can provide in-depth understanding of ecological function and responses of macroinvertebrates to different environmental conditions19,28,29,30,31.

In this study, we hypothesized that the addition of eDNA metabarcoding will provide a more taxonomically comprehensive biodiversity measure for focal bioindicator macroinvertebrates, which will be reflected in finer-scale environmental assessment. By comparing diversity and functional macroinvertebrate metrics using eDNA metabarcoding and morphological assessment methods we assess the responses of these metrics across environmental gradients. Through this investigation, we aim to determine the most effective approach for long-term monitoring of benthic communities in Toronto Harbour and similar systems.


A map showing sampling stations in Toronto Harbour and associated gradients in water physical–chemical and sediment contaminants is shown in Fig. 1 (Table S1). From these samples, a total of 18,790,832 × 2 paired-end sequence reads were generated for 75 samples and 16 controls. After read pairing, trimming, and denoising (sequence error, chimera, and pseudogene removal for COI) we generated a total of 9329 exact sequence variants (ESVs) (Table S2).

Figure 1
figure 1

Toronto Harbour and associated abiotic and biotic gradients. Combined sewage outflows (black triangles) and the Don River (blue triangle) flows into the Toronto Harbour (Ontario, Canada). Sampling stations are shown in the legend. Inset map (top left) shows map area (black box) within respect to the Great Lakes region. Scale bar is in kilometers. Three of the most influential environmental variables are shown along with diversity metrics with the largest responses. Abbreviations: water physical–chemical features (W), sediment contaminants (S), macroinvertebrate (mi), eukaryote (euk). Map tiles by Stamen Design, under CC BY 3.0. Data by OpenStreetMap, under ODbL.

Overall, we found that taxonomic assignment resolution was finer using COI metabarcoding compared with other methods (Figure S1a). Sampling effort assessed using rarefaction found that sequencing depth was appropriate for the metabarcoding samples, and all curves reached saturation, however for samples identified by morphology, curves continue to rise indicating that further sampling would have identified additional families (Figure S1b).

Macroinvertebrate diversity metrics measured using morphology or metabarcoding

Biodiversity was compared between morphological and metabarcoding approaches (Fig. 2a). Macroinvertebrate species richness was higher using COI metabarcoding (median 15 species per station) compared with morphological methods (9 species per station) (Wilcox test, p-value > 0.00027) (Fig. 2a). 18S metabarcoding detected an even higher diversity of genera (82 genera per station).

Figure 2
figure 2

Alpha and gamma diversity detected is higher using DNA metabarcoding compared with morphological methods. In (a) alpha diversity is compared across methods showing genus richness for (non-macroinvertebrate) eukaryotes sampled using 18S metabarcoding and species richness for macroinvertebrate sampling using COI and morphological sampling. In (b) gamma diversity is compared showing the total number of 18S eukaryote genera and total number of species sampled using COI and morphological methods in Toronto Harbour.

The variance between the macroinvertebrate species detected using COI and morphological methods was similar for many stations, however, COI metabarcoding detected divergent communities at several stations (Fig. 3a). The first two axes of the PCA plot explains 42.8% of the variance in these communities. These differences were correlated with the relative read abundance of Limnodrilus hoffmeisteri, Tubifex tubifex, and Potamothrix vejdovskyi. When we compared community composition using Bray Curtis dissimilarities, most locations appeared similar to each other when using morphological taxa (stress = 0.09, R2 = 0.98) and ammonia (r = 0.31, p = 0.035) and magnesium (r = 0.35, p = 0.026) correlate with the ordination of these communities (Fig. 3b). When we compared communities using COI sequence variants, locations showed more separation with Toronto Island and Bathurst Quay locations clustering separately from the rest (stress = 0.14, R2 = 0.86) and ammonia (r = 0.51, p = 0.002), metals (r = 0.38, p = 0.005), and organochlorines (r = 0.38, p = 0.005) correlate with the ordination of these communities (Fig. 3c). When we compared communities using 18S sequence variants, most locations appeared similar, but Bathurst Quay and Keating Channel locations clustered separately (stress = 0.06, R2 = 0.99) and pH (r = 0.43, p = 0.004) and metals (r = 0.40, p = 0.005) correlate with the ordination of these communities (Fig. 3d).

Figure 3
figure 3

DNA metabarcoding helps distinguish between sampling locations in Toronto Harbour. Communities are compared using (a) principal components analysis (PCA) at the species rank for COI and morphological sampling, (b) non-metric multidimensional scaling (NMDS) using morphological taxa identified to a species rank when possible, (c) NMDS using macroinvertebrate COI sequence variants, and (d) NMDS using eukaryote (non-macroinvertebrate) 18S sequence variants. For PCA, COI samples are shown in grey circles and morphological samples in white squares and the most strongly correlated species vectors were added. For NMDS plots, centroids for each location are plotted for each sampling method using Bray Curtis dissimilarities, and environmental vectors were added if correlations were at least 0.30 with a p-value < 0.05.

We also compared the top 5 most abundant (non-macroinvertebrate) eukaryote families using 18S metabarcoding with the top 5 most abundant families detected using both morphological methods and COI metabarcoding (Fig. 4a). Although we compare the relative abundance of individuals sampled using morphological methods with the relative read abundance of sequence variants, it's important to acknowledge that read abundances reflect the specificity of our primers, specimen size, as well as abundance in samples. In terms of detections, COI metabarcoding detects some but not all the macroinvertebrate families detected using morphology. The median relative abundance of Naididae, the most abundant group using either method, was higher using morphological methods (90.6%) compared to COI metabarcoding (31.8%) (Wilcox test, p-value = 3.6e−10). The median relative abundance of Chironomidae was higher using COI (9.2%) compared to morphological methods (4.6%) (Wilcox test, p-value = 0.04). Overall, macroinvertebrate communities at the family rank were positively correlated (Pearson, 70.1%, p-value ~ 0, 95% confidence interval 61–78%).

Figure 4
figure 4

Comparison of major taxonomic and functional groups detected across sampling methods. We compare (a) the 5 most abundant families of eukaryotes (non-vertebrate 18S) and macroinvertebrates (COI and morphology) and (b) macroinvertebrate functional feeding guilds. Relative read abundance is presented for DNA-derived data and relative abundance of individuals is presented for morphology-derived data. Abbreviations: predators (PR), collector-gatherer (CG), herbivore (HB), shredders (SH), collector-filterers (CF), parasites (PA).

Overall, multi-marker metabarcoding recovers a greater diversity of taxa than morphological methods even when only considering macroinvertebrate taxa in the phyla Arthropoda, Annelida, Mollusca, Cnidaria, and Platyhelminthes for a fair comparison with morphological methods (Table S3). Overall, we detected 47 and 88 unique macroinvertebrate species (using morphological methods versus COI metabarcoding), 77 and 79 genera, and 30 and 61 families in total across all sampled stations. When considering other (non-macroinvertebrate) eukaryotes detected using 18S metabarcoding, we detected a further 271 genera and 111 families from 36 eukaryote phyla with the most diverse phyla detected from the Ciliophora (105 taxa), Cercozoa (64), Ascomycota (Fungi, 61), Basidiomycota (Fungi, 52 taxa), and the Chytridiomycota (42 taxa).

Macroinvertebrate functional metrics using metabarcoding and morphological methods

Functional feeding groups, a classification system based on how macroinvertebrates acquire food, allowed us to assess whether ecological function varies independently of taxonomic shifts. At each station, the proportion of macroinvertebrate reads (COI metabarcoding) or individuals (morphological) assigned to a family with a particular functional feeding guild(s) was assessed (Fig. 4b). The relative abundance of FFGs between methods were significantly different for parasites (Wilcox test, = 0. 3.8e−05), predators (p.adj = 0. 1.8e−05), and ‘other’ (p.adj = 0.014). The median relative abundance of parasites was ~ 0% using morphological methods and 0.03% using COI metabarcoding. The median relative abundance of predators was 97% using morphological methods and 11% using COI metabarcoding. The mean relative abundance of ‘other’ was 2% using morphological methods and 0.2% using COI metabarcoding. Functional communities detected using both methods were not found to be correlated (Pearson, 0.08, p-value = 0.31, 95% confidence interval − 0.07 to 0.22), though this may be due to our difficulty assigning function to macroinvertebrate families detected using COI metabarcoding methods. The three most abundant predator families using COI metabarcoding was Naididae (422 sequence variants, 284,851 reads), Chironomidae (81 sequence variants, 96,261 reads), and Bosminidae (34 sequence variants, 23,495 reads); and using morphological methods was Naididae (30 taxa, 30,163 individuals), Chironomidae (41 taxa, 322 individuals), Dreissenidae (3 taxa, 2241 individuals).

We used hierarchical partitioning to identify environmental variables that explain the variance in diversity and functional metrics (Table S5, Table S6). Sediment contaminants explain more variation in diversity metrics and water physical–chemical features tend to explain more variation in functional metrics (Fig. 5). The most influential environmental parameters are metals, aromatic hydrocarbons, and ammonia (Fig. 1). Metals make significant independent contributions explaining 62% of the variation in macroinvertebrate species richness (COI), 55% of the variation in the relative abundance of Thoracosphaeraceae (18S), and 41% of the variation in eukaryote genus richness (18S). Aromatic hydrocarbons make significant independent contributions explaining 63% of the variation in Candonidae (COI), 57% of the variation in Cyprididae (COI), and 45% of the variation in Limnesiidae (morphology), 45% of the variation in Limnesiidae (morphology), and 41% of the variation in Naididae (morphology). Ammonia makes significant independent contributions explaining 56% of the variation in Bosminidae (COI), 56% of the variation in Naididae (COI), 50% of the variation in Candonidae (COI), 49% of the variation in Thoracosphaeraceae (18S), and 47% of the variation in Hydridae (morphology). Temperature makes significant independent contributions explaining 62% of the variation in macroinvertebrate richness (COI) and 50% of the variation in Haptoria (18S). Ammonia makes significant independent contributions explaining 57% of the variation in parasites (morphology), 30% of the variation in shredders (morphology), and 29% of the variation in collector-gatherers (morphology).  Organochlorines also make significant independent contributions explaining 59% of the variation in parasites (COI) and 46% of the variation in collector-filterers (COI).

Figure 5
figure 5

Sediment contaminants explain more variation in diversity metrics and water physical chemical features explain more variation in functional metrics. Predictors that make significant (p-value < 0.05) independent contributions explaining variance in diversity metric response variables (left panel) and functional metric response variables (right panel). Abbreviations: DNA metabarcoding(dna), macroinvertebrate (mi), eukaryote (euk), conventional morphology (morph), shredders (SH), predators (PR), parasites (PA), herbivores (HB), collector-gatherers (CG), collector-filterers (CF), temperature (Temp), ammonia (NH3_N), total nitrogen (TN), sulphate (SO4), aromatic hydrocarbons (Aro), metals (Metal), organochlorines (Orgchl), insecticides (Insect).


The process of monitoring benthic macroinvertebrate communities has been the focus of freshwater health assessments across the world. Morphological-based approaches, which were once the only method of determining richness, are rapidly being complemented with DNA-based methodologies such as metabarcoding, greatly increasing the taxonomic information obtained from benthic samples21,21,32,33,34. In this study, we have demonstrated the benefits of employing eDNA metabarcoding to lake-based macroinvertebrate biodiversity assessments.

Previously, morphological-based biomonitoring in Toronto Harbour had focused primarily on overall richness as the determining factor of lake health with limited investigation into the composition of taxa15,16,35. While this can be informative as to the overall quantity of macroinvertebrate taxa present in the lake, it misses details on the proportions of bioindicator groups in relation to other taxa. Moreover, species diversity often decreases with degradation of aquatic environments, with more tolerant species dominating species compositions and composition of macroinvertebrates36. Therefore, the concept of higher total richness translating to improvements in water quality and lake health may be misleading, as measures of community composition, structure and diversity are also required to understand changes in environmental gradients36. COI metabarcoding also provides finer level taxonomic assignments compared with morphological methods as has been shown in a previous study for fishes37.

In addition, the confidence levels for morphological macroinvertebrate identification are scarcely reported and classifications can vary between taxonomists18,38, meaning likelihood of misidentified or missed taxa is not taken into consideration21. Both DNA metabarcoding (i.e., finer resolution taxonomic assignments, ability to detect microscopic taxa) and morphological methods (i.e., absolute abundance counts) provide complementary information that can contribute and support macroinvertebrate community information for further ecological analyses. In addition to the typical benthic macroinvertebrates identified using morphology, DNA metabarcoding enabled accurate detection of additional phyla, using more inclusive markers as used in this study. In this study, the detection of unique taxa via 18S sequencing enabled us to clearly distinguish stations, highlighting that non-macroinvertebrate groups can also provide insights towards understanding of spatial community dynamics in lake systems39.

The 18S marker enabled us to detect fungal groups expected to be found from aquatic environments such as the Chytridiomycota, Blastocladiomycota, and Cryptomycota (Rozellomycota). We even detected members of the Archaeorhizomycota (12 sequence variants, 117 reads), a group of fungi with a global distribution largely known only from marker gene studies of soil with only two species in culture40,41. The application of this marker also resulted in detection of several fungal taxa that we would expect to find from benthic samples such as known parasites of arthropods/insects: Cordyceps, Coelomomyces, and Labulomycetales.

Compared to DNA methods, morphological methods detected negligible amounts of aquatic bioindicator taxa, specifically Ephemeroptera (HBI = 0–2; an indicator of excellent water quality) and Trichoptera (HBI = 0–4; very good). COI metabarcoding detected Ephemeroptera and Trichoptera more often, but the relative abundance of these taxa were relatively low. Environmentally tolerant taxa (i.e. those with HBI score of 6–10), were detected more so with DNA metabarcoding methods compared to morphology. For example, species/genera including, Limnodrilus hoffmeisteri (HBI = 10; very poor), and Tubifex tubifex (HBI = 10; very poor), and Hydra (HBI = 5; poor) were detected exclusively by DNA methods, whereas Potamothrix vejdovskyi (HBI = 8; poor) was detected exclusively with morphological methods. Naididae as a family are highly tolerant to organic pollution, scoring between 6 and 10 on the HBI index42. More families, genera and species from extreme ends of the tolerance index were detected using DNA metabarcoding, highlighting the ability of this method to detect presence of these important taxa.

Beyond taxonomic metrics, mapping functional diversity across lake environments can improve our understanding of lake ecosystem integrity19,43,44,45. For example, the presence of diverse macroinvertebrate predators detected in Toronto Harbour may indicate that there are sufficient resources to support a stable multi-level food web43. Overall, morphological methods performed very well with regards to their ability to identify a diverse array of functional groups, but it was challenging to use automated methods to annotate function to many of the less abundant but diverse macroinvertebrate families detected using COI metabarcoding. Despite metabarcoding methods facilitating genus- and species-level taxonomic assignments, databases of functional metrics for macroinvertebrates at this taxonomic level are still incomplete46. The influence of different environmental variables (e.g. Metals and aromatic hydrocarbons in sediment as well as ammonia in lake water) on the relative abundance of functional feeding groups has implications on the trophic system of the lake as a whole. An increase or decrease in sediment and water pollutant loads will likely influence a shift in the taxa exhibiting different FFG guilds. Assigning FFG at more conservative taxonomic ranks (i.e., family) enables easy integration of both metabarcoding and morphological taxonomic lists whilst maximizing functional assignments.

By using a multi-marker approach, we can begin to close knowledge gaps regarding how water quality and loading of elements in sediment affect taxa at various trophic levels47,48. In this study, metals were found to be one of the more influential sediment contaminants explaining variation in overall macroinvertebrate richness using COI metabarcoding as well as variation in (non-macroinvertebrate) eukaryote richness using 18S metabarcoding. In Toronto Harbour, metals such as cadmium, chromium, copper, lead, mercury, and zinc were found to have levels that exceeded the Canadian Federal Probable Effect Level. In this study, we have samples from 5 stations from inner harbour centre that represent a clear gradient of increasing metal concentrations, decreasing macroinvertebrate and eukaryote richness detected using COI and 18S metabarcoding. Ciliates, despite their ubiquity in freshwater systems, the key position they play in trophic webs (feed on bacteria, algae, other protists while also being consumed by other meiofauna) and known sensitivity to a range of water quality types (low to highly polluted), are often a neglected in water quality assessments49.


DNA methods can readily be applied for both lake community biomonitoring and targeted monitoring for invasive, threatened, and/or exploited lake species, integrated with periodic morphological assessments to supplement routine DNA monitoring with abundance measures. Overall, for taxonomically comprehensive assessments of lake communities, it is imperative to apply robust high-throughput methods, such as DNA metabarcoding, to increase the resolution of biodiversity data and to understand species-specific responses to environmental gradients in the face of various perturbations and climate warming effects.


Study stations and sampling design

Sample stations in Toronto Harbour have been previously established by the Ministry of the Environment and Climate Change Resources in 1971, to track improvements in sediment contaminant concentrations and benthic invertebrate species composition and density50 (Table S1; Figure S1).

In October 2018, water, sediment and benthos samples were collected from Toronto’s Inner Harbour over a 3-day period, from a subset of 25 stations17. At each station, environmental data was collected, followed by the collection of surface and overlying water, sediment and benthos samples in order. Samples were collected following the Canadian Aquatic Biomonitoring Network (CABIN) Open Water sampling protocols for collection of benthic samples51,52.

Water samples were analyzed for major ions, nutrients, temperature, conductivity, pH and dissolved oxygen (see supporting information for technical details). Sediment samples were analyzed for trace metals, PCB aroclors, total PCBs and PCB congeners, chlorinated pesticides, chlorobenzenes, technical toxaphene (insecticide) and toxaphene congeners/parlars. See supporting information for technical details on water, sediment and benthos sampling and collection for both DNA and morphological identification.

Morphology-based taxonomic identification of benthos

Invertebrates in the benthic community samples were sorted, identified to the family level, and counted by EcoAnalysts, Moscow, ID.

DNA sample homogenization and DNA extraction of benthos samples

Samples were transferred from whirl-packs to 50 mL conical tubes, using molecular biology grade water to rinse whirl packs to ensure the entire sample was transferred. Samples were centrifuged to collect sediment at the bottom, and excess water was removed. Approximately 0.3 g was then directly subsampled into 2 mL bead tubes included within the Qiagen PowerSoil kit. Samples were extracted according to manufacturer’s protocol eluting with 50uL Buffer C6. Each batch (~ 95 samples) included one negative extraction control where no tissue was included. The remaining mass was stored in the Falcon tubes at − 20 °C as a voucher.

Library preparation and high-throughput sequencing

Two fragments within the standard COI DNA barcode region and one fragment within the 18S (eukaryote) region were amplified with the following previously optimized and validated primer sets: (LCO1490/230_R [~ 230 bp] called F230R, mICOIintF/jgHCO2198 [~ 313 bp] called ml-jg and Uni18SF/ Uni18SR [~ 600 bp] called Uni18S53,54,55,56,57, using a two-step PCR amplification regime. The first PCR used COI and 18S specific primers and the second PCR involved Illumina-tailed primers. Uni18S PCR cycling conditions are as follows: 95 °C for 3 min, followed by 35 cycles of 94 °C for 30 s, 52 °C for 30 s, 72 °C for 90 s, and a final extension of 72 °C for 8 min. COI PCR cycling conditions for both fragments were: 95 °C for 5 min, followed by 35 cycles of 94 °C for 40 s, 46 °C for 1 min, 72 °C for 30 s, and a final extension of 72 °C for 5 min. Amplification was visually confirmed through a 1.5% agarose gel electrophoresis. Amplicons were purified with a MinElute PCR purification kit (Qiagen), quantified with fluorometry using a QuantIT PicoGreen dsDNA assay kit (Invitrogen), and normalized to the same concentration prior to dual indexing with a Nextera Index Kit (Illumina). Indexed samples were pooled and purified through magnetic bead purification. The library was then quantified with the QuantIT PicoGreen dsDNA assay kit, and average fragment length was determined on an Agilent Bioanalyzer 2100 with a DNA 7500 chip. The library was then diluted to 4 nM based on the concentration and average fragment length and sequenced on an Illumina MiSeq using a v3 chemistry kit (2 × 300 cycles). A 10% PhiX control spike-in was used to ensure sequence diversity.

Bioinformatic processing

Illumina MiSeq paired-end reads were processed using the MetaWorks-1.0.0 pipeline available from MetaWorks is an automated Snakemake59 bioinformatic pipeline that runs in a conda60 environment. Details of the sequence processing steps of METAWORKS is described in the Supplementary Material. Further analysis of leave-one-sequence-out testing with the RDP Classifier conducted with our COI and 18S reference sets allowed us to assess expected taxonomic assignment accuracy according to metabarcode length and taxonomic rank61,62,63. Using a leave one sequence out approach during classifier validation, we determined that for a ~ 200 bp COI metabarcode, taxonomic assignments are ~ 90% correct at the species rank, 95% + correct at the genus-family ranks using the appropriate bootstrap support cutoffs, and 99% + correct at more inclusive ranks (e.g. order-kingdom) assuming the query taxa are present in the reference database61. Using a ~ 200 bp 18S metabarcode, taxonomic assignments are ~ 80% + correct at the genus-order ranks using the appropriate bootstrap support cutoffs and about 95% + correct at more inclusive ranks (class-domain) assuming the query taxa are present in the reference database.

Statistical analyses

All statistical analyses were conducted in RStudio v 1.1.456 running R v 3.5.1 64 and plots were created with ggplot2 65. Sequence variants recovered from field blanks comprised 2.7% of unique COI macroinvertebrate sequence variants and 5% of unique 18S eukaryote sequence variants and were removed from all subsequent analyses.

Resolution of taxonomic assignments were compared by recoding taxonomic assignments to: species = 1, genus = 2, family = 3, etc. Sampling effort in metabarcoding and morphological samples were assessed using rarefaction. Relative abundance were compared by calculating the proportion or percentage of reads per taxon per station for metabarcoding data and by calculating the number of individuals per taxon per station for morphological methods. Prior to NMDS or PCA matrices were converted to proportions or standardized using a Hellinger transformation.

For a fair comparison of macroinvertebrate metrics using COI metabarcoding and morphological methods, we limited COI results to taxa in the phyla normally detected using morphological methods: Arthropoda, Annelida, Mollusca, Cnidaria, and Platyhelminthes. We compared several macroinvertebrate diversity metrics for morphology-based and metabarcoding methods such as richness at the species level for macroinvertebrate COI and morphological samples, richness at the genus level for (non-macroinvertebrate) eukaryote 18S samples and read abundance of the top 5 most abundant families per sampling method.

We used principal components analysis (PCA) ’rda’ function in the vegan package to assess the variance between macroinvertebrate COI metabarcode and morphological samples66. Only vectors for the most strongly correlated species were plotted for clarity. Also using vegan, we used non-metric multi-dimensional scaling (NMDS) analyses to visualize beta diversity based on Bray Curtis dissimilarities using the ‘metaMDS’ function using 3 dimensions and fitted correlated environmental variables using the ‘envfit’ function using 999 permutations if correlations were greater than 0.30 and the p-value < 0.05. Goodness of fit calculations and Shephard’s curve were calculated using the vegan ‘goodness’ and ‘stressplot’ functions.

Water physical–chemical features were measured from samples collected at the bottom of the harbour and contaminants were measured from sediment samples. For each set of measurements for organic contaminants/metals, values for each individual chemical/element was summed across each major contaminant group17. We tested each predictor for normality using a Shapiro–Wilk Test using the ‘shapiro.test’ function in R. Skewness was checked using the ‘skewness’ function from the moments package67. Each individual predictor variable was then transformed (square root, log10, 1/x) as needed to better meet normality assumptions in hierarchical partitioning using a Gaussian model. Predictors were standardized using z-scores and centered prior to analysis. We checked for collinearity among water and sediment variables using Pearson correlation coefficients with a cutoff of 0.70, using the ‘rcorr’ function from the Hmisc package (Figure S2)68.

For each family, we added primary functional feeding guild (FFG) annotations based on the EPA Freshwater Biological Traits Database69. This system recognizes 7 feeding modes: collector-filterer (CF), collector-gatherer (CG), herbivore (scraper) (HB), parasite (PA), predator (piercer, engulfer) (PR), shredder (SH), and Other. Remaining missing family annotations were added using information compiled from the Taxa and Autecology Database for Freshwater Organisms available from freshwaterecology.info70. For this database, preference was given to feeding type annotations by Moog71. Any further missing annotations were then added by checking feeding habit annotations72. Since individuals were identified to the family level, species in a family can exhibit more than one feeding type and the presence of multiple feeding types per family were recorded when this was the case. We multiplied a family x station matrix containing read/individual counts by each feeding mode in a family x FFG matrix containing 1’s or 0’s indicating the presence of the feeding mode in a family. For each feeding mode, total read counts per station were recorded and combined into a single FFG x station matrix. Out of 61 families detected using metabarcoding, 23 (38%) were assigned feeding types. Out of 30 families detected using morphology, 24 (80%) were assigned feeding types. Further analyses using FFGs represent the proportion of all reads/individuals that could be both taxonomically assigned to family and functionally assigned a feeding mode.

We used hierarchical partitioning to regress each macroinvertebrate metric, individually, on the environmental predictors to identify which water physical–chemical features or sediment contaminants explain the most variance independently of the others. This was done using the ‘hier.part’ package in R73.

Nutrient/major ion predictors measured from water collected at the bottom of the harbour were analyzed separately from sediment contaminants. Macroinvertebrate diversity metrics included macroinvertebrate species richness based on COI and morphological methods, genus richness based on (non-macroinvertebrate) eukaryote 18S metabarcoding, as well as the relative abundance of the top 5 families based on COI, morphological, and 18S. Functional metrics included the FFGs calculated for macroinvertebrate families using COI and morphological methods. All metrics were standardized using a Hellinger transformation using the ’decostand’ function in vegan. Hierarchical partitioning significance was assessed using a randomization test, 1000 replicates, and calculating a goodness of fit measure based on log-likelihood.