Zooplankton diversity monitoring strategy for the urban coastal region using metabarcoding analysis

Marine ecosystems in urban coastal areas are exposed to many risks due to human activity. Thus, long-term and continuous monitoring of zooplankton diversity is necessary. High-throughput DNA metabarcoding has gained recognition as an efficient and highly sensitive approach to accurately describing the species diversity of marine zooplankton assemblages. In this study, we collected 30 zooplankton samples at about 2-week intervals for 1 year. Zooplankton diversity showing a typical four season pattern. Of the “total” and “common” zooplankton, we assigned 267 and 64 taxa. The cluster structure and seasonal diversity pattern were rough when only the “common” zooplankton was used. Our study examined how to maximize the benefits of metabarcoding for monitoring zooplankton diversity in urban coastal areas. The results suggest that to take full advantage of metabarcoding when monitoring a zooplankton community, it is necessary to carefully investigate potential ecosystem threats (non-indigenous species) through sufficient curation rather than disregarding low-abundance operational taxonomic units.


Results
Environmental conditions and summary of DNA data and taxonomic assessment. Water temperature and salinity were measured at the same station and on the same dates as most zooplankton samplings from February 2019 to March 2020 at 2-week intervals. The average water temperature was 16.6 ℃, ranging from 11.4 to 27.6 ℃ (see Fig. S2). It gradually increased from February 2019, peaked on August 14, 2019, and then decreased. The average water temperatures in February and March of 2020 were slightly lower than those in 2019. During the same sampling period, the salinity (practical salinity unit, psu) ranged from 30.2 to 34.5 psu, with an average of 33.0 psu. Contrary to water temperature, salinity generally decreased and increased again from February 2019 to September 2019 (Table S1; Fig. S2). These seasonal changes in water temperature and salinity are consistent with previous studies in the Busan Bay and the Southern coast of Korea [22][23][24][25] .
A total of 9,163,971 amplicons were sequenced from the 30 samples and 8,490,439 reads (92.7%) remained after the stringent quality filtering of chimeras ( Table 1). The number of OTUs (at the 98% similarity level) was Seasonal trends of α-diversity and taxonomic composition of zooplankton based on metabarcoding data. The average Chao1 index of "total" zooplankton was 42, ranging from 21.00 ± 0.16 (YD50) to 77.00 ± 30.34 (YD44). The overall trends of the Chao1 index was low in April and high in September ( Fig. 2a; Table S4). The average Shannon diversity index of "total" zooplankton was 1.66, which varied from 0.43 (YD62) to 2.53 (YD38). Similar to the Chao1 index, the overall trends of the Shannon diversity was low in April and high in September to October, showing a seasonal pattern. However, diversity declined slightly in early 2020 compared with a similar period in 2019 ( Fig. 2b; Table S4). The observed taxa and Shannon diversity index of "common" zooplankton had similar distribution patterns to those of "total" zooplankton throughout the sampling period. However, the R 2 -value of the polynomial regression analysis for the observed taxa of "common" zooplankton was higher than that for the Chao1 index of "total" zooplankton ( Fig. 2c, d; Table S4). www.nature.com/scientificreports/ With the environmental data measured on the zooplankton sampling day (accepting ± one date gap), the correlation between the α-diversity index (Chao1 and Shannon) and environmental factors was confirmed. The α-diversity indices were positively correlated with water temperature and negatively correlated with salinity ( Fig.  S5). The α-diversity showed a slightly higher correlation with water temperature than salinity because salinity was more conserved throughout the year (Fig. S5). Seasonal pattern was analyzed by dividing into three main mesozooplankton groups 4,24 ; copepods (average of 47%), meroplankton (43%), and non-copepod holoplankton (10%) (Fig. S6). The three groups showed highly seasonal dynamic patterns; the dominant group was copepods in January (average of 94.5%) and Meroplankton in September (88.89%).
The same NMDS analysis was performed on the "common" zooplankton [log 10 (x + 1) transformation, Bray-Curtis] (Fig. 6c, d). Similar to the above, it was also largely divided into four main groups according to the season, but there were differences in some samples. The samples clustered in G2 and G3 were the same as those in the "total" zooplankton analysis. However, sample YD46 was included along with YD44 in G1 instead of G4, and G1 was divided into two subgroups (G1-1 and G1-2). Furthermore, there were three single-cluster samples (YD42, YD50, and YD54). Cluster analysis using only the "common" zooplankton did not well differentiate the temporal and seasonal differences in the zooplankton community compared with the "total" zooplankton analysis (ANOSIM significance = 0.001, R = 0.8785).
We then conducted a SIMPER analysis to determine each taxon's average percentage contribution to each of the four seasonal groups [standardized log 10 (x + 1)-transformed data]. The top five highest contributing species to seasonal differences (p-value < 0.05) are indicated in Table 2. It was found that 15 species, except for the duplicates in the list, greatly contributed to the cluster structure variations according to time and season. Therefore, we confirmed their appearance to observe which seasonal group they represented. P. leuckartii (Branchiopoda), E. nordmanni (Branchiopoda), and Eudactylopus yokjidoensis (Copepoda) showed higher abundance in spring (G1) than in other seasons. P. polyphemoides (Branchiopoda), B. trigonus (Cirripedia), Membranipora villosa (Bryozoa), and C. challengeri (Cirripedia) were appeared in summer (G2) compared to other seasonal groups. A large number of A. omorii (Copepoda) represented throughout all seasons but their abundance decreased in late summer-autumn (G3). Amphibalanus amphitrite (Cirripedia), B. intermedius (Malacostraca), Liriope tetraphylla (Cnidaria), S. doederleini (Echinodermata), and Paracalanus gracilis (Copepoda) were the representatives of late  Table 2). All top 5 taxa which contributed significantly to distinguishing each seasonal group were included in the "common" taxa. The result of the full SIMPER analysis is attached (Supplementary Analysis S1).
Searching for candidates of potential invasive species. In order to evaluate the reliability of the overall metabarcoding classification, the results of our analysis were compared with the national list of marine species (NLMS) 26 . Of the 267 species identified in our data (including sequence-read depth < 10), ~ 75.7% (202/267) was confirmed to be the correct taxonomic name (species level: 192; genus level: 9; family level: 1) by NLMS and ~ 24.3% (65/267) of the remaining species were not found (Table S8). One of them was a freshwater taxon (Cyclops vicinus) 27,28 and 39 species in our data were only the same genus name in NLMS (Table S8). Moreover, 26 taxa did not have genus names as well as species names. C. vicinus is suggestive of debris that probably inflowed from the Nakdong River 29 . We confirmed whether the COI sequences for the 38 species that only existed in the NLMS with the same genus name had their congeneric species registered in NCBI. Twenty taxa of the 38 species have COI sequences for their congeneric species in the NCBI database. The remaining 18 taxa do not have COI sequences registered for any congeneric species in NCBI. Therefore, in the 18 cases, there   (Table S8). It would be worth exploring the taxonomic identification in future studies.

Discussion
In the study area (Yeongdo-gu, Busan), zooplankton diversity was highest in autumn (October) and lowest in spring (April) ( Fig. 2; Table S4). This seasonal pattern was similar to previous observations of the zooplankton community in the Busan Bay and the southern coast of Korea 24,25 . In addition, the seasonal pattern is thought to have a relatively higher correlation with water temperature than salinity 30 . Copepod species dominated the zooplankton composition, followed by cirripedian larvae and branchiopods (Figs. 3 and S4). In previous studies, copepods were most dominant in the zooplankton communities in the coastal regions, followed by branchiopods or Cirripedia larvae, depending on the season or environment 24,25,31 . It was confirmed that 30 temporal samples were roughly divided by season into four groups (Fig. 6). In addition, the 14 species that contributed significantly to each seasonal group as a result of SIMPER analysis were Note that species names are followed by WoRMS (http:// www. marin espec ies. org). On the southern coast of Korea, P. leuckartii is most abundant in April and reported to be negatively correlated with water temperature and salinity 31,32 . Likewise, in our study, P. leuckartii was most abundant in spring (G1) and not detected in summer (G2) when the water temperature was high ( Fig. 5; Table S7). E. nordmanni, which is known to appear briefly in the spring when the water temperature is between 10 and 17 °C 31,33 , was analyzed as a representative of spring (G1; February to May) in our study ( Fig. 5; Table S7). E. yokjidoensis, a new species reported in 2018, was collected from the southern coast of Korea in April 2016 34 . It showed high abundance in spring (G1), indicating that this new species may exist in our study area, but very little was found in other seasons in our samples. P. polyphemoides (Cladoceran) appears throughout the year in Chinhae Bay, Korea although its abundance is especially high when the water temperature is 18 °C 33 . It was also reported in the Mediterranean Sea at 18-19 °C 35 . Similarly, it was found in summer (G2; June to July) within a temperature range of 17.4 to 21.8 °C in our study, representing this season. M. villosa (Colonial marine bryozoan) was reported in Busan during the summer (June) and was mainly distributed in coastal ports of Korea in summer and autumn (August to November) 36 . In our data, it appeared only in summer (G2).
L. tetraphylla and B. intermedius, representatives of late summer-autumn (G3; August to November) in our study, have not already been accurately modeled for their annual distribution on the southern coast of Korea. L. tetraphylla was only detected in the coastal region of Busan in late September 37 and B. intermedius was confirmed only in the southern Yellow Sea of Korea during October 38 .
P. gracilis and C. furcatus were dominant in late summer-autumn (G3) and in the winter group (G4), supporting previous studies [39][40][41] . A. omorii was dominant throughout all seasons but with relatively low abundance in G3. A. omorii is not reported to appear in the summer when the water temperature is high (average 24 °C) 24 . The high-water temperatures in August may explain this species' disappearance in September and its low abundance in late summer-autumn (G3) (Figs. 5 and S2). In addition, this species frequently appears in the eutrophic inner bay 24,42 . Therefore, it indirectly shows that our study area, the entrance to Busan Port, may have undergone some degree of eutrophication.
Cirripedia larvae B. trigonus and C. challengeri were most abundant in summer (G2) and A. amphitrite in late summer-autumn (G3), respectively. It has been reported that C. challengeri appeared most intensively in August-October near Oryuk Islets off Busan 43 , the outer area of our study area. However, because it is difficult to classify Cirripedia larvae down to the species level, no seasonal changes in the distribution of the other two Cirripedia species have been reported. According to a previous study, Cirripedia larvae are relatively abundant Figure 6. Results of the clustering analysis. Cluster dendrogram of (a) "total" zooplankton and (b) "common" zooplankton. NMDS plot of (c) "total" zooplankton and (d) "common" zooplankton. The colors of cluster dendrograms and the NMDS plot indicate the seasonally divided groups. Black indicates a single cluster with the minimum dissimilarity cutoff ("total" species: 0.68, "common" species: 0.56). Figures were produced using R (v4.0.3, https:// www.R-proje ct. org). www.nature.com/scientificreports/ in summer and autumn than in other seasons 31 . Finally, S. doederleini is mainly distributed in the Caribbean 28,44 , and no record was found in Korea. Our metagenomic analysis results revealed that the seasonal zooplankton community could be largely divided into four groups corresponding to the four seasons. The distribution pattern of species representing each seasonal group has shown to be largely consistent with past research 24,25 . It was also possible to estimate Cirripedia larvae species, which was not identified in previous studies. Given these results, the study of marine zooplankton community and diversity by metabarcoding is efficient and enhances understanding of the dynamics of the zooplankton community throughout the year. Moreover, should the metabarcoding sequence data and the analyzed results be stored and remain available for future analyses, it will allow easy detection of changes in species composition and any introduction of invasive species into the Busan Port ecosystem by simply uploading their COI sequences to the database.
Most of the species not recorded in NLMS (average read counts per sample: 23.74) showed relatively lower abundance than the identified species (119.85). For this reason, they may have been relatively rare in the coastal region of Korea and difficult to find. In addition, these species may pose a potential threat to marine ecosystems as invasive species, introduced by ship movements or climate change 45 . Therefore, in future monitoring of zooplankton in the region, it is necessary to investigate these species' presence or absence carefully. Adding the presence or absence of barcode sequences (e.g., COI, 18S rRNA, ITIS) and database registration information to the NLMS in the future can greatly contribute to the improvement of the accuracy of future studies using metabarcoding. With only one year of observation, although it is difficult to state these are early invasive species in the Busan Port ecosystem, if we monitor them for a long time using metabarcoding, it should reveal their appearance trends. Hence, it may be possible to judge whether their abundance is increasing or just a short-term influx.
All 64 taxa identified in the "common" zooplankton were included in the "total" zooplankton taxa and accounted for 24.0% (64/267) of "total" zooplankton species. The small number of taxa in the "common" zooplankton accounted for about 97.8% (2,485,517/2,542,332) of the final read count for "total" zooplankton. Some zooplankton taxa occupy most of the abundance in the study area, and a large number of the other taxa show a very low frequency of appearance. Even if only "common" zooplankton was used, similar to the use of "total" zooplankton, the change in the temporal community structure was divided into the four seasonal groups. Moreover, in the SIMPER analysis, the top 5 species that showed significant differences among the seasons were included in the "common" zooplankton ( Fig. 6; Table 2). Nonetheless, if we include the species that occupied a small proportion in the analysis, it provides a better distinction of seasonal changes in community composition (Fig. 6). The Table 2. The top five highest average contributing taxa to each of the four seasonal groups. Significant p-values (< 0.05 and < 0.01) are marked with asterisks (* and **).

Taxa
Contribution (± SD)% Average of G1 Average of G2 "Spring" (G1) vs. "Summer" (G2) www.nature.com/scientificreports/ inclusion of rare species also helps detect early invaders or NIS introduced by climate change or human activities to predict and prepare for their impact on the ecosystem. Therefore, a species with low abundance should be reflected in the ecological analysis after sufficient data curation. Continuous and extensive zooplankton ecological monitoring studies involving metabarcoding methods have several advantages. First, this method can be more efficient than traditional methods 46 . As exemplified by E. yokjidoensis in our study, if the researchers only register the COI (or any other marker sequence) for a new species that has just been reported, it is possible to quickly screen and predict where the new species is distributed within the stored metagenomic library data. Although morphological methods can produce similar results by re-analyzing previously-stored samples, this work requires relatively more experts' labor than the metagenomic methods. Second, it can be possible to classify zooplankton larvae difficult to identify morphologically, like Cirripedia larva. Lastly, species that are difficult to detect by any morphology-based methods due to very small populations, such as early invaders and NIS candidates, can be detected with high sensitivity by metagenomics 47 . Nonetheless, even with metagenomic methods, it is hard to distinguish whether a species is a real member of the study area or a fragment of a dead specimen flowed from a river such as C. vicinus, a freshwater species identified in our data. Therefore, complementing metagenomic analysis with traditional morphology will enable understanding marine ecosystems more specifically and clearly than either approach alone, especially in extensive and continuous ecosystem monitoring.

Conclusions
Our study investigated how to maximize the advantages of metabarcoding for monitoring of zooplankton community structure and diversity in urban coastal regions like the entrance of Busan Port, Korea. In this study, the zooplankton community showed a typical four-season pattern and the species representing each seasonal group were generally consistent with previous studies. Even after the rare species were removed, "common" zooplankton enabled us to confirm the approximate pattern of change in zooplankton diversity. However, using all the OTUs, "total" zooplankton yielded a relatively more pronounced seasonal change in the zooplankton community structure, and potential candidates for early invasive species in the port ecosystem were identified. Although our observations were conducted over a relatively short period at one sampling station, it suggests that regular monitoring of urban coastal areas by metabarcoding could be useful for understanding this ecosystem and detecting potential hazards. Furthermore, it is expected that the accumulation of monitoring data using metabarcoding will enable predicting and responding quickly to changes in zooplankton diversity.

Material and methods
Sampling sites and collection. The Table 1). A plankton net with 200-μm mesh and 60-cm opening diameter was towed horizontally for a distance of 100 m (total filter volume, 20.6 m 3 ) across the water surface (< 1 m depth) for zooplankton sampling. Temperature and salinity were measured at 0.5 m and 1.0 m depth using a conductivity meter (YSI 30, OH, USA) and the average of these two values was used for further analysis. DNA isolation and amplification by PCR. The collected zooplankton sample was transferred to the laboratory, where DNA extraction was undertaken immediately from 10 mL of the approximate 200-mL sample. The rest of the sample was stored in alcohol for later use. The TIANamp Marine Animals DNA Kit (Tiangen Biotech, China) was used to isolate DNA from the zooplankton sample.
The gene for eukaryotic mitochondria cytochrome c oxidase I (COI) was amplified by using the degenerate primer set mlCOIintF (5′-GGW ACW GGW TGA ACW GTW TAY CCY CC-3′) and jgHCO2198 (5′-TAIACYTCIG-GRTGICCR AAR AAYCA-3′) 48 . Amplification reactions were performed in 0.2-mL PCR tubes in a 30-μL mixture containing 1.8 μL of 1 × 10 -5 μM of primers, 2 μL of DNA template, 11.8 μL of ultrapure water, 15 μL of 2X DNA-free Taq Master Mix (CellSafe, Korea). Samples were amplified for 40 cycles using a MaxyGene Gradient Thermal Cycler (Axygen, CA, USA) under the following conditions: initial denaturation at 95 °C for 5 min (1 cycle), denaturation at 95 °C for 30 s, annealing at 46 °C for 30 s, and extension at 72 ℃ for 60 s. The final extension was performed at 72 ℃ for 5 min. A negative control (without DNA template) was performed for the PCR step to detect potential contamination. The PCR product was purified using the Universal DNA Purification Kit (Tiangen Biotech, China). Quantity and quality analyses of the PCR amplified fragments and purified product were estimated using capillary electrophoresis and an ND-1000 spectrophotometer (Nanodrop, Power Lab, Korea). To ensure a homogeneous number of sequencing reads from each sample, 100 ng of each amplicon DNA was taken and diluted to 4 nM with determination of the size of the DNA with Agilent Technologies 2100 Bioanalyzer (DNA 1000 Chip, USA). All the diluted samples were pooled and used in end-repair and ligation of adaptors followed by sequencing in the MiSeq platform according to the manufacturer's protocol. Next-Generation Sequencing library constructed using the Nextera XT Index Kit and the TruSeq Nano DNA Sample Prep Kit as the main capture kit and sequenced using the MiSeq platform were performed at Theragen facilities (Theragen Biotech, Korea).
Quality control and merge. To filter low-quality reads, cutadapt (ver. 2.8) 49 was used to remove amplicon sequences and to discard any unknown nucleotide "N" and reads that had no primer sequences or < 200 bp in length. To maximize the read depth for each sample, low-quality score cutoff values were set differentially in forward-end reads (q = 30) and reverse-end reads (q = 20). Reads without a mate (singletons) were discarded using the pairfq script (ver. 0.17.0; https:// github. com/ sesta ton/ Pairfq). Merging of paired-end reads was con- www.nature.com/scientificreports/ ducted by pear (Paired-End reAd mergeR, ver. 0.9.2) 50 with the following parameters 51 : v = 30, t = 50, n = 250, m = 350, and q = 20.
OTU clustering. The resulting FASTA files were clustered using the vsearch tool (ver. 2.7.0) 52 . Next, sequences were dereplicated, sorted (-derep_fulllength), and those with < 2 clusters (singleton) were removed. The outputs that passed the previous steps were pre-clustered at a similarity threshold of 99% (-clus-ter_size). After pre-clustering, chimeras were de novo detected and removed using the UCHIME algorithm (-uchime_denovo) 53 . Lastly, final OTUs were clustered at a similarity threshold of 98% (-cluster_size) from pre-clustered OTUs, and all sequences were assigned to OTUs. A flowchart of the steps involved in metagenomics analysis is given in Fig. S1.
Taxonomic identification. Taxonomy was assigned to the OTU (2) accept only aligned lengths ≧200 bp and bit-score ≧100; (3) select one with the longest alignment length if there are many OTUs aligned with the same query sequence. The remaining OTUs were then processed for final taxonomic identification. The taxonomic assignment and hierarchical classifications from NCBI accession numbers were done using the 'taxonomizr' package 55 in R software. Next, the OTU tables were modified to the species or genus level (in case that the NCBI database did not contain species level information) and were used for further analyses. After BLAST, all OTUs corresponding to the shown taxa were identified using a custom Perl script (eDNA_shell_ fast_taxonomy.pl). The Perl script is available upon request from the authors. In addition, 'Bacteria' , 'Fungi' , 'Fish' , 'Insect' (from inland), 'Mammalian' , 'Phytoplankton' , and 'Environmental' or 'Unclassified' OTUs were removed from the OTU count table because we focused on marine zooplankton diversity. Finally, the synonymized taxa were combined into one species by referring to the World Register of Marine Species (WoRMS, http:// www. marin espec ies. org) 28 .

Biodiversity analysis.
In zooplankton studies using metagenomics, it is common that low-abundance reads are discarded, and further analysis is performed to remove contamination or PCR artifacts or compare morphological analysis data with the metagenomically dominant species 3,56,57 . Instead, in this study, we applied two methods to determine the difference between the data with the low-abundance OTUs removed ("common" zooplankton) and the data with all OTUs ("total" zooplankton). To confirm the difference between the two methods, the OTU removal standard used in the high contamination risk sampling method was followed. Thus, for "common" zooplankton, OTUs that contributed > 0.5% of sequences in at least one sample were retained 3 . Conversely, "total" zooplankton was used for analysis without removal of low-coverage OTUs.
All samples were rarefied at the lowest sequencing depth to reduce biases resulting from differences in sequencing depth using vegan 58 in R software. As there were many un-assigned OTUs in taxonomic assessment, rarefaction was performed at the OTU level to secure as much read depth as possible for "total" zooplankton. For "common" zooplankton, rarefaction was performed at the remaining species level after taxonomic assessment. Using phyloseq 59 in R software, species richness (observed or Chao1 index) and Shannon diversity index were estimated. Linear regression models were used to examine the relationship between environmental variables and biodiversity with R software 60 .
NMDS was employed to cluster samples according to seasonally different community compositions using the phyloseq and vegan packages in R. The original species-level OTU data were transformed to log 10 (abundance + 1) before NMDS. Then, NMDS for transformed data was conducted using the Bray-Curtis distance method (100 permutations). Analysis of similarity (ANOSIM) was applied to test each seasonal group's significant effects on community composition (999 permutations). Similarity percentages (SIMPER) analysis was conducted on Bray-Curtis similarities from log 10 (abundance + 1) transformed (100 permutations) data using the vegan package implemented in the R programming language 58,60 . All figures (except for Figs. 1, S1, and S2) were initially created using R 60 .
By comparing the species shown in our analysis results with NLMS published by the National Marine Biodiversity Institute of Korea 26 , we confirmed the species listed as candidates for NIS. First, all the species identified by metabarcoding were checked if any records had appeared in the coastal region of Korea. For the species in our results that did not exist in NLMS, the congeneric species in NLMS were checked if they had taxonomy information and COI sequence registered in the NCBI. Finally, the taxa without both species and genus name in NLMS plus COI sequence for all other congeneric species within NLMS were estimated as potential early invader species or NIS.

Data availability
All sequencing data are archived in the NCBI Sequence Read Archive (SRA) database under BioProject number PRJNA739266. www.nature.com/scientificreports/