Comparative analysis of freshwater phytoplankton communities in two lakes of Burabay National Park using morphological and molecular approaches

We analyzed phytoplankton assemblages’ variations in oligo-mesotrophic Shchuchie and Burabay lakes using traditional morphological and next-generation sequencing (NGS) approaches. The total phytoplankton biodiversity and abundance estimated by both microscopy and NGS were significantly higher in Lake Burabay than in Lake Shchuchie. NGS of 16S and 18S rRNA amplicons adequately identify phytoplankton taxa only on the genera level, while species composition obtained by microscopic examination was significantly larger. The limitations of NGS analysis could be related to insufficient coverage of freshwater lakes phytoplankton by existing databases, short algal sequences available from current instrumentation, and high homology of chloroplast genes in eukaryotic cells. However, utilization of NGS, together with microscopy allowed us to perform a complete taxonomic characterization of phytoplankton lake communities including picocyanobacteria, often overlooked by traditional microscopy. We demonstrate the high potential of an integrated morphological and molecular approach in understanding the processes of organization in aquatic ecosystem assemblages.


Scientific Reports
| (2021) 11:16130 | https://doi.org/10.1038/s41598-021-95223-z www.nature.com/scientificreports/ Natural lake systems represent essential reservoirs for domestic water supply, fish production, and recreational activities. At the same time, however, lakes are among the most vulnerable ecological systems and, therefore, should be continuously monitored [1][2][3][4] . Growing drainage shortage and diffuse source pollution of natural water reservoirs have severely impacted the water resources of Kazakhstan-the largest nation in Central Asia and one of the most water-scarce countries on the Eurasian continent 5 . Anthropogenic pressure and weather fluctuations resulting in alterations of physico-chemical conditions, nutrient input, eutrophication, and increase of grazing pressure, among other issues, cause substantial changes in the structure and functioning of lake ecosystems 4,6-8 . The observed temperature has risen twice as fast in Central Asian countries since the 1970s in comparison with the average global level 9 . Phytoplankton communities respond rapidly to shifting environmental conditions 10 via changes in cell abundance, morphology, and biomass [11][12][13] . Phytoplankton development results from interactions between internal processes and external environmental biotic and abiotic factors 14 , including between others temperature, pH and salinity [15][16][17] . However, biological monitoring methods can provide more insight into the effect of changes in the abiotic chemical and physical parameters of the organisms 18 . The presence of certain taxonomic phytoplankton groups may be used as indicators of chemical and/or physical conditions of the surrounding environment, or water quality 19,20 . Traditional approaches assessing phytoplankton diversity, distribution, and abundance of phytoplankton taxa, based on morphological characteristics obtained by light microscopy [21][22][23][24] have a number of limitations: (1) labor intensity that limits the size of the quantified sample to hundred(s) of cellular events and a relatively low number of samples to be processed; (2) accurate diagnostics of taxa and their abundances are hampered by undifferentiated morphologies, unidentified early-life algal stages and numerous cryptic species 25,26 ; and (3) incomplete description of the changes in biodiversity based on a limited number of morphologically identified taxa. During the last two decades, cytometric methods (flow cytometry (FCM) and imaging flow cytometry (IFC)) have been recognized as a powerful tool to study seasonal and spatial trends of phytoplankton 27,28 . It is noteworthy, however, that conventional cytometry may not be allowed to isolate and characterize all plankton species and colonial forms identified by traditional microscopy due to size limitations of flow cells typically within a 150 μm limit 29 .
Molecular monitoring tools represent a promising alternative to morphological methods. Moreover, after the unveiling of the key role of picoplankton in aquatic food webs and primary production 30 , a demand arose for new precision and sensitive techniques for the detection and characterization of these microorganisms. Many of them are too small to be identified by light microscopy. Therefore, molecular methods became one of the main tools for defining the composition of picoplankton assemblages [31][32][33][34][35] as well as for discriminating from harmful planktonic microalgae [36][37][38] . These tools came into use through biodiversity studies of algae expanding our discovery of new microalgae that were not detected by microscopy 39,40 . New genome data has provided many novel insights into the evolutionary history of photoautotrophic microorganisms 41 . Although next-generation sequencing allows for faster analysis 42 , the identification of unique and fragile nano-and picophytoplankton, the discovery of hidden diverse new microorganisms 43,44 , and minimizes the role of subjective evaluation, comparative studies with traditional methods are still needed 40,42 . Recently, the progress in next-generation sequencing (NGS) facilitated the extensive sequence-based characterization of diverse plankton communities 34,43,45,46 . However, despite the fast development of high-throughput sequencing (HTS), it is not yet clear to what extent the results of a traditional light microscopy-based taxonomic approach can be consistent in comparison with the results of NGS analysis 42 . Comparative methodological studies are also required to evaluate different algal taxa, such as diatoms and cyanobacteria 40,44,47-51 . In the present study, we compared the effectiveness of optical methods and NGS in assessing two Kazakhstani lakes located in the Burabay National park (Lake Burabay and Lake Shchuchie).

Methods
Sampling sites description. Burabay National Park is located in the northern part of Kazakhstan (the Akmola region) and includes fourteen lakes, of which Burabay and Shchuchie are among the largest (Fig. 1). This region is characterized by a continental climate, with warm summers (the average temperature in July is + 18.0 to + 20.5 °C), cold winters (the average temperature in January is − 16.0 to − 19.0 °C), and an average annual rainfall of 250 to 350 mm 52 . The beds of lakes Burabay and Shchuchie are of tectonic origin, filled with freshwater where bicarbonate, sulfate, and calcium ions prevail (Suppl . Table S1). Lake Burabay (Auliekol' , Borovoe) is an oligo-mesotrophic lake that may be defined as a continuous cold polymictic lake (Osgood Index = 1.05) 53,54 . It is a shallow lake (mean depth = 3.4 m, mean Secchi depth = 2.8 m) (Suppl. Tables S1, S2), with the bottom largely (up to 40-70%) covered by submerged macrophytes, both angiosperms (particularly Potamogeton species and Ceratophyllum demersum) and stoneworts (Chara species). As the main recreational area in the region that includes beach activities, swimming, boating, and fishing, it also has several hotels in its surrounding area. Thus, the ecosystem of Lake Burabay is under a strong recreational load. Lake Shchuchie (Shortan, Shortandy, Shortankol') is an oligotrophic closed lake. Although its maximum depth reaches 22.7 m 55 (Suppl . Table S2), the lake has no continuous stratification of a water column (possibly due to subaquatic springs) and may be classified as a cold polymictic lake (Osgood Index = 2.9) 53 . The ecological state of Lake Shchuchje is intensively influenced by the City of Shchuchinsk on the north shore. Water from the lake is used for various purposes-drinking, domestic use, and industrial use 56,57 . Field sampling and data collection. Surface water sampling was carried out monthly from June to September in 2015 from Lakes Burabay and Shchuchie (Fig. 1). Sampling locations were chosen based on morphometric characteristics of the lakes and heterogeneity of the degree of anthropogenic load. Samples for quantitative microscopy analyses of the phytoplankton community were collected from the surface water horizon (0.5 m depth) at each location using 0. 5 www.nature.com/scientificreports/ of formaldehyde and glacial acetic acid was used for longer preservation of the sample 58 . Collected quantitative samples were concentrated by a settling method 59 . Water samples for qualitative analyses were collected in duplicates using an Apstein plankton net 60 . The two sample replicates consisted of live unfixed material that was analyzed directly on arrival to the laboratory, and samples fixed with glutaraldehyde at the final concentration of 2% for identification of fragile phytoflagellates 61 . Water samples for cytometry analyses were collected and analyzed alive or fixed with 0.5% glutaraldehyde at final concentration until the analysis. Samples for molecular analysis (1.5-2 L) were filtered through 0.2 µm pore size polyethersulfone filters (EMD-Millipore, USA), placed into 5 mL bead tubes provided in the PowerWater DNA isolation kit (MO BIO Inc., USA), and stored at − 20 °C until DNA extraction. Physico-chemical parameters such as water temperature, pH, dissolved oxygen (DO), conductivity, total dissolved solids (TDS) were measured at each sampling point using a YSI Pro Plus multimeter (Xylem Inc., UK) simultaneously with phytoplankton samples. Water transparency (Secchi depth) was dimensioned using a black and white Secchi disk (diameter 0.20 m). Water samples were filtered through 0.22 µm polycarbonate filters using a vacuum pump, and the filtrate was used for subsequent ion chromatography (IC) analysis. Measurements of concentrations of nitrates (NO 3 -), nitrites (NO 2 -), ammonium (NH 4 + ), phosphates (PO 4 3-), fluorides (F -), chlorides (Cl -), bromides (Br -), sulfates (SO 4 2-), lithium (Li + ), calcium (Ca 2+ ), sodium (Na + ), potassium (K + ), magnesium (Mg 2+ ) were performed using Compact IC FLEX 930 with titrator Titrando 905 Metrohm (Mettler-Toledo, Switzerland) (Suppl . Table S1).  69 . The quality scores associated with each base call for each read were used to determine the portion of each Illumina read that was of acceptable quality. Two paired-end reads were joined on the overlapping ends using the fastq-join tool from the ea-utils package vs.1.1.2-537 70 . A minimum of 6 bases with up to 8% of mismatches was allowed between each end.
The alignment of sequences was done using the mapping software BWA 71 against Greengenes (gg_ otus_4feb2011 (downloaded in March 2021)) and SILVA (SSURef_NR99_115_tax_silva_DNA.fasta) databases. The sequences were assigned to operational taxonomic units (OTUs) at 97% similarity. The samtools vs.1.2 (http:// www. htslib. org/) were used to compute the number of reads mapped onto each OTU.
Data processing and statistical analysis. Mann-Whitney rank sum non-parametric test was used to determine significant differences among species distribution analyzed by NGS and microscopy in Lakes Burabay and Shchuchie. The direct comparison of microscopic data with NGS (species & genera distribution) was made from both lakes using Spearman correlation analysis (SigmaPlot, SyStat Software, USA). Linear regression and curve estimation were also performed with this software package.
The total phytoplankton abundance in Lake Burabay is 6 to 22 times higher than in Lake Shchuchie due to the continual development of small-celled colonial picocyanobacteria (CPCy) typical for shallow nutrientreach waters 72 and mainly represented by non-gas-vacuolated Aphanocapsa, Anathece, and Cyanodictyon species (Suppl. Figs. S1, S2). Despite its absolute numerical dominance, the relative biomass of this group is not very significant due to their smaller cell sizes and individual cell biovolumes. Potentially bloom-forming heterocystous cyanoprokaryote Dolichospermum (= Anabaena) flos-aquae, along with Dolichospermum mucosum, common for eutrophic lakes with low nitrogen content 72 were occasionally found at the end of the summer of 2015. The mass development of potentially toxic diazotroph D.flos-aque, however, was recorded using FlowCam imaging cytometer in a mid-summer period, when phytoplankton biomass was considerably higher (Suppl. Fig. S3). The highest total abundance and biomass of phytoplankton were observed in August (Fig. 3C,D).

FACS-based sorting.
Monthly collected water samples were also analyzed using light scattering and fluorescence via FACSAria flow cytometer (BD Biosciences, USA). A characteristic flow cytometric "signature" was www.nature.com/scientificreports/ observed for each lake (Suppl. Fig. S4). In the June samples, from Lake Burabay, it was possible to discriminate three autofluorescent assemblages: one assemblage consisted of small cyanobacteria cells and picoplankton, the second assemblage contained a mixture of Dinobryon monads, small cyanobacteria cells, and a few Cyclotella/Pantocsekiella spp. cells, and the third assemblage was mostly dominated by Dinobryon monads (Suppl. Fig. S4A,B). Later, in the July samples, only two assemblages were observed, each consisting of various cyanobacteria cells mostly dominated by Cyanodictyon spp., Aphanocapsa spp., and Anathece spp. Overall, the sorting results are in agreement with microscopy and FlowCam-based observations, where the majority of phytoplankton cells at the corresponding stations belonged to cyanobacteria. In contrast to Lake Burabay, the samples from Lake Shchuchie analyzed using flow cytometry were more diverse. The major microalgal assemblages from Lake Shchuchie that were discriminated and sorted by the flow cytometer consisted of Dinobryon whole cells or its monads, cryptomonads, pico-sized flagellates or picoplankton, and Cyclotella spp. (Suppl. Fig. S4C,D). Compared to the microscopy analysis, rare phytoplankton groups were missing from the flow cytometry (FCM) analysis with FACSAria instrument, possibly due to their low abundance and/or large size and odd shapes (e.g., long filaments and structures are limited by flow cell size).

SSU rRNA phytoplankton diversity based on NGS. During next-generation sequencing (NGS) from
the MiSeq run of the 16S rRNA and 18S rRNA libraries, raw sequences were acquired from Lake Burabay and Lake Shchuchie sub-samples (at the same 2015 sampling season dates). After applying quality control and clustering procedures, assembled operational taxonomic units (OTUs) were aligned with the Genbank sequence database using a cut-off of 97% sequence identity. To filter only phytoplankton OTUs, a search for keywords Phylum-level patterns were present, with seasonal variations being largely in agreement with our microscopic and cytometric observations. Subtle patterns identified by microscopy, however, were absent from NGS analysis. Phytoplankton biodiversity detected by NGS and microscopy were compared at genera and species levels. A morphological approach enabled us to detect species from nine phyla in Lake Shchuchie and Lake Burabay: Bacillariophyta, Ochrophyta (including Chrysophyceae, Synurophyceae, Xanthophyceae, and Raphidophyceae), Miozoa, Haptophyta, Cryptophyta, Chlorophyta, Charophyta, Euglenozoa, and Cyanobacteria. In comparison, the NGS method revealed the same number of phyla, with additional Dictyochophyceae and Eustigmatophyceae classes within the Ochrophyta phylum.
At genera level, almost 17% of all eukaryotic phytoplankton units represent marine phytoplankton genera for both lakes (Fig. 4). For further comparisons between microscopy data and NGS data, only taxonomic units representing freshwater phytoplankton were used. After excluding marine species from our analysis, it was found that the number of genera identified by NGS is less than was identified by microscopy-72 and 70 genera determined by NGS vs. 107 and 88 genera revealed by microscopy for Lake Burabay and Lake Shchuchie, respectively (Figs. 4, 5). A similar situation arose with species-level data when marine and unidentified species were excluded from the analysis (Fig. 5).
The similarity between eukaryotic phytoplankton of the two lakes measured on the basis of data obtained by NGS is higher at the genera level, whereas similarity based on microscopy analysis is higher at the species level. The microscopy survey indicated 66 shared genera and 76 shared species between communities (Jaccard similarity is 0.31 and 0.51, respectively), while NGS detected 42 shared genera and 25 shared species (Jaccard similarity is 0.40 and 0.28, respectively) and thereby found more unique species for each lake.
Prokaryotic diversity in phytoplankton was higher on genera level based on 16S rRNA analysis than was found through microscopy (Fig. 6). A considerable part of all identified OTUs represents strains so that they cannot be directly compared with the species identified by microscopy. Furthermore, more than half of all identified strains correspond to unicellular picocyanobacteria (PCy) (Suppl. Fig. S5). Interestingly, the majority of these strains are freshwater. Overall, NGS was found to be more sensitive to the detection of planktonic

Discussion
The recent decline in morphological taxonomic studies 73 makes a quantitative approach invaluable, coupled with traditional analysis by an expert taxonomist. Morphological optical methods are likely to miss the rare (because the size of the sample is limited) or unclassifiable species that contribute to lake diversity and may overestimate the biodiversity and richness of phytoplankton, helping identify different phenotypes and transitional forms as separate species. The extent of taxonomic coverage by reference database is also important 74 . The NGS approach implemented in this paper identified the lack of sequenced freshwater taxa in currently available databases (SILVA, Greengenes etc.). The incompleteness of reference databases is a challenging issue that hampers the identification and assignment of sequences. This results in a large number of OTUs that cannot be taxonomically classified to the species ranks or even stays unclassified depending on the taxonomic group or region of study 75 . In contrast to a morphological approach, the number of genera from Chlorophyta and Bacillariophyta (which are the most diverse eukaryotic groups in phytoplankton in Lakes Shchuchie and Burabay) identified by NGS is 25-50% less (Fig. 4), suggesting that microscopy is a more efficient method for their detection 76 .
On the other hand, NGS allowed us to find more genera and species of dinoflagellates, cryptomonads, and other phytoflagellates (i.a. haptophytes, dictyochophyceans), which were not identified via microscopy. Nevertheless, the number of shared units on species level is more than two times less than on genera level (Fig. 5), which may be related to a lack of information on freshwater phytoplankton in the reference database (SILVA). It was previously shown by Bazin et al. 76 that approaches based on 18S rRNA gene clone libraries using universal primers are biased toward heterotrophic organisms, and a microscopy approach is necessary to reveal the real diversity among phototrophic taxa.
Thus, the taxonomic resolution of the NGS approach (using 16S and 18S rRNA primers) we employed was not able to reliably provide species-level identification. NGS analysis, however, identified phylum-like patterns presented in July-September. It seems that NGS-based taxonomy can be used at the genus level for freshwater phytoplankton communities and may hamper the detection of subtle ecological effects. Further studies employing traditional and NGS approaches in parallel are required to increase the quantity and quality of algal databases, and expand possibilities for the functional analysis of phytoplankton assemblages. Ideally, species should be defined using an integrative approach, including morphology, genetics, behavior, ecology, and geography [77][78][79] . Currently, most species descriptions rely on phenotype features. However, traditional phenotypically based taxonomy is challenged by molecular findings that provides greater taxonomic resolution than morphology [80][81][82] .
Notably, the NGS approach is favorable in detecting of picoplankton species/strains, which cannot be found and/or identified by routine microscopy. This is particularly true for picocyanobacteria, especially for unicellular PCy strains due to their small cell sizes and phenotypical plasticity that make them almost indistinguishable www.nature.com/scientificreports/ morphologically 83 . Nevertheless, the NGS approach gives us less cumulative information about species richness and phytoplankton abundance in comparison to microscopy for the present. Though FCM is another approach to be considered for picocyanobacterial analysis, there are technological limitations that may not allow FCM to isolate all species described by traditional microscopic analysis 84 . We used FCM as an ancillary instrument supporting selective microscopic findings.

Conclusions
In summary, if we compare optical methods and DNA-based methods, DNA-based analysis may help to analyze samples at different taxonomic levels and discriminate overlooked cryptic and rare species. The advantages of optical methods are relatively low cost of equipment and a direct description of phytoplankton that cannot be replaced by DNA-technologies. It makes light microscopy still a primary method in the study of phytoplankton. However, an integrative approach of both DNA-based and morphological methods has rarely been used, but as demonstrated here may provide deeper insights into the structure of phytoplankton communities, in particular, picophytoplankton. Due to the growing use of new generation-sequencing methods, a larger amount of genomic data can be expected from the phytoplankton research though our knowledge of the phytoplankton metabolome continued to be incomplete. Combined evaluation, results from traditional and modern techniques and monitoring will be the foremost practice in future phytogeographic research.