Invasion of microorganisms to a new ecosystem usually becomes noticeable only after crucial ecosystem services have been jeopardized [1]. However, the invasion of species capable of toxin production to new areas demands vigilant and proactive surveillance. The invasive Raphidiopsis raciborskii is an example of a toxin-producing, nitrogen-fixing, and bloom-forming filamentous cyanobacterium [2]. Strains of R. raciborskii differ in their ability to produce (cyano)toxins (cylindrospermopsin and saxitoxin), known to affect cattle, wild animals and humans, as well as many ecosystem services such as drinking and recreational water resources [3,4,5]. R. raciborskii, a species of tropical origin, is currently expanding its range across Europe’s freshwater ecosystems [6,7,8]. As early detection of an invasive species is requisite for implementation of efficient management actions, identifying areas at risk of invasion is therefore of high priority [2]. For this purpose, predictive models are useful for assessing the suitable habitats for colonization. Species distribution models (SDMs) predictions based on bioclimatic factors can be used to complement patchy species distributions derived from sporadic samplings and occasional reports of presence/absence of a target species [9]. However, SDMs results should mainly be considered as early warnings, underpinning monitoring efforts rather than proof of presence/invasion [10]. SDMs are statistical procedures that link occurrence records of a species to environmental variables to estimate spatial distribution patterns using a correlative approach [11, 12], however, successful colonization also requires the dispersal and establishment of invasive species in the new ecosystem [13, 14]. That is why most such modeling efforts face the same argument of whether their predictions have been empirically supported bringing another challenge regarding the reliability in the early detection of invasive species in areas at high risk of being invaded.

In a previous study [15], we based the SDMs on published observations of R. raciborskii and environmental predictors obtained from climatic models to visualize and predict potential new habitats for R. raciborskii in Europe. While this species has not been reported in Sweden, our SDMs prediction revealed potential areas for range expansion in the southern and central regions of Sweden [15]. Here, we integrate field-based surveys in Sweden and in-silico screening of environmental DNA from lakes across Europe to validate the SDMs prediction and highlight challenges in supplying such empirical proofs.

To provide empirical proof for the potential expansion of R. raciborskii to Sweden, we selected a number of eutrophic shallow lakes, sampled in late summer, with high (>0.5) and low (<0.5) predicted probability of presence (Table 1) and performed microscopic and molecular surveys (Supplementary Information S.1., for detailed sampling methodology). Water and sediment samples were used for DNA extraction and the rpoC1 gene [16] was targeted and amplified with R. raciborskii specific primers cyl2/cyl4 [16] and cyl4F/cyl4R [17] (Supplementary Information S.1.3). The specific primers were tested on a European strain of R. raciborskii (NIVA-CYA 399, Norwegian Culture Collection of Algae) isolated from Lake Balaton (Hungary), this strain was also used as a positive control. The products of the species-specific polymerase chain reactions were separated by electrophoresis on 1.5% agarose gel and visualized under UV illumination. None of these 11 lakes resulted in the amplification of the target region suggesting the absence of R. raciborskii. However, since molecular methods could suffer from a limited detection range, the sensitivity and the detection limits of the method were evaluated. Accordingly, a total of 50, 100, and 500 filaments of the reference culture of R. raciborskii (NIVA-CYA 399) were picked using an inverted light microscope, and the same procedure used for field samples was followed. While the cyl4F/cyl4R returned a band for all three reactions, the cyl2/cyl4 primer was only able to return a band for the reactions with 100 and 500 filaments (Supplementary Fig. S1). This highlights the partial limitation of this molecular method in detecting the presence of this invasive species, especially during early stages of invasion when population densities are likely low. Using other molecular methods such as duplex digital PCR (dPCR) is reported to improve the detection limit [18]; however, requirements of such methods might not be as widely accessible as PCR. The negative results of molecular analyses were corroborated by the lack of microscopic identification of R. raciborskii in the samples (Supplementary Information, S.1.2). To complement the field study, in-silico screening of environmental DNA using publicly available lake metagenomes was also performed. A total of 153 metagenomic datasets from 50 lakes across Europe were selected from publicly available datasets stored in the National Center for Biotechnology Information (https://www.ncbi.nlm.nih.gov/) (Table 1). The 16 S rRNA reads were extracted from these metagenomes using SSU-align tool [19] and their taxonomy was assigned using BLAST [20] against Silva SSU 138.1 [21] (Supplementary Information, S.1.4). The probability of occurrence of each site based on the SDMs prediction covered probabilities from 0.055 to 0.846 (Table 1) with a median of 0.276 indicating that metagenome availability and selection was slightly biased towards sites which may not favor R. raciborskii settlement/survival. Only 5 out of 153 screened metagenomes contained reads matching the R. raciborskii 16 S rRNA sequence (Table 1 and Fig. 1). The lakes from which these 5 metagenomes originate are situated in areas with high probability of occurrence in three cases (0.537 to 0.825) and lower probability (0.059 and 0.319) in two cases. The low number of reads matching the R. raciborskii 16 S rRNA sequence makes it difficult to define a threshold in interpreting the SDMs prediction. In addition, lower abundances in the early stages of invasion poses limitations for in-silico methods in general and specifically for R. raciborskii since cyanobacteria are usually underrepresented in metagenomic datasets. Additionally, timing and frequency of sampling will also affect the efficiency of early detection methods as seen in the case of Rimov reservoir (predicted probability of 0.319), where only one of 38 metagenomes had a positive match (Table 1).

Table 1 Detected presence (+) and absence (−) of the invasive cyanobacterium Raphidiopsis raciborskii in European lakes using field-based or in-silico screening methods.
Fig. 1: Detection of Raphidiopsis raciborskii in screened samples and metagenomes.
figure 1

Detected presence (+) and absence (−) of the invasive cyanobacterium Raphidiopsis raciborskii in freshwater lakes and reservoirs across Europe based on field (only Sweden) and in-silico screening of environmental DNA using publicly available metagenomic datasets.

While SDMs are valuable tools for predicting potential invasion sites and to guide management efforts, many uncertainties remain. One of the most important limitations when constructing the SDMs was the general lack of relevant environmental variables for predicting the range expansion of the invasive species. Reports of presence are not usually accompanied by detailed environmental metadata, such as temperature and nutrients, that are known to be important for phytoplankton [9], and knowledge of interactions with native species in invaded areas is largely lacking. This suggests that frequent monitoring and open access to additional biotic and abiotic data connected to the presence of the target species in already invaded areas are necessary for developing high grid resolution and more accurate models to predict the likelihood of invasion into new aquatic environments.