Can information from citizen science data be used to predict biodiversity in stormwater ponds?

Citizen science data (CSD) have the potential to be a powerful scientific approach to assess, monitor and predict biodiversity. Here, we ask whether CSD could be used to predict biodiversity of recently constructed man-made habitats. Biodiversity data on adult dragonfly abundance from all kinds of aquatic habitats collected by citizen scientists (volunteers) were retrieved from the Swedish Species Observation System and were compared with dragonfly abundance in man-made stormwater ponds. The abundance data of dragonflies in the stormwater ponds were collected with a scientific, standardized design. Our results showed that the citizen science datasets differed significantly from datasets collected scientifically in stormwater ponds. Hence, we could not predict biodiversity in stormwater ponds from the data collected by citizen scientists. Using CSD from past versus recent years or from small versus large areas surrounding the stormwater ponds did not change the outcome of our tests. However, we found that biodiversity patterns obtained with CSD were similar to those from stormwater ponds when we restricted our analyses to rare species. We also found a higher beta diversity for the CSD compared to the stormwater dataset. Our results suggest that if CSD are to be used for estimating or predicting biodiversity, we need to develop methods that take into account or correct for the under-reporting of common species in CSD.

predict biodiversity in the ponds. Here, we investigate whether CSD from records of aquatic insects from the Swedish Species Observation System (www.artportalen.se) can be used to predict the biodiversity of insects in recently constructed stormwater ponds. We used adult dragonflies (Odonata: Zygoptera and Anisoptera) as our focal group of insect biodiversity. Dragonflies are intermediate consumers both in the aquatic and the terrestrial life stage, i.e., they are predators of smaller invertebrate prey and are preyed upon by larger invertebrate and vertebrate predators 16 . Because they have an intermediate position in the food web, they should represent overall patterns in aquatic biodiversity relatively well 16 . In addition, Odonata species richness is positively correlated with species richness of many, but not all, invertebrate taxa and vegetation abundance 17,18 .
To examine whether biodiversity patterns in stormwater ponds could be predicted from CSD, we used three approaches. First, we examined the similarity between biodiversity datasets (i.e., between CSD and those obtained in urban stormwater ponds) from past to recent years. Because Odonata have a short generation time (usually 1-2 years) and because they are good dispersers 16 , we expected that community patterns generated with data from recent years would be better predictors of community structure in stormwater ponds. Second, we examined how the inclusion of CSD covering different areas around the stormwater ponds predicted the Odonata diversity in the ponds. We expected that citizen science datasets covering an area with a larger diameter around our study area would be a better predictor since a large area covers more habitats. We also considered an alternative hypothesis, where predictability decreased with area, due to the distance decay of similarity in ecological communities, see e.g. Nekola & White 19 . Third, we examined whether including common or rare species affected the predictions. We expected that community patterns based on data from rare species would be more similar to the patterns based on our dataset (stormwater ponds) because these species are more actively sought by citizen scientists than common species 3,20,21 .

Methods
Stormwater pond data. Biodiversity data of dragonflies in 18 stormwater ponds were obtained in the city of Uppsala, Sweden in 2018. These man-made ponds were recently constructed (i.e. between 2004 and 2014). The city of Uppsala has 150 000 inhabitants and covers an area of 26 km², and all stormwater ponds that are filled with water all year round were used for this study ( Fig. 1; for more details on methods and pond description, see Johansson et al. 22 ). Dragonfly abundance was recorded every second week over a 10-week period by two trained researchers (P.C and J.W.), who walked one lap slowly around the ponds from 29 May until 5 August 2018. This period covers the emergence period of all species found at this latitude 23 . Most species were identified visually. However, for some species, identifications were made after capture by a butterfly net. The speed of walking was adjusted with respect to vegetation of the pond and by the abundance of dragonflies, such that the speed was slow at ponds with a lot of vegetation and a high abundance of dragonflies. Total numbers of adults (including mating pairs and ovipositing individuals) were counted and used for the subsequent analyses. No counts were done during cloudy, windy (>30 km/h) or rainy days and, therefore, the biweekly counts were shifted 1-2 days on two of the sampling occasions. For the analyses, the week with the highest number of individuals was used for each species. These ponds and a modified data set was used in a recent study by Johansson et al. 22 and therefore pond description and some of the methods overlap slightly with the information given in that study 22 . citizen science data (cSD). We used the Swedish Species Observation System to extract records of dragonfly abundance based on CSD. These observations were collected by citizens in a non-standardized way, meaning that data were gathered opportunistically without standardized methods and controlling for sampling effort 24 . These observations included all water bodies in the study area, and thus CSD habitats might represent freshwater systems with a wider range of habitat characteristics. Unfortunately, the CSD did not have enough replicates of stormwater ponds, which would allow to control for other confounding factors (e.g., type of water body). However, we included only species that were recorded in our stormwater pond survey. Hence, species that did not occur in the stormwater ponds were excluded from the CSD set. When the number of individuals was available in the database, we used this number as our estimate, and when the record of a species only mentioned "observed", we gave this record an abundance of 1. Two hundred forty-eight (248) localities were surveyed by citizens during 8 years. Furthermore, two species, Calopteryx virgo and C. splendens, are predominantly lotic specialists and typically do not occur in standing waters, such as stormwater ponds. They were therefore removed from the analyses. These species occurred in low numbers at the stormwater ponds (a total of 5 and 2 individuals for C. virgo and C. splendens, respectively).
To evaluate whether CSD could be used to predict pond biodiversity, we extracted data covering different diameters around the center of the stormwater ponds in the city (Fig. 2). The center of the study area (59°50′51.3″N; 17°39′3.4″E) was estimated approximately as the centroid of the coordinates of the stormwater ponds. Data records from the database were extracted on a yearly basis from 2010 until 2017 to examine whether more recent CSD performed better in predicting biodiversity in the stormwater ponds. In addition, we also examined whether citizen data records covering a larger area around the city center improved the prediction of stormwater pond biodiversity. We did this by extracting and comparing data covering a diameter of 10, 20 or 30 km around the center of the stormwater ponds in the city (Fig. 2). This analysis was performed for the year 2017. 25 ) to investigate the differences in community structure between the stormwater pond (sampled by us in 2018) and the citizen datasets in terms of dragonfly species abundance and composition. CAP is a constrained ordination method and, therefore, it uses an a priori hypothesis to produce an ordination plot. This hypothesis can then be tested using a generalized discriminant analysis based on distances 26 . Our a priori hypothesis was represented by a categorical (explanatory) variable with levels representing the type of data (stormwater in 2018 and citizen datasets). CAP was based on the Bray-Curtis dissimilarity index on dragonfly abundance data. The resulting dissimilarity matrix was also used in a distance-based test for homogeneity of multivariate dispersions (PERMDISP; www.nature.com/scientificreports www.nature.com/scientificreports/ see Anderson, 27 ; Anderson et al. 28 ). This was done to test whether beta diversity values differed according to the type of the data. Since our main interest was focused on comparing the data obtained in the 2018 stormwater ponds with the data obtained by citizens, we carried out a set of planned comparisons. First, we repeated CAP to compare the 2018 dataset to each year separately (from 2010 to 2017) considering sites (surveyed by citizens) that were within a diameter of 10 km around the center of the stormwater ponds in the city. Second, we used CAP to compare the 2018 dataset with sites within diameters of 10 km, 20 km and 30 km around the center of the stormwater ponds. This analysis was restricted to the year 2017. Third, to evaluate the likely effect of a biased search for rarer species in the CSD, we also divided the dataset into two parts. The first part included the 16 most abundant species in our stormwater dataset (see Fig. S1), whereas the second part included the 11 rarer species. This splitting of the dataset was based on the location of an inflection point exhibited in a Whittaker plot (rank-abundance curve) and on an attempt to balance the number of rare and common species in the analyses. Thereafter, we ran independent analyses using these datasets. All analyses were carried out using functions (vegdist, capscale and betadisper) available in the vegan package 29 . Significance tests were based on 999 permutations.

Results
Twenty-nine species of Odonata were found in the 18 studied stormwater ponds (Table 1), but only 27 were included in the main analyses since the two lotic Calopteryx species were excluded. The total number of species represents 61% of the Odonata species recorded in the province of Uppland in Sweden. The average species richness in the stormwater ponds was 10, with a range between 3 to 19. The most common species were Libellula  Table 1). We found a strong negative relationship between the difference in occupancy given by the datasets (average of the species frequencies of occurrence over the years in the CSD -frequency of occurrence in the stormwater ponds) and the mean abundance in the stormwater ponds ( Fig. 3; r = −0.88; P < 0.0001 with all species and r = −0.87; P < 0.0001 after removing C. virgo and C. splendens). This result indicates a bias against abundant species in the CSD.
We found a significant difference between the citizen dataset (considering different years, 2010-2017) and the stormwater pond dataset from 2018 ( Fig. 4; F = 1.97; P < 0.001), and the community structure from the more recent years of CSD-set were not more similar to that of the stormwater ponds community structure (Fig. 4). In addition, variations in community structure, i.e. beta diversity, as given by the citizen science datasets were much higher than that given by the 2018 stormwater pond dataset ( Fig. 5; F = 6.77; P < 0.001). We also found that the www.nature.com/scientificreports www.nature.com/scientificreports/ differences among the datasets remained independently of the diameter (10, 20 and 30 km) used to extract the CSD (F = 2.61; P = 0.001). Running these analyses after including data on lotic species did not change the results qualitatively (results not shown).
After splitting the dataset according to species abundance (see Fig. S1), we found that the differences between citizen science datasets and the 2018 dataset were remained only when common species were considered in the analyses ( Table 2; Fig. 6). In contrast, we did not find any significant difference in community structure (between CSD and 2018 dataset) when rare species were considered (Table 2; Fig. 6). The same pattern was recorded when the comparisons were based on the years 2018 and 2017 for different diameters: the differences among the datasets were significant when the analysis were done using all species (F = 2.61; P = 0.001) or common species (F = 2.75; P = 0.001), but not when the analysis was based on rare species data (F = 0.84; P = 0.77; Fig. S2).

Discussion
Colonization of new man-made habitats is an important process that may counteract biodiversity loss 30 . Stormwater ponds is one category of such new man-made habitats and, therefore, it is important to examine whether knowledge on the biodiversity in the area surrounding the ponds based on citizen science data (CSD) can be used to predict the biodiversity in these ponds. Our results suggest that the diversity of rare species in these new man-made habitats could be predicted from the CSD. However, the results also suggest that CSD cannot be used to portray the overall biodiversity of dragonflies in stormwater ponds. There could be several reasons underlying this result.
First, there is a tendency that opportunistic CSD are biased towards over-reporting rare species and under-reporting more common species 20,21 . We found support for this bias because ordination scores obtained with our dataset and with CSD overlapped mainly when rare species were used in the analyses. Similarly, Snäll et al. 20 found that common bird species were not regularly reported in the Swedish Species Observation System, and we suggest that the same holds for dragonflies. For example, L. quadrimaculata and S. vulgatum were found in 94.4% of the ponds in 2018, but in the CSD set (radius 10 km) these two species were reported, in average, with much lower frequencies (8.8 % and 21.6 %, respectively). In contrast, species less abundant, in our dataset of stormwater ponds, were reported with a similar frequency in the CSD-set. Currently, methods are being      www.nature.com/scientificreports www.nature.com/scientificreports/ developed to correct for these biases 31 , and they should be used in future studies when predicting colonization of new man-made habitats. However, it is worth noting that citizen scientists may be especially interested in finding rare species with conservation interest 32 . Thus, we also found some support for this view, which suggests that even biased data may be beneficial for conservation purposes 24 .
Second, the stormwater ponds may not be comparable to the more natural habitats sampled in the CSD. For example, since the stormwater ponds are more recent habitats, they might be at early successional stages preferred by only certain dragonfly species. In contrast, CSD habitats might represent freshwater systems with a wider range  www.nature.com/scientificreports www.nature.com/scientificreports/ of habitat characteristics or at later successional stages, which are preferred by a different set of dragonfly species. However, studies have found that aquatic insect communities in newly-created ponds may reach community structure similar to that in the natural ponds within one or two years [33][34][35] . Thus, since the youngest stormwater  www.nature.com/scientificreports www.nature.com/scientificreports/ ponds were over four years old, we suggest that differences in successional stages were not the reason why we could not predict the biodiversity of stormwater ponds based on the CSD. In addition, studies using systematic protocols that were designed to compare the biodiversity of aquatic insects (including dragonflies) in stormwater ponds and more natural ponds have found similar results 11 .
Third, the CSD-set was collected from all kinds of aquatic habitats while the stormwater ponds probably represent a more similar range of aquatic habitats. If we had restricted our CSD-set to stormwater ponds, we might have found that CSD could predict the community composition in the stormwater ponds. Unfortunately, the CSD did not have enough replicates of stormwater ponds for such a comparison. Hence, we emphasize that our goal was to ask whether CSD collected from aquatic habitats could predict the community of Odonata in stormwater ponds.
We expected that the use of CSD collected from a larger diameter around our study area would make a better prediction, since a large area covers more habitats compared with more limited areas 36,37 . Conversely, one could expect a decrease in predictability with area, due to the distance decay of similarity in ecological communities 19,38 . However, we did not find support for any of these expectations because, independently of the diameter by which we based our comparison on, there was a significant difference in species composition between our stormwater pond data and the CSD. Thus, we did not observe that Odonata communities from aquatic habitats surveyed by volunteers, which were geographically closer to our study area, were more similar to the stormwater pond data. This finding suggests that under-reporting common species and over-reporting rare species still overrides the effect of area increase in our dataset.
We also found that variation in community structure (i.e., beta diversity) was significantly higher in the CSD compared to our stormwater pond data. We suggest that the main reason for this pattern is that stormwater ponds are more similar to each other than the water bodies from the CSD-set. The water bodies in the CSD included all kinds of freshwaters, from lentic to lotic and from temporary to large permanent lakes. Differences in biodiversity among different landscape or habitat types have also been found in previous studies on freshwater invertebrates. For example, urban and rural ponds may support different invertebrate communities (e.g. 39 ), forest lakes and bog lakes typically harbor distinct invertebrate communities (e.g. 40 ), and lotic and lentic ecosystems generally show contrasting invertebrate communities in terms of alpha and beta diversity (e.g. 41 ). In this sense, our CSD-set should show overall higher levels of biodiversity than our stormwater pond dataset. An alternative, but not mutually exclusive, explanation for the difference in beta diversity could be the under-representation of common species in the CSD, which thereby could inflate the estimates of beta-diversity in the CSD.
In summary, we were unable to predict patterns in dragonfly biodiversity in stormwater ponds based on data collected by citizen scientists. We suggest that the main reason for this result is that common species are under-reported and rare species are over-reported by citizen scientists. Similar problems with the under-reporting of common species have been found in studies estimating annual variation in birds, species richness in beetles, and spatio-temporal variation in beetle abundance 9,19,20,42 . There is thus a need for predictive models that take into account or correct for the under-reporting of common species in CSD, and such models should provide better predictions of population trends and colonization of man-made habitats by species.