Background & Summary

In water bodies, phytoplankton communities are key elements of ecosystems, regulating biogeochemical cycles1 and representing one of the most important photoautotrophic primary producer2,3. Because of their rapid responses to water quality and anthropogenic pressures, phytoplankton communities are commonly used as trophic and ecological indicators4,5 in Europe and in the United States6. Therefore, they are mainly studied using a classical taxonomic-based approach, i.e. abundance and biovolume of species. Taxonomic lists are used as a whole, with the sum of individual biovolume employed for total biomass metric assessment7,8, and at the species level for selecting indicator species. This selection is usually based upon the most common species of the phytoplankton community, for which information and ecological profiles can be established9. Therefore, the assessment is only based on a part of the community. Rare species are not considered in such cases although they can be very informative10 (e.g. new invasive species with very specific response to a trophic or a polluted state). Data can be aggregated at higher taxonomic level (family or phylum) to take into account all available taxa7,11. However, different taxa of the same phylum can indicate a very different ecological status while species from different phylum can reflect very similar ecological conditions3. Moreover, the ecological significance can be neutralized at the phylum level by species indicative of opposite conditions12.

During the last decades, approaches based on functional groups were created to overcome these drawbacks13,14,15 and allow evaluating other aspects of ecosystem quality, functioning and biotic interactions16,17. As outlined in Salmaso et al.18 many studies use functional classifications and/or functional groups with the objective to better investigate links between phytoplankton communities and the main environmental factors. Some classifications mentioned by Salmaso et al.18 are based on one characteristic (size classes) or several (morphologically based classifications) while others use functional properties (functional or morpho-functional groups). Indices based on functional groups were also published for lakes19 and for rivers20 to be used for the implementation of Water Framework Directive, WFD21. Nevertheless, as explained by Padisák et al.22 functional groups were not built using all encountered taxa. Ecologists have to adjust their groups manually or through an R code for the Reynolds Functional Groups23. These additions, may not be adequate and lead to misinterpretations. According to Wentzky et al.24, the use of functional-traits at the species level (and not at higher taxonomical level) can limit these issues without influencing any assumption about taxon functional group.

Thus, data collections of phytoplankton traits at the species level are required for a large number of taxa. Accordingly, different datasets were created considering the taxa size. Olenina et al.25 reported size-classes (PEG biovolume, http://ices.dk/data/vocabularies/Documents/Forms/AllItems.aspx) but mainly for marine waters. Kremer et al.26 established a dataset for freshwater and estuaries for a large number of taxa (>1200 species), which had the advantage that it considered cell size and natural unit size (colonial and filamentous growth forms). At the French level, Druart and Rimet created a trait dataset including only biovolumes27 and published, more recently, a phytoplankton trait dataset28 mainly based on Alpine lake surveys and including around 10 compiled traits.

To our knowledge, specific traits such as lorica, protuberance presence (and kind of protuberance), potential toxicity associated with potential toxin production by cyanobacteria, heterocyte and vacuole presence have never been merged together in one trait database. More generally, to our knowledge, no database covers a large number of ecological and morphological traits for phytoplankton communities at the French or European level. To be able to use many morpho-functional traits for water bodies in ecological analyses in France, the construction of such a dataset was initiated. This was based upon the survey database established for the WFD application on French water bodies since 2005. This dataset will be particularly helpful for studying the relationships between functional traits and environmental variables, as well as to develop new trait-based indicators for water quality assessment.

We aim in this paper to (1) compile morpho-functional traits of phytoplankton taxa in French water bodies (metropolitan territory) (2) offer information of the corresponding phylogeny (3) provide a dataset helpful for analyzing functional and morpho-functional phytoplankton assemblages from local to national scale (4) deliver a dataset allowing consistent data comparisons between European countries.

Methods

Site description

This work was based on the French national database consisting of phytoplankton communities surveyed in 384 lakes between 2005 and 2016 and used for applying the Water Framework Directive. As each lake has been sampled several times this database contains 2,987 phytoplankton samples. Sampled lakes cover the whole French territory (Fig. 1) and encompass all natural and artificial types of lakes encountered in France: from lowlands to alpine location, from small to large lakes (0.09 to 577.12 km2), from shallow to deep lakes (mean depth from 0.30 to 153 meters) and also from oligotrophic to hypereutrophic environmental conditions.

Fig. 1
figure 1

Location of the 384 lakes covered by our study; each black dot represents a lake sampled between 2005 and 2016.

Data acquisition and compilation

In order to create our dataset we firstly defined the taxa list recorded in the national database. The extracted list was firstly examined for taxonomic consistency. Taxa only determined at a high taxonomic level (order, class, phylum) were rejected from the list. Thus, genus and below this level (species, variety, form, etc.) were conserved to obtain an accurate taxonomical list for morpho-functional traits (e.g. biovolume, cell form). A taxa name in the national database can be different over time because of taxonomic changes and determination by different operators. To harmonize taxa denomination and to obtain one up-to-date taxa list, the original names used in the lake database were corrected following the most recent research reported on the AlgaeBase website (https://www.algaebase.org/ - last consultation on 31st July 2020) and in scientific papers. This will allow the traits table to be used for future studies. Thus, a list of 1,318 taxa, associated with abundance and biovolume data, was obtained from the database. In order to remove idiosyncratic species and any potential determination errors, we only kept taxa which occurred in more than 15 samples. A final taxa list of 636 taxa was obtained.

Secondly, we selected a list of 53 morpho-functional traits that we considered to be the most relevant for the description/analyses of functional phytoplankton communities (see Online-only Table 1 for detail of these traits). We compiled available trait data from published sources: we used general books (John et al.29; Reynolds3; Wehr et al.30) for all taxa and specific references listed in Table 1 for more precise information of some traits or taxa. The traits not available in the literature (biovolume, size class, carbon contents, etc.) were determined/calculated. All available synonyms of the taxa recorded in the literature were taken into account in order to consider morpho-functional traits taken from as a many taxonomic references as possible. Taxonomic information was essentially obtained from the AlgaeBase website.

Table 1 Specific books and references used for some traits and taxa for the different phyla.

From these publications we mainly obtain morphological characteristics such as cell form, presence of ornamentation on cell and number of chloroplasts. If the information was divergent between two references, the most recent is accepted.

To facilitate the use of the dataset in statistical analyses, taxa code (6 letters in upper case), used in the free computer software Phytobs31 (commonly used in France for freshwater phytoplankton counting), was added for each taxon. These codes were constructed with the first 3 letters for genus and the last 3 for species and infra specific denomination. In order to have a global clustering of phytoplankton taxa (sometimes useful for result presentations), more or less independent of phylogeny updates and thus stable in time, we included a phytoplankton group name. We classified phytoplankton taxa in 9 groups based on common clustering of algae and cyanobacteria such as chlorophytes (green-algae), diatoms, xanthophytes (brown-algae), chrysophytes (gold-algae).

The trait life form was added to the dataset and is defined as the common living form of the taxon in its natural environment (cell, colony or filament). Because some taxa could exhibit a second life form in nature, this information was also indicated in the dataset.

We determined the cell and individual form traits for each taxon with the use of 14 simple geometric forms: sphere, cylinder, rotational ellipsoid, flattened ellipsoid, cymbelloid, staurastrum and ceratium form, oval cylinder, double cone, parallelepiped, prism on parallelogram base and on triangulare base, cone with half sphere and tetrahedron as defined in the computer tool Phytobs31 and illustrated in Fig. 2. Choices were made with the help of book taxa illustrations and according to Hillebrand et al.32, Olenina et al.25, Druart and Rimet27, Hutorowicz33 and Padisak and Adrian34.

Fig. 2
figure 2

Illustration from Phytobs31 of the 14 cell and individual forms used for the biovolume calculations. (a)sphere, (b) cymbelloid, (c) cylinder, (d) oval cylinder, (e) rotational ellipsoid, (f) flattened ellipsoid, (g) ceratium and (h) staurastrum form, (i) double cone, (j) parallelepiped, (k) prism on parallelogram base and on (l) triangulare base, (m) cone with half sphere and (n) tetrahedron.

The minimum and maximum cell dimensions were taken from the literature and were used for the definition of the size class (see below).

We determined taxa biovolume following the European standard NF EN 1669535 with the use of the software Phytobs31 for the calculation (this counting software provides also a biovolume calculation tool). Thus, from real samples observed during the period of the study, 30 individuals of each taxon were measured, as recommended by the European standard, to calculate a mean cell biovolume. For taxa not observed in enough quantity, common minimum and maximum dimensions were taken from the literature in order to determine mean dimensions and calculate a corresponding mean cell biovolume.

We also specified the mean carbon content per cell in our dataset. We used the formulas described in Menden-Deuer and Lessard36 and reproduced in European standard NF EN 1669535 because they consider the decrease in specific carbon content with cell size. Thus, the general formula used for phytoplankton is Carbon [pg C] = 0.216 × Biovolume0.939 and the specific formula for diatom taxa is Carbon [pg C] = 0.288 × Biovolume0.811.

Three size class systems were also subsequently defined in the dataset. The first two use the Sieburth et al.37 class limits and are based on the cell dimensions. The third one uses the biovolume thresholds defined by Ignatiades38. For Sieburth et al. class limits, two variables were defined: one using the minimal individual length and the second using the maximal individual length. In each case, taxa were classified within the following classes: picophytoplankton (0.2–2 µm), nanophytoplankton (2–20 µm), microphytoplankton (20–200 µm), mesophytoplankton (0.2 mm-20 mm). The Ignatiades classification was calculated on individual biovolume: cell biovolume for unicellular taxa and individual biovolume for colonial and filamentous taxa. The classes defined by Ignatiades were: nanoplankton (10–1,000 µm3), microplankton (1,000–1,000,000 µm3), macroplankton (>1,000,000 µm3) plus a specific class, picoplankton (<10 µm3), used for the very small taxa.

Different binary traits (1-presence; 0-absence) were added: motility, presence of flagella, aerotopes, contractile vacuoles, mucilage, akinetes, heterocytes, cysts, plasts, sheath, tractus, siliceous skeleton, lorica, external plates, scales, ornamentations, protuberances, chlorophyll-b and c, xanthophyll, phycobilin and potential toxin production39,40,41,42,43 (with toxin family specified: microcystin, anatoxin-a, anatoxin-a(S), cylindrospermopsin, saxitoxin, Beta-Methylamino-L-Alanin). These characteristics were mainly extracted from the literature. The precise number of plastids and flagella were also added. Some distinctions were defined for the protuberance type, their size and their number onto the individuals. The protuberances were split into seven different types including granule, needle or bristle and the size was divided in two size classes (large and small).

The main nutrition mode was indicated for each taxon following the common definitions: autotroph, mixotroph and heterotroph44,45.

The reproduction mode was specified more precisely when mentioned in the literature or just indicated as sexual or asexual when not.

Some taxa are known only from freshwater and others also from the marine environment. Thus, this information was indicated as a Water_Type in the trait dataset. The trophic status of the water where the taxon lives was defined under “Water_Trophy” as from oligotrophy to hypereutrophy. In the case of a taxon with a large trophic range, the main trophic status was indicated.

In some literature and mainly in Reynolds3, the tolerance to abiotic parameters are defined. We also indicated this trait as a tolerance trait in the dataset (literal form).

We finally reported in the dataset the Reynolds functional groups, based on the original description of Reynolds3 and developed further (with more taxa) by in Borics et al.20 and Padisak et al.19,22.

Data Records

The phytoplankton morpho-functional traits dataset46 is in the form of a large table with taxa in rows and their corresponding morpho-functional traits in columns. This trait table is contained in a formatted file with semicolon-separated values (*.csv) and stored in a public repository available on https://data.inrae.fr/dataverse/eabx. with the following DOI: https://doi.org/10.15454/GJGIAH. The name of the file was defined as FRENCH_PHYTOPLANKTON_TRAITS.csv. This trait table is designed to be updated with new data especially from French lakes (new surveys and new taxa occurrences) and/or new morpho-functional traits relevant for the study of phytoplankton communities.

In the table some variables are numerical and continuous such as biovolumes or cell dimensions, other are categorical such as life forms and other boolean as toxin production. Consequently, the table is a mixed format with numeric and text. The different variables are listed with their attributes in Online-only Table 1 to facilitate understanding and reuse of the data. Missing information was labelled ‘#NA’.

In the dataset, the chlorophytes group is the most represented with 36.5% of the total (Fig. 3) and cryptophytes is the least (2.2%). Other groups like cyanobacteria, diatoms and chrysophytes represent between 10 to 20% of the dataset taxa. The individual life form of each taxon listed is mainly represented by the cell form (56.8%), followed by the colonial form (32.9%) and, finally, the filamentous form of around 10% (Fig. 4).

Fig. 3
figure 3

Proportion of the dataset taxa in the different phytoplankton groups.

Fig. 4
figure 4

Proportion of the dataset in the different life forms. These represent the usual individual living form of the taxa (cell, colony or filament).

Technical Validation

The technical validation of the data stored in the dataset was undertaken following the scheme outlined in Fig. 5. All the information collected from the literature was double-checked and all sources are provided in the methods section above. All measures (biovolume, carbon content, size classes attribution, etc.) were made by the same individual in order to reduce error.

Fig. 5
figure 5

Scheme for the technical validation of the dataset. Extraction of the taxa list from the national phytoplankton database, simplification with the synonymy stored in Phytobs software. Implementation of the traits by 3 different operators to obtain a beta version of the dataset, then double-check step and implementation of the calculated traits to obtain a first version of the dataset. Quality control of the data by the referent scientist through the use of graphs and cross-checks to obtain the final version of the dataset.

After that, all data included in the traits table were controlled by the same person (C.L.T.), an expert on French phytoplankton. The dataset thus created was explored through preliminary analyses with box-plots and histograms in order to check distribution and to highlight erroneous data (outliers, aberrant text, missing data, etc.). Many cross-checks were undertaken in order to highlight missing and inconsistent data (C.L.T.):

- motility set to “1” when aerotope, contractile vacuole or flagellum are present;

- number of flagella indicated with presence of flagellum;

- absence of flagellum for desmids, diatoms and cyanobacteria;

- compulsory flagellum for cryptophytes;

- number of plastids indicated with the presence of plastids;

- absence of plastids for cyanobacteria taxa;

- presence of akinete and heterocyte only for cyanobacteria taxa;

- compulsory cyst for chrysophytes and dinophytes taxa;

- siliceous skeleton only for diatoms;

- presence of external plates only for dinophytes;

- presence of lorica for chrysophytes and few chlorophytes and euglenophytes;

- protuberance type other than “None” and protuberance size not null with presence of protuberance;

- presence of chlorophyll-b and xanthophyll and also absence of chlorophyll-c and phycobilin for chlorophytes (equivalent cross-controls for the other phytoplankton groups were done);

- presence of toxin only for some cyanobacteria;

- the toxin family indicated with presence of toxin.

Usage Notes

The phytoplankton trait dataset is freely available and stored in https://data.inrae.fr/dataverse/eabx public repository and should be appropriately referenced by citing the present paper. A survey taxa list should be first checked for synonymy before being analyzed with the morpho-functional dataset. This will avoid the mismatch of taxa names. We strongly encourage users to use the synonymy database present in the Phytobs software.

Note that many traits were obtained from literature indicating general characteristics of species or genera and do not represent individual responses to environmental conditions. Some of them can be affected by different environmental conditions. For example, traits based on dimensions (min, max length and width), biovolume, carbon content and in some cases size-classes can be directly influenced by environmental conditions36,47,48,49 as well as the presence of heterocyte and akinete for some cyanobacteria taxa50.