Background & Summary

Trait-based approaches have become increasingly popular in community ecology1,2, including phytoplankton communities3,4,5,6, over the last few decades. Phytoplankton are a diverse group of microscopic organisms, accounting for approx. 40% of global primary productivity and are key contributors to the biogeochemical processes7,8. They provide an ideal model system for testing trait-based approaches, due to their relative simplicity and well-defined traits9,10. Phytoplankton morpho-functional traits affect the fitness and competitive success of individual cells, with cascading implications at the population, species and community levels11. Individual trait-based approaches provide the framework for linking individual responses to natural and anthropogenic pressures to community organization and ecosystem functioning12,13. Individual trait-based approaches have been applied in plant communities14,15 and more recently to plankton ones16,17,18. Here, we present an integrated individual-level trait-based phytoplankton dataset that combines six fully harmonized datasets related to transitional water ecosystems from the Northern Atlantic, South-Western Atlantic, South-Western Pacific, Indo Pacific Ocean and the Mediterranean Sea, sampled within the “Phytoplankton Bio-Imaging” project. All data are from a specific transitional water type, i.e., lagoon ecosystems, characterized by being micro-tidal, shallow and nutrient-rich depositional ecosystems19, determining morphometric traits adaptation of phytoplankton guilds when compared to the marine ones20. The lagoon ecosystems in each biogeographic area have been selected as relatively pristine ecosystems with low anthropogenic pressure; therefore, the integrated dataset as well as each individual biogeographic dataset can provide control/reference data for studying phytoplankton morphometric trait responses to anthropogenic pressure. The datasets have been harmonized and are openly accessible through the data portal of the Italian National distributed node (LifeWatch Italy) of the European e-Science Research Infrastructure LifeWatch ERIC. The integrated dataset presented here contributes to enhance the findable, accessible, and interoperable information21 on transitional water phytoplankton morpho-functional traits and complements the existing data resources on marine22,23 and freshwater24,25,26 phytoplankton.

Methods

Sampling and data collection

Phytoplankton samples were collected in a single sampling event that took place between July 2010 and November 2012 in 24 transitional water ecosystems distributed across five biogeographical regions: Northern Atlantic Ocean (NAO-Scotland), South-Western Atlantic Ocean (SWAO-Brazil), South-Western Pacific Ocean (SWPO-Australia), Indo Pacific Ocean (IPO-Maldives) and Mediterranean Sea (MED- Greece and Turkey) (Fig. 1). Sampling was carried out according to a hierarchical sampling design: for each ecoregion, three ecosystems were selected and within each of these, a maximum of three habitat categories were chosen and three experimental stations per habitat type were sampled with three replicates each, for a total count of 116 sites and approximately 350 water samples. Habitat types were classified on the basis of sediment granulometry and type of bottom vegetation27 according to the EUNIS habitat type hierarchical classification, version 201228.

Fig. 1
figure 1

Distribution map of the five biogeographical areas included in the dataset: South-Western Atlantic Ocean (SWAO) in brown, Northern Atlantic Ocean (NAO) in green, Mediterranean Sea (MED) in light blue, Indo Pacific Ocean (IPO) in purple and South-Western Pacific Ocean (SWPO) in pink. The red dots identify the phytoplankton sampling stations in each biogeographical area.

Phytoplankton samples were collected with horizontal tows from the subsurface (0.5 m) using a net mesh (6 µm) and fixed with Lugol’s solution (15 mL/L). This sampling technique is not a 100% quantitatively; however, the sampling procedures were standardised following same protocol in every sampling campaign. During the net sampling phase, the net was towed from the boat for a standard length of approximately 1.5–2 m, repeated three times back and forth, with each haul consisting of a linear measure of approximately 10 m. Phytoplankton taxonomic identification, cell abundances estimations and morphometric measurements were performed using an inverted microscope (Nikon T300E, Nikon Eclipse Ti) connected to a video-interactive image analysis system (L.U.C.I.A Version 4.8, Laboratory Imaging), following the Utermöhl method29 at 400x magnification. For each sample, a minimum of 400 cells were counted, measured and identified to the lowest taxonomic level possible, using specific manuals, monographs and phytoplankton Atlas30,31,32,33,34,35,36,37. The taxonomic validation was performed using the World Register of Marine Species (WoRMS)38 and Algae Base39.Where identification to species level was not possible the “Cf.” qualifier was used to indicate a specimen relevant to the species claimed and the numbered “sp.” was used to denote an organism relevant to the identified genus.

After taxonomic identification, cell volumes (expressed in μm3) were estimated according to the species/taxa specific shape association and using the geometric equations for simple and complex shapes recorded in the webservice “Atlas of Shapes” https://www.phytovre.lifewatchitaly.eu/vre/shapes-groups/. The geometric shape was attributed to the shape of the individual cell, even for coenobial, colonial and filamentous species where cells were not observable. The cell and shape views (e.g., lateral, frontal, etc.) with all the corresponding measured linear dimensions were reported in the datasets using alphabetical codes (e.g. length indicated by “a”, “l”, etc.; width indicated by alphabetical code “b”, “d”, etc.), together with information on the presence of internal and external cell structures (Table 1). Cell volumes were also reported in the datasets as “volume equivalent to sphere” and “volume equivalent to cylinder” and calculated using the Nikon image analysis system, based on cell contours and the application of a rod model using minimum and maximum Feret distances as linear dimensions40. Phytoplankton cellular carbon content (pg C) was obtained indirectly by converting cell biovolume to carbon using empirical or theoretically derived equations in accordance with Menden-Deuer and Lessard, 200041.

Table 1 Description of the dataset, according the Phytoplankton Data Template.

Data Records

The integrated dataset generated and analyzed for this study includes six datasets42,43,44,45,46,47 published in the LifeWatch Italy data portal https://dataportal.lifewatchitaly.eu/data with their respective DOIs (Table 2). The data were collected and harmonized according to the Phytoplankton Data Template https://www.phytovre.lifewatchitaly.eu/phyto-data-template/ which is based on the Darwin Core standards48 and the Phytoplankton Traits Thesaurus49. The datasets are formatted as column-oriented tables with data reported in semicolon_separated values format (.csv). The associated metadata are described using the Ecological Metadata Language50 (EML 2.2.0) standard in extensible markup language (.xml) format to ensure data understanding and long-term control. Each phytoplankton record is represented by an identifier (catalogNumber) associated with ancillary information (e.g. sampling locations, temporal and spatial information), phytoplankton taxonomic classification and morphological trait data (Table 1). Data variables are numeric and categorical and are expressed in text and numeric formats.

Table 2 List of datasets and respective DOIs.

In total 127311 phytoplankton cells, belonging to 306 taxa were counted, measured and taxonomically classified. Summarized information from each dataset is presented in Table 3. The highest abundance was recorded in South-Western Atlantic Ocean area (SWAO), while the lowest number of records was reported in the South-Western Pacific Ocean (SWPO) region. In terms of taxa richness and shape occurrence a rather similar trend occurred in all biogeographical areas, with Northern Atlantic Ocean (NAO) biogeographical area showing the highest diversity in terms of taxa composition and the Indo Pacific Ocean area (IPO) showing the highest diversity in terms of shape occurrence. The distribution of phytoplankton composition by phyla in each biogeographical area showed a noticeable predominance of Ochrophyta in all regions (Fig. 2) mainly represented by the genera Chaetoceros, Pseudo-nitzschia, Ceratoneis, Cyclotella, Thalassionema and Navicula, followed by the Myzozoa represented by 66 different taxa. Other phyla such us Chlorophyta, Cyanobacteria, Cryptophyta, Haptophyta, Euglenozoa, Charophyta, Bacillariophyta and others accounted for less than 10000 individuals in each phyla, for a total of 87 taxa. A total count of 35 different morphological shapes of phytoplankton were described in the integrated dataset. Prism on elliptic base was the most abundant shape in terms of data records in all the biogeographical areas examined (38073 records) except for the South-Western Atlantic Ocean (SWAO) region where cylinder, parallelepiped and prolate spheroid + 2 cylinders were the most dominant shapes (Fig. 3). Complex shapes such as cylinder + 2 cones and cone + half sphere were mainly found in Northern Atlantic Ocean (NAO) and Mediterranean Sea (MED) regions with more than 2000 phytoplankton individuals, while more than 1500 organisms with cubic and gomphonemoid shapes were recorded in Northern Atlantic Ocean (NAO) area. The category “others” which included a mix of 20 different simple and complex shapes, rarely contributed to the overall morphological distribution of phytoplankton, with less than 1000 individuals.

Table 3 Summary of total abundance, taxa richness and shape occurrence from five biogeographical areas.
Fig. 2
figure 2

Barplot showing the distribution of phytoplankton data records per phyla in each biogeographical area: South-Western Atlantic Ocean (SWAO) Northern Atlantic Ocean (NAO), Mediterranean Sea (MED), Indo Pacific Ocean (IPO) and South-Western Pacific Ocean (SWPO).

Fig. 3
figure 3

Barplot showing the number of phytoplankton data records and the number of shapes per biogeographical area: South-Western Atlantic Ocean (SWAO) in brown, Northern Atlantic Ocean (NAO) in green, Mediterranean Sea (MED) in light blue, Indo Pacific Ocean (IPO) in purple and South-Western Pacific Ocean (SWPO) in pink. The shape category “Others” refers to less representative complex shapes described according to the Atlas of Shapes: half ellipsoid + cone on elliptic base, prism on triangular base 1, cone, prism on elliptic base + 4 cones, ellipsoid + cone, 2 half ellipsoids, prism on elliptic base + parallelepiped, 2 half ellipsoids + prism on elliptic base, ellipsoid + 2 cones + cylinder, cymbelloid, pyramid on rectangular base, cylinder + 3 cones, half sphere, parallelepiped + 2 cylinders, half ellipsoid + 3 cones, parallelepiped + 6 half cylinders, truncated cone, truncated cone + truncated cone, prism on elliptic base + 2 parallelepipeds and sickle-shaped prism.

Technical Validation

Data curation and technical validation steps were carried out to ensure the accuracy of the data and metadata. During data collection, a standardized sampling protocol already used in previous studies51 and the same sampling design were followed throughout the entire sampling campaign to avoid bias and ensure replicability of the data and information. Secondly, all the samples were collected, identified and measured by a team of qualified researchers and taxonomists who ensured data quality by checking and validating the taxonomic and morphological classification of the phytoplankton and detecting format and nomenclature errors or missing and inconsistent data. Thirdly, taxonomical and morphological information contained in all datasets were checked and technically validated through the use of WoRMS and Algaebase repository and the web services “Atlas of Shapes” and “Trait computation” provided in the Virtual Research Environment “Phyto VRE” of LifeWatch Italy (Fig. 4). After all the curation and validation steps, the data were stored and preserved in the LifeWatch Italy data portal, making them findable, accessible, interoperable and reusable. Finally, the integrated dataset includes information that have already been published in peer-reviewed scientific journals52,53,54,55,56,57,58.

Fig. 4
figure 4

Schematic illustration of the data validation process. Taxonomical and morphological information were checked and technically validated through the use of WoRMS and Algaebase repository and the web services of the Virtual Research Environment “PhytoVRE”.