Background & Summary

Because of its location in a densely populated region with intensive economic activities, the North Sea has been seriously affected by anthropogenic activities, both historical and contemporary1,2. Especially in the last 50 years it has been heavily impacted by pollution (e.g. heavy metals), eutrophication (with shifts in nutrient ratios), climate change, fisheries (with concomitant changes in food webs) and other disturbances (e.g. the construction of offshore windfarms)37.

Due to its role as the main primary producer in the ocean, phytoplankton influences almost all higher trophic levels, from copepod herbivores to zooplankton carnivores, pelagic fish, seabirds and marine mammals8. Phytoplankton is sensitive to anthropogenic pressures and both its production and composition can change as a result of eutrophication and temperature changes, but also as a result of top-down effects of changes in higher trophic levels (e.g. through fisheries, shifts in zooplankton composition)4,5,9. Long-term data on phytoplankton community structure offer a unique opportunity to study the impact of various anthropogenic pressures on phytoplankton, and how phytoplankton may respond to future changes.

While in most North Sea countries such as the Netherlands, France, Germany and the United Kingdom long-term monitoring phytoplankton programs have been running for several decades1019, in Belgium no such structured long-term monitoring effort exists. Nevertheless, an impressive amount of phytoplankton community structure studies have been conducted during the last decades, including the 1970’s for which data are often lacking in neighbouring countries12,13,1719. These historical and recent cell count data were until now scattered in technical reports or in digital form on laboratory computers only, while some data were directly incorporated in the database of the Belgian Marine Data Centre (BMDC). As a result, these quantitative phytoplankton community structure datasets, which took a lot of resources and expert knowledge to acquire, were never published or disseminated as a whole to the wider scientific community. In addition, because the data were collected by different researchers over a long time period, the data were never quality controlled or standardized in a uniform way.

This work is part of the 4DEMON project (www.4demon.be/), which has the aim to safeguard and centralize these valuable historical data for the future and make them available to the scientific community. To this end, we identified relevant data sources based on literature research, contacted researchers, digitised data values, assembled metadata, conducted quality control on the data and integrated the data in an extensive database for the BPNS, the Belgian Phytoplankton Database (BPD) (Fig. 1) (Data Citation 1). The BPD is available through the Integrated Marine Information System (IMIS) hosted at the VLIZ (Flanders Marine Institute).

Figure 1
figure 1

Representation of the workflow starting from the data source identification up to the final dissemination of the Belgian Phytoplankton Database (BPD).

Methods

Data Inventory

Possible data sources were identified based on literature research, web-based search engines and queries in online databases such as IMIS (Integrated Marine Information System - www.vliz.be/en/imis), IMERS (Integrated Marine Environmental Readings & Samples - www.vliz.be/vmdcdata/imers) and IDOD (Integrated and Dynamical Oceanographic Data management - http://www.bmdc.be). In addition, universities and researchers were contacted by mail, phone or personally. All data sources are inventoried in the Data Inventory and Tracking System (DITS - http://dits.bmdc.be) managed by BMDC. The majority was made available through IMIS via www.vliz.be/en/imis?module=ref&SpCol=809&show=search.

Data compilation

After the identification of relevant data sources, all non-digitally available data sources (namely books, technical reports, Bachelor theses, Master theses, PhD theses and project reports) were scanned. The data values were manually transferred to a standard format in MS Excel. Data were also downloaded or extracted from databases. We directly accessed data already available in digital format on laboratory computers or received them from researchers. All compiled data were integrated in a MS Access database.

Quality Control & standardization

Metadata

A significant amount of metadata was not easily accessible via the data sources themselves. An intensive effort was made to recover all relevant metadata e.g. station information or methodological approaches from associated sources such as final project reports.

Taxonomy

During the last decades there were many extensive nomenclatural and other taxonomic revisions of phytoplankton taxa based on progress related to advances in microscopy and molecular-phylogenetic analyses. For this reason, species names needed to be referenced prior to inclusion in the database. This was done using the taxon match option available in the World Register of Marine Species (www.marinespecies.org), a universally recognized and authoritative open-access reference system for marine species managed by VLIZ and edited by more than 240 taxonomic editors world-wide. Every species name has a unique identifier known as the AphiaID20. This identifier enables to link the species name to an internationally accepted standardized name and associated taxonomic information, but also redirects to the most accurate information on the species taxonomy, like accepted names and synonyms.

The taxon match was conducted in November 2017. Due to spelling mistakes present in the original reports or resulting from errors during digitisation (caused by illegible or low quality handwriting in the original paper reports), many names were not recognized automatically by the World Register of Marine Species (WoRMS), but were matched manually. Finally, less than 1% of the records could be matched neither automatically nor manually. In these cases the ‘AphiaID matched’ field stayed empty, but the records were not discarded.

A thorough clean-up of the species names and manual matching yielded a total of 99% of the taxon names being referenced with an AphiaID. Some species which were not listed in the WoRMS database were, after approval of the dedicated editor, added to this register.

Geographic reference

For most of the stations geographical coordinates were available or could be deduced from the synthesis reports or publications which made use of the data. Stations with unknown coordinates, but located on a map, were georeferenced in QGIS or based on standards within Marine Regions (www.marineregions.org). Stations from the same project which had slightly different names in various paper sources were compared and a unique station name was assigned. Finally, 93.8% of the records were assigned to stations with geographical coordinates.

Analyses and sampling methodologies

Phytoplankton was sampled using Niskin bottles, but also using other unspecified types of recipients (e.g. bottles, buckets) or a Van Dorn-sampler (Table 1). The samples were preserved with Lugol’s solution, formaline or natrium acetate. They were cooled or stored at room temperature and protected from the light. Cells were consistently counted with the Utermöhl method21 using the inverted microscope as optical instrument; for the large dinoflagellate Noctiluca scintillans sometimes a stereoscopic microscope was used. Sampling techniques, preservation steps and analytical methods are described in detail in the metadata of the BPD.

Table 1 Sampling methodologies, preservation and analytical methods reported in the Belgian Phytoplankton Database (BPD).

Additions and changes

The data were screened for random digitisation errors (mistakes made during transfer from handwritten documents to digital format). Duplicate values, resulting from data sources reporting on the same data, were removed. All zero values were removed. All units were converted to the common unit cells per litre. Phaeocystis cells associated in colonies are in the unit ‘106 coc L-1‘ (= colonial cells per litre). For 249 common phytoplankton species (168 Bacillariophyceae, 76 Dinophyceae, 4 Prymnesiophyceae and 1 Cryptophyceae) biovolume calculations based on literature values and online sources are given as additional information.

Data Records

Historic Projects

The BPD is a compilation of data assembled from different research projects conducted since 1968. At the end of the 1960s and in the 1970s the University of Leuven joined cruises to Iceland to investigate the pelagic environment. In 1970 the Belgian government financed an integrated research project called Projet Mer/Projekt Zee (PMPZ) to assess the quality of the marine environment of the BPNS. This project was followed by national research programs called Concerted Research Actions (CRA) from 1977 until 1981. From 1990 onwards subsequent projects such as AMORE (Advanced Modelling and Research on Eutrophication) focused on Phaeocystis blooms in the English Channel and the Southern Bight of the North Sea with a strong focus on the BPNS2224. In the years 2000, phytoplankton analyses were collected and processed in the framework of Bachelor and Master theses at Ghent University and in the framework of the EU Water Framework Directive (WFD) in order to study spatiotemporal dynamics in phytoplankton community structure in the BPNS2529 (Table 2).

Table 2 Overview of data sources integrated in the Belgian Phytoplankton Database (BPD).

Record types

Data recovery resulted in a high number of biotic values and associated abiotic parameters. In total 86,746 phytoplankton records are stored in the BPD. Quantitative units are phytoplankton densities in cells per litre (95.1% of the records) and abundance classes in cells per litre (4.4% of the records). Abundance classes reflect a range of cell densities per litre e.g. density between 1,000 and 9,999 cells per litre. 17,342 records are tagged with a living/dead (12,114/5,228) notation. ‘Dead’ refers to dead cells, e.g. diatom frustules without a cell content.

Metadata

Each individual data record is linked to its associated metadata such as information about the data source, the sampling event, the sampling and analysis method, the project, the physical dataset origin and the phytoplankton taxonomy.

Spatial & temporal coverage

The database includes almost 300 individual phytoplankton sampling stations in the BPNS and adjoining areas (French, Dutch and British waters) (Fig. 2) of which 137 sampling stations are situated within the BPNS resulting in a total of 51.6% of all records deriving from samples taken in the BPNS.

Figure 2: Locations of the sampling stations.
figure 2

Sampling stations are marked as black dots. The boundary of the Belgian part of the North Sea (BPNS) is indicated as a black line.

Data are available for the years 1968 to 1978 and 1994 to 2010 (Fig. 3a). In total, 3,178 sampling events took place throughout these periods of which 1,782 took place in the BPNS. The dataset has a good seasonal coverage (Fig. 3b and Fig. 3c). Between 1968 and 1978 2,269 sampling events took place which is on average 206 events per year. These events resulted in 56,286 records, an average of 5,117 records per year and a record to event ratio of 25. Between 1994 and 2010, 909 sampling events resulted in 30,460 records, which is on average 60 sampling events and 2,031 records per year.

Figure 3: Amount of the phytoplankton sampling events.
figure 3

The two periods 1968-1978 (red) and 1994-2010 (blue) are distinguished. a: Amount of sampling events per year from 1968 to 2010. b: Amount of sampling events per season, c: Amount of sampling events per month. Winter=December-February, spring=March-May, summer=June-August, autumn=September-November.

Taxonomic coverage

The dataset contains 681 unique AphiaIDs of which 93% were at least identified to the genus level. The remaining 7% were identified to a higher taxonomic level or could not be matched (1%). Bacillariophyceae (diatoms) and Dinophyceae (dinoflagellates) are the two most counted phytoplankton groups representing 86.4% and 6.3% respectively of all records (Table 3).

Table 3 Taxonomic phytoplankton classes present in the Belgian Phytoplankton Database (BPD).

Associated environmental data

The associated environmental abiotic data (15,199 environmental records) measured during the same campaigns and projects in the BPNS are included, containing i.a. concentrations of nutrients, chlorophyll a, temperature, salinity and pH. Similar to the phytoplankton data, this is a compilation originating from various laboratories and changes in methods have occurred over the years. The data have been quality checked and referenced in time and geographically. Duplicates, outliers and zero-values have been removed. The environmental data can be either directly linked to the phytoplankton data via a common sampleID (1,230 samples, 10,332 environmental records) or via a combination of sampling date and station (726 samples, 4,867 environmental records). The latter do not share a common sampleID with the phytoplankton data e.g. because the exact sampling time during the day or the sampling depth may differ.

Documentation and dataset dissemination

The BPD is accessible through IMIS and can be downloaded from the Marine Data Archive (MDA) (see Data Citation 1). Note that the first version of the data file (Phytoplankton_BPNS1968-2010.xlsx) does still contain zero count values, but does not yet include phytoplankton biovolume estimates or abiotic data.

Technical Validation

The BPD contains high quality phytoplankton count data of several decades and its associated abiotic data. As the BPD is a compilation of different research projects users should be aware of the following before usage. During the last decades, different protocols were used. For example, the sample collection method, the storage of the samples and the preservation methods can differ. In addition, cell counts have been performed by several researchers. In addition to variable levels of taxonomic expertise and difference in species concepts, it is a well-known fact that taxonomic skills can improve even during the careers of single taxonomists (as a result of growing expertise, but also better analytical tools and identification guides). This personal component in microscopic taxonomic determination can never be excluded completely30. All data present in the BPD however were obtained in well-equipped Belgian laboratories known for their high research standards and having extensive expertise in the field of phytoplankton identification and/or taxonomy. Therefore, the phytoplankton identifications and counts are considered to be generally solid.

Throughout the dataset diatom and dinoflagellate records are dominant (Table 3). Variation in taxon richness (number of accepted AphiaIDs) per sample in the BPD is shown in Fig. 4a. The peaks at 1 and around 60 can be attributed to specific projects. For example, the AMORE, AMOREII_VUB-ECOL and IPMS-PHAEO projects only focused on a few specific groups (such as the Bacillariophyceae (as a whole), Phaeocystis and Noctiluca), which explains the large number of samples for which only a single group is reported. The peak around 60 is mainly due to the AMORE II and III projects. In these projects, many taxa per sample get the same (low) density of 100 cells L−1. As zero values are absent for these samples, we suspect that these entries concern some indication of the fact that these species were under the detection limit. As we cannot be sure what these values mean, we have decided to leave them as they are in the dataset.

Figure 4: Summary statistics of the phytoplankton records.
figure 4

(a) Frequency distribution of the number of accepted AphiaIDs in the BPD, (b) Histogram of the logarithmically transformed biovolumes (log10 μm3 sample−1) per sample of the most common phytoplankton species.

In addition, missing metadata (e.g. coordinates of sampling location) can limit the usability of some data records. Despite these caveats, the BPD is the only Belgian phytoplankton database which contains data going back almost five decades. It is a comprehensive and thoroughly quality checked integrated data series which includes reliable data on phytoplankton in the BPNS for marine researchers and other interest groups (Fig. 4b).

Usage Notes

The BPD can be used to study spatio-temporal changes in phytoplankton community structure in the BPNS and adjoining areas in the period 1968-2010. Inter-annual as well as seasonal patterns can be studied (Fig. 5). Data can be analysed at the species level, but also aggregated data like e.g. total diatom or total dinoflagellate abundances can be analysed and community indices like diatom to dinoflagellate abundance ratios can be calculated. Furthermore, multivariate community analysis with e.g. ordination methods or general additive mixed modelling is an interesting field of study.

Figure 5: Heatmaps with colour key and histogram of the monthly logarithmic mean abundances of a selection of phytoplankton species reported in the Belgian Phytoplankton Database (BPD).
figure 5

Colours indicate the monthly mean logarithmic density (log10 cells L−1) of the species. a: Records dating from 1968 to 1978, b: Records dating from 1994 to 2010.

Additional information

How to cite this article: Nohe, A. et al. Marine phytoplankton community composition data from the Belgian part of the North Sea, 1968-2010. Sci. Data 5:180126 doi: 10.1038/sdata.2018.126 (2018).

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.