Introduction

Ocean photosynthesis is dominated by phytoplankton, a functional group of single-cell organisms including prokaryotes and eukaryotes. Photosynthetic eukaryotes, although less abundant in numbers than cyanobacteria, can dominate carbon production and biomass in oceanic and coastal waters (Li, 1995; Worden et al., 2004; Rii et al., 2016). These unicellular organisms are taxonomically diverse with representatives in all of the super-groups of the eukaryotic tree of life (Not et al., 2012).

Four taxonomic groups have emerged as important factors within eukaryotic phytoplankton community: diatoms, dinoflagellates, haptophytes and chlorophyta (Not et al., 2012). The first three groups are mostly found in the micro-plankton (20–200 μm) and nano-plankton (2–20 μm) size ranges, whereas Chlorophyta dominate the smaller size fractions (pico-plankton - 0.2–2 μm). Abundant groups from the micro- and nano-plankton, such as diatoms and dinoflagellates, have been broadly investigated, in contrast to members of the green lineage. The importance and diversity of Chlorophyta have not been extensively studied, in part because many are of small size and lack distinctive morphological features. Several early diverging clades of mainly marine unicellular algae gathered under the term prasinophytes located at the base of Chlorophyta evolution (Lemieux et al., 2014). The paraphyletic origin of prasinophytes is reflected in the wide range of cell shapes (Leliaert et al., 2012; Tragin et al., 2015) and photosynthetic pigments (Latasa et al., 2004) found among them. Within prasinophytes, nine distinct lineages (known as clades I–IX) have been distinguished on the base of 18S ribosomal RNA (rRNA) gene phylogeny (Guillou et al., 2004; Viprey et al., 2008). These lineages correspond to established taxonomic levels such as Class (for example, Mamiellophyceae) or Order (for example, Prasinococcales) or just regroup undescribed strains (clade VII) or even only environmental sequences (clade IX). The class Mamiellophyceae (Marin and Melkonian, 2010), corresponding to clade II (Guillou et al., 2004), contains three important genera of marine pico-phytoplankton: Micromonas, Bathycoccus and Ostreococcus. They are typical of coastal waters (Not et al., 2004; Romari and Vaulot, 2004; Collado-Fabri et al., 2011) where they can form sporadic blooms (O’Kelly et al., 2003) but have also been observed in open oceanic waters (Foulon et al., 2008; Monier et al., 2012; Treusch et al., 2012; Vaulot et al., 2012). The genus Micromonas can be dominant in Arctic ecosystems (Lovejoy et al., 2007; Balzano et al., 2012). Members of these three genera are easily isolated into cultures and genome sequences from representative strains are available (Derelle et al., 2006; Worden et al., 2009; Moreau et al., 2012), fostering their use as biological models for marine photosynthetic eukaryotes.

Another group of prasinophytes, clade VII (Guillou et al., 2004), has been suggested to be important in oceanic waters (Moon-van der Staay et al., 2000; Viprey et al., 2008; Shi et al., 2009). Prasinophytes clade VII are naked coccoid cells ranging in size from 2 to 5 μm with no specific morphological feature and remain without formal description, despite the fact that they have been in culture since 1965 (Potter et al., 1997). Previous phylogenetic analysis of the 18S rRNA gene divided this group into three well-supported lineage, A, B and C (Guillou et al., 2004), the latter formed by Picocystis salinarum only, a small species found in saline lakes (Lewin et al., 2000; Roesler et al., 2002; Krienitz et al., 2012). Strains from lineages A and B harbor very similar set of photosynthetic pigments in contrast with those from lineage C (Lewin et al., 2000; Lopes dos Santos et al., 2016).

The first molecular data of prasinophytes clade VII were recovered from the environment by Moon-van der Staay et al. (2000). Since then, clade VII has been found in moderately oligotrophic areas from the Pacific Ocean, Mediterranean Sea (Viprey et al., 2008; Shi et al., 2009; Rii et al., 2016) as well as in coastal waters (Romari and Vaulot, 2004; Wu et al., 2015). Recently, Rii et al. (2016) confirmed through 18S rRNA clone libraries that prasinophytes clade VII are the dominant green algae in the mildly oligotrophic waters of the South East Pacific Ocean. Rates of 14C-primary productivity estimated from radiolabeled cells sorted by flow cytometry revealed that the productivity of photosynthetic picoeukaryotes (<3 μm) was nearly (44%) equivalent that of Prochlorococcus plus Synechococcus (56%), suggesting a key role for clade VII in these waters (Rii et al., 2016)

In this study, we analyze the genetic diversity of clade VII based on a large number of strains hosted by the Roscoff Culture Collection (RCC) and National Institute of Environmental Sciences Collection (NIES) and determine the oceanic distribution of this group using V9 rRNA metabarcoding data sets obtained in the frame of the Tara Oceans expedition and Ocean Sampling Day (OSD) consortium. We demonstrate that prasinophytes clade VII is a highly diversified group and one of the key component of phytoplankton in open ocean waters.

Materials and methods

Origin of the strains and culture conditions

Prasinophytes clade VII strains listed in Table 1 were used for genetic diversity analysis. These strains were selected from the RCC (http://www.roscoff-culture-collection.org) and from the NIES collection (National Institute for Environmental Studies, http://mcc.nies.go.jp). Strains were grown at 22 oC in K seawater medium (Keller et al., 1987) under average 100 μmoles photons m−2 s−1 and 12:12-h light:dark regime.

Table 1 Origins and culture conditions of prasinophytes clade VII strains used in this study

Genomic DNA extraction and PCR amplification

Cells were harvested in exponential growth phase and concentrated by centrifugation. Total nucleic acids were extracted using the Nucleospin Plant II kit (Macherey-Nagel, Düren, Germany). The nearly full-length nuclear 18S rRNA and partial plastid 16S rRNA gene were PCR amplified using the eukaryotic primers Euk63f and Euk1818r (Lepère et al., 2011), and PLA491f (Fuller et al., 2006a, 2006b) and OXY1313r (West et al., 2001), respectively (Supplementary Table 1). Sequences were deposited to GenBank under accession numbers: KU843559–KU843599.

Environmental sequences

A reference data set of near complete sequences of gene coding for nuclear 18S rRNA and partial plastid 16S rRNA gene were compiled using sequences available for prasinophytes clade VII strains and the PhytoRef database for plastid 16S rRNA (Decelle et al., 2015b). This data set was used to identify similar environmental sequences in GenBank (Release 208.0; June 2015) using BLAST (blastn algorithm, e-value e−10). Only sequences longer than 500 and 400 bp were retrieved for plastid 18S and 16S rRNA, respectively (Supplementary Table 2). The nuclear 18S and plastid 16S rRNA environmental sequences were checked for chimeras using uchime as implemented in Mothur v.1.36.0 (Schloss et al., 2009) and assigned manually based on a reference alignment as described below.

Phylogenetic analysis and sequence assignments

All culture and environmental sequences were aligned with MAFFT using the E-INS-i algorithm (Katoh et al., 2002). Only 18S rRNA sequences longer than 1500 bp and environmental plastid 16S rRNA that completely covered the fragment obtained (approximately 850 bp) with the primers PLA491f/OXY1313r were considered in order to build a reference alignment for the phylogenetic analysis. The remaining shorter sequences were placed in the reference alignment using the online tool MAFFT with the —addfragments option (Katoh and Frith, 2012) and manually assigned.

Phylogenetic reconstructions were performed with three different methods: maximum likelihood (ML), distance (neighbor joining; NJ) and Bayesian analyses. The K2+G+I model was selected for the 18S rRNA and 16S rRNA sequence data based on the substitution model selected through the Akaike information criterion and the Bayesian information criterion options implemented in MEGA 6.06 (Tamura et al., 2013). NJ and ML analysis was performed using MEGA 6.06 and PhyML 3.0 (Guindon et al., 2010) with SPR (Subtree Pruning and Regrafting) tree topology search operations and approximate likelihood ratio test with Shimodaira–Hasegawa-like procedure, respectively. Markov chain Monte Carlo iterations were conducted for 1 000 000 generations sampling every 100 generations with a burning length 100 000 using MrBayes 3.2.2 (Ronquist and Huelsenbeck, 2003). MAFFT and MrBayes programs were run within Geneious 7.1.7 (Kearse et al., 2012). For the definition of environmental clades, we adapted the criteria of Guillou et al. (2008): a clade must contain at least two environmental or strain sequences obtained from different locations and/or samples at a given location and must have strong phylogenetic support for at least one of the gene markers used and specific V9 signatures in nuclear 18S gene (see details below). Clade nodes were considered as well supported when bootstrap, SH-like support values and Bayesian posterior probabilities were higher than 70, 0.7 and 0.95, respectively. The same criteria was used to represent the sequences on the phylogenetic trees.

18S rRNA metabarcodes

V9 rRNA metabarcoding data sets obtained in the frame of the Tara Oceans expedition and the OSD consortium were used to analyze the global environmental distribution of prasinophytes clade VII. The Tara Oceans samples were collected in oceanic waters over the course of a 2.5 year expedition (Pesant et al., 2015) and OSD samples were collected during a simultaneous sampling campaign performed on 21 June 2014 at 191 different sites mostly in coastal areas (Kopf et al., 2015).

For Tara Oceans, we used the publicly available eukaryotic metabarcoding data set, consisting of >530 million quality-checked eukaryotic V9 rRNA sequences obtained from 293 samples obtained at 47 stations (mostly concentrated in the Southern hemisphere), two depths (surface and deep chlorophyll maximum (DCM)) and four size fraction (0.8–5, 5–20, 20–180 and 180–2000 μm). V9 rRNA metabarcodes (129 bp on average) were sequenced with Illumina HiSeq (Illumina, San Diego, CA, USA) sequencing at a typical depth of 2 M sequences per sample (Supplementary Table 3). Metabarcode sequences from each sample set were clustered into OTUs, differing by one base at most, using the SWARM software (Mahé et al., 2014). Detailed information on sampling, primers and sequence data processing are provided in Pesant et al. (2015), Amaral-Zettler et al. (2009) and de Vargas et al. (2015). We only considered OTUs represented by at least 10 sequences. OTUs were assigned to a taxonomic group using a BLAST-like strategy (see details at http://taraoceans.sb-roscoff.fr/EukDiv/#content2) against an annotated eukaryotic V9 database derived from the PR2 database (Guillou et al., 2013), which provides eight levels of taxonomic hierarchy (Kingdom, Super-division, Division, Class, Order, Family, Genus and Species) (for detailed information see de Vargas et al., 2015). Only OTUs that presented >80% similarity to at least one of the reference sequences were kept. Fifty-seven OTUs were classified as prasinophytes-and-relatives (Supplementary Table 4). Sequences representative of each OTU were aligned to reference sequences (see above) and assigned to a specific clade of prasinophytes clade VII (Supplementary Table 4 and Supplementary Figure 2).

Only OSD V9 rRNA sequences were considered in this study for comparison purpose. The metabarcode data set, sequenced with Illumina MiSeq sequencing by Life Watch (Italy), consisted of 11 million quality-checked eukaryotic V9 rRNA sequences obtained from 31 stations (surface) after filtration through 0.8 μm. Most of the OSD stations used in this study were concentrated in the Northern hemisphere. Detailed information on sampling and sequence data preprocessing are provided at: https://github.com/MicroB3-IS/osd-analysis/wiki/Guide-to-OSD-2014-data#data-deposited-in-public-archives-and-available-and-web-sites. Metabarcode sequences from each sample set were clustered into OTUs at 99% identity, using the nearest neighbor method implemented in Mothur (Schloss et al., 2009). OTUs represented by at least 10 sequences were assigned to a taxonomic group using V9 regions extracted from the PR2 database (Guillou et al., 2013) using the ‘classify’ Mothur command with the wang method (Wang et al., 2007).

We compared the contribution of prasinophytes clade VII of Mamiellophyceae metabarcodes to the total number of ‘photosynthetic’ metabarcodes. Groups considered as photosynthetic included Chlorophyta, Rhodophyta, Cryptophyta, Haptophyta and Ochrophyta (that is, photosynthetic Stramenopiles such as diatoms) but not dinoflagellates for which the differentiation between photosynthetic and heterotrophic species is highly complex.

Analyses and plots were performed with the R software using the following libraries: ggplot2, mapproj, maps, mapdata, reshape2, scales, gridExtra (R Development Core Team, 2013).

The Tara Oceans and OSD data set are accessible at http://doi.pangaea.de/10.1594/PANGAEA.843017and843022 and https://owncloud.mpi-bremen.de/index.php/s/RDB4Jo0PAayg3qx?path=/2014/silva-ngs/18s/lifewatch/v9, respectively.

Results

Genetic diversity

The phylogenetic tree based on nearly full nuclear 18S rRNA sequences (>1500 bp) obtained from strains and retrieved from GenBank (Figure 1 and Supplementary Table 2) confirmed the existence of the three major lineages, A, B and C, defined by Guillou et al. (2004). However, each of these lineage can be further divided into finer clades. Lineage A is subdivided into seven clades, A1–A7, well supported by all three methods (NJ, ML and Bayes) used except for clades A2, A3 and A4. Clade A2 and A4 had no support from NJ (bootstrap 61% and 23%, respectively) and clade A3 neither by NJ (bootstrap 59%) nor Bayes analysis (0.56). A2 and A3 clades had identical sequences in the V4 region of the 18S rRNA gene and A4 was only distinguished by a unique position around 580 bp. Lineage B contains three clades, B1–B3, the latter composed only by environmental sequences. The three clades are well supported by all three methods used. Lineage C did not contain any subdivision based on the 18S rRNA gene and is only composed of Picocystis salinarum, which is the only described species within prasinophytes clade VII so far. The average sequence identity within a clade calculated with the nuclear 18S rRNA data set ranged from 98.19% to 99.96%. Clades B1 and A6 showed lowest average sequence identity, 98.19 and 99.12, respectively (Supplementary Table 5).

Figure 1
figure 1

ML tree inferred from nuclear 18S rRNA sequences belonging to prasinophytes clade VII. Nodes supported by NJ, ML and Bayesian methods are shown by a solid dot (▪). Nodes supported by ML and Bayes only are indicated by a gray dot (). Empty dot () represents a node supported by only one method, either ML or Bayes.

Almost all clades within lineages A and B were well supported by the phylogenetic analysis of plastid 16S rRNA sequences longer than 850 bp by the three methods used, including A2, A3 and A4 (Figure 2). The only exceptions were B3 that was only represented by environmental nuclear 18S rRNA sequences and for which no equivalent was found with the plastid 16S rRNA gene, A7 represented by a single strain RCC3374 and A6 (RCC4434) for which we have no 16S rRNA sequence (Figure 2). The branching of RCC3374 and RCC3368 (for which we have no 18S sequence, see Table 2) was not supported by any of the methods used. Also, we observed that three plastid 16S rRNA clones previously assigned as environmental clade 16S-VIII by Lepère et al. (2009) corresponded to clades VII B1 and B2 as hypothesized by Shi et al. (2011). Plastid 16S rRNA phylogeny indicates that the Picocystis lineage is not a sister lineage of the two others lineages (A and B) of prasinophytes clade VII (Figure 2).

Figure 2
figure 2

ML tree inferred from nuclear 16S rRNA sequences belonging to prasinophytes clade VII. Nodes supported by NJ, ML and Bayesian methods are shown by a solid dot (▪).

Table 2 Prasinophyte clade VII and Mamiellophyceae sequences as a function of size fraction and depth in the Tara Oceans and OSD metabarcode data sets

Strains RCC996, RCC3376 and RCC2339 as well as environmental sequence DSGM-79 (AB275079) did not belong to any clade in the nuclear 18S rRNA analysis. Strains RCC996 and RCC3376, as well as clone DSGM-79 belong to lineage A, whereas RCC2339 belongs to lineage B. These strains have unique signatures in both nuclear 18S rRNA and plastid 16S rRNA. Partial nuclear environmental sequences retrieved from GenBank as well as V9 metabarcodes were similar to these unique sequences (see below) demonstrating that these sequences are not chimera. However, as no other long sequences are available in public databases, we did not create clades for these sequences. It is likely that these unique branches will be attached to clades as more sequences and strains become available.

Prasinophytes clade VII in public databases

A total of 267 nuclear 18S and 109 plastid 16S rRNA clade VII sequences longer than 500 and 400 bp, respectively, were retrieved from GenBank (Figure 1 and Supplementary Table 2). These sequences were mostly from environmental clone libraries although 12 were from cultures, 11 of which were assigned to lineage C. The other strain sequence recovered from GenBank is labeled ‘Prasinophyceae sp. CCMP1998’ (KF615770), which had been wrongly attributed to the genus Pycnococcus (Duanmu et al., 2014) but clearly belongs to prasinophyte clade VII.

Sequences from lineages A (43%) and B (48%) were all obtained from oceanic waters, including Atlantic, Indian and Pacific Oceans and China, Arabian, Red, Caribbean and Mediterranean Seas in contrast to sequences from lineage C (8%), which were not of oceanic origin. Geographical information about sampling origin was available for 85% of the sequences assigned to A and B. They were found between ~60° N and ~34° S with no sequence in cold temperate or polar regions (Supplementary Table 2).

Among the nuclear 18S nuclear sequences, clades A4 and B1 were the most abundant (Supplementary Figure 3A). About 7% of the sequences were only assigned at the lineage level because the aligned region did not have signatures strong enough to identify them at the clade level. For example, most of the 18S rRNA sequence not assigned corresponded to the V4 region for which clades A2 and A3 are identical. Also most plastid 16S rRNA sequences from GenBank were obtained using bacteria 16S rRNA primers: despite being long sequences, the alignment of these sequences with the fragments generated by our primers (PLA491F/OXY1313R) was short. About 2% of 18S rRNA sequences were assigned to the solitary branches RCC996, RCC2339 and clone DSGM-79 (Supplementary Figure 4).

Oceanic distribution

In order to access the oceanic distribution of prasinophytes clade VII, we searched for 18S rRNA signatures of this group in V9 rRNA metabarcoding data sets obtained in the frame of the Tara Oceans expedition (de Vargas et al., 2015) and OSD (Kopf et al., 2015). The Tara Oceans stations were distributed over the major oceans with a bias towards the Southern hemisphere, whereas OSD stations were mostly coastal and in the Northern hemisphere (Supplementary Figure 1). OSD sampling was synchronized on the northern summer solstice while Tara Oceans expedition extended over all four seasons. The sampling strategy of Tara Oceans included four different size fractions (0.8–5, 5–20, 20–180 and 180–2000 μm) in contrast to only one size fraction for OSD (0.8–200 μm). Also two water column depths corresponding to subsurface and the DCM were sampled in Tara Oceans, whereas OSD samples were only recovered from surface. In this work, we only considered the 31 OSD and 47 Tara Oceans stations for which V9 18S rRNA data sets are publicly available.

The contribution of Chlorophyta to Tara Oceans and OSD metabarcodes affiliated to photosynthetic groups (excluding dinoflagellates) was 12.8% and 19.6%, respectively (Table 2). For the Tara Oceans metabarcodes, Chlorophyta contribution was highest, on average, for the smallest size fraction and higher at the DCM compared with the surface (Table 2).

In coastal waters, Mamiellophyceae was the dominant Chlorophyta group. In the OSD data set, they contributed on average 10.7% to the total photosynthetic sequences (49.7% of the Chlorophyta) against 0.2% (1.4% of Chlorophyta) for prasinophytes clade VII (Figure 3a and Table 2). Mamiellophyceae were dominant at most coastal station from OSD but also at some Tara Oceans stations, including high latitudes stations both in the northern and southern hemispheres, and off South Africa in the Benguela current (Figure 4).

Figure 3
figure 3

(a) Average contribution of the different groups of Chlorophyta in Tara Oceans (average of the three size fractions 0.8–5, 5–20 and 20–180 μm) and in OSD (0.8–200 μm fraction). (b) Average contribution of the different groups of Chlorophyta in the Tara Oceans data set as a function of size fraction in surface (left) and at the DCM (right).

Figure 4
figure 4

Map of the contribution of Chlorophyta to photosynthetic sequences (size of circles) and dominant Chlorophyta group (color of circles) for Tara Oceans (0.8–5 μm) and OSD samples in surface samples. Circles with black border correspond to OSD stations.

In contrast, clade VII appeared as the dominant group in oceanic waters. In the Tara data set, their average contribution to the total photosynthetic sequences was about 8%, while that of Mamiellophyceae was only 1.8% (Figure 3a and Table 2). The overall contribution of prasinophytes clade VII to total Tara Oceans photosynthetic sequences varied between 5% and 11% on average for the different depths and size fractions but could reach up to 80% in specific samples (Table 2). The dominance of clade VII within Chlorophyta held for all size classes and for both surface and DCM samples (Figure 3b). At most Tara oceanic stations (Figure 4), the 0.8–5 μm fraction was dominated by prasinophytes clade VII but other Chlorophyta groups could prevail at specific locations, for example, at several Mediterranean Sea stations where Pyramimonadales were the major group. Their highest contribution was in the pico-eukaryote fraction (0.8–5 μm) and unexpectedly in the mesoplankton fraction (180–200 μm). In fact, in surface, for these two size fraction, the overall contribution of prasinophytes clade VII to photosynthetic sequences was very similar with an average of 10% and a maximum over 50% (Table 2). Clade VII contribution to the total photosynthetic pico-eukaryote size fraction was highest in equatorial and subtropical waters (Figures 5a and b, bottom graphs).

Figure 5
figure 5

Top graph: contribution of the different clades of prasinophytes clade VII to the smallest size fraction (0.8–5 μm) at each Tara stations. Bottom graph: relative contribution of clade VII sequences to the total photosynthetic sequences. (a) Surface. (b) DCM.

Fifty-seven Tara Oceans and eight OSD OTUs were classified as prasinophytes clade VII (Supplementary Table 4). Tara Oceans OTUs represented a total of over 2 million sequences and clade VII was present at all stations. In contrast, OSD had just over 1000 sequences of clade VII, which was only present at 16 out of 32 stations. The eight most abundant clade VII Tara Oceans OTUs regrouped >99% of the sequences. By careful alignment of the sequence representative of each OTU with references sequences from cultures or the environment, we were able to assign the major OTUs to the clades defined above (Supplementary Figure 2 and Supplementary Table 4). Interestingly, no sequence from either Tara Oceans or OSD could be assigned to the C clade.

Overall, the most abundant clades were B1, A4, A6 and A3, respectively, in the Tara data set (Table 3 and Supplementary Figure 3A). Remarkably, these clades were also the most abundant in OSD (A4, A6, B1 and A3) and in GenBank 18S (A4, B1, A3 and A6) (Supplementary Figure 3A). Two of these clades (B1 and A6) have only a single representative in culture, whereas A4 and A3 have strains from the Pacific, Atlantic and Indian Oceans (Table 1). The relative contribution of the different clades did not seem to vary much with the size fraction nor with the depth level (DCM vs surface) in the Tara Oceans samples (Supplementary Figure 3B).

Table 3 Contribution of the different subclades of prasinophytes

The clade composition varied widely between the different regions sampled (Figure 5). Mediterranean Sea surface waters were dominated by clade A6, which was also present at depth in the Eastern, more oligotrophic basin (stations 22–30, Figure 5). Despite the low contribution of clade VII to the total photosynthetic pico-eukaryote fraction in the Mediterranean Sea, A6 notably contributed to about 10% of the total photosynthetic sequences at the lowest DCM recorded (Figure 6a).

Figure 6
figure 6

Abundance of the prasinophytes clade VII clades as a function of depth (a) and temperature (b). The radius of the circles is proportional to the contribution of clade to the total number of photosynthetic sequences.

The North Indian and tropical Pacific Oceans were dominated by clade B1. This clade was always present when clade VII significantly contributed to the total photosynthetic pico-eukaryote fraction, either at the surface in the Pacific Ocean or in a more pronounced way at the DCM in the Indian Ocean (Figure 5). In the Pacific Ocean, B1 was complemented by A1 both in surface and at depth, A1 being relatively more important at the DCM, which fits with the fact that all A1 strains have been isolated from deep euphotic waters (Figure 5 and Table 1). Deep waters seem to be the preferential niche for A1 where their contribution to the total photosynthetic was higher than in surface (Figure 6a). Clade A4 also had a large contribution to total photosynthetic sequences at the bottom of the euphotic zone (Figure 6a), although all A4 strains have been isolated from surface waters (Table 1). In the Indian Ocean, B2, A4 and A3 seemed to complement B1 and in the Atlantic Ocean B1, A3 and A4 are the most important clades (Figure 5).

Prasinophytes clade VII seem to have a preference for warm waters being mostly found between 20 °C and 30 °C (Figure 6b). Among the different clades, A1 seems to have a narrower temperature range and A4 two distinct temperature niches around 20 °C and 27 °C, where they had the larger contribution to the total photosynthetic sequences (Figure 6b). In contrast, B1 seems to have a wide temperature range (Figure 6b).

Discussion

Our phylogenetic analyses based on nuclear 18S rRNA sequences recovered the three main lineages (A, B, and C) of prasinophytes from clade VII as proposed by Guillou et al. (2004). A and B are clearly marine lineages with sequences and strains recovered only from marine environments. On the other hand, lineage C (encompassing the only formally described species Picocystis salinarum) has been found in inland saline lakes (Hollibaugh et al., 2001; Roesler et al., 2002; Krienitz et al., 2012) or ponds (Lewin et al., 2000) where it was originally isolated. Thus, Picocystis seems to be a prasinophyte typical of inland saline waters rather than marine with a very restricted ecological range of habitats, as evidenced by the absence of similar sequences in the Tara and OSD data sets. Also, all our environmental sequences recovered and assigned to Picocystis were from hypersaline lakes in USA, China and Kenya with one exception. The environmental sequence Ola1.E10.invm13r (AB691200) was recovered from Terrebone Bay, a complex collection of small bayous of the Mississippi River Delta for which an increase of salinity has been reported (Steyer et al., 2008).

In contrast to 18S analysis, plastid 16S rRNA phylogeny suggests that Picocystis salinarum forms a lineage separate from prasinophytes clade VII A and B. Several evidences have accumulated supporting this hypothesis, including phylogenetic analyses using the complete nuclear (Marin, 2012) and plastid encoded rRNA operons (Marin and Melkonian, 2010; Marin, 2012) and chloroplast genomes (Lemieux et al., 2014). The chloroplast genomes of Picocystis salinarum and one strain of clade VII (RCC15 or CCMP1205, clade A2) differ in size (clade VII harbor the smallest chloroplast genome yet reported for a photosynthetic green algae), structure (clade VII chloroplast contains 100 genes unequally distributed between the two DNA strands) and gene content (Lemieux et al., 2014). Very recently, pigment composition analysis confirmed that Picocystis genus harbor monadoxanthin and diatoxanthin as accessory pigments (Lopes dos Santos et al., 2016). These pigments are usually found in cryptophytes or diatoms (Takaichi, 2011), algae that belong to the Red lineage, but have also been reported in Coccomyxa, a green alga belonging to the Chlorophyceae (Crespo et al., 2009). Therefore, the Picocystis lineage should be considered as an independent lineage of prasinophytes as already suggested by Lemieux et al. (2014).

Nuclear and plastid SSU rRNA gene failed to resolve confidently the relationships between the different clades (low bootstrap and different tree topologies). For ancient groups of eukaryotes, such as prasinophytes, multi-gene analysis or secondary structure of the internal transcribed spacer 2 from several taxa are usually required to resolve relationships at high taxonomic levels.

A small number of oceanic studies, focusing on photosynthetic picoeukaryotes, have already pointed prasinophytes clade VII as an important oceanic group, especially in the mesotrophic and moderately oligotrophic waters of the Pacific Ocean (Moon-van der Staay et al., 2000; Viprey et al., 2008; Shi et al., 2009, 2011; Rii et al., 2016). Our analysis showed a clear dominance of clade VII over Chlorophyta in oceanic waters where this group contributed on average to 54% of the total Chlorophyta sequences and 8% of the total photosynthetic sequences of the oceanic Tara data set. Mamiellophyceae, usually pointed as important factors of phytoplankton communities in coastal waters (Romari and Vaulot, 2004; Collado-Fabri et al., 2011), was the dominant Chlorophyta group in coastal OSD data set where contributed with 10.7% to the total photosynthetic sequences (49.7% to the Chlorophyta) against 0.2% for clade VII. One could argue that clade VII dominance among 18S metabarcodes is a result of large number of copies of the rRNA operon in their genomes. Although, no genome sequence is available for any member of this group, prasinophytes clade VII cells have a size between 2 and 5 μm in the same range than Mamiellophyceae, and therefore their 18S rRNA gene copy number is probably similar (Zhu et al., 2005).

An unexpected result was the high contribution of prasinophytes clade VII to the mesoplankton fraction (180–2000 μm). In some stations, they were the only green algal group present in this fraction (for example, in the Red Sea or the Indian Ocean). Many studies have reported that small cells can be incorporated into large aggregates allowing them to sink through the water column at a rate faster than their cell size would allow (Jackson, 2001; Richardson and Jackson, 2007). Sequences from Micromonas, Bathycoccus and Ostreococcus have been recovered in traps deployed at 200 and 500 m eastern subtropical North Atlantic (Amacher et al., 2009). Among the clade VII sequences from GenBank, a fraction was from deep sea waters, sediments, traps or marine snow (Supplementary Table 2). Alternatively, symbiotic associations could exist between some members of prasinophyte clade VII and larger plankton organisms. Symbioses are poorly characterized across open ocean plankton community and recent work has shown the importance and diversity of interactions among plankton members, especially the Rhizaria group (Decelle et al., 2015a) for which a wide diversity of symbionts has been described including the green algae Pedinomonas symbiotica (Cachon and Caram, 1979). One GenBank clade VII sequence, AB180203, has been indeed associated to a symbiont of the radiolarian Spongaster tetras. However, all the strains present in the RCC or NIES collections have been isolated as free living cells.

Tara Oceans, OSD and GenBank data sets pointed clades A3, A4, A6 and B1 as the most abundant among the 10 clades of prasinophytes clade VII. The ecological importance of clades and ecotypes have been extensively address for the marine genera of cyanobacteria Prochlorococcus and Synechococcus (Scanlan et al., 2009). The success of these cyanobacteria in the modern oceans has been attributed to their wide genetic diversity. For the three marine important genera of pico-phytoplankton, Micromonas, Bathycoccus and Ostreococcus, clades or ecotypes have also been described and ecological hypotheses about the distribution have been made. For example, Ostreococcus clade A seems to be typical from surface waters, whereas clade B appears to be associated with deeper layers of the euphotic zone (Rodríguez et al., 2005; Six et al., 2005). However, the physiological and environment parameter influencing clade distribution are more complex than irradiance alone, as showed for Ostreococcus (Simmons et al., 2016).

Clade B1 was always associated with the highest contribution of clade VII to the photosynthetic sequences, especially in the Pacific Ocean. Despite being the most represented clade, B1 was the last one for which a strain has been isolated. This strain (NIES-3669) obtained in 2015 from western North Pacific waters off the south coast of Hokkaido, Japan, was difficult to isolate and grows fastidiously compared with others prasinophytes clade VII (MHN, personal observation). In contrast, three strains have been isolated from clade B2, which occurrence is much lower in environmental sequences and seven strains have been obtained for each clade A3 and A5, which are always sporadic. It is noteworthy that abundant groups of marine microbes have defied cultivation efforts. For example, the most abundant marine heterotrophic bacteria, Pelagibacter, was only isolated (Rappé et al., 2002) 12 years after being discovered by Giovannoni et al. (1990). The heterotrophic flagellates group MAST-4 (Massana et al., 2004) is widespread in surface marine waters (Massana et al., 2006; Rodríguez-Martínez et al., 2009) but remains uncultured to date.

Interestingly, clade B1 arrived only in third position in the OSD data set, which suggests that it could be a more oceanic clade. Two Indian Ocean Tara stations, 41 and 42, with significant contribution of clade VII to the total photosynthetic sequences, showed dominance of B1 at the DCM and not in surface at the other stations where this clade is dominant. This could indicate an ecotype (or another clade) within the B1 clade adapted to deeper water. B1 showed the lowest average of sequence identity within prasinophytes clade VII clades (Supplementary Table 5) and the largest number of Tara OTUs (Supplementary Table 4), which could indicate the presence of different ecotypes within this clade.

Clade A1 was not as abundant as B1 but had a large contribution to the total photosynthetic sequences at the DCM and seemed to complement B1 at the Pacific stations. All A1 strains have been isolated from deep samples (Table 1) and possess an intron 500 bp long around position 1000 in the 18S rRNA gene.

Prasinophyte clade A4 seems to be a more coastal clade as it was more abundant in OSD and at some Tara stations closer to shore, like 32 in the Red Sea and 78 and 84 in the Atlantic and Southern Ocean (Figure 5 and Supplementary Figure 4). This clade is the one for which the largest number of strains has been isolated across wide oceanic and latitudinal ranges (Table 1). Clade A4 strains were isolated from temperate latitudes, as far North as ~49° N or as far South as ~33° S but also from tropical latitudes. The Tara Oceans metadata analysis suggest two temperature niches. The analysis of accessory pigment composition of two temperate A4 strains (RCC1124 and RCC1871, Table 1) showed that, in contrast with the others strains from lineage A and B, they do not contain loroxanthin (Lopes dos Santos et al., 2016). It will be interesting to analyze the pigment composition of tropical A4 strains to know whether loroxanthin is associated with lower temperature niches.

Among Tara stations where B1 was not abundant, we did not observe a prevailing clade except in the Mediterranean Sea where A6 seemed to be dominant except at stations 4 and 7, which presented an assemblage of clades (Figure 5,Supplementary Figure 4). These latter stations are close to the Strait of Gibraltar, a well-known region of water mass exchange between Atlantic Ocean and Mediterranean Sea marked by salinity and temperature gradients (Gascard and Richez, 1985; Béranger et al., 2005). The Mediterranean Sea presents a range of trophic conditions and it is known to harbor endemic phytoplankton species (for example, Gómez, 2006). However, no strain of prasinophytes clade VII has been isolated from the Mediterranean Sea and the only strain representing A6 was isolated in 2013 from the Atlantic Ocean. Also only one environmental sequence among the nine available from the Mediterranean Sea was assigned to A6. The dominance of A6 in the Mediterranean Sea could be seasonal because all the stations at this oceanic region were sampled during late fall–early winter.

The distribution of the other clades, A2, A3, A5, A7 and B2, is too sporadic to draw major conclusions. Sensitive and quantitative techniques to assess the abundance and distribution of these clades would help to understand the role of these low abundance clades.

Conclusion

Amplicon-based analysis such as metabarcoding are influenced by several factors such as DNA extraction, amplicon size, primer preference, target gene copy number, sequencing analysis methods like OTU generation (Zinger et al., 2012). Most of these bias will affect the interpretation of the results in the light of quantitative methods such as estimation of richness. Yet, metabarcoding is powerful to determine overall trends of dominants groups (de Vargas et al., 2015). Here we analyzed the distribution of prasinophytes clade VII in two marine large 18S rRNA data sets from different environments: the coastal OSD and the oceanic Tara Oceans. Despite the bias and discrepancies in number of sequences between these two data sets, we could establish that prasinophytes clade VII is the dominant group of the green algae in oceanic waters and probably important primary producers as recently demonstrated (Rii et al., 2016). Our phylogenetic analyses point out, however, wide heterogeneity within lineages A and B and the precise habitat of each clade is still unclear although the Tara data helped us to formulate some hypotheses such as the occurrence of A4 in temperate waters or dominance of A6 in the Mediterranean Sea. The power of defining clades for abundant groups such as prasinophytes clade VII is to be able to refine their distribution and correlation with environmental factors in order to understand their ecology. Each clade described here has a precise signature in the V9 region (Supplementary Figure 2) that allowed us to map their distribution through different oceans regions. These signatures could be further used to design fluorescence in situ hybridization or quantitative PCR probes in order to assess more precisely the abundance of the most prevalent clades and their contribution to the total photosynthetic community in marine waters.