Introduction

Molecular surveys have been instrumental for decoding the wide diversity of microorganisms with crucial ecosystem roles (Falkowski et al., 2008). They have unveiled novel lineages within the three domains of life, bacteria (Achtman and Wagner, 2008), archaea (Robertson et al., 2005) and eukaryotes (Epstein and López-García, 2008), thus opening new avenues for evolutionary and ecological explorations. Within eukaryotes, novel lineages at the highest taxonomic rank were found, such as the Picozoa (Not et al., 2007; Seenivasan et al., 2013) or the rappemonads (Kim et al., 2011), and probably more wait to be unveiled. This new diversity was detected from small subunit rDNA environmental sequences forming monophyletic clades (named as ribogroups) of uncertain affiliation. Then, additional tools were used for their characterization, such as FISH (fluorescence in situ hybridization) (Not et al., 2007), single-cell genomics (Yoon et al., 2011), or isolation in pure culture (Seenivasan et al., 2013). Besides the striking identification of novel high-rank diversity, molecular surveys also identified novelty at all phylogenetic levels, from novel groups within supergroups, to additional diversity within defined taxa (Massana and Pedrós-Alió, 2008).

In molecular surveys of marine planktonic picoeukaryotes, a substantial number of the 18S rDNA sequences formed novel lineages within the supergroups alveolates and stramenopiles, named as MALV (MArine ALveolates) (Guillou et al., 2008) and MAST (Massana et al., 2004). On average, they encompassed 32% and 13% of sequences, respectively, in picoeukaryote surveys (Massana and Pedrós-Alió, 2008). MAST groups were not monophyletic, but formed independent lineages among basal heterotrophic stramenopiles. Twelve MAST groups were first defined (Massana et al., 2004), and seven more were proposed later (Zuendorf et al., 2006; Orsi et al., 2011). Also, a parallel classification was proposed soon after the first study (Richards and Bass, 2005). With only one exception (Cavalier-Smith and Scoble, 2013), this diversity remains uncultured, so determining their cell physiology and ecological attributes is one of the main challenges for future ecological studies. Some groups were investigated in detail, mainly by FISH using group-specific oligonucleotide probes, and MAST cells turned out to be small (2–5 μm) heterotrophic flagellates, widely distributed and active bacterial grazers (Massana et al., 2006; Lin et al., 2012; Piwosz et al., 2013). However, the particular cell size, distribution and activity varied among the investigated groups (Massana et al., 2009; Piwosz and Pernthaler, 2010), revealing a tight link between phylogenetic and functional diversity.

The main aim of this work was to reevaluate the phylogeny, diversity and ecology of MAST ribogroups, using three independent surveys of 18S rDNA sequences derived from traditional and new molecular tools. First, we performed an exhaustive search in the GenBank database for stramenopile sequences derived from both cultures and natural samples obtained through eukaryotic clone libraries. Second, we analyzed a data set of 454 pyrotags from European coastal sites, including oxygenated seawater samples (fractionated as pico-, nano- and microplankton), sediments, and anoxic plankton. This pyrotag data set has already been used to study MAST diversity, distribution and ecology (Logares et al., 2012). Third, we used a collection of single amplified genomes (SAGs) from single-cell protists (Heywood et al., 2011; Martinez-Garcia et al., 2012), which provide a cell-by-cell view of community structure. Our specific objectives were (1) to reevaluate the phylogenetic robustness of described MAST ribogroups, (2) to detect new ribogroups within stramenopiles, (3) to characterize the genetic diversity of each MAST ribogroup and define specific subclades and (4) to derive ecological preferences for the identified ribogroups and subclades by inspecting the samples from where sequences came from. This robust phylogenetic framework is needed for future research addressing the ecology of specific groups, both by using specific probes and by inspecting high-throughput sequencing data sets.

Materials and methods

Stramenopile sequences derived from GenBank

A reference data set of 370 near complete 18S rDNA sequences was compiled to represent all stramenopiles lineages. This data set was used to identify similar sequences in the NCBI-GenBank database (Release 183.0; April 2011) using a local BLAST search (Altschul et al., 1990) run with relaxed parameters (-evalue 0.0001 -max_target_seqs 10 000 -perc_identity 90). This yielded 6373 sequences between 700 and 2000 bp in size and with a bit score of >1000. Non-stramenopile sequences were detected using KeyDNATools (Guillou et al., 2013) and phylogeny; only a few were kept as outgroups. The GenBank retrieved data set was aligned with mothur (Schloss et al., 2009) based on the SILVA 108 template alignment (Quast et al., 2013) and chimeras (350 sequences) were identified by ChimeraSlayer (Haas et al., 2011). Studies reporting protist molecular surveys were checked to confirm that all stramenopile diversity was represented in this data set, and a few divergent lineages were added (such as MH-IX from Richards and Bass, 2005). Finally, sequences shorter than 1100 bp were removed, resulting in a final GenBank data set of 4000 entries.

Stramenopile sequences derived from SAGs

During the Tara-Oceans expedition (Karsenti et al., 2011), surface and deep chlorophyll maximum samples were collected at several sites in the Mediterranean Sea and the Indian Ocean, cryopreserved with glycine betaine, and shipped to the Bigelow Laboratory for Ocean Sciences. Two single-cell sorts by flow cytometry were done per sample, a first based on pigmented cells and a second based on DNA staining of unpigmented cells larger than bacteria (Heywood et al., 2011). Single plastidic or aplastidic cells were deposited into 384-well plates, subjected to MDA (multiple displacement amplification), and the genomic DNA was PCR screened with the 18S rDNA eukaryote primers 528f and EukB (Heywood et al., 2011). Sequences (1200 bp) were obtained from 903 SAGs and identified by BLAST (complete analysis in Sieracki et al., in preparation). Here, we considered 568 SAGs affiliating to stramenopiles.

Stramenopile sequences derived from BioMarKs pyrotags

During the BioMarKs project (http://www.biomarks.eu/), samples were collected in six European coastal sites at different water column depths (surface and deep chlorophyll maximum) in three size fractions (picoplankton (0.8–3 μm), nanoplankton (3–20 μm) and microplankton (20–2000 μm)), and at sediments (Logares et al., 2012). DNA and RNA extracts were obtained from the samples, the later reverse transcribed to cDNA, and protist diversity was assessed by pyrosequencing the PCR products obtained using V4 18S rDNA primers (Stoeck et al., 2010). Sequences thus obtained were named as pyrotags. The data set was curated by removing short or low quality reads and chimeras (Logares et al., 2012). High-quality pyrotags (1.5 million) were clustered at 97% similarity with USEARCH (Edgar, 2010) to obtain the operational taxonomic unit (OTU) table (pyrotags per sample). OTUs were compared with reference data sets (including the MAST data set generated here), and assigned to a taxonomic group when they had an e-value of below 10−100 against a reference sequence. This assignation was validated by phylogeny. Metazoan pyrotags were removed from the OTU table to calculate the contributions of different groups.

Phylogenetic analysis

Maximum likelihood trees with the aligned GenBank retrieved data set were done using RAxML (v7.2.8; Stamatakis, 2006) with the GTR+G evolutionary model under rapid hill climbing mode. Alternative trees were run to select the best likelihood tree and bootstrap was calculated with pseudoreplicates. Bootstrap values were added to the best tree with RAxML and the final trees were edited with Archaeopterix (Han and Zmasek, 2009). A series of preliminary trees (100 replicates for topology and bootstrap) were done to curate the data set by individual BLAST inspection of sequences forming long branches. Sometimes, similar sequences shorter than 1100 bp were identified and incorporated to the data set to break up the long branches (50 cases). In other instances, long-branch sequences were removed because they were chimeras (30 cases), had introns that disturbed the alignment (20 cases), or still remained unique (closest sequence below 92% similarity; 20 cases). The curated data set was merged with the 568 stramenopile sequences from SAGs and run with USEARCH to keep the longest sequence that represent OTUs clustered at 98% similarity. The final set of 1226 sequences, including 20 alveolate and rhizaria outgroups, was aligned de novo with MAFFT v6.903b (Katoh et al., 2002) using the E-INS-i algorithm, slightly edited at both ends (2720 final positions) and used to construct an ML tree with 1000 replicates for topology and bootstrap. Phylogenies were run in the Mare Nostrum supercomputer cluster (http://www.bsc.es/marenostrum-support-services).

Final curated data set and intra-group analyses

Once the ribogroups were well defined, a final check on NCBI release 193.0 (December 2012) was done to obtain the final list of 1762 MAST and 36 MOCH (Marine OCHrophyta) sequences (generally larger than 800 bp). Basic properties of the sites from which sequences originated, such as the habitat (marine or freshwater), substrate (plankton or sediment), oxygen content and marine region, were taken from the GenBank entry. Phylogenetic trees pointed to intragroup divergent sequences, which were often slightly edited by removing variable positions at the ends, likely due to sequencing errors or to short chimeric fragments (506 sequences edited). Then, the final MAST data set was exhaustively analyzed group by group. First, sequences were aligned by MAFFT using the E-INS-i algorithm and ML trees were done with 100 replicates for topology and bootstrap. Second, MAST groups with a substantial genetic diversity were subdivided into subclades supported by high bootstrap values. Third, USEARCH was used to cluster group sequences in OTUs defined at distinct similarity thresholds. And fourth, ML trees were done after adding the pyrotags to previous alignments with MAFFT, in order to assign the pyrotags to subclades. The final data sets of MAST and MOCH sequences are shown in Supplementary Tables S1 and S2.

Results

High-rank phylogeny of stramenopiles

A phylogenetic tree with curated 18S rDNA sequences from GenBank and from the Tara-Oceans SAG collection was done with the aim of recovering all existing lineages within stramenopiles as well as minimizing artifacts like chimeras or unclear novel diversity (Figure 1). All groups were recovered with very high bootstrap values (except Bicosoecida, which had high support in other trees). MAST ribogroups, forming distinct lineages within basal heterotrophic stramenopiles, were reevaluated using this tree. Some previously defined ribogroups did not appear in the final tree because they affiliated to formal taxa or were based on chimeric sequences, and therefore its use should be abandoned. This applies to one of the 12 seminal groups (MAST-5), to most MAST groups later described (except MAST-16) and to many of the mystery heterokont (MH) clades (Table 1). Three MH clades were included in the MAST groups in agreement with the original authors (T Richards, personal communication): MH-IV (excluding ME1-24 that was MAST-6) renamed as MAST-20, MH-IX renamed as MAST-21 (not shown in the tree due to its long branch and ambiguous placement) and MH-XII renamed as MAST-22. A second aim of this phylogeny was identifying new ribogroups. Surprisingly, only three additional MAST groups were found (MAST-23 to -25). In addition, five new ribogroups emerged within Ochrophyta, the large division including all chlorophyll-containing stramenopiles, and were named as MOCH for Marine Ochrophyta (Figure 1). The number of sequences from independent molecular surveys populating the final set of MAST and MOCH ribogroups is shown in Table 2.

Figure 1
figure 1

High-rank phylogenetic tree of stramenopiles based on the 18S rDNA gene. The tree includes formal taxa described in Adl et al. (2012) and ribogroups within basal stramenopiles (MAST) and Ochrophyta (MOCH). Only MAST-21 is not shown. Numbers in brackets before group names indicate the sequences within the group (distinct OTU98). Groups displaying 100% bootstrap values after 1000 pseudoreplicates are shown as thickened lines. Bootstrap values above 70% for the remaining nodes are also shown. Some large clades are marked, as dashed lines if the clade requires validation. The scale bar indicates 0.1 substitutions per position.

Table 1 List of described MAST or MH ribogroups that are not supported phylogenetically and should be abandoned
Table 2 Number of sequences of MAST and MOCH ribogroups in GenBank (longer than 800 bp), in the SAG collection (derived from aplastidic or plastidic sorts), and in the BioMarKs data set

While high bootstrap values characterized the groups shown in Figure 1, the relative order among them was less clear. One of the well-supported large clades was the Ochrophyta (93% bootstrap), and within them some relationships were recurrent: Diatomea and Bolidomonas (96%); Phaeophyceae, Chrysomerophyceae, Xantophyceae and Phaeothamniophyceae (80%); Dictyochophyceae, Pelagophyceae, MOCH-1 and -2 (100%). The placement of the remaining groups shifted in different trees. Non-ochrophyta stramenopiles were separated into three phylogenetic regions, following Cavalier-Smith and Scoble (2013). First, the Pseudofungi formed a set of separate lineages basal to Ochrophyta that included Pirsonia, Peronosporomyctes, Hyphochytriales, Developayella, and several MASTs. They were never monophyletic and their relative order shifted in different trees. Second, the Sagenista formed a clade binding Labyrinthulomycetes with many MAST groups. Although in the tree shown here this large clade did not have bootstrap values, it was generally supported in previous trees. Third, the Opalozoa formed another large clade uniting the remaining groups (74% bootstrap). Blastocystis, Placidida, MAST-3 and -12 were always related, whereas Bicosoecida related with other MAST groups. In the tree shown here, MAST-25 formed the most basal branch (but in other trees affiliated to Sagenista or Opalozoa).

New described ribogroups

Only three new MAST ribogroups were defined here, and they were formed by a limited number of sequences (Table 2). MAST-23 was retrieved in two sites with deep anoxic waters, the Cariaco Basin (Atlantic) and the Saanlich Inlet (Pacific), whereas was undetected in the pyrotag survey of European coastal waters. MAST-24 included GenBank sequences from anoxic waters (Cariaco Basin and Black Sea) together with pyrotags from anoxic waters. MAST-25 included GenBank sequences from surface Pacific picoplankton as well as SAGs from the Indian Ocean and coastal pyrotags. Whereas MAST-23 and MAST-24 seem to be anaerobic protists, MAST-25 likely is an aerobic heterotrophic picoplankter.

Five MOCH groups were described here at different positions within Ochrophyta (Figure 1). Each group included GenBank sequences from different marine regions, and all were well represented by BioMarKs pyrotags and, except MOCH-5, also by SAG sequences (Table 2). The BioMarKs data set indicated that MOCH cells formed part of the picoplankton and nanoplankton (Supplementary Table S3), and pointed MOCH-2 as particularly abundant (0.8% of pyrotags in both size fractions). MOCH-1 and -2 contained SAGs sorted as plastidic cells, and MOCH-5 included two cultures of phototrophs (now lost). So these three groups most likely represent new algal lineages. Conversely, MOCH-3 and MOCH-4 probably represent new heterotrophic lineages, since SAGs within them were sorted as aplastidic and some of their sequences derived from deep and dark samples.

Ecological analysis of MAST ribogroups

The ecological settings for each of the 18 MAST ribogroups were investigated by analyzing three independent data sets: sample properties originating GenBank sequences (Figure 2), the presence and sorting criteria of SAGs prepared from planktonic marine cells (Table 2), and the contribution of pyrotags in an European coastal survey (Table 3). Some clear patterns were inferred. First, the exhaustive GenBank screening did not reveal any ribogroup typical of freshwater systems. In addition, MAST groups were exclusively composed by marine sequences, with notable exceptions in MAST-2 and -12 (Figure 2a). Second, there were no particular geographic restrictions for any of the groups, which were consistently retrieved from samples in all oceans (Figure 2b). Third, four groups, MAST-1, -3, -4 and -7, dominated the data sets, contributing to 72% of GenBank sequences, 88% of SAGs and 80% of pyrotags (Table 2). Fourth, most groups derived from planktonic samples and were not detected in sediments, with the exception of MAST-1, -9 and -12 (Figure 2c). The BioMarKs survey demonstrated that MAST-6 and -20 were also important in sediments. Fifth, some groups were exclusive of anoxic habitats: MAST-16, -20, -21, -22, -23 and -24. These contributed little to the BioMarKs data set (which had a lower sequencing effort in anoxic habitats) and included no SAG (which derive from oxic waters). MAST-9 and -12 appeared both in oxic and in anoxic systems, whereas the rest were exclusive of oxic samples (Figure 2d). Virtually, all these oxic ribogroups included SAG representatives and were well represented in the BioMarKs data set, which also allowed inferring the cell size of these particular lineages. Thus, some were clearly picoplanktonic (MAST-4, -7, -8, -10, -11 and -25), whereas others were also nanoplanktonic (MAST-1, -2 and -3). Finally, the contribution of ribogroups was always larger than that of known taxa of basal heterotrophic stramenopiles (Table 2). Ribogroups accounted for a significant share of the picoplankton signal (5.6% on average) and had a lower contribution to nanoplankton (2.1%) and microplankton (0.2%).

Figure 2
figure 2

Ecological preferences of 11 MAST ribogroups inferred from sample attributes extracted from GenBank entries. Only ribogroups with a significant number of sequences (>20) are shown. Each panel shows the distribution of sequences within each group according to salinity (a), geographic origin (b), substrate (c) and oxygen (d).

Table 3 Relative abundance (% of pyrotags) of MAST ribogroups, basal taxa within stramenopiles, and unassigned stramenopiles in the different sample types of the BioMarKs data set

Phylogenetic structure of MAST ribogroups

The diversity displayed by each of the 18 ribogroups was not equivalent, as shown by the number of OTUs detected at 97% similarity (OTU97) and by the similarity threshold that collapses the group into a single OTU (Table 2). Some groups, such as MAST-2, -10, -20, and -21, were quite homogenous and exhibited only one OTU97. Among these, MAST-2 was particularly interesting, since its low genetic diversity was structured into several clades, one of which included sequences from freshwater bodies only (Supplementary Figure S1). These sequences were highly similar among them, and differed by only 1% with marine MAST-2 sequences. The remaining groups exhibited a substantial genetic structure. In some cases (MAST-6, -11, -16, -23, -24 and -25), this structure was not characterized due to the limited number of GenBank sequences (less than 8, Table 2). For the rest, we did phylogenetic trees to define specific subclades and detect their putative ecological distinctiveness. The eight ribogroups analyzed here (Figure 3) were subdivided into 4–12 subclades and all except one (MAST-4B) were well supported by very high bootstrap values (75–100%). Most subclades (30 out of 46) included SAG representatives as well as pyrotags (41 out of 46), often derived from oxic picoplankton. Subclades generally collapsed at around 96% similarity (threshold that forms a single OTU), although some were more (90%) or less (99%) diverse (Table 4).

Figure 3
figure 3

Specific phylogenetic trees of eight MAST ribogroups (those with a large genetic structure and >25 GenBank sequences). Each group is divided into phylogenetic subclades that have generally the highest bootstrap support after 100 pseudoreplicates (red dots when 100%). White dots show 100% bootstrap values for internal nodes. The scale bar indicates 0.05 substitutions per position and applies to all trees. The presence of SAGs from planktonic protists within each subclade is noted by orange dots. The distribution of pyrotags in three planktonic size fractions, anoxic waters (pico- and nanoplankton together) and sediments for each subclade is noted in the horizontal bars.

Table 4 Number of sequences in the three data sets, similarity value that forms a single OTU and the accession number of a representative sequence for subclades shown in Figure 3

MAST-1 was subdivided into four subclades. Two of them, MAST-1A and -1C, appeared in both the pico- and nanoplankton, and one MAST-1C lineage (not shown) concentrated GenBank sequences and pyrotags from sediments and anoxic waters. MAST-3 was the most diverse group (42 OTU97 and collapsing at 83%) and was subdivided into 12 subclades. Most were picoplanktonic, except some that were also present in the nanoplankton (MAST-3D and -3H) and MAST-3J that was also important in anoxic waters and sediments. Curiously, this clade contained the cultured Incisomonas uniciliatida. The other described MAST species from the genus Solenicola affiliated within MAST-3I. These two clades with described species, together with MAST-3H, were the only subclades without SAGs. The ribogroups MAST-4, -7 and -8 displayed five to six subclades each with little ecological distinctiveness, since almost all included SAGs and pyrotags from the oxic picoplankton. MAST-4F was one exception, since it did not contain pyrotags and included some GenBank sequences from anoxic waters (3 out of 6). Within MAST-8, the exceptions were MAST-8A (no SAGs), MAST-8D (also in the nanoplankton) and MAST-8F (no SAGs or pyrotags). Again, one of the two GenBank sequences from MAST-8F derived from anoxic plankton.

The remaining three groups were clearly distinct with respect to their ecological preferences (Figure 3), having a strong signal in anoxic plankton and sediments (as noted in Figure 2). Within MAST-9, two clades (MAST-9A and -9C) appeared to be more important in sediments (often anoxic), whereas MAST-9B and -9D appeared to be more important in anoxic plankton. Only MAST-9A and -9D had a significant contribution of pyrotags from oxic picoplankton and SAGs. MAST-12 was the group exhibiting the highest ecological differentiation among subclades. Thus, two subclades appeared as oxic picoplankton (MAST-12B and -12D, the later with one SAG), one was important in sediments (MAST-12E), one seemed to be very versatile, with one SAG and sequences from plankton, sediments and anoxic sites (MAST-12A) and one subclade was typical of freshwater and marine plankton, oxic non-marine sediments, and did not contain any pyrotag or SAG (MAST-12C). Finally, the last ribogroup, MAST-22 was divided into four subclades, being all of them retrieved from anoxic plankton, with the exception of MAST-22B that also included sequences retrieved from sediments.

Discussion

Novel diversity within stramenopiles

Identifying novel high-rank diversity was one of the primary goals of environmental molecular surveys (Dawson and Pace, 2002; Epstein and López-García, 2008) and indeed some high-rank groups were detected and characterized (Not et al., 2007). Nevertheless, often the putative novel groups were based on chimeric sequences or were divergent lineages within well-known taxa (Berney et al., 2004), highlighting the importance of robust phylogenies and careful sequence curation. Here, we did not try to find novel diversity at the highest level but within one of the most important eukaryotic supergroups, the stramenopiles, which include the ribogroups collectively named as MASTs (Massana et al., 2004). By compiling 18S rDNA sequences from different sources we aimed at updating the MAST groups. This was prone to similar artifacts, as many sequences that initially appeared as novel lineages were chimeras or formed long branches within described taxa (generally Labyrinthulomycetes or Bicosoecida). So, a number of the defined MAST ribogroups did not stand our phylogenetic evaluation and were discarded (listed in Table 1). Nevertheless, many of the described MAST groups (Massana et al., 2004; Richards and Bass, 2005; Orsi et al., 2011) remained valid in our phylogenies and were further supported by sequences from single planktonic cells (that is, SAGs) and pyrotags (Table 2).

Surprisingly enough, we identified very little novel diversity within basal heterotrophic stramenopiles, only three additional MAST clades (Table 2). These are rare in environmental surveys (0.7% of GenBank sequences, 1.7% of SAGs and 0.8% of pyrotags), which could explain why they had not been described before; most likely they are minor components among commonly sampled environments. In addition, our extensive phylogenetic survey unveiled novel diversity within Ochrophyta. These novel groups were sometimes pointed out in the original publication but remained unnamed (Le Gall et al., 2008; Bachy et al., 2011; Stock et al., 2012). Here, we defined five novel ochrophyta ribogroups, and one of them (MOCH-2) seemed to be a quantitatively important member of oxic pico- and nanoplankton. On the basis of the provenance of SAGs and GenBank sequences, it appeared that some MOCH groups were novel algal lineages while others were most likely heterotrophs.

One may wonder how much space is left for the discovery of novel diversity within basal heterotrophic stramenopiles. In our phylogenies, we removed single sequences that formed independent lineages, and some may represent low abundant novel taxa, but their validation requires the support of related sequences. Within the BioMarKs data set, pyrotags assigned to stramenopiles without a clear group affiliation (incertae sedis in Table 3) were always a small fraction. For instance, they only accounted for 0.4% of pyrotags in the picoplankton. These unassigned stramenopiles were dominated by a single lineage related to GenBank entry HQ867108, but the phylogenetic placement of these partial sequences could not be resolved. Unassigned pyrotags were also scarce in the other samples suggesting that new MAST groups, if they existed, would be minor members of natural assemblages. Nevertheless, particularly isolated systems like anoxic deep-sea basins or hydrothermal vents are generally less connected and overall underinvestigated (Dawson and Pace, 2002; López-García et al., 2003; Wylezich and Jürgens, 2011), so some could still potentially harbor a significant amount of undescribed diversity.

High-rank phylogeny of stramenopiles

The tree shown in Figure 1 captured the essence of stramenopile diversity sampled using 18S rDNA gene sequencing. Our extensive approach maximized the genetic diversity detected while excluded chimeras and artifacts by manual sequence curation. Stramenopiles appeared formed by a set of formally described taxa, following the recent protist classification (Adl et al., 2012), together with a number of phylogenetically dispersed ribogroups. Overall, 18 MAST and 5 MOCH ribogroups were defined. Interestingly, since sequences were obtained at random from GenBank and SAGs and then clustered at 98% similarity, the number of sequences in the tree provided a rough estimate of the diversity contained in each group. Thus, of a total of 1206 OTU98 displayed in this particular tree, Diatomea was the most diverse taxa (392 OTU98), followed by Chrysophyceae (233), Labyrinthulomycetes (123) and Bicosoecida (71).

The main division of stramenopiles is between Ochrophyta, which unites all photosynthetic taxa, and basal heterotrophic taxa (Cavalier-Smith and Chao, 2006). The dichotomy between photosynthetic ochrophyta and basal heterotrophs was evident when analyzing the SAG collection. Thus, most SAGs from MOCH clades were sorted as plastidic cells (8 of 13), whereas virtually all SAGs from MAST lineages were sorted as aplastidic cells (353 out of 362). This strong dichotomy gives support to name differently ribogroups in these two parts of the tree (MAST versus MOCH). There has been a substantial effort in organizing the stramenopile tree in smaller categories using phylogeny (Cavalier-Smith and Chao, 2006; Cavalier-Smith and Scoble, 2013). Thus, Ochrophyta were subdivided into two large clades, the Khakista (Diatomea plus Bolidomonas) and the Phaeista (the remaining taxa), and this was supported in our tree (Figure 1). Regarding the higher rank taxonomic names for basal stramenopiles, they were generally unstable. Thus, Pseudofungi did not form a monophyletic clade, Sagenista showed little bootstrap support, and only Opalozoa appeared to be robust in Figure 1. Validating these large clades would require multigene phylogenies, which may yield slightly different topologies than when using the single 18S rDNA gene (Riisberg et al., 2009).

The main barriers for MAST colonization

The analysis of GenBank sequences showed that most ribogroups contained sequences from all over the world (Figure 2b), suggesting the absence of marked geographic barriers for dispersal of these minute protists in the marine environment. There were cases when one particular group was absent from a given ocean, such as MAST-4 from polar systems, but this seemed to be an environmental (temperature) barrier (Rodríguez-Martínez et al., 2013). Our results confirmed the perception that small microeukaryotes exhibited the capacity for global dispersal, and that their establishment was determined by environmental conditions.

One striking finding was the absence of any typical freshwater ribogroup, despite freshwater sequences being retrieved in our database mining and also the fact that stramenopiles are considered to be important members of freshwater microbial assemblages (Boenigk and Arndt, 2002). Sampling effort in marine systems is certainly larger, but still there is a number of protist surveys in freshwater systems, and in these surveys the stramenopile sequences most often affiliate within Chrysophyceae and Bicosoecida (Richards et al., 2005). Therefore, MASTs (defined initially as Marine Stramenopiles) remain as typical marine groups. Our analysis, however, identified two ribogroups with a number of sequences deriving from freshwaters (Figure 2a), forming a separate subclade within each group. This points to two examples of marine freshwater transitions, a relatively infrequent phenomenon in the microbial world (Logares et al., 2009). Within MAST-2, almost identical sequences were retrieved from lake plankton in Europe, Asia and America (Supplementary Figure S1). This suggests a recent freshwater adaptation and subsequent global colonization. On the other hand, subclade MAST-12C contained more diverse sequences (clustering at 95%) deriving from more habitats (lake plankton, soils and marine plankton).

Besides the marine-freshwater boundary, there are other strong barriers for protist colonization, being perhaps the most important the oxygen status, strongly conditioning cell metabolism. In our survey, seven MAST groups were typical of anoxic systems, and these affiliated with Pseudofungi, Sagenista or Opalozoa. Additional anoxic sequences were detected in three MAST groups, always restricted to specific subclades: MAST-1C, MAST-3J and MAST-12A. The eight remaining ribogroups were retrieved almost exclusively from oxic sites. These data further support the view of oxygen as a fundamental driver of protist community structure, which creates a marked phylogenetic dichotomy between oxic and anoxic assemblages (Dawson and Pace, 2002; Wylezich and Jürgens, 2011).

Small heterotrophic flagellates in planktonic microbial food webs

A diverse assemblage of MAST cells from 11 separate evolutionary lineages (MAST-1 to -4, -7 to -12 and -25) thrives in surface oceans, arguably one of the largest biomes on earth. They include SAGs sorted as aplastidic planktonic protists, confirming their relevance in marine ecosystems and their heterotrophic nature. The groups studied in more detail, that is, MAST-1, -2, -3 and -4 (Massana et al., 2006, 2009; Lin et al., 2012; Piwosz et al., 2013) have been shown to be bacterial grazers, and it is likely that the other groups are also grazers. Thus, MAST cells form a heterogenous collection of small bacterial grazers that probably exhibit physiological differentiation and adaptation. Just by looking at the OTU number, which provides a conservative estimate of diversity contained in these uncultured protists, 98 OTU97 can be identified. Each of these phylogenetic units deserves a careful inspection, which can be based on FISH probes or targeted sequencing as has been recently done for MAST-4 to evaluate its genetic structure (Rodríguez-Martínez et al., 2012) and biogeography (Rodríguez-Martínez et al., 2013). Also, culturing attempts should be continued (Cavalier-Smith and Scoble, 2013; del Campo et al., 2013), and the potential of genomes obtained from single cells should be exploited by phylogenomics and genome reconstructions (Yoon et al., 2011). The ecological attributes and putative specialization of the different clades is an intriguing aspect to be analyzed. The key open question is what drives and maintains the diversity of these phylogenetically diverse but apparently functionally redundant small bacterial grazers.