Introduction

Picoeukaryotes are eukaryotes smaller than 2–3 μm in diameter. They occur in aquatic environments worldwide and are considered as fundamental components of marine ecosystems in the oceans (Sherr and Sherr, 2000). They contribute significantly to biomass and primary production and are particularly important in nutrient-rich coastal waters in which they are most abundant (Li, 1994; Worden and Not, 2008). During the last decade, studies based on cloning and sequencing of the 18S rRNA gene have revealed an unexpected diversity among picoeukaryotes in the oceans (Díez et al., 2001; López-García et al., 2001; Moon-van der Staay et al., 2001). Unfortunately, this approach suffers from potential cloning biases that may mask the real diversity of the community under study (Forns et al., 1997). In addition, community diversity may also be greatly underestimated by the limited throughput of the method (Bent and Forney, 2008).

The newly emerged 454 sequencing-by-synthesis technology is cloning independent and massively parallel (Margulies et al., 2005). The diversity of bacteria (Sogin et al., 2006) and archaea (Galand et al., 2009) in open oceans has been investigated with high-throughput amplicon sequencing targeting the hypervariable V6 region of the 16S rRNA gene. Targeting the V9 hypervariable region of the 18S rRNA gene, this technology has also been applied to marine protists recently (Amaral-Zettler et al., 2009). However, to the best of our knowledge, no similar studies have been carried out on marine picoeukaryotes. In this study, we explored the feasibility of using the 454 pyrosequencing technology to investigate the diversity and community composition of picoeukaryote assemblages in subtropical coastal waters of the western Pacific. Picoeukaryote assemblages between two sites with different hydrography and trophic status were also compared.

Materials and methods

Sample collection and DNA extraction

Surface seawater samples (2.5 l) were collected from two bays in northeastern Hong Kong in April 2007. Tolo Harbor (TH) is a landlocked bay with a long history of eutrophication (Wear et al., 1984; Chau, 2007). Located outside TH, Mirs Bay (MB) is relatively unpolluted and more exposed to ocean currents from the South China Sea (Hong Kong Environmental Protection Department, 2003). Water samples were filtered through a 200 μm mesh sieve immediately to remove most of the mesozooplankton and large particles. Water temperatures, salinities and dissolved oxygen levels were measured on board using a Hydrolab sensor (Austin, TX, USA). In the laboratory, chlorophyll a concentrations were determined using a Turner Designs 10-AU fluorometer (Sunnyvale, CA, USA) as described by Wong and Wong (2003). At the time of sample collection, chlorophyll a concentration was 9.1 μg l−1 in TH and 1.9 μg l−1 in MB (Table 1). On the basis of these values, TH and MB were considered to be eutrophic and oligomesotrophic, respectively (Molvaer et al., 1997).

Table 1 Sampling conditions and sequence characteristics of the two sites

Two liters of water was prefiltered through 3 μm pore size Nuclepore membranes (Whatman, Piscataway, NJ, USA) and the microbial biomass was then collected onto a GF/F filter (Whatman). A gentle vacuum (<20 cm Hg) created by a hand pump was used to facilitate the filtration processes. The filter was then immersed in DNA lysis buffer (0.75 M sucrose, 40 mM EDTA, 50 mM Tris-HCl (pH 8)), immediately frozen in liquid nitrogen and stored at −80 °C until DNA extraction. DNA was extracted after the cetyltrimethylammonium bromide extraction procedure (Doyle and Doyle, 1990).

PCR and pyrosequencing

PCR was performed using 454 sequencing adaptor-linked primers flanking the hypervariable V4 region of the 18S rRNA gene: A-528F (5′-gcctccctcgcgccatcag-GCGGTAATTCCAGCTCCAA-3′) and B-706R (5′-gccttgccagcccgctcag-AATCCRAGAATTTCACCTCT-3′) (adaptor sequences shown in lowercase) (Elwood et al., 1985). PCR mixtures (50 μl) were prepared in duplicate and each contained 2 μl of DNA template, 5 μl of 10 × PCR buffer (50 mM KCl, 10 mM Tris-HCl and 1.5 mM MgCl2), 200 μM of dNTP, 0.2 μM of each primer and 2.5 U Taq polymerase (Promega, Madison, WI, USA). The PCR thermal regime consisted of an initial denaturation of 3 min at 94 °C, followed by 30 cycles of 30 s at 94 °C, 30 s at 60 °C, 1 min at 72 °C and a final cycle of 5 min at 72 °C. PCR products were pooled and purified with the Qiaquick gel purification kit according to the manufacturer's instructions (Qiagen, Hilden, North Rhine-Westphalia, Germany). DNA concentration and quality were determined with a NanoDrop 1000 spectrophotometer (Wilmington, DE, USA).

Pyrosequencing of PCR products was performed on a Genome Sequencer FLX system at 454 Life Sciences (Branford, CT, USA). Sequences and quality scores from our pyrosequencing run were submitted to the NCBI short read archive (accession number SRA009090). Raw sequence reads were filtered before subsequent analyses to minimize the effects of random sequencing errors (Huse et al., 2007). Briefly, we eliminated sequence reads that (1) did not perfectly match the proximal PCR primer, (2) were too short (<244 bp) or too long (>302 bp), as determined by the sequence length distribution plots or (3) contained one or more ambiguous base(s) (N(s)).

Assignment of phylotype OTUs

Phylotype operational taxonomic units (OTUs) were assigned according to the best BLASTN hit (Altschul et al., 1990) against NCBI nucleotide sequence database NT (as of August 2008) with the following parameters: E value=10−5, minimum query coverage=95% and minimum identity=95%. Taxonomic group assignment of reads resembling GenBank sequences from environmental clones was obtained using the web-based software package KeyDNAtools as described by Guillou et al. (2008). Nontarget reads belonging to viruses, bacteria or metazoa were removed from further analyses.

Assignment of similarity-based OTUs and species richness estimators

Sequence reads from each sample were clustered to give similarity-based OTUs using cd-hit-est (Li and Godzik, 2006) with minimum sequence identity set to 90, 95, 98 or 100%. Parametric rarefaction curves were calculated in steps of 1000 specimens using Analytic Rarefaction (v 1.3, http://www.uga.edu/strata/software/anRareReadme.html). Nonparametric species richness ACE and classic Chao1 estimates were calculated using EstimateS (v 8, http://viceroy.eeb.uconn.edu/estimates).

Results

Characteristics of the pyrosequencing run

A total of 188 303 sequence reads with an average length of about 260 bp were generated in a single run of 454 pyrosequencing from the two water samples (Table 1). The filtering process removed about 12% of the raw sequence reads, leaving 87 789 and 76 817 high-quality target tags from the MB and TH samples, respectively. The average tag length was improved to about 272 bp. After BLAST searching, 61 671 and 48 208 sequences with 95% query coverage and top match identity remained for samples collected from MB and TH, respectively.

Composition and distribution of the picoeukaryotic assemblages

Trimmed tags were grouped under at least 19 high-level taxonomic groups (Supplementary Table 1) and over 93% of them fell into one of the nine major groups (Figure 1). Stramenopiles, dinoflagellates, ciliates and prasinophytes comprising about 27%, 19%, 11% and 11%, respectively, of the picoeukaryotes were the dominant groups. Although stramenopiles and dinoflagellates together contributed more than half (55.9%) of the total community in the sample from TH, five groups (stramenopiles, prasinophytes, dinoflagellates, ciliates and the novel alveolates group III (NAGIII)) shared similar contributions (each between 10% and 25%) for the sample from MB. NAGI, NAGIII and prasinophytes were at least threefold more abundant in MB than in TH. Conversely, the abundance of picobiliphytes and cercozoans in TH at least doubled than those in MB. The relative contributions of NAGII, ciliates and dinoflagellates were almost comparable between samples from TH and MB. Dinophyceae and stramenopiles were more abundant in TH, but the relative contributions of their phylotype OTUs were higher in MB (Supplementary Figure). Alternatively, there was more NAGIII in MB than in TH, although both samples shared comparable contributions of phylotype OTUs.

Figure 1
figure 1

Relative abundance of the nine dominant high-level taxonomic groups of picoeukaryotes. MB, Mirs Bay; TH, Tolo Harbor; NAG, novel alveolates group.

In both samples, the top 20 phylotype OTUs each contributed at least 1% and together represented about 60% of the entire picoeukaryote community (61.7% in MB and 56.7% in TH) (Table 2). In addition, more than half of the OTUs represented uncultured environmental clones (MB, 14; TH, 13). Although sequence reads resembling six GenBank entries (Pseudochattonella verruculosa, Verrucophora farcimen, Pterosperma cristatum, uncultured Dinophyceae clone dhot2b10, AMT15_1B_26 and uncultured NAGII clone SA1_4G10) were common in both bays, the abundance of all other top phylotype OTUs differed between TH and MB. More than half of the top phylotype OTUs were at least four times more abundant in terms of percentage contribution in one sample than in the other.

Table 2 Top 20 phylotype OTUs in samples from the two sampling sites

Diversity of the picoeukaryotic assemblages

The coverage of libraries at cluster distances 0.05 was high for both samples, with rarefaction curves reaching saturation. However, rarefaction curves at a cluster distance of 0.02 were still in an increasing trend (Figure 2). The numbers of OTUs detected were generally close to those estimated by the nonparametric ACE and Chao1 estimators at cluster distances 0.05, with percentages of detected OTUs/estimated OTUs over 75% (Table 3), but the difference between the observed and estimated values at cluster distances 0.02 was large, especially when identical tags were grouped into unique OTUs, with percentages of detected OTUs/estimated OTUs less than 20%.

Figure 2
figure 2

Rarefaction curves of similarity-based operational taxonomic units (OTUs) at cluster distance values of 0.02, 0.05 and 0.10. MB, Mirs Bay; TH, Tolo Harbor.

Table 3 Similarity-based OTUs and species richness estimates of the two libraries

Discussion

Methodology

In recent years, the cloning and sequencing approach has been widely used to study the diversity and community composition of picoeukaryotes (Díez et al., 2001; López-García et al., 2001; Moon-van der Staay et al., 2001). However, with an examination of typically 100 clones per library (Vaulot et al., 2008), the data generated by this approach only provide general information on the structure of picoeukaryote communities but are not sufficient for meaningful comparisons among libraries. In this study, water samples were collected from two sites of different trophic status and about 100 000 sequence reads were derived from each water sample to facilitate an in-depth investigation into the structure of the picoeukaryote assemblages. However, as only one sample per site was examined, conclusions about picoeukaryotic communities are limited to the respective samples.

In a previous study at the same two sites, picoeukaryotic diversity was studied using the traditional cloning and sequencing method (Cheung et al., 2008). Sequencing in partial PCR products amplified using the widely adopted primer set Euk328f and Euk329r (Moon-van der Staay et al., 2001) with the primer Euk528f (Elwood et al., 1985) allowed us to recover at least 19 high-level taxonomic groups of picoeukaryotes. In this study, at least 19 high-level taxonomic groups were recovered using the primer set, A-528F and B-706R, which targets the same hypervariable V4 region (Supplementary Table 1). Although the dominant groups were retrieved by both methods, sequences belonging to nucleomorphs and ellobiopsids were only detected in the previous study, and sequences belonging to NAGIII, centrohelids and some other groups of chlorophytes apart from prasinophytes were only recovered in this study. However, except for NAGIII, which is a newly defined alveolates group (Guillou et al., 2008), all the other groups were only represented in low abundance. The results reveal a comparable coverage of high-level taxonomic groups with the two methods using different primer sets targeting the same DNA region.

It is difficult to assign a correct taxonomic identity to tags that are too divergent from sequences available from the reference database, particularly when the database is not exhaustive enough, similar to the eukaryotic SSU rRNA database under investigation here. Thus, after BLAST searching, sequences that showed a query coverage or top match identity of 95% were excluded from further analyses. This stringent process removed about 30% of the MB and 37% of the TH trimmed tags but secured a more accurate taxonomic identification of the remaining reads. Indeed, a significant amount (up to about 30%) of fairly divergent sequence tags, with GAST distances greater than 0.10, was also recovered in a previous study of protists using the 454 technology (Amaral-Zettler et al., 2009). It is worth noting that as more sequences are accumulated in the database, the assigned taxonomic identity of tags with relatively low similarity to previous entries may change. Therefore, the results presented here were restricted to our current knowledge on picoeukaryotes.

Intrinsic errors of the 454 pyrosequencing technology could lead to overestimation of species richness. We followed the quality filtering strategy proposed by Huse et al. (2007), which ensures that the per-base error rates of GS20 pyrosequencing reads will be comparable or lower than the conventional Sanger sequencing method. However, an examination of the 16S rRNA genes of a single Escherichia coli strain by Kunin et al. (2009) suggested that diversity was still being overestimated in pyrosequencing data sets after the quality filtering process. According to their analysis, our diversity estimates at cluster distances of 0.10, 0.05 and 0.02 may have been overestimated by twofold, threefold and more than ninefold, respectively.

Diversity of the picoeukaryotic assemblages

Parametric rarefaction curves and nonparametric ACE and Chao1 estimators were used to estimate OTU richness in this study. The nonparametric richness estimators have been used in diversity studies of picoeukaryotes/protists only recently (Massana et al., 2004; Zuendorf et al., 2006; Countway et al., 2007). Using a 95% phylotype cutoff, Countway et al. (2007) studied protistan diversity from 18S rRNA clone libraries and reported ACE and Chao1 values of around 400 for the euphotic zone of the Gulf Stream. Using 454 pyrosequencing technology and the same phylotype cutoff, Amaral-Zettler et al. (2009) obtained incidence-based ICE and Chao2 estimates of about 4000 for protistan diversity in a polluted estuary site off the coast of Massachusetts. For picoeukaryotic diversity, Massana et al. (2004) reported one of the highest Chao1 estimates of nearly 200 in an oligotrophic coastal site at Blanes Bay by grouping clones sharing the same RFLP patterns as unique OTUs. In this study, the calculated ACE and Chao1 estimates for picoeukaryotic diversity in subtropical coastal waters ranged from 700 in the TH sample at a cluster distance of 0.10 to 6600 in the MB sample at a cluster distance of 0.02. These values are generally much higher than those published by other investigators for picoeukaryotic/protistan diversity. However, the above comparison is rough as it is difficult to compare estimates based on different OTU defining and/or sequencing strategies. Also, in contrast to prokaryotes, the copy number of SSU rRNA genes varies widely among eukaryotes (Zhu et al., 2005), which in turn may affect the estimates calculated. According to the correlation between cell length and SSU rRNA gene copy numbers of eukaryotes estimated by QPCR, picoeukaryotes are expected to have less than 10 copies of SSU rRNA genes.

Rarefaction curves predict that additional sampling would lead to significantly higher estimates of total diversity at a cluster distance of 0.02, even by examining nearly 100 000 sequences per sample (Figure 2). As this threshold corresponds roughly to the genus/species level of picoeukaryotes (Romari and Vaulot, 2004), the results suggest that further increasing the sampling effort would reveal more genera/species of picoeukaryotes in the water samples. This finding confirms that the diversity of picoeukaryotes is unexpectedly high and still largely unknown (Mackey et al., 2002).

A unimodal relationship between diversity and productivity for herbaceous vegetation has long been observed in terrestrial systems (Grime, 1973). Recent analysis of a global data set has revealed a similar pattern between marine phytoplankton diversity and phytoplankton biomass (Irigoien et al., 2004). Lefranc et al. (2005) used the cloning and sequencing method to study the genetic diversity of small eukaryotes (<5 μm) in three lakes in France and reported a higher diversity in the oligomesotrophic lake than in the oligotrophic and eutrophic lakes. In this study, both parametric rarefaction curves and nonparametric richness estimators revealed a higher diversity of picoeukaryotes in the oligomesotrophic MB than in the eutrophic TH (Table 3, Figure 2). This finding is in good agreement with that of Lefranc et al. (2005) and provides further evidence on the classic unimodal diversity–productivity relationship.

Composition and distribution of the picoeukaryotic assemblages

Water samples collected from two bays along the subtropical Pacific coast harbored different assemblages of picoeukaryotes, as revealed by the few shared top phylotype OTUs (Table 2) as well as the differential contribution by the most represented groups (Figure 1).

Prasinophytes recovered from this study (Table 2) and from other environmental genetic libraries (Worden, 2006) included mainly uncultured Bathycoccus, Micromonas and Ostreococcus spp. within the order Mamiellales. These organisms are more common in coastal areas than in open oceans (Not et al., 2005). In our study, prasinophytes were more abundant in open waters of MB than in the eutrophic waters of the semi-enclosed TH (Figure 1). A similar result was found in our previous study using the cloning and sequencing strategy (Cheung et al., 2008). Viprey et al. (2008) reported a greater dominance by Micromonas and Ostreococcus environmental sequences in relatively mesotrophic waters than in oligotrophic coastal waters. Taken together, these results suggest that these genera are well adapted to coastal waters of intermediate productivity, although the role of water temperature should not be neglected (Lovejoy et al., 2007).

Picobiliphytes are a recently defined algal group with a phycobiliprotein-containing plastid (Not et al., 2007). Our previous study showed that picobiliphytes are prevalent in eutrophic waters (Cheung et al., 2008), and this finding is supported by the results obtained in this study (Figure 1). The repeated recovery of picobiliphytes in our samples also provides further evidence on their prevalence in subtropical waters (Cheung et al., 2008, Cuvelier et al., 2008). NAGIII, a newly defined alveolates group, contains environmental sequences from oceanic and coastal waters (Guillou et al., 2008). Since its recent discovery, little is known about its ecology. Our study provides information about its dominance in waters of lower productivity (Figure 1), but more evidence has to be provided for confirmation.

Most of the top phylotype OTU matches in the water sample from MB were from environmental studies performed in oceanic or coastal waters (Supplementary Table 2). In contrast, a noticeable number of OTUs in the water sample from TH were retrieved either from anoxic sediments or from water samples collected during algal blooms. For instance, the third most abundant phylotype OTU in the MB sample (uncultured prasinophyte clone PROSOPE.C1-80m.96) was retrieved from the Mediterranean Sea (Viprey et al., 2008), whereas the third most abundant OTU in the TH sample (uncultured ciliate clone 9_153) was recovered from the anoxic layer of the East Sea sediment in the northwestern Pacific (Park et al., 2008). In addition, stramenopiles, which have been found abundantly in anoxic and extreme environments (Stoeck and Epstein, 2003; Luo et al., 2005), seemed to dominate the TH sample (Figure 1). Because high levels of phytoplankton biomass could lead to hypoxia/anoxia (Roman et al., 1993), the presence of OTUs with high adaptability to anoxic environments in the eutrophic waters of TH is not unexpected.

Conclusions

In this study, we successfully applied the massively parallel 454 sequencing-by-synthesis technology to describe the composition and genetic diversity of marine picoeukaryotes off the subtropical western Pacific coast. Our results showed that, even by examining nearly a 100 000 sequences per sample, a greater sampling effort would likely harbor more taxa of picoeukaryotes. Water samples of different trophic status contained different high-level taxonomic groups and phylotype OTUs of picoeukaryotes. With the ultrahigh resolution power provided by 454 pyrosequencing, more in-depth examinations of picoeukaryotic diversity in various aquatic ecosystems are possible. In the future, this technology can be used to answer ecological questions related to spatial and temporal patterns as well as to identify ecological responses to changing environmental conditions that require an accurate characterization of community composition.