Introduction

Protozoa are essential organisms in soil ecosystems primarily because they have a significant role in the soil food web as bacterial grazers (Ekelund and Rønn, 1994). Moreover, protozoan diversity is a good index of ecosystem function (Griffiths et al., 2000); in particular, it has been suggested that the high diversity allows the soil protozoan community to respond to both seasonal and environmental change (Bamforth, 1995). As fundamentally aquatic organisms, one would a priori expect protozoa to be negatively affected by drought, however empirical evidence is ambiguous. Whereas Eisenhauer et al. (2012) reported that drought decreased protozoan abundance, Schmitt and Glaser (2011) found that drought increased the protozoan diversity. Because of the large unknown protozoan diversity, and the difficulty in identifying many species morphologically, it has been problematic to evaluate such contrasting views.

However, the fast development in high-throughput sequencing techniques (HTS) now offers tools to answer such questions; still, only few such studies have specifically targeted soil protozoa. One reason for this is that heterotrophic soil protozoa include members from at least four kingdoms/supergroups (Baldauf et al., 2000). Therefore, they can inherently only be amplified with general eukaryotic primers, which will unavoidably also amplify DNA from other organisms abundant in soil. Thus, sequences from metazoans, fungi and plants often dominate samples, and must be removed prior to further analysis. Further, in case of micro-eukaryotes, we only have a comprehensive reference database for the small subunit ribosomal RNA (18S) gene. The copy number of this gene may vary considerably between different eukaryotic groups (Zhu et al., 2005), which complicates quantitative comparison of community composition based on sequence abundance.

Here, we tackle these problems by targeting a specific protozoan group, the Cercozoa; for a phylogenetic overview of this group see Bass et al. (2009b). Cercozoa encompass a high morphological diversity. They include large testate amoebae such as Euglypha and Trinema, naked filose/reticulate amoebae such as vampyrellids, granofiloseans and Filoreta, and gliding flagellates such as cercomonads, glissomonads and thaumatomonads (Bass et al., 2009a; Hess et al., 2012; Berney et al., 2013). Owing to the high morphological and physiological diversity, Cercozoa also show high functional and ecological diversity. Hence, changes in the relative abundance of different cercozoan groups could potentially be a valuable indicator of environmental change. For example, paleohydrological studies have shown that testate amoebae are sensitive indicators of water content in peat bogs (Charman and Warner, 1992; Booth, 2001) and there is evidence that testate amoebae also respond negatively to drought in more arid soils (Lousier, 1974a, b; Wilkinson and Mitchell 2010). Hence, it is likely that testate Cercozoa, such as Euglypha and Trinema, would decrease in abundance in response to drought.

Morphological methods have long suggested that Cercozoa is one of the dominant groups of free-living eukaryotic microorganisms in temperate soils (Sandon, 1927; Ekelund et al., 2001). This has been confirmed by recent HTS-based studies. Bates et al. (2013) found that Cercozoa accounted for ~30% of the identifiable protozoan 18S reads in arid or semi-arid soils and ~15% in more humid soils. In a transcriptomic analysis of soil protist activity, Geisen et al. (2015) found that 40–60% of all identified protozoan small subunit ribosomal RNAs in forest and grassland soils could be assigned to Cercozoa. Cercozoa are also abundant in marine benthic and interstitial communities; a recent HTS study using general eukaryotic primers found Cercozoa to comprise between 9 and 24% of all assigned eukaryotic operational taxonomic units (OTUs) on the ocean floor (Pawlowski et al., 2011).

Several recent papers have named many new cercozoan taxa at species, genus and family level from temperate topsoil (Bass et al., 2009b; Howe et al., 2009, 2011; Chatelain et al., 2013). Sanger sequencing of environmental DNA in Cercozoa (Bass and Cavalier-Smith, 2004) has shown a further undescribed diversity. However, the number of sequences obtained by this method is low compared with HTS methods, and though HTS is now ubiquitous in studies of microbial ecology, to our knowledge no HTS-based study has yet targeted Cercozoa directly. Reasons for this likely include lack of standard protocols with consensus on using a suitable region of the 18S ribosomal RNA gene and lack of good primers. Moreover, existing reference databases with sequences for OTU identification are rather limited. For example, as of November 2015, the Silva database contained only ~300 full-length cercozoan 18S sequences.

Here we present an HTS-based analysis of soil Cercozoa. We first identified the best suited 18S region in Cercozoa; next we designed an appropriate primer to target this region. We then used this primer to target soil Cercozoa directly in an experiment where we tested the hypothesis that testate Cercozoa respond negatively to drought. To annotate sequences, we used a yet unpublished database (David Bass, in preparation) that contains 966 cercozoan 18S sequences of high-quality, which cover the entire group. We hope that we provide theoretical and practical foundations needed to establish a frame for future comparative molecular analyses of cercozoan diversity.

Materials and methods

Study site and soil sampling

Projected climate change for Denmark in the 2100th century indicates drier summers, which are experimentally simulated on the CLIMAITE study site (Larsen et al., 2011). The CLIMAITE experimental site is situated in a dry heath-/grassland 50 km NW of Copenhagen, Denmark (55° 53′ N, 11° 58′ E). The mineral fraction of the soil consists of 92% sand, 5.8% silt and 2.2% clay (Nielsen et al., 2009). The site is well drained with an organic top layer (O-horizon). The pHCaCl2 in the O-horizon is 3.3 increasing to 4.5 in the lower B-horizon. The dominant vegetation consists of the dwarf shrub Calluna vulgaris (c. 30% cover) and the perennial grass Deschampsia flexuosa (c. 70% cover). The annual mean temperature is 8 °C with a mean precipitation of 613 mm (Danish Meteorological Institute, www.dmi.dk). Since 2005, a complete three-factorial treatment with increased CO2, temperature and summer drought has been maintained in 12 four-chamber octagons (7 mm in diameter), where the fourth octagon is a control plot with no treatments. Each treatment is replicated seven times. These treatments are intended to mimic the projected climate change for the region in 2075. Drought is induced once or twice a year by automatic rain shelters, which exclude the precipitation continuously for 2–5 weeks until the water content plunges below 5% by volume in the upper 20 cm of the soil. Larsen et al. (2011) provide a detailed description of the experiment.

In November 2010, we sampled topsoil (0–8 cm, O-horizon) from two of the six drought plots 1–2 (Dry1) and 3–1 (Dry2) and from two of the control plots 9–1 (Control1) and 11–4 (Control2). To minimise the spatial variation, we pooled and mixed three subsamples from each plot for the subsequent analyses.

Comparisons of hypervariable regions

The entire 18S region typically spans a length of ~2000 bp, whereas HTS methods have not been able to amplify >500–700 bp at most. Therefore, it is necessary to identify good representative parts of the 18S region for HTS analyses. Prime candidates are the eight hypervariable regions labelled V1–V5 and V7–V9 (the V6 hypervariable region found in bacterial 16S is absent from eukaryotic 18S (Howe et al., 2011). As the V4 region is the longest of the eight hypervariable regions, we found it the most attractive for taxonomic annotation. However, we wished to make sure that the diversity in V4 correlated reasonably well with the total 18S diversity as compared with the other hypervariable regions.

An appropriate clustering level-threshold constitutes another special problem in HTS analyses. Thus, it is necessary to choose a percentage-wise separation threshold when clustering the obtained sequences into OTUs. To obtain a robust theoretical foundation for HTS analysis, we first obtained two data sets of named cercozoan 18S Sanger-generated sequences. In June 2015, we obtained one set of 63 species from GenBank. This set contained the whole 18S region including all eight hypervariable regions in their entirety. We used the 63 sequence set to identify the V4 region as the best representative of the entire 18S diversity. The other, larger set consisted of 193 partial sequences that were at least 1500 bp long and all contained the V4 region. We included only named species documented in recent papers (Ekelund et al., 2004; Hoppenrath and Leander, 2006; Lara et al., 2007; Wylezich et al., 2007; Bass et al., 2009a, b; Burki et al., 2010; Chantangsi and Leander, 2010a, b; Heger et al., 2010; Heger et al., 2011; Howe et al., 2011; Yabuki and Ishida, 2011). This sequence set represented all nine cercozoan classes sensu Cavalier-Smith and Chao (2003), and contained no duplicate names or synonyms. We used this set of 193 sequences to evaluate the interspecific distances in the 18S region most suitable for separation of OTUs in the Cercozoa, and to test the effects of different OTU separation thresholds. Names and accessions numbers of the 63+193 sequences are listed in Supplementary data (tables 1 and 2).

In order to precisely identify the position of the hypervariable regions in the Cercozoa, we use the E-ins-i algorithm in MAFFT (Katoh and Standley, 2013) to align the 63 sequences along with the complete sequence of the fungus AF258606 Scytalidium hyalinum, which had the start and end of each V1–V9 regions fully annotated in its documentation on GenBank. Using the command dist.seqs in MOTHUR (Schloss et al., 2009), counting all indels as one event without penalising end gaps, we then calculated all possible uncorrected P-distances between the sequences for the whole 18S and for each of the eight hypervariable regions V1–V9, and correlated these P-distances for each region with the complete 18S. In this manner, we tested how well the genetic diversity of the respective regions correlated with that of the complete 18S region. All graphics and statistics were done in R (Ihaka and Gentleman, 1996).

Sequence variation between known species

To examine the congruence between already described Cercozoa and the genetic distances in V4, we aligned the data set of 193 sequences with MAFFT using the E-ins-i algorithm and calculated the uncorrected P-distances for all 18 527 pairs in MOTHUR. We then clustered the V4 region of these 193 sequences with the furthest neighbour-algorithm (implemented in MOTHUR) for all thresholds between 0 and 10%.

DNA extraction, primer design and initial cloning check

We extracted DNA from 0.5 g of fresh soil within 24 h of soil sampling. We used a genomic mini spin kit for universal DNA isolation (A&A biotechnology, Gdynia, Poland) with a standard protocol (Yu and Mohn, 1999). Based on the 193-sequence alignment, we designed primers that would amplify the majority of named key soil cercozoan genera within Granofilosea, Imbricatea, Cryomonadida, Cercomonadida, Glissomonadida and Euglyphida with no—or in some cases one—mismatch in the primer sequence. We accepted a slight bias against some members within these taxa, and against some genera, for example, Cyphoderia, Platyreta, Arachnula and Filoreta (two mismatches) and some bias against Chlorarachniophyta, Phytomyxea and Ascetosporea (notably Haplosporida and Mikrocytida) and other endomyxan lineages, and the genera Helkesimastix, Sainouron, Cholamonas (Cavalier-Smith et al., 2009), Reticulamoeba (Bass et al., 2012), and Rosculus and Guttulinopsis (Bass et al., submitted) with three or more mismatches in each primer.

From the alignment of all these genera, we used the representative sequence AF411270 Cercomonas longicauda as template in Primer3 (Rozen and Skaletsky, 2000) to find two compatible primers: the forward primer Cerc479F (5’TGTTGCAGTTAAAAAGCTCGT-3’, Tm=57.8 °C) and the reverse primer Cerc750R (5’TGAATACTAGCACCCCCAAC-3’, Tm=57.5 °C). To check the specificity of the primers, we performed an initial PCR and cloning of 50 sequences. The PCR master mix (25 μl) consisted of 1 × High Fidelity buffer (Invitrogen, Carlsbad, CA) with MgCl2, 0.25 mM deoxynucleotides mixture, 1 μl 100 × bovine serum albumin, 0.5 IU Phusion Hot Start DNA polymerase (5 units μl–1, Invitrogen 0.4 μM) of each primer, 1 μl DNA template. The PCR incubation conditions consisted of an initial denaturation step of 94 °C for 5 min; 30 cycles of denaturation at 94 °C for 60 s, annealing at 55 °C for 60 s and elongation at 68 °C for 60 s; and finally, an extension step of 72 °C for 7 min. We chose to lower the annealing temperature from the theoretical optimum of the primers to compensate for the mismatches. Cloning was performed using TOPO TA Cloning Kit from Invitrogen, and sequencing of 50 supposedly positive clones from this PCR was done by MACROGEN in Seoul, South Korea.

DNA amplification and GS-FLX Pyrosequencing

The samples were prepared for GS-FLX pyrosequencing in a two-step PCR. We used a Platinum Taq DNA High Fidelity polymerase (5 units μl−1, Invitrogen); otherwise the master-mix and the PCR incubation conditions were as above. To eliminate as many primer-dimers as possible, the products were incubated at 70 °C for 4 min and then stored immediately on ice before electrophoresis. We loaded the PCR products on a 1% agarose gel with ethidium bromide, which confirmed the presence of a single band in the desired length of ~250–300 bp with ultraviolet illumination. The bands of PCR products were excised from the agarose gel and purified by the Montage DNA Gel Extraction kit (Millipore, Bedford, MA).

The second PCR amplification was performed with fusion primers consisting of the raw primers above with the B-adaptors and four MID-tag barcodes of 10 bp added upon the forward primer and was amplified using only 15 PCR cycles. Otherwise, PCR incubation conditions, electrophoresis, gel excision and purification were as above. The amplified DNA from the second PCR was quantified with the Qubit dsDNA HS Assay Kit and the Qubit fluorometer (Invitrogen, Life technologies, Carlsbad, CA, USA) and mixed in approximately equal molar concentration (5 × 106 copies μl−1) to ensure an approximately equal representation of sequences on each sample. A GS-FLX Titanium sequencing run was then performed on a 70_75 GS PicoTiterPlate (PTP) using a GS-FLX Titanium pyrosequencing system according to manufacturer instructions (Roche Diagnostics, Basel, Switzerland) at the National High-throughput DNA Sequencing Centre (Copenhagen, Denmark).

Bioinformatic analyses

In several taxonomic groups, including Rhizaria, error rates primarily linked to singletons and homopolymers may cause a considerable overestimation of diversity; especially in the V4 compared with the V9 region. GS-FLX Titanium was particularly susceptible to such errors compared with the GS-FLX standard kit, even when reads are clustered up to a 3% level (Behnke et al., 2011). Hence, to eliminate such errors, we applied a strict quality sorting approach in our analysis; we eliminated singletons and long homopolymers, and chose a conservative 5% clustering threshold.

The titanium run produced 689 988 reads. We analysed it through the Qiime pipeline (Caporaso et al., 2010) and discarded all reads that had a quality score below 25 or had any mismatches in the primer or MID-tag sequences. We also discarded reads with a length outside 200–1000 bps, as the shortest cercozoan among the 193 named species had a V4 length of 218 bp. This left 494 963 reads, which were run through ACACIA (Bragg et al., 2012) to discard homopolymers >6 bp. Chimeras were removed with UCHIME (Edgar et al., 2011). This removed further 22 254 and 12 310 reads, respectively. The rest were clustered at 5% with UCLUST (Edgar, 2010), and 838 post-clustering singletons were subsequently discarded. Representative sequences from the resulting 1745 OTUs were blasted using nblast (Altschul et al., 1990). We removed another 160 OTUs that either had no BLAST hit (two OTUs), were presumed chimera with different BLAST hits of the 5’ and 3’ end (2), had top hits to non-target organisms (17 fungi, 2 ciliates, 1 heterokont), or had a query coverage of 60% or below (136 OTUs); and thus perhaps were chimeras. All the rest blasted to Cercozoa with a similarity of 80% or more to the most similar hit in GenBank. The final data set consisted of 443 350 sequences, distributed on the plots with 85 305 from Control1, 94 975 from Control2, 116 388 from Dry1 and 146 682 from Dry2. To obtain comparable data for rarefaction curves and statistical tests, we further resampled down to 85 305 sequences per plot. The data (the sff file) and barcode information has been archived on GenBank in the Sequence read Archive under the experiment number SRX1054896.

Taxonomic affiliation of OTUs

In some groups of organisms, an argument for choosing a particular clustering threshold can be made by identifying a ‘barcoding gap’ (or barcoding window); that is, a gap between non overlapping distributions of taxa. This approach has, for example, been used in several fungal groups (Frøslev et al., 2007; Jargeat et al., 2010; Harder et al., 2013). Unfortunately, our analysis of the 193 sequences assigned with a name shows that no such barcoding gap exists in Cercozoa (see also results and discussion), as the distribution of the interspecific diversity extends continuously all the way down to 0%. Hence, to eliminate as much artefactual diversity as possible without over-compromising phylogenetic resolution, we chose a conservative 5% level for OTU separation. We first blastn-searched the remaining 1143 OTUs locally against a custom cercozoan database (David Bass, in preperation). This enabled us to group them roughly into higher level groups approximating to Order/Class. We then used GenBank to blastn-search for representatives for these higher level groups. The top blastn hits, including as many named or otherwise characterized database sequences as possible, were retrieved and aligned with the OTUs generated in this study using the E-ins-i algorithm in MAFFT (Katoh and Standley, 2013) and phylogenetically analysed using RaxML BlackBox (Stamatakis et al., 2008). We constructed an ML tree in RAxML BlackBox (using the GTRGAMMA) of the HTS sequences within a set of longer 18S reference sequences using the approach described in Dunthorn et al. (2014). An approach using only V4 for the whole analysis gave a similar result but with less backbone resolution because this approach removes informative data from the analysis.

We used the resulting trees to assign OTUs as far as possible to named genera, higher level groups or environmental clades. We used a similar taxonomic framework as the one used in several recent studies of cercomonads, glissomonads, Granofilosea and other rhizopodial forms, as well as Cercozoa in general (Bass et al., 2009a, b; Howe et al., 2009, 2011). We included the % sequence identity in the OTU name to indicate the degree of similarity of the OTU to the best-matched database sequence. We made no attempt to assign any OTUs to species level. This may be possible for 100% complete 18S ribosomal DNA reads, however, in the vast majority of cases it would be misleading to imply such a high resolution. We considered OTUs assigned to the orders Euglyphida, Cryomonadida, and Thecofilosea, and the genera Trinema, Rhogostoma, Corythion, Cyphoderia, Ovulinata, Euglypha, Trachelocorythion, Assulina, Pseudodifflugia, Tracheleuglypha and Ebria as testate.

Results and discussion

Primer specificity, choice of amplification region and clustering threshold

The V4 regions is the longest of the 18S hypervariable regions (average V4 length=253.5 bp) and the only one long enough for serious phylogenetic analysis. For this reason we would have considered this region preferable even if it had correlated slightly worse than a much shorter region. However, the sequence variability of the V4 region also turned out to correlate more strongly with the entire 18S of the 63 complete sequences than the other hypervariable regions (R2=0.89, Figure 1). The V2 region also correlates well (R2=0.88) but is much shorter with 161.8 bp on average. All other hypervariable regions are shorter than 100 bp on average and had R2 values below 0.7 (Figure 1). This is important as the V9 region has been used extensively in HTS analyses of eukaryotic diversity (Amaral-Zettler et al., 2009; Stoeck et al., 2009; Behnke et al., 2011) and has been suggested as a prime candidate especially for measuring protist lineage richness (Amaral-Zettler et al., 2009). There were three reasons for recommending V9. First, it is comparatively short length (75–150 bp) enabled sequencing with first-generation Illumina/Solexa; second, it had an apparently lower sequencing error rate compared with the longer V4 region (Behnke et al., 2011). Third, it appeared to yield less-biased results across the broad taxonomic groups with general eukaryotic primers (Stoeck et al., 2010).

Figure 1
figure 1

Linear regression of the percentage-wise DNA distance (uncorrected P-distance) of the eight hypervariable regions (V1–V9) against the entire 18S sequence from 63 complete cercozoan sequences. Dark grey shading=99% confidence intervals, light grey shading=95% confidence intervals.

However, the rapid development in HTS analyses means that the read lengths of, for example, standard Illumina MiSEQ (2 × 250 bp) easily amplifies any hypervariable region. Furthermore, Behnke et al. (2011) found that the error rate in V4 could be greatly reduced by rigorous elimination of singletons and conservative clustering approaches, as we did here. Finally, our analysis of the hypervariable regions in Cercozoa shows that V4 correlates much better with the entire 18S than V9. This is in line with results found for other protist groups. Thus, the V4 seems more appropriate as barcode region than V9 in specific studies on ciliates (Dunthorn et al., 2012), it has been suggested as a barcode in diatoms (Luddington et al., 2012), and has been argued to be the best ‘pre-barcoding’ region for protists as a whole (Pawlowski et al., 2012). In dinoflagellates, the V1–V4 regions outperform the other half of the 18S in taxonomic resolution (Ki, 2011). A recent study of oceanic eukaryotes with general V9 primers found that although they obtained a good representative sample across eukaryotic phyla, close to 90% of the protist sequences could not be assigned to genus or species level (de Vargas et al., 2015). All in all, this suggests that, although V9 may be a good choice for studies that target eukaryotes using general primers, V4 (and adjacent regions) is a better choice for 18S-based diversity analyses that specifically target subgroups such as Cercozoa and several other protist groups.

However, we must stress that the V4 region is too conservative to separate several of the sequences assigned a species name in the 193-sequence set that we used. Many 18S-identical cercozoan strains exhibit consistently different phenotypes and ecological preferences, and one single 18S-type may harbour several different ITS1-types (Bass et al., 2007). Hence, application of an OTU separation threshold of close to zero would be needed to get close to identifying species such as Sandona mutans, S. dimutans, S. trimutans, S. tetramutans, S. pentamutans, or Bonamia sp. ex Ostrea chilensis and B. sp. ex Crassostrea ariakensis in clustering analyses (Figure 2a). Accordingly, V4 contains no barcoding gap (Figure 2b); as the distribution of the pairwise distances between the 193 named known sequences extend continuously all the way below 1%. We would need ITS or another more sensitive marker to analyse diversity at this level. However, at present, the use of ITS for HTS studies in Cercozoa would be impractical because of the lack of good databases. At present, Genbank contains <50 full-length ITS sequences for named cercozoan species, and in an HTS analysis one would only be able to annotate a fraction of the resulting OTUs. For the time being, HTS community analyses of Cercozoa will therefore have to be based on 18S data. To compare such analyses, researchers must agree on a specific region and on a reasonable clustering threshold.

Figure 2
figure 2

(a) Number of OTUs in 193 named cercozoan morpho-species as a function of OTU separation threshold of the V4 region. (b) Histogram of the pairwise comparisons of the V4 region from 193 named cercozoan morpho-species. The dashed red line shows our separation threshold of 5%. The bulk of the interspecific diversity is well above dissimilarities of 3 or 5%, but it extends all the way down to zero with no separation window. (c) Number of OTUs as a function of OTU separation threshold in the cleaned 454 data set of 443 350 V4 sequences. Choosing a separation threshold of 0.5–1% consistent with the morpho-species concepts would retrieve >9000 OTUs. (d) Rarefaction curve of the observed species (OTUs at 5% threshold) with error bars from combining the two drought and control plots, resampled to 85 305 sequences per tag. The samples are reasonably saturated with no difference in raw diversity between the two treatments.

Our analysis shows that the decision about an appropriate clustering threshold in V4 region of Cercozoa can only be pragmatically based. Our clustering threshold of 5% is high. However, when we used the Bayesian-based ribosomal database project classifier (Wang et al., 2007) and the Silva database (at a 60% bootstrap support) to compare the 1585 OTUs at the 5% level and the 3421 OTUs returned by the 3% level, we found very little difference in the proportion of OTUs that could be assigned to the different taxonomic groups. For example, Silicofilosea comprises 17.1% of the reads at the 5%-threshold compared with 16.3% at the 3%-threshold, and the percentages of testate genera at the 5% and 3%-thresholds are 13.4 and 13.7%, respectively. As the choice of threshold had little apparent effect on our overall conclusions and could not be supported by taxonomy, we ultimately preferred the high value of 5%. This minimises the biases from sequencing artefacts in HTS sequencing (especially on the now defunct GS-FLX Titanium platform) that artificially inflate diversity. The clearly exponential decline in diversity as a function of clustering threshold in Figure 2c suggests that this effect is still substantial at the 3% level.

Still, most of the interspecific genetic variation is well above 5% in the diversity of the 193 known sequences (Figure 2b). The number of retained OTUs only declines from 165 to 152 when the dissimilarity threshold increases from 3 to 5% (Figure 2a). When the dissimilarity threshold increases from 0 to 5%, nearly all the 30–40 known species that lump in this process are congeneric and/or otherwise closely related. Hence, when we take into account both HTS error rates and genetic diversity in known reference species, we conclude that a dissimilarity threshold of 5% in cercozoan V4 sequence data is reasonable for capturing the main taxa and major functional groups.

Annotation, diversity and ecology

It is commonplace that HTS studies find a high unknown diversity of microorganisms in almost all habitats. Hence, we were not surprised to find a high level of unknown cercozoan diversity. However, we find it remarkable that even with our highly conservative data treatment, the number of OTUs on our single biotope still exceeds by a factor of more than four the number of Cercozoa that have been assigned a name based on their morphology. This is high for a single biotope by any standard or OTU separation threshold (Figure 2c). Moreover, ITS1-level diversity of Cercozoa is likely to be many times higher than that at 18S level. Thus, our study shows that an extrapolation of the existing practice of assigning names to Cercozoa harbours an immense potential for naming new hitherto unknown species.

The numbers of OTUs in the control and drought sites were almost identical (Figure 2d). However, our HTS data also allow us to evaluate the relative abundances of different Cercozoan genera on this soil type, as can be seen on the heatmaps in Figure 3. It has been suggested that because of their small size (5–10 μm), Glissomonadida are the quantitatively dominant Cercozoa in soils (Howe et al., 2009), and indeed, glissomonads are the most abundant group in our analysis (Figure 3). Among the taxa that could be identified confidently, the euglyphid testate amoeba Trinema was the most abundant genus (Figure 3b). Trinema also presented the proportionally largest changes in abundance of cercozoan genera between the two treatments. Overall, testate amoebae constituted ~19.9 and 17.7% on the two control sites and 15.9 and just below 9% on the drought sites, respectively (Figure 3a). Thus, although our results are not significant, possibly due to the small sample size, they suggest a negative response of testate amoebae to drought in accordance with previous findings (Lousier, 1974a, b; Wilkinson and Mitchell, 2010).

Figure 3
figure 3

Heatmaps displaying the taxonomic abundance and distribution of the sequences (n=85 305) between the four sample sites, at (a) order and (b) genus level. The heatmaps are drawn using the heatmap.2 function of the gplots package, and dendrogram distances are based on Euclidean distance. Taxa constituting <1% of the total taxa have been lumped into the ‘other assigned’ category. Testate taxa are marked with an asterisk. Glissomonads and Euglyphids are by far the most abundant Cercozoa in all samples. Of all taxa, the genus Trinema shows the largest relative change in response to the drought treatment.

Testate amoebae are generally large (members of Trinema usually in the length range of 30–100 μm, or 6–12 times longer than most glissomonads). Hence, even though cell number of glissomonads in the samples exceed that of testate amoebae by a factor of three or four, the same will not be true for their biomass. Since-cell size of protozoan predators is one of the most important morphological factors determining the bacterial community composition (Glucksman et al., 2010), our results suggest a quantitative importance of cercozoan testate genera that merits more attention.

Conclusions

We found the V4 hypervariable region to be the best single region in 18S ribosomal DNA for exploring cercozoan diversity by HTS analyses. Further our analysis using this region showed a high diversity at the studied sites. Cercozoan species have traditionally been defined by morphology supported by 18S Sanger sequencing, and if this concept is to be taken to represent the best estimate of the ‘true diversity’, our analysis of the V4 region from cultured cercozoans species demonstrates that more variable taxonomic markers should be investigated. However, as our conservative treatment of V4 HTS sequence data reveal the existence of a large unknown diversity in just one single biotope, the diversity revealed by less-conservative markers would be quite high indeed.

Our results suggest that the soil protozoan diversity per se is largely resilient to the levels of drought expected from the climate change scenarios projected for the Northern temperate latitudes over the 21st century, at least over shorter timeframes. However, they also suggest that among Cercozoa, testate forms are the most sensitive to drought and hence good indicator organisms to detect soil community changes in the early stages of climate change. Finally, our data indicate that a sampling of close to 105 sequences is necessary to reach sampling saturation in HTS studies of Cercozoa in soil samples. This should be taken into account in future HTS studies, especially those that wish to use general eukaryotic primers to gain an understanding of eukaryotic subgroups.