Investigations of the Bacteria and Archaea living in geothermal systems have been critical to our knowledge of the diversity and history of life on Earth [1, 2]. Ecological studies of these microbes have expanded our understanding of adaptations to extreme environments [3], with basic research into geothermal microbial communities having contributed to advances in numerous fields, including microbiology, biomedical diagnostics, and biotechnology [4,5,6]. One of the most prominent examples is the development of the heat-stable enzyme Taq polymerase [7] from the thermophilic bacteria Thermus aquaticus isolated from hot springs in Yellowstone National Park [8].

While the vast majority of microbiology research in geothermal systems has focused on Bacteria and Archaea, there has been little work investigating protistan diversity. Protists (e.g., microbial eukaryotes) are ubiquitous, single-celled eukaryotic organisms with diverse functional strategies. They are critical to many ecosystem processes—yet their diversity and community dynamics remain overlooked in many environments [9]. Surprisingly, although the Bacteria and Archaea found in hot springs and other geothermal environments have been extensively studied for decades, we know of no comparable study of the protists (broadly defined here as any microbial eukaryote excluding fungi) found in geothermal springs. This is likely in part due to the persistent paradigm that eukaryotic organisms are poorly suited to life in extreme environments, despite increasing evidence that protists are present and active in many of these unique habitats.

Early geothermal spring literature suggested thermophilic protists occur at temperatures up to at least 60 °C [10] with more recent reports suggesting some protists may survive at temperatures up to 70 °C [11]. Amoebae such as Echinamoeba [12] and Naegleria [13] have been identified as thermophilic, with growth optima up to 55 °C. Other protists are also known to withstand acidic environments [14, 15] and there have been detailed studies into a few acid-associated microbial eukaryotes such as the alga Cyanidium caldarium, which is known to thrive in acidic environments up to 55 °C [16]. Other phototrophs, including diatoms and chlorophytes, have been found in acidic lakes and some Euglena may tolerate both high temperature and acidity [17]. Despite these examples of “extremophilic” protists, investigations into diversity across the eukaryotic tree of life have been limited to a few geothermal springs [18, 11]. Yet, we know from work conducted in other environments, including soils and marine systems, that protists are important contributors to ecosystem processes as primary producers [19] predators, decomposers, and parasites [20, 21]. Thus, we expect that these environments likely harbor novel and diverse communities.

Geothermal springs also represent a unique system for studying the factors that structure microbial eukaryotic communities and the environmental constraints on eukaryotic diversity [22, 11]. Geothermal spring systems often have extreme ranges in environmental conditions, including temperature, pH, and salinity, that can vary independently across spring features [23]. This makes them useful for testing the relative importance of different environmental factors in shaping ecological communities. While temperature and pH are known to be strong drivers of bacterial and archaeal community diversity and composition in diverse environments [23,24,25], it is less clear to what extent these factors structure protistan communities. The relative importance of pH versus temperature is also an open question as few protist studies cover wide ranges of temperature and pH. Based on work conducted in other systems, including soils (e.g., refs. 26, 27), we may expect that the biogeographical patterns exhibited by microbial eukaryotes may parallel those observed for bacteria, but it is unclear if this is true in geothermal environments.

New Zealand geothermal systems are profuse and diverse, with over 10,000 features in the Taupō Volcanic Zone and including some of the world’s largest thermal pools [28, 29]. We analyzed 160 geothermal spring samples collected from features across the Taupō Volcanic Zone, New Zealand to investigate the diversity and the relative importance of environmental gradients in structuring protistan communities. The sampled springs represent broad environmental gradients (including temperatures and pH levels that span nearly 50 °C and 8 pH units). We also directly compared how bacterial and archaeal communities [29] were structured across the same geothermal spring samples to determine if micro-eukaryotes and prokaryotes share similar community dynamics and environmental constraints.

Results and discussion

Protistan community composition across geothermal springs

Across all springs, we found a remarkably high number of different protist lineages, with 1088 protistan phylotypes identified, spanning diverse phylogenetic lineages (Fig. 1). This level of diversity was unexpected, as previous geothermal studies focused on a relatively small number of springs have found only a few to dozens of unique protists [11]. Further, our estimate of the overall species diversity is likely an underestimate, as we performed several filtering steps including: (1) clustering phylotypes at the >97% sequence similarity level and (2) filtering out phylotypes represented by fewer than 10 reads in a given sample. As with bacterial and archaeal communities [2, 30], geothermal environments harbor an unexpected amount of protistan taxonomic and phylogenetic diversity.

Fig. 1
figure 1

New Zealand geothermal springs harbor diverse protists. a Phylogenetic tree of the 1088 protist phylotypes recovered across 160 geothermal springs. The tree is a conservative estimate of diversity as it only includes phylotypes present (>10 reads) in a given sample. The outer ring bars are scaled to show the relative abundance of each phylotype. Ring and clade color indicate taxonomic group. For a few taxa, clade membership did not reflect taxonomic assignment and no corrections were made. b Most spring communities are dominated by a few phylotypes and individual protist phylotypes are rarely found in more than a few springs. (Top) The number of phylotypes found in each spring. (Bottom) The number of springs in which each phylotype occurred

Most spring communities were dominated by only a few phylotypes (Fig. 1). While the diversity recovered in individual geothermal springs ranged from 2 to 172 phylotypes, the median richness was 21 phylotypes per spring. Unexpectedly, not a single major group was found across all or even most springs, and 93% of all phylotypes were found in less than 10% of springs with ~43% of phylotypes only found in a single spring. Even the most widespread phylotype (a ciliate, Oxytricha) was only found in 65% of the springs surveyed. Given the very close distances between many of the sampled geothermal features (some were just a few meters apart) we expected greater overlap in protistan communities. This low level of overlap across springs may in part be due to sequencing depth, as rarefaction curves did not plateau. However, most protists appear to generally be restricted in their distributions across springs.

Geothermal spring samples were dominated by Ciliophora (ciliates) which made up 57% of spring communities on average (Supplementary Table 1). However, this does not necessarily mean that most protistan cells in the sampled springs were ciliates; the high rRNA gene copy number in ciliates are known to over represent this taxon in high-throughput data [21], thus ciliates are likely overrepresented. Within Ciliophora, members of the Trimyemidae and Oxytrichidae groups made up 14 and 9% of the 18S rRNA gene reads recovered across springs. Other dominant groups included Ochrophyta, Chlorophyta, and Amoebozoa with average relative abundances of 15, 11, and 5% respectively. We found many protistan groups that have been previously been identified in geothermal spring environments including phototrophic lineages such as Cyanidium, Pinnularia, Chlamydomonas, Chlorella [31, 11] and a diversity of non-photosynthetic lineages including Echinamoeba, Cyclidium, Platyophyra, Frontonia, Ochromonas [18, 11]. However, within these lineages we frequently found multiple novel phylotypes. For example, within Echinamoeba we recovered 17 phylotypes, with up to 7% sequence divergence from previously known Echinamoeba. We also found many lineages that were uncharacterized or not previously associated with geothermal environments, as we explain in more detail below.

New Zealand springs harbor novel diversity

We investigated to what extent the diversity of the most abundant phylotypes in New Zealand geothermal springs compares to known protist diversity (Supplementary Fig. 1 with references in Supplementary Table 2). We highlighted eight clades that either had no known database representatives or only included protists from other extreme environments, particularly within Amoebozoa, Alveolata, Archaeplastida, and Stramenopiles (Supplementary Fig. 1).

As mentioned, the Amoebozoan clade containing Echinamoeba thermarum was of interest as it appears to be a diverse group with an evolutionary adaptation to withstand high temperatures. Additionally, a second Amoebozoan clade contains phylotypes from this study, other geothermal springs, and an amoeba recovered from hospital hot water samples [32]. Within Alveolata, a potentially high temperature-associated clade contains Trimyema minutum thermophilium, an anaerobic ciliate recovered from high temperature, shallow sea hydrothermal vents [33]. The Stramenopiles clade containing Ochromonas may be adapted to low pH as some Ochromonas species are known to be acid associated [34], and the other nearest neighbors were recovered from acidic environments (Supplementary Fig. 1).

Environmental factors driving protistan diversity and composition

Surprisingly, while pH explained the most variation in Shannon diversity across springs, temperature was not predictive of diversity. We ran multiple linear regression to compare protist diversity with environmental variables, and seven variables (pH, nitrate, Ba, and S; p < 0.5 and for conductivity, H2S, and sulfate, p = 0.07–0.08) together accounted for only 30% of the overall variability in diversity (Supplementary Table 3). Of the measured variables, pH was the most important (Supplementary Fig. 2). To investigate the relationship between pH and diversity, we fit a linear (R2 = 0.15; p < 0.0001) model which revealed a decline in diversity in the more acidic springs.

Previous studies investigating the temperature–diversity relationship in protists are inconclusive, likely in part due to individual studies having sampled a low number of springs. For example, Brown and Wolfe [11] found the most protist diversity of all of the sites they sampled in a few high temperature and highly acidic lake samples in Lassen Volcanic National Park although the authors suggest diversity generally declined with decreasing pH and increasing temperature. In our study, the lack of a significant diversity–temperature relationship for protists suggests that there may be more protist lineages than commonly assumed [35] that are adapted to high temperature environments. In contrast to temperature, the reduction of microbial diversity with decreasing pH is well recognized across diverse environments [36, 15]. Protists living in extremely acidic conditions likely require highly specialized physiological adaptations and life history strategies to survive in these environments, which may be why only a few protist phylotypes were recovered from highly acidic geothermal springs.

Despite the considerable variability in the composition of the protistan communities, composition across springs was nevertheless moderately predictable from the measured environmental conditions. Our results based on multiple regression of pairwise distances in overall community composition (model correlation with community dissimilarity = .43 and MRM overall explanatory power: R2 = 0.16) indicated that the composition of the geothermal protistan communities was best predicted by three factors (Supplementary Figs. 3, 4): pH (rM = 0.35; p < 0.001), temperature (rM = 0.24; p < 0.001), and bicarbonate (rM = 0.22). Other environmental factors were also correlated with community composition but did not contribute to the overall power of the explanatory model (e.g., maximize correlation with community dissimilarity). For example, conductivity (e.g., salt tolerance, rM = 0.10; p < 0.01; Supplementary Fig. 4) was moderately correlated with differences in community composition, but inclusion of this variable in our models did not improve overall model fit.

Our findings suggest that both temperature and pH explain in part what types of protists are found in springs. Amaral-Zettler [15] also found clustering of community composition by pH across diverse aquatic environments, despite low overall similarity between sites. Our finding that protistan communities in high temperature springs are generally different from those communities found in lower temperature springs supports the hypothesis that these high temperature springs (e.g., 50–65 °C) harbor unique communities and that sequences recovered from high temperature springs likely originate from taxa living within the springs, rather than allochthonous infall from nearby areas with cooler temperatures.

Protistan lineages associated with environmental conditions

To identify protistan lineages associated with specific environmental conditions, we performed indicator species analyses. We found strong preferences for specific pH and temperature ranges for several protists (Fig. 2, with corresponding information in Supplementary Table 4). For those taxa that were found in at least ten springs (n = 134), we identified 15 phylotypes that prefer hot springs, nine associated with moderate to cooler temperature springs, five with acidic conditions, and 12 with neutral to alkaline springs (Figs. 2 and 3). Echinamoeba appear to be particularly well-suited to high temperature springs, as 60% of the thermophilic taxa identified were members of this group, consistent with Baumgartner et al. [12]. The most abundant phylotype, an unknown phylotype related to Trimyema minutum thermophilum was also associated with high temperature springs (Fig. 2 and Supplementary Table 4; correlation with temperature: ρ = 0.55; p < 0.001 and indicator value = 0.43; p = 0.001). Some acidic phylotypes were related to known acidophiles, such as Hypotrichia and Pinnularia (Bacillariophyta; [37, 38]). Our analyses suggest that there are diverse protists associated with extremes in pH and temperature, with strong phylogenetic clustering of the protists associated with hot springs as 9 of the 15 identified are within the single genus Echinamoeba (Fig. 3).

Fig. 2
figure 2

Distribution of individual protist lineages across temperature (°C) and pH for groups identified as sensitive to either environmental variable. The plots are arranged according to the observed preferences for particular conditions. Each circle represents a geothermal spring sample; gray circles indicate a particular taxon was not found in a spring and if found, circle color ranges from yellow to red based on log-relative-abundance. Size of circles corresponds to their proportional abundance

Fig. 3
figure 3

Phylogenetic distribution of the protist phylotypes sensitive to temperature and pH across springs. Clade markers signify type for each taxon: red is hot, blue is cold, yellow is acidic, and green is alkaline

Direct comparisons to bacterial and archaeal communities

We investigated if bacterial and archaeal communities were structured by similar drivers, and to what extent bacterial, archaeal, and protistan community structures were related. As with protists, we found pH to be the single most important factor in predicting both variation in bacterial and archaeal diversity (Supplementary Fig. 3 and Supplementary Table 2) and community composition across springs (rM of pH = 0.59; p < 0.001; Supplementary Figs. 3 and 4). These findings suggest that the environmental factors shaping microbial community assembly may be more universal than previously thought.

As with our protist results, temperature explained no additional variation in bacterial and archaeal Shannon diversity across springs. Our finding that temperature does not explain diversity patterns across the sampled geothermal springs for either protists or prokaryotes differs from a previous report of bacterial and archaeal communities, where both temperature and pH explained differences in diversity across geothermal soils ranging in temperature from 7.5 to 99 °C [23]. We suspect this may be due to the more restricted temperature range of this study (17.5 to 64.9 °C), and that the diversity of spring communities may be more closely with temperature at its extremes.

We also found that protistan community composition was moderately correlated with bacterial and archaeal community composition (rM = 0.52, p < 0.001; Supplementary Fig. 4), a pattern that is driven by shared environmental drivers, close associations across protists and prokaryotes, or both. To further decipher the structure of the associations between protistan, bacterial, and archaeal communities across the temperature and pH gradient we used network analyses of co-occurrence patterns (Fig. 4). We recovered five groupings of taxa (e.g. network modules) that represent clusters of bacterial and archaeal and protistan taxa that share similar environmental preferences (acidic, high temperature, alkaline-neutral, and moderate-cool temperature). Recovering network modules (particularly the high temperature and acidic clusters) independently validated our findings of specific protist lineages that are sensitive to temperature and pH, as many of the protist indicator taxa (Fig. 4) were clustered into modules with bacteria that are known to persist in particular environments. For example, four of the five protistan taxa identified as acidic indicator taxa clustered in a module with known acidophilic bacteria, including Acidithiobacillus and Acidiphilium [39]. Likewise, taxa such as Echinamoeba and Trimyema that we identified as being relatively more abundant under high temperature conditions were found to co-occur with known thermophilic bacterial lineages including Thermus, a well-studied genus known for tolerating high temperatures [7] and Venenivibrio, a thermophilic genus of bacteria with a known temperature optimum of 70 °C [40].

Fig. 4
figure 4

The protistan, bacterial, and archaeal taxa that co-occur across geothermal springs. Node size is proportional to abundance of taxa and edge width corresponds to strength of co-occurrence (positive ρ value). Gray nodes represent bacterial and archaeal taxa and white nodes represent protistan taxa. Node outline colors indicate the five modules (groups of co-occurring taxa) identified using the Louvain community detection algorithm [55]. Node fill color designate protist taxonomic groups that contain the phylotypes identified as indicators for a particular environmental condition (shown in Fig. 3). There are 138 nodes (taxa) with 940 connections (edges)

While it remains unclear to what extent protists contribute to the ecological dynamics of springs, our findings that bacterial, archaeal, and protistan communities are strongly associated indicates the potential for interactions between microbial taxa. As many of the protists identified here are likely heterotrophic, protistan grazing may impact bacterial and archaeal communities, as observed in other environments [41, 42].

While the adaptive mechanisms which allow protists to withstand such extreme acidic or high temperature environments remain unknown, we have identified lineages that will be prime targets for future investigations. For example, our work suggests that some Amoebozoa are adapted to high temperatures. This information could be leveraged to investigate if there are unique physiological adaptations related to enabling life at extremes. We also found that protistan communities were best predicted by pH and then by temperature. Bacterial and archaeal composition was also best predicted by pH across the same springs, suggesting shared environmental drivers and ecological constraints. Although often overlooked in geothermal springs, protists are present and diverse in these environments.

Materials and methods

Sample collection and characterization

Geothermal spring water samples from geothermal springs were collected across the Taupō Volcanic Zone in New Zealand between 2013–2015 as part of the 1000 Springs Project (29, Our study utilized 160 geothermal spring samples, collected from 14 geothermal systems across four districts in the North Island of New Zealand; Taupō, Rotorua, Turangi, and Whakatane. These 160 geothermal features represented very diverse and often extreme physicochemical environments. Temperatures ranged from 17.5 to 64.9 °C, the pH across springs was 2.02 to 9.7 and geothermal features also had a broad range in conductivity (salt content; 222–10,420 µS/cm), and turbidity (water clarity; 0.1–484 FNU). The temperature and pH profiles of some geothermal springs are known to rapidly change thus, we note that the temperature measured refers to the initial temperature measured at the start of sample collection. A comprehensive suite of physicochemical parameters was measured for each spring. Details on the specific water column sampling strategies, and subsequent processing, storage and analytical methods are provided in Supplementary Table 5.

Samples were immediately processed after collection (within 2 h), either using a mobile laboratory on site or at the GNS Science laboratory in Taupō. In brief, 2 l of water were collected per sample and filtered using a peristaltic pump system with a Sterivex-GP 0.22 µm PES column filter (Merck Millipore) to capture biomass from the water column. Filters were immediately cooled to 4 °C and then stored at −20 °C prior to DNA extraction. Filter extractions were performed at the Thermophile Research Unit at the University of Waikato, using a modified CTAB extraction protocol [43]. After extraction and sequencing, remaining DNA was preserved and stored at room temperature with DNAstable® following the manufacturer’s protocols.

Molecular and bioinformatic analyses

To characterize the composition and diversity of protistan communities across geothermal spring samples, we amplified a 516 bp fragment of the 18S rRNA gene using the 616*F/1132R primer set [44] modified with appropriate Illumina adapters. This primer pair was selected as it amplifies a phylogenetically informative gene region (the V4–V5 hypervariable region) and likely exhibits few biases against major protistan lineages [44]. Triplicate PCR reactions were performed for each of the 160 extracted DNA samples, and we included and sequenced multiple negative controls per plate to check for possible contamination. PCR products were cleaned with the MoBio Ultra Clean PCR Clean-Up Kit. Next, we performed PCR-mediated Nextera barcode ligation following the manufacturer’s instructions, adding unique barcodes onto amplicons, to allow for multiplexed sequencing. Samples were normalized with the SequalPrep Normalization Plate Kit (Invitrogen) prior to sequencing on the Illumina MiSeq platform at the University of Colorado Next Generation Sequencing Facility, running the 2×300 bp paired-end chemistry.

Sequences were demultiplexed with the custom Python script “” [45] and then paired-end sequences were merged, quality filtered, and clustered into phylotypes with the UPARSE pipeline [46]. When merging reads, we set a minimum overlap region of 20 bp and merged reads had to be more than 200 bp long. The maximum allowed expected error frequency was set to 0.5 per sequence for quality filtering. Phylotypes were clustered at ≥97% similarity and the merged raw reads were mapped to a de novo database at ≥97% similarity with USEARCH (v7; [47]). This threshold for defining phylotypes (≥97% sequence similarity) is not a specific estimation of the species-level diversity of protists [48]. Rather, we chose this threshold as it is a more conservative assessment of protistan diversity than standard eukaryotic species delineations (often considered ≥99%). We refer to the 97% consensus sequences as “phylotypes”. All sequences were dereplicated and phylotypes represented by only a single read were excluded. We also removed highly divergent sequences by filtering against the Protist Ribosomal Reference database (PR2, v4.3; [49]), excluding phylotypes with less than a 75% identity match. Taxonomy was assigned with the RDP classifier [50] against the PR2 database. Reads assigned to bacteria, archaea, metazoa, and embryophyta (land plants) were removed prior to downstream analyses, as here we focused specifically on protists. We also removed fungi from the dataset as the fungal 18S rRNA gene provides limited taxonomic and phylogenetic resolution.

To ensure conservative estimates of the diversity and number of phylotypes across geothermal features, we filtered reads with no taxonomic assignment at broad taxonomic levels (levels 2, 3, and 4; taxonomies in the PR2 database are unranked). To account for possible sequencing bias, samples with less than 2700 identified protist reads were excluded from analyses due to relatively low coverage. Estimates of diversity were highly correlated with and without rarefaction (Pearson correlation = 0.99 and p < 0.0001) as were the patterns in pairwise community similarity levels (Mantel statistic r = 0.97 and p < 0.001), thus we used the unrarefied samples in all downstream analyses to retain more phylotypes from deeply sequenced samples (e.g., samples with more reads). All sequenced blanks (multiple per 96-well plate) were well below the 2700 minimum read threshold. We also removed all phylotypes with less than ten reads in a given sample to be highly conservative in what phylotypes we consider “present” in any given sample. Raw sequence data have been deposited in the Sequence Read Archive (SUB2814786 with NCBI BioProject accession no. PRJNA392095) and the phylotype tables with corresponding metadata and consensus sequences for all phylotypes and co-occurrence results used to create Fig. 4 are available via Figshare (doi: 10.6084/m9.figshare.5146378).

Data analyses

All statistical analyses were performed in R (R Core Team, 2015). To test for correlations between measured environmental parameters we ran pairwise Pearson’s correlations (Supplementary Fig. 5). As some of the 45 measured physicochemical factors were correlated, we reduced collinearity by identifying highly correlated variables (r > = 0.600). Our final set of factors included 26 variables.

To investigate what environmental factors may drive differences in protistan diversity (Shannon diversity index), we ran a multiple linear regression model. Variables for the model were selected using forward and backward stepwise model selection with the Akaike information criterion (AIC) as applied with the R function stepAIC [51]. To assess the relative importance of the regressors, we used the relaimpo package (v. 2.2–2; [52]) with the “lmg” model. We applied a linear regression model to further investigate the relationship for the factor (pH) explaining the most variability in diversity across springs.

Differences in microbial community composition were determined by computing pairwise Bray-Curtis distances from square-root-transformed relative abundances (Vegan v2.4.1; [53]) and these dissimilarities were visualized using nonmetric multidimensional scaling with three dimensions (k = 3), 1000 random starts (trymax = 1000), and the proportion of site pairs with no shared species was set to 0.1 (noshare = 0.1). 3D visualizations were made with plotly (v4.5.2). To find the best explanatory environmental variables associated with community composition we first ran “bioenv” in Vegan to identify the subset of variables (of the 26) that maximize the correlation with community dissimilarity, and calculated the correlation of each identified variables (pH, temperature, and bicarbonate) to community composition with Mantel tests based on 5000 permutations. We then performed multiple regression on distance matrices with “MRM” in the Ecodist package (v 1.2.9, [54]) to estimate the overall explanatory power of the model. We recognize that our estimates of the relative abundance of eukaryotes in a sampled environment are potentially biased by focusing on the 18S rRNA gene as there can be substantial variation in the length and number of copies of this gene across eukaryotic taxa [55]. For this reason, we confirmed that the pairwise dissimilarities between samples calculated from relative abundances (using the Bray-Curtis metric) were highly correlated with pairwise dissimilarities calculated using a presence-absence metric (Jaccard distance) (Mantel statistic ρ = 0.9 and p < 0.0001).

We next sought to determine which specific protistan taxa were preferentially found in acidic or alkaline springs and in hot versus moderate temperature springs. We filtered our taxa table to include only those taxa that were found in at least ten springs to set a high minimum occurrence threshold for these analyses. We then calculated indicator values of protist phylotypes specific to temperature or pH with the Dufrene-Legendre Indicator Species Analysis as implemented in the labdsv package ([56], v1.8.0). For this analysis, spring samples were divided into four temperature categories (17.5–64.9 °C by 10–15 °C increments; temperature <35 °C = moderate and >45 °C = hot) and two pH categories (pH < 5 = acidic and >5 = neutral alkaline). We confirmed the fidelity of identified taxa by separately investigating the abundance of taxa across the pH or temperature range via Spearman’s correlations.

To assess the uniqueness of protists found in this study, we compared the phylotypes (only including those occurring in at least 10 springs) to sequences in the SILVA and GenBank databases. Phylogenetic trees were built to assess the extent to which specific lineages were uniquely represented in New Zealand geothermal springs. Nearest neighbors were identified with the “search and classify” function within the Silva Incremental Aligner (SINA v1.2.11) with the top neighbor per query sequence. Additional neighbor sequences were also manually added from Genbank as many shorter-length protist sequences are not in the Silva database. All trees were aligned using SINA with default parameters [57]. After aligning, we trimmed gaps with trimAl [58] with a gap threshold of 0.2. We used FastTree with a GTR model of nucleotide evolution (v2.1.9; [59]) to infer approximately maximum-likelihood phylogenetic trees with an archaeon sequence included as the outgroup to root trees. Visualization was done with Graphlan [60].

Finally, we compared the community structure of bacteria and archaea across 156 of the 160 springs for which we obtained protistan community data. To do this, we obtained bacterial and archaeal 16S rRNA gene sequence data from the same spring samples [29], enabling us to make direct comparisons. We first investigated whether bacterial and archaeal communities were structured by similar environmental factors as protistan communities. We assessed both the factors that explain differences in Shannon diversity across the springs and differences in community composition, with the same methods described above for the protistan communities. We then used network analyses to identify co-occurrence patterns between bacterial, archaeal, and protistan taxa and to identify environmental clusters. For network inference, we created a Spearman’s rank correlation matrix (including bacteria, archaea, and protists), with an abundance threshold of 0.1% to remove poorly represented taxa. The threshold for a correlation to be considered robust was set at ρ > 0.3 and p-value < 0.0001. The edge weights in the network were weighted by the strength (ρ) of the correlation between two nodes (e.g., dependent on the number of times the two taxa co-occur). Networks were visualized with Gephi [61]. To describe the resulting community topology of the network, we identified network modules with the Louvain community detection algorithm [62].