Introduction

Biogeography is the study of the spatial distribution of life forms across landscapes. For macroorganisms, global biogeographic patterns are strongly influenced by geographic barriers to dispersal. Dispersal barriers limit gene flow between regions, and local evolutionary processes such as drift or adaptation cause endemism in geographically restricted areas. However, because single-celled organisms are very small, highly abundant, metabolically plastic, and disperse easily, whether and how microorganisms are affected by geography has been the subject of debate (Finlay, 2002). Traditionally, microbial ecologists have assumed that the effects of dispersal limitations are minimal, and have considered the biogeographical distribution of microorganisms largely a result of environmental selection (Baas Becking, 1934; de Wit and Bouvier, 2006). However, improvements in molecular genetic methods and lower DNA sequencing costs have provided the necessary tools to detect genetic divergence among microbial populations with much greater resolution. As a result, there is now evidence that physical isolation is important in shaping biogeographical distributions and in facilitating speciation in microorganisms as well as macroorganisms (Papke and Ward, 2004; Martiny et al., 2006; Whitaker, 2006; Hanson et al., 2012).

In recent years, population genetic studies using high-resolution techniques have suggested that microorganisms can exhibit a high degree of endemism (Cho and Tiedje, 2000; Papke et al., 2003). In particular, pioneering research on hot spring microbiota by Whitaker et al. (2003) showed that the genetic distance among thermophilic archaeal populations was correlated with the geographic distance that separated them. Evidence for endemism as well as similar distanceā€“decay patterns have been identified for other microbial taxa, including both extremophilic and non-extremophilic populations (Escobar-PĆ”ramo et al., 2005; Vos and Velicer, 2008; Hahn et al., 2015; Raymond and Alsop, 2015). These findings suggest that allopatric speciation is an important and underappreciated force in microbial evolution (Whitaker, 2006). However, not all microorganisms seem to be geographically restricted over large spatial scales (e.g., Roberts and Cohan, 1995; Sikorski and Nevo, 2005; van Gremberghe et al., 2011; de Rezende et al., 2013; RyÅ”Ć”nek et al., 2014), and evidence for long distance microbial dispersal is present even in studies that identified geographic clustering using high-resolution genetic analyses. For example, Vos and Velicer (2008) found a robust pattern of increasing genetic divergence among populations of the spore forming soil bacterium Myxococcus xanthus, but also found that identical genotypes were present at distant sites and that some globally separated populations were not significantly genetically differentiated. Foti et al. (2006) found that, despite geographic clustering, identical genotypes of Thioalkalavibrio were present in soda lakes on different continents.

Here we report on the biogeography of extremely acidic biofilms known as snottites from hydrogen sulfide (H2S)-rich caves. Sulfidic caves are fed by anoxic springs that degas H2S(g) into the cave atmosphere and provide energy for sulfur-oxidizing microorganisms on the cave walls and ceilings. In most caves, snottites are observed where H2S(g) concentrations in the cave air are between 0.2 and 25 parts-per-million by volume (ppmv), and rarely occur where concentrations are outside of this range (Figure 1) (Hose et al., 2000; Macalady et al., 2007). Previous research has shown that snottites are inhabited by low-diversity communities containing a few species of acidophilic bacteria and archaea (Hose et al., 2000; Vlasceanu et al., 2000; Macalady et al., 2007; Jones et al., 2012, 2014). The most abundant microorganism is Acidithiobacillus thiooxidans, which is a sulfide-oxidizing autotroph and likely responsible for forming the snottite biofilms (Jones et al., 2012).

Figure 1
figure 1

(a) Schematic depicting the snottite habitat in sulfidic caves. Snottites form in close proximity to H2S-degassing cave streams, and typically occur where cave air H2S(g) concentrations are between 0.2 and 25ā€‰ppm. (b, c) Representative photographs of snottites sampled in this study, from (b) Acquasanta, Italy and (c) Villa Luz, Mexico. Black scale bar in (b) is 1ā€‰cm.

Snottites therefore present an opportunity to test whether and how geographic barriers affect microbial biogeography. Because sulfidic caves are found around the world, and because snottites are found in multiple chambers within each cave system, they can be sampled at spatial scales ranging from meters to 1000ā€‰s of kilometers. Because snottites are extremely acidic (pH 0ā€“1.5), whereas the surrounding areas are circumneutral, snottite microorganisms seem unlikely to survive transit between caves. Geochemical isolation constitutes a potential barrier to microbial dispersal in addition to the physical barrier already imposed by their location in the terrestrial subsurface, which presumably has reduced microbial transport by wind and large animal vectors compared with surface environments.

We therefore hypothesized that this physical and geochemical isolation would result in different caves harboring genetically distinct snottite populations, and that genetic distance among the Acidithiobacillus spp. would be correlated with the geographic distance separating cave locations. We used a population genetics approach to compare snottite Acidithiobacillus populations at different levels of genetic resolution, and by combining results from neutrally evolving markers with functional gene information from metagenomics, we propose a model in which isolation and stochastic colonization sculpt the biogeographic relationships among snottite Acidithiobacillus spp.

Materials and methods

Sample collection

Snottite biofilm samples were collected from four sulfidic caves: Cueva de Villa Luz and Cueva Luna Azufre in Tabasco, Mexico (Hose and Pisarowicz, 1999; Hose et al., 2000), and le Grotte di Frasassi (Galdenzi and Maruoka, 2003) and Grotta Nuova di Rio Garrafo (hereafter Acquasanta, following Jones et al., 2010) in the Marche region, Italy (Supplementary Table S1 and Supplementary Figure S1). Samples are summarized in Table 1, and location information is provided in Supplementary Figure S1. Permits for access to Villa Luz were granted by C Rogers Morales Mendez and L Felino Arevalo Gallegos. Sampling and personal safety equipment was thoroughly washed between cave trips, and separate gear was used for Italian and Mexican cave expeditions.

Table 1 Summary of Acidithiobacillus isolates and environmental sequences

Sampling locations were selected to encompass multiple spatial scales as well as a range of H2S(g) concentrations. Cave air concentrations of H2S(g), CO2(g) and SO2(g) were measured using DrƤger tubes (DrƤger Safety, Pittsburg, PA USA) and/or an ENMET MX2100 portable gas detector (ENMET Corp., Ann Arbor, MI USA). Gas concentrations ranged from below detection to 30 ppmv for H2S(g), from 700 to 6000 ppmv for CO2(g) and from below detection to 36 ppmv for SO2(g) (Supplementary Table S2). At the time of collection, the pH values of 5ā€“10 snottites in the area were measured with pH paper (range 0ā€“2.5), and all values were between 0 and 1.5. (pH paper is necessary because the small size and viscous texture of the biofilms prohibit measurement by pH electrodes.) Biofilm aliquots for DNA extraction were preserved in RNAlater (Thermo Fisher Scientific, Waltham, MA, USA) and stored at āˆ’20ā€‰oC. Aliquots for enrichment culturing were stored at 4oC until inoculation.

Enrichment and isolation of Acidithiobacillus strains

Acidithiobacillus strains were enriched in liquid media with either thiosulfate or elemental sulfur as the electron donor (ATCC 1353 Thiobacillus albertis medium or 125 Thiobacillus medium, http://www.atcc.com) (Harrison, 1982). Isolation was achieved by plating enrichments onto solid thiosulfate media (3% agar, 2 Ɨ 1353 thiosulfate media, pH 4). Colonies were re-plated, picked and re-grown in liquid 1353 medium prior to DNA extraction. Initially, successful isolation was assessed by uniform colony morphology on solid media and microscopically by uniform cell morphology. All strains were passaged and plated the same number of times, except for strain RS2a, which was isolated in 2005 and passaged weekly for 1 year. Strains from samples RS2, GB30 and GS6 all grew similarly on liquid elemental sulfur media at initial pH values from 5 down to 0.2.

DNA extraction, 16S rRNA gene and ITS sequencing

DNA was extracted from 46 isolates (Table 1) using the Mo Bio UltraClean Microbial DNA Isolation Kit, according to the manufacturers instructions (Mo Bio Laboratories, Inc., Carlsbad, CA, USA). Partial 16S rRNA genes and complete ITS sequences were amplified from each strain via PCR using forward primer 27f (AGA GTT TGA TCC TGG CTC AG) and reverse primer Sag2 (TGG CTG GGT TGC CCC ATT C), modified from Sagredo et al. (1992). PCR was performed with 5ā€‰min initial denaturation at 95ā€‰oC, followed by 25 cycles of 60ā€‰s denaturation at 95ā€‰oC, 30ā€‰s annealing at 50ā€‰oC, 2ā€‰min elongation at 72ā€‰oC, and with 12ā€‰min final elongation at 72ā€‰oC. Products were purified with the QIAquick PCR purification kit (Qiagen Inc., Valencia, CA, USA) and directly sequenced using primers 27f, Sag2 and 1392r (ACG GGC GGT GTG TRC) at the Penn State Core Genomics Facility with an ABI Hitachi 3730XL DNA analyzer and BigDye fluorescent terminator chemistry v3.1 (Applied Biosystems, Foster City, CA, USA).

For environmental biofilm samples, DNA extraction, amplification and cloning of environmental 16S rRNA sequences from samples RS2 and PC1 was described in Macalady et al. (2007). For all other samples, DNA was extracted from snottite biofilms exactly as in Jones et al. (2012). Acidithiobacillus 16S rRNA gene sequences were cloned from Villa Luz and Luna Azufre snottites using bacterial-specific primers 27f and 1492r (GGT TAC CTT GTT ACG ACT T) following the amplification, cloning and colony PCR procedures of Macalady et al. (2008). The 16S rRNA gene+ITS sequences were amplified from Frasassi and Acquasanta snottites using primers 27f and Sag2 as described above for isolates. PCR products were purified using the Qiaex II gel extraction kit (Qiagen Inc.). Cloning was performed by ligating the purified product into the pCR4-TOPO plasmid and used to transform either One Shot Mach1 T1 or TOP10 chemically competent Escherichia coli (Invitrogen Corp., Carlsbad, CA USA). Inserts were extracted by colony PCR as in Macalady et al. (2008) with M13 primers (Invitrogen Corp.) and sequenced at the Penn State Core Genomics Facility as described above.

Multi-locus sequence typing

MLST was performed by amplifying and sequencing six loci (recA, rpoB, atpD, ileS, rpl1p and rps2p) from each isolate (Table S2). Each of these loci are from housekeeping genes typically represented by a single copy per genome. Primers for each locus were designed based on nucleotide sequences of full-length genes from Acidithiobacillus ferrooxidans strains ATCC 23270 and ATCC 53993 and using assembled At. thiooxidans sequences from preliminary metagenomic sequencing (below, and Jones et al., 2012). Primers for recA and atpD were modified from Amouric et al. (2011). Loci were amplified using primers and PCR conditions provided in Table S2. PCR products were purified with the Omega E.Z.N.A. Cycle-Pure Kit (Omega Bio-Tek, Inc., Norcross, GA, USA) and sequenced at the Penn State Core Genomics Facility.

Phylogenetic and statistical analyses

Raw DNA sequences were assembled and manually checked for quality with CodonCode Aligner v.2.0 (CodonCode Corporation, Dedham, MA, USA). rRNA gene sequences were aligned using the NAST aligner at Greengenes (DeSantis et al., 2006), chimera checked with Bellerophon 3 (Huber et al., 2004) and imported and further aligned in ARB (Ludwig et al., 2004). Sequences from the six loci for MLST were concatenated, and then aligned using ClustalX v.2.1 (Larkin et al., 2007).

Phylogenetic analyses were performed in PAUP* v.4b10 (Swofford, 2000). Maximum likelihood analysis of the concatenated MLST data set was performed using the Tamura-Nei nucleotide substitution model and parameters selected by the Bayesian information criterion with jModelTest v.2.1.1 (Posada, 2008). Maximum likelihood analysis of 16S rRNA genes was also performed with the Tamura-Nei substitution model, selected by the Bayesian information criterion in jModelTest. Maximum parsimony bootstrap analyses (2000 replicates) were performed with 25 random addition replicates and tree-bisection reconnection branch swapping, and neighbor joining bootstrap analyses (2000 replicates) with Jukes-Cantor distance. Bayesian analysis was implemented in MrBayes v.3.0b4 (Huelsenbeck and Ronquist, 2001) with six substitution rate categories and gamma-distributed rate variation. Bayesian analysis was run for 500ā€‰000 generations, trees were saved every 100 generations and posterior probabilities calculated after discarding the first 20% of trees. Prior to analysis, 16S rRNA gene sequences and MLST sequences were clipped to equal length (1371 and 5777 characters, respectively). No positions were masked.

ITS sequences were aligned in a custom ARB database. Phylogenetic analyses of ITS regions were accomplished in three ways. (i) First, all positions with >50% gaps were masked, and sequences were analyzed by neighbor joining and maximum parsimony as described above (367 positions, 73 variable sites, 52 parsimony informative sites). Positions with <50% gaps were included to increase the number of variable sites for phylogenetic analysis. (ii) Second, a separate analysis was performed with all gapped positions masked from the alignment, and sequences were analyzed by both neighbor joining and maximum parsimony analyses. A strict consensus tree was constructed from equally parsimonious topologies (339 positions, 59 variable positions, 38 parsimony informative sites). (iii) A third analysis was performed in which gaps in the ITS alignment were treated as characters for parsimony analysis. Alignment gaps represent insertion or deletion (indel) events, and contain additional phylogenetic information that is not included in the analysis of nucleotide characters alone. Gaps were quantified by simple indel coding (Simmons and Ochoterena, 2000) using the software package GapCoder (Young and Healy, 2003) (Supplementary Figure S2 and Supplementary Table S3). Simple indel coding is a conservative approach in which indels with different 5ā€² and 3ā€² termini are considered separate characters, and smaller gaps that occur completely within larger gaps are treated as missing data for the sequences with the larger gaps. The presenceā€“absence indel matrix was subjected to maximum parsimony analysis in PAUP* and a strict consensus tree was calculated. Prior to phylogenetic analysis, ITS clones with one nucleotide difference were grouped using the pre.cluster command in Mothur v.1.24 (Schloss et al., 2009). The identity of the tRNA sequences in the ITS region was confirmed with tRNAscan-SE v.1.21 (Schattner et al., 2005).

Mantel and partial Mantel tests were performed in R v.2.6.1 (R Core Development Team, 2007) with the Vegan package (Oksanen et al., 2008). All Mantel tests were performed using Pearsonā€™s productā€“moment correlation. For Mantel tests of genetic distance versus environmental distance among sites, geochemical variables were first relativized to the maximum value for each variable in all samples ('relativization by maximum', McCune and Grace, 2002), and environmental dissimilarity among samples was calculated with Euclidean distance. Partial mantel tests of genetic versus environmental distance were calculated while controlling for environmental dissimilarity based on Euclidean distance as described above.

MLST, ITS and rRNA gene sequences have been submitted to GenBank under the following accession numbers: KU249220-KU249459 (MLST); KU249460-KU249649 and KU341209-KU341239 (ITS and rRNA gene sequences from isolate and environmental samples); and DQ499162-DQ499330 and KU341124-KU341208 (other environmental rRNA gene sequences).

Metagenomic analysis

Two metagenomic data sets were generated from snottite biofilm samples AS08-5 and RS09-1 (hereafter AS5 and RS9) from the Acquasanta and Frasassi cave systems, respectively. DNA was extracted from the biofilm samples as described above, and metagenomic data sets were generated by submitting the environmental DNA extracts for pyrosequencing on a Roche GS 454 FLX with FLX Titanium chemistry (454 Life Sciences, Branford, CT, USA). Raw metagenomic data sets from AS5 and RS9 included 112.7 and 104.7 megabase pairs (Mbp) of sequence data, respectively. Preliminary analysis and binning of these data sets were reported in Jones et al. (2014). Briefly, AS5 and RS9 were assembled with the Newbler assembler (gsAssembler) version 2.6 (454 Life Sciences) using default parameters, except with minimum overlap identity 95 and minimum overlap length 60. Sample AS5 assembled into 4855 contigs longer than 500ā€‰bp, with an N50 value of 1453ā€‰bp and longest contig of 17 kilobase pairs (kbp) length. Sample RS9 assembled into 3424 contigs longer than 500ā€‰bp, with an N50 value of 3085 and longest contig 37.6ā€‰kbp. At. thiooxidans sequences were identified based on coverage, tetranucleotide frequency and sequence similarity to other Acidithiobacillus genome sequences as described in Jones et al. (2014).

Homologs of genes involved in the oxidation of reduced inorganic sulfur compounds were identified in the At. thiooxidans bins from Jones et al. (2014). To further compare genomic differences among the At. thiooxidans populations in AS5 and RS9, we compared the two metagenomes against the genomes of three publicly available At. thiooxidans strains. Predicted protein-coding genes from At. thiooxidans ATCC 19377 (Valdes et al. 2011), At. thiooxidans Licanantay (Travisany et al., 2014) and At. thiooxidans A01 (Yin et al., 2014a, 2014b) were downloaded from the NCBI database (http://www.ncbi.nlm.nih.gov/bioproject/) under the accessions PRJNA36587 (ATCC 19377), PRJNA245008 (Licanantay) and PRJNA230432 (A01). In order to determine homologs that were shared between the metagenomes and each of these three genomes, we used TBLASTN (Altschul et al., 1997) to compare predicted protein-coding genes from strains ATCC 19377, Licanantay and A01 against the AS5 and RS9 assemblies in turn.

Metagenomic data are available at the Sequence Read Archive (SRA, http://www.ncbi.nlm.nih.gov/sra) under accession numbers SRX225604 (RS09-1) and SRX225750 (AS08-5), and assembled contigs can be accessed at the integrated microbial genomes (IMG) system (http://img.jgi.doe.gov) under Taxon Object IDs 3300000824 and 3300000825.

Results

16S rRNA gene sequence analysis of Acidithiobacillus populations

We isolated 46 strains of Acidithiobacillus spp., 40 from Italy and 6 from Mexico (Table 1). We sequenced 16S rRNA genes from the isolates, and also cloned Acidithiobacillus 16S rRNA genes from 10 environmental samples (Table 1). Acidithiobacillus spp. from Italy are strains of At. thiooxidans, while strains from the Mexico group are a sister clade to At. caldus, hereafter referred to as At. group II (Figure 2). All Acidithiobacillus 16S rRNA gene sequences from Italy share >99% identity, as do all 16S rRNA genes from Mexico. Further phylogenetic relationships could not be resolved among Acidithiobacillus sequences from caves on the same continent based on 16S rRNA genes.

Figure 2
figure 2

Maximum likelihood phylogram of 16S rRNA gene sequences from the genus Acidithiobacillus. Representative sequences from this study are shown in bold. Numbers indicate bootstrap support by neighbor joining and maximum parsimony, in that order (only values >75% shown).

ITS sequence analysis

Because analysis of environmental and isolate 16S rRNA genes indicated that Acidithiobacillus from Italy and Mexico represent separate species (Figure 2), we only analyzed the 16Sā€“23S ITS region from Italian acidithiobacilli, and did not attempt to isolate additional strains from Mexico. We successfully sequenced the ITS region from 25 of the At. thiooxidans isolates, and cloned the ITS region from six environmental samples (Table 1). However, for isolates from samples RS20, RS30 and RS31, as well as isolates GS6d, GS6f and GS3b from samples GS6 and GS3, the ITS region amplified but the result was a mixture of two sequences that could not be resolved. Therefore, we cloned six ITS sequences each from isolates RS30a and RS31a, and identified two distinct ITS sequences from each strain (dashed box in Figure 3 and Supplementary Figure S2b). The presence of different ITS sequences in these strains likely represents intragenomic divergence among multiple copies of the rrn operon (Sagredo et al., 1992; Stewart and Cavanaugh, 2007).

Figure 3
figure 3

Phylogenetic analysis of ITS sequences from environmental and isolate At. thiooxidans strains. Sequence names are colored by cave location, names in bold indicate isolates and italicized names indicate environmental clones. The numbers of clones from the same sample represented by each sequence are given in parentheses. The base tree is a neighbor joining phylogram constructed after excluding alignment positions with >50% gaps, and numbers indicate maximum parsimony and neighbor joining bootstrap support for nodes connecting the four major clades. Stars indicate nodes compatible with both neighbor joining analysis and maximum parsimony consensus trees after excluding all gapped positions, and the solid circles indicate nodes compatible with maximum parsimony analysis of indels (Supplementary Figure S2). The dashed box in clade ITS_4 indicates ITS regions that were cloned from isolates RS30a and RS31a.

Phylogenetic analysis of the ITS region was complicated by multiple gaps in the sequence alignment, presumably due to insertion or deletion events (indels) (Supplementary Figure S2). The beginning and end of the ITS region is conserved, and the start includes both a tRNAala and a tRNAile while the end contains a box A antiterminator sequence (Venegas et al., 1988; Sagredo et al., 1992) (Supplementary Figure S2a). Most of the variable positions in the alignment are in a heavily gapped middle region. Phylogenetic analysis of isolate and environmental ITS sequences produced four major clades, designated ITS_1, ITS_2, ITS_3 and ITS_4 (Figure 3). All phylogenetic techniques were in agreement, except that maximum parsimony analysis of indels did not reproduce group ITS_4 (Figure 3 and Supplementary Figure S2c).

In the consensus phylogeny (Figure 3), clade ITS_1 includes environmental and isolate sequences from Acquasanta, and Frasassi isolates from samples GS30 and GB31. Clades ITS_2 and ITS_3 contain all Frasassi environmental ITS sequences as well as most Frasassi isolates. Clade ITS_4 contains Frasassi isolates from samples RS30, RS31 and RS2. Because ITS sequences from isolates RS30a, RS31a, GB30aā€“f and GB31aā€“b occur in different clades than environmental ITS sequences cloned from the same biofilm samples (Figure 3), those isolates do not represent the dominant snottite populations in their samples.

Environmental ITS sequences cloned from samples RS, GS and GB occur in clades ITS_2 and ITS_3 (Figure 3). As with isolates RS30a and RS31a, we attribute this to intragenomic divergence among multiple rrn operons in the dominant At. thiooxidans populations.

MLST of Acidithiobacillus isolates

For MLST, we sequenced six loci from all 40 Italian At. thiooxidans isolates. Total aligned length of the six concatenated loci is 5777ā€‰bp, which includes 627 variable sites. Phylogenetic analysis of the concatenated sequences produced three major clades (Figure 4): (i) clade MLST_1, which includes all isolates from Acquasanta, as well as all isolates from Frasassi samples GB30 and GB31; (ii) clade MLST_2, which contains Frasassi isolates from samples RS30, RS31 and RS2; and (iii) clade MLST_3, which contains all other isolates from Frasassi. At. thiooxidans isolates from Frasassi sites GB30 and GB31 are identical to isolates from Acquasanta site AS1 based on MLST.

Figure 4
figure 4

Maximum likelihood analyses of At. thiooxidans isolates based on MLST. Numbers at each node indicate posterior probabilities and bootstrap support from Bayesian, maximum parsimony and neighbor joining analyses, in that order. Sequence names are colored by cave location.

However, the comparison between environmental and isolate ITS regions described above indicates that isolates from samples RS30, RS31, RS2, GB30 and GB31 do not represent the dominant At. thiooxidans populations at those sites (Figure 3). Therefore, we conclude that those isolates represent 'weeds' that were selected for in vitro. Sequences in clade MLST_2 and isolates GB30aā€“f and GB31aā€“b were thus excluded from subsequent statistical analyses.

For the remaining 21 isolates, pairwise genetic distance is positively correlated with geographic distance (Mantel test, r=0.967, P<0.001) (Figure 5a). Among Frasassi isolates alone, pairwise genetic distance is also positively correlated with geographic distance (Mantel test, r=0.51, P=0.005) (Figure 5b). Genetic distance among isolates is also positively correlated with environmental conditions, based on concentrations of H2S(g), CO2(g) and SO2(g) in the cave air (Supplementary Table S4) (Mantel tests: all isolates, r=0.67, P<0.001; Frasassi only, r=0.39, P=0.02). However, partial mantel tests indicate that genetic distance among isolates remains positively correlated with genetic distance after controlling for environmental variables (partial Mantel tests: all isolates, r=0.94, P<0.001; Frasassi only, r=0.35, P=0.02).

Figure 5
figure 5

Genetic distance versus geographic distance for At. thiooxidans isolates from (a) the Frasassi and Acquasanta cave systems, and (b) the Frasassi cave system only. Isolates from RS30, RS31, RS2, GB30 and GB31 are excluded (see text for details). Genetic distance among strains was determined from MLST (Figure 4). Addition of a small random number was used to spread out overlapping points along the x-axis. Dashed lines are least squares regression lines, and Pearsonā€™s correlation for the depicted relationships are statistically significant (a: r=0.93, P<<0.001; b: r=0.75, P<<0.001).

Metagenomic analysis of Acidithiobacillus metabolic potential

Because ITS sequence analysis and MLST indicated that At. thiooxidans populations from Acquasanta and Frasassi were genetically divergent, we generated metagenomic data sets from one biofilm sample from each cave system (samples RS9 and AS5) to compare functional genetic potential that could impact physiological attributes of the populations. The At. thiooxidans population in sample RS9 falls in clade MLST_3, while strains in sample AS5 fall in group MLST_1 (Figure 4). Coverage of Acidithiobacillus sequences in each metagenome is roughly 20 Ɨ. AS5 also contains a second Acidithiobacillus population at roughly 10 Ɨ coverage (see Jones et al., 2014), consistent with two populations identified by ITS cloning (clone AS3cl10 versus other AS3 clones in Figure 3). Additional rare At. thiooxidans populations are likely present in both samples at <5 Ɨ coverage (Jones et al., 2014). Other taxa in the metagenomes include archaea (Ferroplasma and G-plasma) and Acidimicrobium-like organisms (Jones et al., 2014), consistent with previous work (Macalady et al., 2007; Jones et al., 2012).

Identification of putative sequences encoding sulfur oxidation enzymes revealed that the main At. thiooxidans populations in AS5 and RS9 have a difference in their sulfur oxidation pathways. The At. thiooxidans in AS5 has a homolog of sulfur oxygenase reductase (SOR), while the RS9 population does not (Figure 6). Based on coverage and phylogenetic analysis, the SOR in AS5 unambiguously belongs to the most abundant At. thiooxidans population in that sample (Jones et al., 2014). Coverage of At. thiooxidans in the AS5 and RS9 data sets is sufficient to ensure that the complete genetic complement of the dominant At. thiooxidans populations is present in the metagenomes, and this result is also consistent with an earlier report that At. thiooxidans from the site RS lack a SOR homolog (Jones et al., 2012). Accordingly, a SOR is present in two of the three At. thiooxidans isolates whose genomes have been sequenced to date: At. thiooxidans strains Licanantay (Travisany et al., 2014) and A01 (Yin et al., 2014a) have a SOR, while ATCC 19377 does not (Valdes et al., 2011). The absence of SOR in some snottite At. thiooxidans confirms that the absence of SOR in strain ATCC 19377 is likely a real biological difference, rather than an artifact of incomplete sequencing (Yin et al., 2014a).

Figure 6
figure 6

Cartoon of enzymatic sulfur transformations involved in the oxidation of reduced inorganic sulfur compounds by snottite At. thiooxidans, inferred from metagenomic analysis. Abbreviations: HDR, heterodisulfide reductase; SDO, sulfur diooxygenase; SOR, sulfur oxygenase reductase; SOX, multicomponent sulfur oxidation pathway; SQR, sulfide:quinone oxidoreductase; TQO, thiosulfate:quinone oxidoreductase; TTH, tetrathionate hydrolase.

Other than the presence of SOR, the AS and RS populations appear to have similar sulfur oxidation capabilities. Both populations encode homologs for the major enzyme complexes in partial SOX systems (SoxAX, SoxB and SoxYZ but not SoxCD), four structurally distinct sulfide quinone reductases (SqrA, SqrC, SqrE and SqrF), thiosulfate dehydrogenase (DoxD) and tetrathionate dehydrogenase (TetH) (Figure 6). They both have homologs of hdrABC, which has been proposed to be involved in the oxidation of S0 in Acidithiobacillus spp. based on transcriptomic evidence (Quatrini et al., 2009). Both data sets also contain homologs of all three of the proposed sulfur dioxygenases identified by Yin et al. (2014a), although experimental evidence will be required to confirm the sulfur dioxygenase function of these proteins.

To further compare potential metabolic differences among the At. thiooxidans populations in AS5 and RS9, we compared the two metagenomes against three isolate At. thiooxidans strains whose genomes are publically available. RS9 has homologs with 39ā€“78 of the predicted protein-coding genes in the three isolate genomes that are absent in AS5 (Table 2). AS5 shares 172ā€“360 homologs with the isolates that are absent in RS9, and the three isolates contain 111ā€“540 protein-coding genes with no homolog in either AS5 or RS9 (Table 2). Most of the apparent genomic differences between the AS5 and RS9 populations encode hypothetical proteins. However, the presence and absence of homologs for genes involved in motility (e.g., pilT) and transport functions (e.g., peptide and sugar transporters) indicate further differences in metabolic capabilities between the AS5 and RS9 At. thiooxidans (Supplementary Table S5), in addition to their sulfur oxidation pathways (Figure 6). More detailed characterization of the physiological differences among the snottite At. thiooxidans strains is beyond the scope of this study. However, it seems clear that the At. thiooxidans strains from Frasassi encode metabolic capabilities not present in the Acquasanta population, and vice versa.

Table 2 Total protein-coding genes in publicly available At. thiooxidans isolate genomes with homologs in either the AS5 or the RS9 snottite metagenome

Discussion

Biogeography of snottite Acidithiobacillus populations

Acidithiobacillus 16S rRNA gene sequences from the Italy and Mexico locations share less than 95% nucleotide identity, and therefore should be considered separate species, At. thiooxidans and 'At. group II' (Figure 2). The presence of separate Acidithiobacillus spp. in Italy and Mexico is not likely due to geographic isolation of those clades, because At. thiooxidans sequences have been found around the world, as have sequences from the At. group II (Figure 2). Therefore, the geographic pattern we observe is either due to environmental selection for At. thiooxidans in Italy and At. group II in Mexico, or due to stochastic colonization events that led to the establishment of At. thiooxidans and At. group II in separate cave systems. We are unable to distinguish between these two hypotheses using our existing data. However, the ecological success of both species suggests that snottite biofilm formation is an adaptation to the sulfidic cave environment, rather than a strategy associated with a single species of Acidithiobacillus.

The distanceā€“decay relationship revealed by MLST suggests that evolutionary relationships among the dominant snottite At. thiooxidans populations in Italy result from restricted dispersal within and among caves (Figure 5). However, in addition to genetic differences based on housekeeping genes (Figure 4), populations in Acquasanta sample AS5 encode a sulfur oxidation enzyme that is absent from At. thiooxidans in RS9. This difference, while subtle, clearly indicates a potential for different sulfur oxidation capabilities among the At. thiooxidans populations that did not likely result from genetic drift. It is possible that the Acquasanta and Frasassi populations originated from the same ancestral population, and that the observed difference in sulfur oxidation pathways resulted from gene loss or horizontal gene transfer (Figure 7, dashed arrow). However, a more parsimonious explanation is that the snottite populations in AS5 and RS9 originated from two ancestral populations with slightly different sulfur oxidation capabilities (Figure 7, solid arrows). The distanceā€“decay relationship in Figure 5a could therefore partially result from genetic differences among the ancestral populations (founder effects), rather than purely from local adaptation or drift. Differences in functional attributes among populations from within the Frasassi cave system were not explored in our study, so we cannot distinguish whether radiation within caves or multiple colonization events shaped the genetic relationships among Frasassi populations (Figures 4 and 5b). It is worth noting, however, that snottite populations in the Acquasanta, Frasassi and Mexico caves all seem to reflect colonization by different strains or species of Acidithiobacillus.

Figure 7
figure 7

Proposed evolutionary histories of snottite populations in Italy. The caves could have been colonized from a single ancestral population (dashed arrow) that diverged by genetic drift or adaptation. Alternatively, the different cave populations could have originated from separate surface-dwelling populations with slight physiological differences (bold arrows), which then continued to diverge in situ.

Evidence for recent colonization events

The presence of rare At. thiooxidans strains in snottites that do not conform to the distanceā€“decay relationship in Figure 5 makes it clear that the biogeography of sulfidic cave snottites is complex (Supplementary Figure S3). We were initially surprised that identical MLST genotypes could be cultured from Acquasanta site AS1 and Frasassi site GB (Figure 4), as these two cave sites are separated by more than 80ā€‰km. It is extremely unlikely that this result is due to lab contamination. In addition to stringent protocols designed prior to the culturing study and aimed at eliminating laboratory cross contamination, the AS1 inoculum was collected and used to produce isolates in 2007, while GB30 and 31 samples were collected and cultured in 2008. Moreover, if lab contamination were the source of the matching AS and GB isolates, we would expect to see the same contaminant in other 2008 cultures, and this did not occur (Table 1).

The GB cave site is closer to the surface than any of our other sites, and sample locations GB30 and GB31 are less than 30ā€‰m from the cave entrance (Supplementary Figure S1). Site GB is therefore subject to frequent tourist and recreational caving traffic. It is possible that a recent colonization event is responsible for the presence of identical rare strains at Frasassi site GB and Acquasanta site AS1. Human traffic into caves has provided a mechanism for microbial transport that did not exist until recently. Dust, lint and other aerosols from tourist caves advect into more distant passages (Michie, 1999), and in Frasassi, lint can be found cemented into speleothems in remote passages far removed from the heavily trafficked show cave (S Montanari, personal communication). More knowledge of the mechanisms of microbial transport and colonization in cave systems could inform cave management and protection strategies (e.g., Dupont et al., 2007; Bastian and Alabouvette, 2009).

Despite these likely recent colonization events, the genetic distance among the dominant Acquasanta and Frasassi populations may be maintained because low abundance strains have not become established. Microbial biogeographic patterns may be determined by how effectively transplanted strains can take hold and compete with entrenched endemic populations (Gorbushina et al., 2007; HervĆ s et al., 2009; Hanson et al., 2012). A history of stochastic colonization and establishment of Acidithiobacillus populations from different surface sources is consistent with our finding that snottites in Acquasanta, Frasassi and Mexico caves are dominated by different strains or species of Acidithiobacillus (Figure 7).

Comparison of techniques for resolving biogeographic relationships among populations

Not surprisingly, we detected genetic divergence among isolates using ITS sequencing and MLST that we did not detect by 16S rRNA sequence analysis. This and other studies make it clear that the 16S rRNA gene is not sufficient to resolve all existing biogeographic relationships among microbial populations (Whitaker et al., 2003; Vos and Velicer, 2008). However, the presence of highly similar or identical 16S rRNA sequences in extreme environments across the planet (e.g., Hollibaugh et al., 2002; Whitaker et al., 2003; Palacios et al., 2008) suggests that global dispersal may occur over long timescales, consistent with the million-year timescale for 16S rRNA gene evolution (Ochman et al., 1999; Itoh et al., 2002; Kuo and Ochman, 2009).

We found that there were both advantages and disadvantages to using ITS sequence analysis and MLST. Although the ITS region is more variable than the 16S rRNA gene, ITS sequence analysis provides less genetic resolution than MLST and is challenging for phylogenetic reconstruction due to the presence of large and frequent insertions and deletions. Alignment is unreliable because the ITS region has no secondary structure, contains multiple large gaps, and is subject to intragenomic divergence. We were able to use the ITS region to identify divergent populations from different cave systems, but genetic relationships among them are not completely resolved (Figure 3). Most significantly, environmental ITS cloning allowed us to evaluate which isolates were representative of the dominant snottite populations. MLST is a powerful technique for strain-level taxonomic identification, phylogenetic analysis and for identifying recombination events (e.g., Falush et al., 2003). However, as we found with isolates from sites RS and GB, MLST is subject to the biases inherent in culturing. A combination of techniques, like those applied here, is useful to balance the strengths, limitations and resolution of different measures.

Implications for microbial biogeography

This work represents a novel contribution to the field of microbial biogeography by applying a population genetics approach to investigate extremely acidophilic microorganisms in caves, an exceptional natural experiment in microbial isolation. We found that different species of Acidithiobacillus inhabit snottite biofilms on different continents, and, as hypothesized, we identified a distanceā€“decay pattern among dominant snottite populations within Italy. However, this pattern emerged only after removing sequences representing rare Acidithiobacillus strains ('weeds'). The distanceā€“decay relationship is maintained among the dominant populations in the presence of these rare strains either because the dominant populations are established and/or the rare strains are less fit. Furthermore, metagenomic analysis revealed differences in metabolic potential among cave populations, and we found statistically significant correlations between genetic distance and both geographic distance and environmental variables. We argue that the observed biogeographic relationships among snottite Acidithiobacillus populations are best explained by a scenario in which caves were initially colonized by distinct surface-dwelling Acidithiobacillus spp., which then continued to diverge and adapt in situ.

While distanceā€“decay patterns are frequently observed in studies of geographically separated microbial populations, the underlying mechanisms behind the genetic relationships often remain elusive. Our findings underscore the role of stochastic colonization in creating or contributing to distanceā€“decay patterns. This study also highlights the importance of using multiple genetic techniques for recognizing the biogeographic relationships among microbial populations, as well as the application of functional genomic information for confirming, rejecting or extending conclusions based on neutrally evolving markers alone. In this case, functional genomic information revealed metabolic differences that improved our interpretation of the processes behind the pattern (Figure 7). Nevertheless, and despite the extraordinarily low diversity and physical and chemical isolation of snottite microbial communities, the genetic relationships among cave Acidithiobacillus spp. are complex, and clearly indicate that multiple processes and timescales are relevant in explaining how microbial populations are distributed in the present day. Future experimental and observational studies will be required to build a more complete understanding of the the relative importance of physical and chemical barriers, dispersal, colonization, mutation, drift and other processes in controlling the genetic makeup of the diverse microbial species found in nature.