Main

In most natural environments, microorganisms are predominantly found in surface-associated, matrix-enclosed communities known as biofilms (Costerton et al., 1995). At the Lost City Hydrothermal Field on the Mid-Atlantic Ridge, biofilms coat mineral surfaces of the highly porous carbonate chimneys venting <90 °C, pH 9–11 fluids (Schrenk et al., 2004). Chimney fluids contain abundant hydrogen and methane, but very little carbon dioxide due to the high pH (Kelley et al., 2005). Owing to extreme conditions, very little animal biomass is present at Lost City; instead, the thick, mucilaginous biofilms (containing up to 109 cells per gram of carbonate chimney) are the dominant life forms. Previous studies have highlighted the extremely low microbial diversity in carbonate chimneys. Although a single phylotype belonging to the Methanosarcinales order of methane-cycling archaea constitutes >80% of all active cells in the hottest, anoxic zones of the chimney (Schrenk et al., 2004), a few species of aerobic and microaerophilic bacteria dominate the cooler, oxygenated zones (Brazelton et al., 2006). As the continuous mixing of anoxic hydrothermal fluid with oxygenated seawater creates micro-scale redox gradients within the chimneys, these anaerobic archaea and aerobic bacteria live in close proximity to each other.

The low diversity and high cell density of biofilms make them attractive targets for metagenomic sequencing. We obtained 35 Mb of DNA sequence from 46 361 shotgun reads of two pUC18 libraries constructed by the DOE Joint Genome Institute (Walnut Creek, CA, USA) with DNA extracted from ∼1 kg of a single carbonate chimney sample. Initial characterization of the metagenome with BLAST (Basic Local Alignment Search Tool) (Altschul et al., 1997) searches against the GenBank non-redundant database revealed a large number of hits to genes encoding transposases, enzymes involved in the transposition and integration of mobile genetic elements (that is, DNA that can be transferred within or between genomes). To test whether the apparently high abundance of transposases in the Lost City biofilm metagenome is unusual, we developed a simple method of quantifying numbers of transposase sequences in metagenomic datasets (Table 1). We only compared sets of unassembled reads, not assembled contigs, because it is not straightforward to make quantitative comparisons among assembled metagenomes containing contigs and scaffolds of various numbers, sizes and sequence coverages.

Table 1 Abundance of transposases in the Lost City chimney biofilm metagenome compared with other metagenomes in the CAMERA database

Our results show that the Lost City biofilm contains an unprecedented abundance of transposases. Over 8% of all reads in the metagenome matched one of the transposase protein families with an E value of 10−5 or better (hereafter referred to as significant hits). Similar results were achieved with an E value cutoff of 10−10, but control searches revealed that this cutoff resulted in some false negatives. Very few reads contained significant hits to more than one transposase family with the 10−5 cutoff, indicating that our searches were family specific and unlikely to yield non-transposase sequences.

We conducted the same search against all collections of unassembled metagenomic reads in the CAMERA database (Seshadri et al., 2007). Each metagenome contained at least an order of magnitude fewer transposases per read than did the Lost City metagenome (Table 1). Interestingly, the four metagenomes in Table 1 with the highest proportion of reads containing transposases are those derived from biofilms. In contrast, the four metagenomes in Table 1 with the fewest transposases (200 × −800 × fewer per read than the Lost City metagenome) are from water samples with little or no biomass contribution from biofilms. Furthermore, the three viral metagenomes were among those with the fewest transposases, suggesting that the abundance of transposases in the biofilm metagenomes is not easily explained by the presence of viruses.

A further analysis revealed that the transposases in the Lost City biofilm's metagenome are very diverse as well as abundant. We detected 21 different transposase families with significant hits to a Lost City metagenomic read (Table 2). Two of the families (retroviral integrase and transposase 11) were present in 2353 Lost City reads, comprising 63% of all significant hits. To examine whether the 2353 reads represent just a few genes present in many copies or a large diversity of genes, we constructed multiple sequence alignments with the MUSCLE aligner (Edgar, 2004), including the nucleotide sequence region of each read with a significant TBLASTN alignment with a retroviral integrase or transposase 11 sequence. The Lost City sequences were clustered into operational taxonomic units on the basis of a 3% sequence difference threshold using DOTUR (Schloss & Handelsman, 2005). The results (shown in Figure 1) show that the Lost City retroviral integrase and transposase 11 sequences each include >100 operational taxonomic units. (Equivalent results were achieved by aligning amino acid sequences.) A similar analysis of 16S rRNA sequences in the Lost City metagenome yielded just 83 reads representing 22 operational taxonomic units (Figure 1). Therefore, genes encoding transposases are much more abundant and diverse than are 16S rRNA genes in the Lost City biofilm.

Table 2 Abundance of each transposase protein family in the Lost City metagenome
Figure 1
figure 1

The diversity of transposase sequences in the Lost City metagenome is much greater than 16S rRNA diversity. Each operational taxonomic unit (OTU) cluster of sequences (defined as 3% nucleotide sequence difference) is listed on the X-axis, and the number of reads with sequences in each OTU are plotted on the Y-axis. The two transposase families, retroviral integrase (red points) and transposase 11 (green), each have >100 OTUs, whereas there are only 22 16S rRNA OTUs (black) representing a small number of reads.

A preliminary assembly of the Lost City metagenome (assembled by the DOE Joint Genome Institute) is consistent with the carriers of the transposases being small molecules of extragenomic DNA. Approximately half of all 41 393 reads assembled into 6324 contigs of 2 or more reads, including 49 contigs >7 kb in length (Figure 2). The largest contigs had a high sequence similarity to the 16S rRNA gene and to many of the open reading frames in the genome of Thiomicrospira crunogena XCL-2 (Scott et al., 2006). (Nearly half of all bacterial 16S rRNA clones sequenced from this sample showed a similarity to species belonging to the Thiomicrospira genus; see Supplementary information for more detail.) The large contigs had a similar %GC and sequencing coverage (∼38% GC and 5–8 × coverage), indicating that they represent the genome of a Thiomicrospira species that previous studies have shown to be widespread in Lost City carbonate chimneys (Brazelton et al., 2006). The 689 contigs containing transposases, by contrast, have a wide range in their %GC, and many have high sequence coverage and are less than 5 kb (Figure 2). These contrasting patterns indicate that most of the transposases do not belong to the same genome as do the Thiomicrospira-like genes. Instead, they are likely to be located on small, extragenomic molecules such as viral genomes, plasmids or extracellular DNA fragments.

Figure 2
figure 2

Size in kilobases and sequencing coverage of each contig (black points) in a preliminary assembly of the Lost City metagenome. The 689 contigs with transposases (red) are smaller and have higher coverage than do the 16 contigs with 16S rRNA sequences (green).

To test these possibilities, similar TBLASTN searches to those described above for transposases were conducted with PFAM protein families representing plasmid and viral proteins. None of these searches resulted in many significant hits: only 13 reads contained plasmid replication proteins, 6 reads contained viral capsid proteins and 10 reads contained reverse transcriptases (data not shown). Therefore, if plasmids or viruses are carrying the abundant transposases found in the Lost City biofilm, they do not show significant sequence similarity with previously published plasmids and viruses. We also found little evidence of the presence of integrons and genomic islands; only 1.7% of all transposases in the Lost City biofilm (Table 2) belonged to the phage integrase family (which is associated with integrons and genomic islands), and few transposases were found on the same contig as that of tRNA genes (data not shown), frequent insertion sites for these elements. (See Supplementary information for further analyses.) It is also possible that transposase genes are present as multiple copies in regions of cellular genomes that are not conducive to assembly into large contigs, but this scenario cannot easily explain the high diversity of transposase sequences. In conclusion, the most likely carriers of transposases in Lost City biofilms are small extracellular fragments of DNA. Many biofilms are known to contain extracellular DNA, and in some cases, it has been shown that the presence of extracellular DNA is required for biofilm formation (Whitchurch et al., 2002).

Although it is possible that many of the transposase genes detected in this study may not be expressed (Ram et al, 2005), their unprecedented abundance and diversity strongly suggest that lateral gene transfer (LGT) is a frequent occurrence within Lost City biofilms. Considering the low organismal diversity of these biofilms, LGT may be an important source of phenotypic diversity. Mathematical modeling suggests that LGT among genomes within a biofilm can stabilize the coexistence of multiple phenotypes and therefore contribute to the overall fitness of the biofilm community (Chia et al., 2008). Biofilms in other systems are known to generate physiological diversity in response to environmental stresses and gradients, despite limited genetic diversity (Boles et al., 2004; Stewart and Franklin, 2008), and many biofilm communities exhibit highly structured networks of interactions requiring interspecies communication and cooperation (Shapiro, 1998; Stoodley et al., 2002; West et al., 2006). Therefore, further work should investigate whether similar kinds of collective interactions are operating in the low-diversity biofilms of Lost City chimneys, where survival in extreme conditions may require the stable coexistence of multiple phenotypes enabled by LGT.

Lost City carbonate chimneys have been discussed as models for the origin and early evolution of life because of the prevalence of ultramafic environments on early Earth (Grove and Parman, 2004) and Mars (Hamilton and Christensen, 2005), because of the exothermic generation of hydrogen and organic compounds by serpentinization (Proskurowski et al., 2008), because of the potential of chimney pores to concentrate biochemicals (Baaske et al., 2007) and because of the advantage of a high pH for prebiotic chemistry (Martin et al., 2008). In addition, it has been suggested that a community of primitive precells (Baross and Hoffman, 1985) or progenotes (Woese, 1998) undergoing extensive gene transfer represented an early stage of evolution before the advent of free-living cells. Therefore, the biofilms of Lost City carbonate chimneys could serve as a model for this theory, considering the potential for generating phenotypic diversity with limited organismal diversity through rampant LGT.

Accession numbers

This Whole Genome Shotgun project has been deposited at DDBJ/EMBL/Genbank under the project accession ACQI00000000. The version described in this paper is the first version, ACQI01000000. All sequencing reads are deposited under accession numbers ACQI01006325–ACQI01026573, and assembled contigs are deposited under accession numbers ACQI01000001–ACQI01006324.