Introduction

Linking functional traits to bacterial phylogeny remains a fundamental but elusive goal of microbial ecology (Hunt et al., 2008). Without this information, it becomes difficult to resolve meaningful units of diversity and the mechanisms by which bacteria interact with each other and adapt to environmental change. Most bacterial diversity is delineated among clusters of sequences that share >99% 16S rRNA gene sequence identity (Acinas et al., 2004). These sequence clusters are believed to represent fundamental units of diversity, whereas intra-cluster microdiversity is thought to persist because of weak selective pressures (Acinas et al., 2004), suggesting little ecological or taxonomic relevance. Recently, progress has been made in terms of delineating units of diversity that possess the fundamental properties of species by linking genetic diversity with ecology and evolutionary theory (Achtman and Wagner, 2008; Fraser et al., 2009). Despite these advances, there remains no widely accepted species concept for prokaryotes (Gevers et al., 2005), and sequence-based analyses reveal widely varied levels of diversity within assigned species boundaries.

The comparative analysis of bacterial genome sequences has revealed considerable differences among closely related strains (Joyce et al., 2002; Welch et al., 2002; Thompson et al., 2005) and provides a new perspective on genome evolution and prokaryotic species concepts. Genomic differences among closely related strains are concentrated in islands, strain-specific regions of the chromosome that are generally acquired by horizontal gene transfer (HGT) and that harbor functionally adaptive traits (Dobrindt et al., 2004) that can be linked to niche adaptation. The pelagic cyanobacterium Prochlorococcus is an important model for the study of island genes, which in this case are differentially expressed under low nutrient and high light stress in ecologically distinct populations (Coleman et al., 2006). Despite convincing evidence for the adaptive significance of island genes among environmental bacteria, the precise functions of their products have seldom been characterized and their potential role in the evolution of independent bacterial lineages remains poorly understood.

The marine sediment-inhabiting genus Salinispora belongs to the order Actinomycetales, a group of Actinobacteria commonly referred to as actinomycetes. Actinomycetes are a rich source of structurally diverse secondary metabolites and account for the majority of antibiotics discovered as of 2002 (Berdy, 2005). Salinispora spp. have likewise proven to be a rich source of secondary metabolites (Fenical and Jensen, 2006), including salinosporamide A, which is currently in clinical trials for the treatment of cancer (Fenical et al., 2009). At present, the genus is comprised of three species that collectively constitute a microdiverse sequence cluster (sensu (Acinas et al., 2004)), that is they share ⩾99% 16S rRNA gene sequence identity (Jensen and Mafnas, 2006). Although the microdiversity within this cluster has been formally delineated into species-level taxa (Maldonado et al., 2005), it remains to be determined if these taxa represent ecologically or functionally distinct lineages.

We here report the comparative analysis of the complete genome sequences of Salinispora tropica (strain CNB-440, the type strain for the species and thus a contribution to the Genomic Encyclopedia of Bacteria and Archaea project), hereafter referred to as ST, and Salinispora arenicola (strain CNS-205), hereafter referred to as SA, the first obligately marine Actinobacteria to be obtained in culture (Mincer et al., 2002). The aims of this study were to describe, compare and contrast the gene content and organization of the two genomes in the context of prevailing species concepts, identify the functional attributes that differentiate the two species, assess the processes that have driven genome evolution and search for evidence of marine adaptation in this unusual group of Gram-positive marine bacteria.

Materials and methods

Sequencing and ortholog identification

The sequencing and annotation of the SA genome was as previously reported for ST (Udwary et al., 2007). Both genomes were sequenced as part of the Department of Energy, Joint Genome Institute, Community Sequencing Program. Orthologs within the two genomes were predicted using the Reciprocal Smallest Distance method (Wall et al., 2003), which includes a maximum likelihood estimate of amino-acid substitutions. A linear alignment of positional orthologs was created and the positions of rearranged orthologs and species-specific genes identified. Genomic islands were defined as regions >20 kb that are flanked by regions of conservation and within which <40% of the genes possess a positional ortholog in the reciprocal genome. Paralogs within each genome were identified using the blastclust algorithm (Dondoshansky and Wolf, 2000), with a cutoff of 30% identity over 40% of the sequence length. The automated phylogenetic inference system (APIS) was used to identify recent gene duplications (Badger et al., 2005).

Horizontal gene transfer

All genes were assessed for evidence of HGT based on abnormal DNA composition, phylogenetic, taxonomic and sequence-based relationships, and comparisons with known mobile genetic elements. Genes identified by ⩾2 different methodologies were counted as positive for HGT. To reflect confidence in the assignments, genes displaying positive evidence of HGT were color coded from yellow to red corresponding to total scores from 2 to 6. The results were mapped onto the genome to reveal HGT-clustering patterns and adjacent clusters were merged (Figure 1a). Four DNA compositional analyses included G+C content (obtained from the JGI annotation), codon adaptive index, calculated with the CAI calculator (Wu et al., 2005) using a suite of housekeeping genes as reference, dinucleotide frequency differences (δ*), calculated using IslandPath (http://www.pathogenomics.sfu.t1-fn3ca/islandpath/update/IPindex.pl) (Hsiao et al., 2003), and DNA composition, calculated using Alien_Hunter (http://www.sanger.ac.uk/Software/analysis/alien_hunter/) (Vernikos and Parkhill, 2006). G+C content or codon usage values >1.5 s.d. from the genomic mean and dinucleotide frequency differences >1 s.d. from the mean were scored positive for HGT. Taxonomic relationships in the form of lineage probability index (LPI) values for all protein-coding genes were assigned using the Darkhorse algorithm (Podell and Gaasterland, 2007). Genes with an LPI of <0.5, indicating that the orthologs are not in closely related genomes, were scored positive for HGT. A reciprocal Darkhorse analysis (Podell et al., 2008) was then performed on the orthologs of all positives, and if these genes had an LPI score >0.5, indicating that the match sequence is phylogenetically typical within its own lineage, they were assigned an additional positive score.

Figure 1
figure 1

Linear alignment of the Salinispora tropica and Salinispora arenicola genomes starting with the origins of replication. (a) Positional orthologs (core) flanked by islands (E and F), heat-mapped HGT genes (D and G), rearranged orthologs (C and H), species-specific genes (B and I), secondary metabolite genes (green), MGEs (pink) with prophage (P) and AICEs (E) indicated (A and J). For genomic islands, predicted (lower case) and isolated (uppercase with structures) secondary metabolites are given (not shown are six non-island secondary metabolic gene clusters of unknown function). Shared positional (blue) and rearranged (red) secondary metabolite clusters are indicated. *Previously isolated from other bacteria. (b) Expanded view of SA pks5 revealing gene and modular architecture. (c) Neighbor-joining phylogenetic tree of KS domains from SA pks5 revealing gene and modular duplication events (erythromycin root, % bootstrap values from 1000 re-samplings). AICEs, actinobacterial integrative and conjugative elements; MGEs, mobile genetic elements.

A phylogenetic approach using the APIS program (Badger et al., 2005) was also employed to assess HGT. Using this program, bootstrapped neighbor-joining trees of all predicted protein-coding genes within each genome were created. All genes cladding with non-actinobacterial homologs were binned into their respective taxonomic groups and given a positive HGT score. Evidence of HGT was also inferred from Reciprocal Smallest Distance analyses of each genome against a compiled set of 27 finished actinobacterial genomes that included at least two representatives of each genus for which sequences were available. Genes present in SA and/or ST and not observed among the 27 actinobacterial genomes were assigned a positive HGT score. Bacteriophages were identified using Prophage (http://bioinformatics.uwp.edu/~phage/Prophaget1-fn3Finder.php) (Bose and Barber, 2006) and Phage Finder (http://phage-finder.sourceforge.t1-fn3net/) (Fouts, 2006). Other insertion elements were identified as prophage or transposon in origin through blastX homology searches. Gene annotation based on searches for identity across PFAM, SPTR, KEGG and COG databases was also used to help identify mobile genetic elements. Each gene associated with a mobile genetic element was assigned a positive HGT score. Test scores were amalgamated and those genes showing evidence of HGT in two or more tests (maximum score 6) were classified as horizontally acquired. The results were mapped onto the genome, and genes identified by only one test but associated with clusters of genes that scored in two or more tests were added to the total HGT pool. Adjacent clusters were merged.

Clustered regularly interspaced short palindromic repeats (CRISPRs) were identified using CRISPR finder (http://crispr.u-psud.fr/Server/CRISPRfinder.t1-fn3php), whereas repeats larger than 35 bases were identified using Reputer (http://bibiserv.techfak.t1-fn3uni-bielefeld.de/reputer/) (Kurtz et al., 2001). Secondary metabolite gene clusters were manually annotated as in Udwary et al. (2007). Cluster boundaries were predicted using gene clusters reported earlier when available as in the case of rifamycin. For unknown clusters, loss of gene conservation across the Actinobacteria was used to aid boundary predictions. In the future, programs such as ‘ClustScan’ may prove useful for pathway annotation and product prediction (Starcevic et al., 2008). However, many biosynthetic genes are large (5–10 kb) and highly repetitive creating challenges associated with gene calling and assembly (Udwary et al., 2007) and the interpretation of operon structure. The ratio of non-synonymous to synonymous mutations (dN/dS) for all orthologs was calculated using the perl progam SNAP (http://www.hiv.lanl.gov), with the alignments for all values >1 checked manually.

Results and discussion

The ST and SA genomes are circular and share 3606 orthologs, representing 79.4% and 73.2% of the respective genomes (Table 1). The average nucleotide identity among these orthologs is 87.2%, well below the 94% cutoff that has been suggested to delineate bacterial species (Konstantinidis and Tiedje, 2005). Despite differing by only seven nucleotides (99.7% identity) in the 16S rRNA gene, the genome of SA is 603 kb (11.6%) larger and possesses 1505 species-specific genes compared with 987 in ST. Seventy-five percent of these species-specific genes are located in 21 genomic islands (Table 1, Supplementary Table S1), none of which are comprised of genes originating entirely from one genome (Figure 1). The presence of genomic islands in the same location on the chromosomes of closely related bacteria is well recognized (Coleman et al., 2006) and facilitated by the presence of transfer RNAs (tRNAs) (Tuanyok et al., 2008). Twelve islands in the Salinispora alignment share at least one tRNA between both genomes and, of those, four share two or more tRNAs within a single island, indicating multiple insertion sites. In addition to tRNAs, direct repeats detected in the same location in both genomes could also act as insertion sites to help create islands.

Table 1 General genome features

The Salinispora genomic islands are enriched with large clusters of genes devoted to the biosynthesis of secondary metabolites (Figure 1). They house all 25 of the species-specific secondary metabolic pathways, whereas 8 of the 12 shared pathways occur in the genus-specific core (Tables 2 and 3). We have isolated and identified the products of eight of these pathways, which include the highly selective proteasome inhibitor salinosporamide A (Feling et al., 2003), as well as sporolide A (Buchanan et al., 2005), which is derived from an enediyne polyketide precursor (Udwary et al., 2007), one of the most potent classes of biologically active agents discovered to date. An earlier analysis of 46 Salinispora strains revealed that secondary metabolite production is the major phenotypic difference among the three species (Jensen et al., 2007), an observation supported at the genomic level by the analysis of the S. tropica genome (Udwary et al., 2007) and now this study.

Table 2 Secondary metabolite gene clusters in Salinispora tropic a (ST)
Table 3 Secondary metabolite gene clusters in Salinispora arenicol a (SA)

Of the eight secondary metabolites that have been isolated from the two strains, all but salinosporamide A, sporolide A and salinilactam have been reported from unrelated taxa (Figure 1), providing strong evidence for HGT. Further evidence for HGT comes from a phylogenetic analysis of the polyketide synthase (PKS) genes associated with the rifamycin biosynthetic gene cluster (rif) in SA and Amycolatopsis mediterranei, the original source of this antibiotic (Yu et al., 1999). This analysis confirms earlier observations of HGT in this pathway (Kim et al., 2006) and reveals that all 10 of the ketosynthase domains are perfectly interleaved, as would be predicted if the entire PKS gene cluster had been exchanged between the two strains (Supplementary Figure S1). Evidence of HGT coupled with earlier evidence for the fixation of specific pathways, such as rif, among globally distributed SA populations (Jensen et al., 2007) supports vertical inheritance following pathway acquisition (Ochman et al., 2005). This evolutionary history is what might be expected if pathway acquisition fostered ecotype diversification or a selective sweep (Cohan, 2002) resulting from strong selection for the acquired pathway, either of which provides compelling evidence that secondary metabolites represent functional traits with important ecological roles. The concept that gene acquisition provides a mechanism for ecological diversification that may ultimately drive the formation of independent bacterial lineages has been proposed earlier (Ochman et al., 2000). The inclusion of secondary metabolism among the functional categories of acquired genes that may have this effect sheds new light on the functional importance and evolutionary significance of this class of genes. Although the ecological functions of secondary metabolites remain largely unknown, and thus it is not clear how these molecules might facilitate ecological diversification, there is mounting evidence that they play important roles in chemical defense (Haeder et al., 2009) or as signaling molecules involved in population or community communication (Yim et al., 2007).

Differences between the two species also occur in CRISPR sequences, which are non-continuous direct repeats separated by variable (spacer) sequences that have been shown to confer immunity to phage (Barrangou et al., 2007). The ST genome carries three intact prophages and three CRISPRs (35 spacers), whereas only one prophage has been identified in the genome of SA, which possesses eight different CRISPRs (140 spacers). The SA prophage is unprecedented among bacterial genomes in that it occurs in two adjacent copies that share 100% sequence identity. These copies are flanked by tRNA att sites and separated by an identical 45-bp att site, suggesting double integration as opposed to duplication (te Poele et al., 2008). Remarkably, four of the SA CRISPRs possess a spacer that shares 100% identity with portions of three different genes found in ST prophage 1 (Figure 2). These spacer sequences have no similar matches to genes in the SA prophage or in any prophage sequences deposited in the NCBI, CAMERA or the SDSU Center for Universal Microbial Sequencing databases. The detection of these spacer sequences provides evidence that SA has been exposed to a phage related to one that currently infects ST and that SA now maintains acquired immunity to this phage genotype as has been reported earlier in other bacteria (Barrangou et al., 2007). This is a rare example in which evidence has been obtained for CRISPR-mediated acquired immunity to a prophage that resides in the genome of a closely related environmental bacterium. Given that SA strain CNS-205 was isolated from Palau, whereas ST strain CNB-440 was recovered 15 years earlier from the Bahamas, it appears that actinophages have broad temporal–spatial distributions or that resistance is maintained on temporal scales sufficient for the global distribution of a bacterial species.

Figure 2
figure 2

Salinispora tropica prophage and Salinispora arenicola CRISPRs. Four of eight SA CRISPRs (1, 5, 7 and 8) have spacers (color coded) that share 100% sequence identity with genes (Stro numbers and annotation given) in ST prophage 1 (Supplementary Table S2, inverted for visual purposes). Other CRISPRs are colored purple. SA CRISPRs 2–3 and 5–6 share the same direct repeats and may have at one time been a single allele. CRISPR-associated (CAS) genes (red) and genes interrupting CRISPRs (black) are indicated. None of the spacer sequences possessed 100% identity to prophage in the NCBI non-redundant sequence database, the SDSU Center for Universal Microbial Sequencing database or the CAMERA metagenomic database. CRISPRs, clustered regularly interspaced short palindromic repeats.

Enhanced phage immunity, as evidenced by 140 relative to 35 CRISPR spacer sequences, coupled with a larger genome size and a greater number of species-specific secondary metabolic pathways, may account for the cosmopolitan distribution of SA relative to ST, which to date has only been recovered from the Caribbean (Jensen and Mafnas, 2006). Also included among the SA-specific gene pool is a complete phosphotransferase system (Sare4844–4850). Phosphotransferase systems are centrally involved in carbon source uptake and regulation (Parche et al., 2000) and may provide growth advantages that also factor into the relatively broad distribution of SA. However, additional strains will need to be studied before any of these differences can be firmly linked to species distributions.

The 21 genomic islands are not contiguous regions of species-specific DNA but are instead created by a complex process of gene acquisition, loss, duplication and inactivation (Figure 3). The overall composition, evolutionary history and function of the island genes are similar in both strains, with duplication and HGT accounting for the majority of genes and secondary metabolism representing the largest functionally annotated category. Remarkably, 42% of the rearranged island orthologs fall within other islands, indicating that inter-island movement or ‘island hopping’ is common, thus providing support for the hypothesis that islands undergo continual rearrangement (Coleman et al., 2006). There is dramatic, operon-scale evidence of this process in the shared yersiniabactin-related pathways (ST sid2 and SA sid1), which occur in islands 15 and 10, respectively, and in the unknown dipeptide pathways (ST nrps1 and SA nrps3), which occur in islands 4 and 15, respectively. In both cases, these pathways remain largely intact yet are located in different islands in the two strains (Figure 1, Tables 2 and 3). There is also evidence of cluster fragmentation in the 10-membered enediyne gene set SA pks3, which contains the core set of genes associated with calicheamicin biosynthesis (Supplementary Figure S2) (Ahlert et al., 2002), yet is split by the introduction of 145 kb of DNA from three different biosynthetic loci (island 10, Figure 1). The conserved fragments appear to encode the biosynthesis of a calicheamicin analog, whereas flanking genes display a high level of gene duplication and rearrangement indicative of active pathway evolution. Cluster fragmentation is also likely observed in the nine-membered enediyne PKS cluster SA pks1A–C, which is scattered across the genome in islands 4, 10 and 21 (Figure 1 and Table 3).

Figure 3
figure 3

Composition, evolutionary history and function of island genes in Salinispora tropica (ST) and Salinispora arenicola (SA). (a) In all, 3040 genes comprising 21 genomic islands were analyzed for positional orthology (that is, the gene is part of the shared ‘core’ genome), re-arranged orthology (that is, the gene is present in the other genome but not in the same position or island) and species specificity (gene totals presented in wedges). (b) The ST and SA species-specific island genes were analyzed for evidence of parology, xenology and HGT. Pseudogenes and the number of genes with no evidence for any of these processes were also identified. (c) Functional annotation of the species-specific island genes. (d) Distribution of species-specific island genes that have no evidence for HGT or parology among 27 actinobacterial genomes. HGT, horizontal gene transfer.

The genomic islands are also enriched in mobile genetic elements including prophage, integrases, and actinobacterial integrative and conjugative elements (AICEs) (Burrus et al., 2002) (Supplementary Tables S2, S3), the later of which are known to play a role in gene acquisition and rearrangement. The Salinispora AICEs possess traB homologs, which promote conjugal plasmid transfer in mycelial streptomycetes (Reuther et al., 2006), suggesting that hyphal tip fusion is a prominent mechanism driving gene exchange in these bacteria. AICEs have been linked to the acquisition of secondary metabolite gene clusters (te Poele et al., 2007), and their occurrence in island 7 (SA AICE1), which includes the entire 90-kb rif cluster, and island 10 (SA AICE3), which contains biosynthetic gene clusters for enediyne, siderophore and amino-acid-derived secondary metabolites, provides a mechanism for the acquisition of these pathways (Figure 1). Six additional secondary metabolite gene clusters (ST nrps1, ST spo, SA nrps3, SA pks5, SA cym and SA pks2) are flanked by direct repeats, providing further support for HGT. In the case of cym (Schultz et al., 2008), which is clearly inserted into a tRNA, the pseudogenes preceding and following it are all related to transposases or integrases, providing a mechanism for chromosomal integration.

Despite exhaustive analyses of HGT, only 22% of the 127 genes in the five biosynthetic pathways (rif, sta, des, lym and cym) whose products have also been observed in other bacteria (Figure 1 and Table 3) scored positive for HGT. This observation suggests either that the pathways originated in Salinispora or that the exchange of these biosynthetic genes has occurred largely among closely related bacteria and therefore gone undetected with the HGT methods applied in this study. The latter scenario is supported by the observation that all five of the shared biosynthetic pathways were previously reported in other actinomycetes. The acquisition of genes from closely related bacteria likely accounts for many of the species-specific island genes for which no evidence of evolutionary history could be determined (Figure 3b). These genes were poorly conserved among 27 actinobacterial genomes (Figure 3d), providing additional support that they were acquired most likely from environmental Actinobacteria that are not well represented among sequenced genomes. Although gene loss was not quantified, this process is also a likely contributor to island formation. In support of an adaptive role for island genes, 7.6% (44/573) of the orthologs show evidence of positive selection (dN/dS>1) compared with 1.6% (49/3027) of the non-island pairs. Given that the majority of island genes display evidence of HGT, the increased dN/dS ratio is in agreement with the observation that acquired genes experience relaxed functional constraints (Hao and Golding, 2006).

Functional differences between related organisms can be obscured when orthologs are taken out of the context of the gene clusters in which they reside. For example, the PKS genes, Sare1250 and Stro2768, are orthologous and perform similar functions, yet they reside in the rif and slm pathways, respectively, and thus contribute to the biosynthesis of dramatically different secondary metabolites. Likewise, intra-cluster PKS gene duplication (Sare3151 and Sare3152; Figure 1) has an immediate effect on the product of the pathway by the introduction of an additional acyl group into the carbon skeleton of the macrolide as opposed to the more traditional concept of duplication facilitating mutation-driven functional divergence (Prince and Pickett, 2002). Subgenic, modular duplications are also observed (Sare3156 modules 4 and 5; Figure 1), which likewise have an immediate effect on the structure of the secondary metabolite produced by the pathway. Although HGT is considered a rapid method for ecological adaptation in bacteria (Ochman et al., 2000), PKS gene duplication provides a complementary evolutionary strategy (Fischbach et al., 2008) that could lead to the rapid production of new secondary metabolites that subsequently drive the creation of new adaptive radiations.

Salinispora species are the first marine Actinobacteria reported to require seawater for growth (Maldonado et al., 2005). Unlike Gram-negative marine bacteria, in which seawater requirements are linked to a specific sodium ion requirement (Kogure, 1998), Salinispora strains are capable of growth in osmotically adjusted, sodium-free media (Tsueng and Lam, 2008). An analysis of the Salinispora core for evidence of genes associated with this unusual osmotic requirement reveals a highly duplicated family of 29 polymorphic membrane proteins that include homologs associated with polymorphic outer membrane proteins. Polymorphic outer membrane proteins remain functionally uncharacterized; however, there is strong evidence that they are type V secretory systems (Henderson and Lam, 2001), making this the first report of type V autotransporters outside of Proteobacteria (Henderson et al., 2004). Phylogenetic analyses provide evidence that the Salinispora polymorphic membrane proteins were acquired from aquatic, Gram-negative bacteria and that they have continued to undergo considerable duplication subsequent to divergence of the two species (Supplementary Figure S3). The occurrence of this large family of polymorphic membrane protein autotransporters in marine Actinobacteria may represent a low-nutrient adaptation that renders cells susceptible to lysis in low osmotic environments.

Conclusion

In conclusion, the comparative analysis of two closely related marine actinobacterial genomes provides new insights into the functional traits associated with genomic islands. It has been possible to assign precise, physiological functions to island genes and link differences in secondary metabolism to fine-scale phylogenetic architecture in two distinct bacterial lineages, which by all available metrics maintain the fundamental characteristics of species-level units of diversity. It is clear that gene clusters devoted to secondary metabolite biosynthesis are dynamic entities that are readily acquired, rearranged and fragmented in the context of genomic islands, and that the results of these processes create natural product diversity that can have an immediate effect on fitness or niche utilization. The high level of species specificity associated with secondary metabolism suggests that this functional trait may represent a previously unrecognized force driving ecological diversification among closely related, sediment-inhabiting bacteria.