Introduction

The Roseobacter clade can comprise upwards of 25% of the marine bacterioplankton and consists of at least 41 major phylogenetic clusters (in total more than 141 phylogenetic clusters) based on ⩾99% 16S rRNA gene sequence similarity (Buchan et al., 2005; Wagner-Döbler and Biebl, 2006). Unlike most other numerically abundant marine bacterial lineages, strains of Roseobacter are readily cultivated and this clade, as a whole, is well represented by cultured isolates. Among the 141 recognized Roseobacter clusters, 120 clusters (85%) contain cultivable representatives (Buchan et al., 2005). The studied members of the Roseobacter group contain diverse specific biological and ecological features and can consume various carbon and sulfur compounds such as dimethyl sulfoniopropionate (DMSP) (González et al., 1999; Miller and Belas, 2004) and carbon monoxide (Moran et al., 2004; Tolli et al., 2006). Thus, they are important to global biogeochemical carbon and sulfur cycles. Isolation and molecular ecology studies have revealed that the Roseobacter clade occupies diverse marine environments but is most predominant in coastal waters (Buchan et al., 2005). Furthermore, many roseobacters have been found in association with algal blooms (González et al., 2000; Alavi et al., 2001; West et al., 2008), and some roseobacters have been isolated from, and found to be dominant in polar environments (Brown and Bowman, 2001; Brinkmeyer et al., 2003; Selje et al., 2004; Prabagaran et al., 2007). The first complete genome sequence of a marine roseobacter, Silicibacter pomeroyi, was reported by Moran et al. (2004), and there are now complete or draft genome sequences for ca. 40 marine roseobacters, representing diverse clusters (Brinkhoff et al., 2008). This information will facilitate studies of these important organisms at the genomic level.

Bacteria in the Rhodobacter clade occur more frequently in freshwater or estuarine environments (Crump et al., 1999, 2004; Kan et al., 2007), and can also occur in high abundance in some marine environments (Hiraishi and Ueda, 1995). Similar to Roseobacter, members of Rhodobacter are also metabolically versatile and exhibit genomic complexity (Nereng and Kaplan, 1999; Choudhary et al., 2004, 2007).

An unusual bacteriophage-like vehicle of genetic exchange known as a gene transfer agent (GTA) was first reported in the purple non-sulfur bacterium Rhodobacter capsulatus (Marrs, 1974). GTA is a small phage-like particle released by the bacteria, and each particle contains a random ca. 4.5-kb fragment of bacterial genomic DNA (Solioz and Marrs, 1977) that can be transferred between cells (Solioz et al., 1975; Biers et al., 2008). This has made GTA a very valuable tool for the study of R. capsulatus for more than 30 years, aiding with the construction of new strains (for example, Scolnik and Haselkorn, 1984; Lilburn et al., 1992; Lang and Beatty, 2001, 2002) and mapping of genetic loci (for example, Wall et al., 1975; Yen and Marrs, 1976; Wall and Braddock, 1984).

The R. capsulatus GTA (RcGTA) has been genetically characterized by Lang and Beatty (2000). The RcGTA gene cluster comprises 15 genes, and recently, conserved GTA gene clusters were found in other species and appear to be limited to the Rhodobacterales and some other Alphaproteobacteria (Lang and Beatty, 2007; Biers et al., 2008). Currently, all but one of the available Rhodobacterales genomes contain a conserved set of genes that constitute the RcGTA gene cluster (Lang and Beatty, 2007; Biers et al., 2008; Paul, 2008), but these genes are not found in other major bacterioplankton lineages, such as Prochlorococcus, Synechococcus and Pelagibacter spp. GTA-related gene transfer was suggested as a potential adaptation mechanism for these bacteria to maintain the metabolic flexibility to the changing marine environment (Biers et al., 2008). Among the GTA genes, the major capsid gene (g5) is highly conserved among all of these bacteria and the phylogeny based on g5 is consistent with that based on 16S rRNA genes (Lang and Beatty, 2007).

Chesapeake Bay is the largest estuary in North America. It receives a great amount of freshwater and nutrient inputs from tributary rivers. Salinity, nutrient and other environmental factors display significant spatial and seasonal variation in the bay (Kan et al., 2006). Similarly, the bacterioplankton in the bay exhibit strong distribution patterns (Kan et al., 2006). Members of the class Alphaproteobacteria, particularly Roseobacter, are one of the dominant bacterial populations in Chesapeake Bay and their sequences can constitute more than a third of 16S rRNA gene clone libraries (Kan et al., 2007). In a recent study of the bay, 11 Roseobacter clusters were identified based on bacterial rRNA operon clone library analysis, and greater genetic diversity of the Roseobacter clade was found in winter than in summer (Kan et al., 2008). Notably, several novel clusters of Roseobacter appeared to be unique to Chesapeake Bay (Kan et al., 2008). Members of the Rhodobacter (related to Pseudorhodobacter spp.) were detected in the upper bay during the wintertime, but were not detected in the summer (Kan et al., 2007).

In this study, we developed a primer set to target the Roseobacter and Rhodobacter GTA g5 gene to explore (1) whether we can detect Roseobacter and Rhodobacter GTA major capsid genes in the natural aquatic environment; (2) the diversity of g5 gene sequences in Roseobacter and Rhodobacter and (3) the spatial and temporal genetic variations of g5 gene diversity in Chesapeake Bay.

Materials and methods

Sample collection and microbial DNA preparation

The four Chesapeake Bay samples used in this study included three samples collected from stations 908, 818 and 707 (Supplementary Figure S1) in March 2003, and one sample collected from station 818 in July 2003. Stations 908, 818 and 707 in March 2003 were chosen to represent the upper, middle and lower bay in winter, whereas station 818 in July 2003 represents a middle bay sample in summer (Table 1). The winter DNA samples used in this study were the same as those used for earlier 16S rRNA gene clone library analyses (Kan et al., 2007, 2008). The sample collection and DNA preparation methods were described in detail by Kan et al. (2006). Briefly, 500 ml sea water was filtered onto 0.2-μm-pore-size polycarbonate filters (47 mm diameter; Millipore, Billerica, MA, USA) and microbial community DNA was extracted from the filters using the phenol–chloroform protocol as described earlier (Kan et al., 2006).

Table 1 Characteristics of environmental parameters and clone information for each sampling station

Bacterial counts, chlorophyll a and nutrient data

The biological and hydrological data in Table 1 were acquired from the existing database of the Microbial Observatories of Virioplankton Ecology (MOVE) project (http://www.virusecology.org/MOVE/Home.html), and the methods for measuring these parameters have been described elsewhere (Kan et al., 2006, 2007).

Isolation of Roseobacter and Rhodobacter

Roseobacter strains isolated from Chesapeake Bay in July 2007 were obtained by direct plating of sea water onto agar plates containing 500 μM DMSP as the sole carbon source. The growth medium contained 200 mM NaCl, 50 mM MgSO4, 5 mM CaCl2, 150 μM K2HPO4, 8 mM NH4Cl, 5 mM Tris-HCl (pH 7.6). Filter-sterilized vitamin, trace metal and iron stock solutions were added to this basal media as described earlier (Budinoff and Hollibaugh, 2007) and these along with DMSP were added to autoclaved salt solutions. Agar plates were prepared with purified agar at 0.8% w/v and kept aerobically in a temperature-controlled incubator (20 °C). Alternatively, several Roseobacter and Rhodobacter strains were obtained from Chesapeake Bay in April 2008 and Virginia Beach in March 2007 by using the previously described DMSP enrichment method (González et al., 2003) with some modification. Sea water samples were diluted with sterilized artificial sea water, then amended with 100 μM DMSP. After 1 week of incubation (20 °C, in the dark), 10 μl of the enrichment culture was plated on low-nutrient sea water medium plates (González et al., 1999) containing 100 μM DMSP and kept at 20 °C temperature in the dark. Colonies were randomly selected, purified and identified by 16S rRNA gene analysis as described earlier (Giovannoni, 1991).

Primer design and PCR amplification

Degenerate PCR primers specific for the Roseobacter and Rhodobacter clades of the order Rhodobacterales were designed based on an alignment of the GTA capsid protein gene (g5) sequences from 13 members of the Rhodobacterales, the genome sequences of which were known (Supplementary Figure S2). The conserved amino-acid domains GY/FLVDPQT and AKPHVLF (corresponding to amino-acid residues 109–116 and 362–368 for S. pomeroyi DSS-3, respectively) were selected for design of the forward primer MCP-109F, 5′-GGCTA(T/C)CTGGT(G/C)GATCC(G/C)CA(G/A)AC-3′ and reverse primer MCP-368R, 5′-TAGAACAG(G/C)AC(G/A)TG(G/C)GG(T/C)TT(G/T)GC-3′, respectively. All the PCR reactions were performed in 50 μl volume containing 1 × reaction buffer (Genescript, Scotch Plains, NJ, USA) with 1.5 mM MgCl2, 100 μM of each deoxynucleoside triphosphate, 10 pmol of each primer and 1 U Taq DNA polymerase (Genescript). For Roseobacter isolates, DNA released from boiled exponentially growing cultures or extracted using the DNeasy Blood and Tissue kit (Qiagen, Valencia, CA, USA) were used as templates. To ensure the quality and quantity of DNA inputs for PCR amplification of environmental samples, extracted bacterial community DNA samples were first amplified with GenomiPhi V2 (GE Healthcare, Piscataway, NJ, USA) according to the manufacturer's protocol and ca. 100 ng DNA was subsequently used as a template for each PCR reaction. The PCR program included an initial denaturing step at 94 °C for 3 min, followed by 35 cycles of 94 °C for 1 min, annealing at 58 °C for 30 s, and 72 °C for 1 min, and followed by a final extension step at 72 °C for 10 min.

Cloning, sequencing and phylogenetic analysis

The purified g5 amplicons were cloned with the TOPO-TA cloning kit and ligated plasmids were transformed into TOP10 competent Escherichia coli cells according to the manufacturer's instructions (Invitrogen, Carlsbad, CA, USA). Transformants were selected on Luria–Bertani agar plates containing ampicillin (50 μg ml−1) and X-Gal (5-bromo-4-chloro-3-indolyl-beta-D-galactopyranoside) (20 μg ml−1). White colonies were screened by PCR using the vector primers M13F (5′-GTAAAACGACGGCCAG-3′) and T7R (5′-TAATACGACTCACTATAGGG-3′). Purified PCR products were then sequenced using an automated sequencer ABI310 (Applied Biosystems, Foster City, CA, USA) in the Biological and Analytical Laboratory at the Center of Marine Biotechnology, University of Maryland Biotechnology Institute. For each library, 30–44 clones with appropriately sized inserts were sequenced.

All of the g5 sequences were edited using Mac Vector 7.1 program (GCG) to remove plasmid vector and primer sequences, and the DNA sequences were subsequently translated into amino-acid sequence. The resulting capsid protein sequences obtained in this study were aligned and compared with reference sequences in the database. Neighbor-joining phylogenetic trees were constructed using MEGA 4.0 software (Tamura et al., 2007). For translated amino-acid sequence of g5 gene, the evolution distances were calculated under the Jones–Taylor–Thornton model with rate variation among sites and complete deletion of gaps. For DNA sequences of the g5 and 16S rRNA genes, the evolutionary distances were calculated using maximum composite likelihood with complete deletion of gaps.

Statistical analyses

Rarefaction analysis was performed using Analytic Rarefaction version 1.3 (http://www.uga.edu/~strata/software). The coverage for each clone library (C, the fraction of the population represented by the phylotypes that have been discovered in each clone library) was calculated by the equation C=1−(n/N) × 100, where n is the number of unique clones and N is the total number of clones examined (Ravenschlag et al., 1999). Operational taxonomic unit was defined as greater than or equal to 98% identical at the DNA sequence level. The statistical methods used for the estimation of species richness and diversity indices were based on coverage. Coverage-based estimations of species richness and evenness as well as the Shannon–Weaver index (H) and Simpson's index (D) were calculated by using PAST (Hammer et al., 2001).

Nucleotide sequence accession numbers

The g5 nucleotide sequences determined in this study have been deposited in GenBank under accession number of EU929030 to EU929055.

Results and discussion

Testing the g5 primers

The g5 gene-specific primers (MCP-109F and MCP-368R) were tested against 22 Roseobacter and 4 Rhodobacter strains representing diverse Rhodobacterales isolates (Table 2). For all isolates shown in Table 2, PCR products of the expected size (782–794 bp) were recovered and verified to be g5 by sequencing. All but one of a collection of Roseobacter and Rhodobacter strains isolated from Chesapeake Bay yielded a PCR product of the appropriate size with this primer set (Table 2; CB and AB, unpublished data). These results indicate that this primer set is suitable for recovering g5 gene sequences from diverse groups of Roseobacter and Rhodobacter. Application of the g5 primer set to four microbial communities collected from different locales in Chesapeake Bay yielded one specific amplicon of the expected size (data not shown), pointing to the suitability of this primer for the amplification of g5 genes from environmental samples. To explore the diversity of g5 genes and their spatial and temporal variations in the Chesapeake Bay, the PCR products from these four samples were cloned and sequenced.

Table 2 Summary of Roseobacter and Rhodobacte r strains used in this study

Diverse and unique Roseobacter in the Chesapeake Bay

GTA diversity was assessed for four natural microbial assemblages representing upper, middle and lower bay stations (Supplementary Figure S1) during winter (March 2003) and middle bay in summer (July 2003). A total of 158 g5 sequences were recovered from these four clone libraries. Phylogenetic analysis of the environmental g5 clone sequences placed them all within Roseobacter and Rhodobacter clades and revealed 12 distinct phylogenetic clusters (designated as A–L; Figure 1). Eleven of these clusters fell within the Roseobacter clade and one cluster belonged to the Rhodobacter clade.

Figure 1
figure 1

Neighbor-joining phylogenetic tree based on partial g5 amino-acid sequences (ca. 250 aa) showing the phylogenetic diversity of g5 from Chesapeake Bay bacterial communities and Roseobacter and Rhodobacter isolates (bold type). Bootstrap numbers are shown as percentages based on 1000 replicates, and values of less than 50 were omitted. Numbers in the parentheses represent the number of closely related clones in that cluster per se. *Strains without accession number, but can be found at http://research.venterinstitute.org/moore/. aSix additional Chesapeake Bay isolates (CB1032, CB1023, CB1028, CB1083, CB1030 and CB1088) are closely related to CB1005.

Among the 12 clusters retrieved from the Chesapeake Bay, only two clusters contain cultivated representatives (clusters E and J; Figure 1). Other clusters were divergent from known g5 sequences derived from either genome-sequenced Rhodobacterales strains or our bay isolates (<90% amino-acid identity). This finding mirrors recent 16S rRNA gene analysis-based studies of these same communities and further supports the finding that Chesapeake Bay contains unique Roseobacter clusters (Kan et al., 2007, 2008). Additional efforts to isolate and characterize more indigenous roseobacters from the bay, particularly from winter samples, may help us better understand the distribution and ecology of these unique bacteria.

The coverage of each clone library ranged from 83% to 93% (Table 1). Rarefaction analysis showed that all clone libraries have a nearly sufficient number of clones to represent g5 richness (Supplementary Figure S3). The diversity of g5 gene sequences varied with sampling seasons and sites. In wintertime, both Shannon–Weaver and Simpson indices reveal that g5 diversity is higher in the middle and upper bay compared with the lower bay (Tables 1 and 3). The summer sample had lower g5 diversity compared with the winter sample (Tables 1 and 3).

Table 3 Comparison of g5 composition and distribution in four clone libraries

Variation of GTA capsid genotypes along the Chesapeake Bay

A spatial variation of g5 composition from upper to lower bay during the wintertime was evident (Table 3). Clusters A and B were unique to the upper bay, whereas clusters C and D occurred only in the lower bay. Clusters G–L were present in all three of the stations during winter with variable clone distribution frequency. Unlike the upper and lower bay, no unique cluster was found in the middle bay sample.

Cluster A constitutes more than 20% of the g5 clones in the upper bay sample and is more closely related to Rhodobacter clade (Figure 1). A prior investigation of the phylogenetic diversity of the microbial community from this same sample revealed that members of the Rhodobacter group are abundant, representing >15% of the clones in a 16S rRNA gene library (Kan et al., 2007). Several Rhodobacter strains have complete genome sequences available and are known to contain the GTA gene clusters (Lang and Beatty, 2007; Biers et al., 2008). As more Rhodobacter genome sequences become available, it may be prudent to re-evaluate whether specific g5 primer sets can be designed to discriminate between the Rhodobacter and Roseobacter groups.

Clusters C and D appeared only in the lower bay in winter and were most closely related to Roseobacter sp. CCS2 (88% and 89% amino-acid identity, respectively; Figure 1), which was isolated from Pacific coastal waters (http://www.roseobase.org/roseo/ccs2.html). The lower bay station is near the mouth of Chesapeake Bay, and is largely influenced by Atlantic coastal waters. Therefore, the lower bay has higher salinity (23 p.p.t.) than the upper and middle bay (Table 1). Thus, it is possible that these two clusters are more representative of typical coastal ocean roseobacters in comparison to other clusters recovered in this study.

Distinct GTA capsid genotypes are found between winter and summer

Clusters E and F (Figure 1) together accounted for 93% of the summer clone library and were not detected in any winter samples (Table 3), suggesting that there was a major shift in the Roseobacter population composition between winter and summer. This strong seasonal pattern for population composition was also evident in 16S rRNA gene clone libraries (Kan et al., 2007, 2008). PCR amplicon yields of the g5 gene from the summer samples were lower compared with the spring samples (data not shown). We therefore speculate that Roseobacter abundance could be lower in summer than winter. A quantitative PCR analysis based on the g5 gene could be used to test this hypothesis in the future.

Cluster E contains 13 clones derived from the middle bay summer library (July 2003) as well as 7 additional bay strains that were also isolated several years later from the middle bay during summer (July 2007). This result is consistent with the recurring seasonal pattern of bacterioplankton in the bay (Kan et al., 2006) and suggests these isolates represent a stable, abundant and unique summer-type Roseobacter in the Chesapeake Bay. These seven closely related Roseobacter isolates (represented by CB1005 in Table 2) also have no close match to publicly available clone or isolate sequences based on 16S rRNA gene homology searches (Table 2). Thus, characterization of these strains may reveal features that contribute to the success of these populations during the summer season. Another unique summer Roseobacter population (represented by 16 environmental clones) is cluster F (Figure 1). However, no Chesapeake Bay isolates fell within cluster F.

Most of the winter clone sequences are related to Sulfitobacter or Loktanella genera. Clusters J, H and K are affiliated with the Sulfitobacter group and likely represent the cold-adapted Roseobacter species (Figure 1). It has been demonstrated earlier that Sulfitobacter-related clones dominate the winter Chesapeake Bay bacterioplankton community (Kan et al., 2007). Cluster J contains 12 clones recovered from all three winter libraries and strain Roseobacter sp. GAI-101 isolated from Georgia coastal sea water (USA) in winter (González et al., 1999). Nearly all of the closest 16S rRNA gene sequences in GenBank to GAI-101 represent clones or isolates recovered from polar regions. Taken together, these data suggest that the unique winter clones in clusters J, H and K may represent roseobacters adapted to the winter cold environment in a temperate estuary. Several distinct Roseobacter clusters have been identified in temperate and polar regions based on the 16S rRNA gene marker (Selje et al., 2004; Prabagaran et al., 2007).

The g5 clones in clusters G and I dominated the upper bay and their abundance declined from the upper to lower bay during the winter time. Such a trend could be related to the distribution pattern of phytoplankton Chl a (Table 1). Phytoplankton (that is, diatoms and dinoflagellates) blooms occur frequently during the winter–spring period in the upper Chesapeake Bay. It has been demonstrated that phytoplankton biomass and temperature are the two main factors regulating the bacterial population dynamics in the Bay (Kan et al., 2006).

Congruency between the 16S rRNA gene and g5 gene markers

The comparison between g5 and 16S rRNA gene phylogenies showed that the majority of environmental clones retrieved from the same winter samples based on the two different gene makers come from the same Roseobacter groups (Sulfitobacter and Loktanella; Figures 2a and b). In March 2003, clones that belong to Sulfitobacter and Loktanella groups together accounted for 85% of all the Roseobacter g5 clones and 80% of the Roseobacter 16S rRNA gene clones. The GTA g5 gene appears to be a conserved gene marker for Rhodobacterales. It is likely that the GTA gene clusters were acquired prior to the separation of the last common ancestor of all roseobacters and have been well preserved since then in different Roseobacter/Rhodobacter lineages with little lateral gene exchange (Lang and Beatty, 2007). Currently, no g5 genes from other bacterial groups fall into the Rhodobacterales group, more bacterial genome sequences will elucidate the monophylogeny of g5 for Rhodobacterales.

Figure 2
figure 2

Phylogenetic trees based on g5 nucleotide sequences (ca. 750 bp) (a) and 16S rRNA gene sequences (ca. 1300 bp) (b), respectively. Clones from Chesapeake Bay are shown in bold. Bootstrap numbers are shown as percentages based on 1000 replicate trees, and values of less than 50 were omitted. *Strains without accession number, but can be found at http://research.venterinstitute.org/moore/.

Exploring Roseobacter diversity and abundance based on the GTA capsid gene g5 has several advantages: (1) all known Roseobacter genomes but one (strain HTCC2255), representing diverse members of Roseobacter clade, contain GTA gene clusters. The strain HTCC2255 seems to diverge from main stream roseobacters as it was isolated from oligotrophic waters and contains a much smaller genome (∼2.3 Mb) compared with typical Roseobacter genomes (Biers et al., 2008); (2) the GTA g5 gene is highly conserved among the roseobacters; (3) each Roseobacter genome only contains a single copy of conserved g5 gene and (4) the g5 phylogeny is congruent with 16S rRNA gene phylogeny.

Conclusion

This is the first study examining the diversity of GTA capsid protein genes in the natural aquatic environment. The high diversity of g5 sequences retrieved from Chesapeake Bay microbial communities and the concordance between g5- and 16S rRNA-based phylogenies demonstrate that the GTA capsid gene is an ideal group-specific gene marker for investigating the spatial and temporal distribution of Roseobacter and Rhodobacter groups in the natural environment. The high frequency of conservation of GTA genes in these abundant organisms suggests that GTA may be important in aquatic bacterial populations.