Introduction

Microorganisms are the most abundant and diverse life forms on the Earth and play integral, often unique, roles in the biogeochemical cycling of elements that are crucial to ecosystem function and sustainability (Whitman et al., 1998). However, the definition of prokaryotic species is arbitrary in that it is more of a function of the methodology than of any actual relationship (Harayama and Kasai, 2006; Stackebrandt and Goebel, 1994; Whitman et al., 1998). By contrast, the definition of species in multicellular organisms is well defined: two organisms that reproduce sexually to produce fertile offspring. The definition of species in bacteria is problematic, because reproduction is generally asexual, and the exchange of genetic material can be horizontal. As such, there have been a number of attempts to proffer a working definition of what constitutes a bacterial species (Wayne et al., 1987; Cohan, 2002; Harayama and Kasai, 2006). One well-accepted definition is based on whole-genome DNA–DNA hybridization, by which a procaryotic species is defined as an entity comprised of strains sharing an approximately 70% or greater reassociation value (Wayne et al., 1987). Although this definition is pragmatic and universally applicable within the procaryotic world (Rossello-Mora and Amann, 2001; Stackebrandt et al., 2002; Konstantinidis and Tiedje, 2004), there is no theoretical foundation for setting the value of 70% reassociation value as a criterion for species designation. Nonetheless, it is still considered as the ‘gold standard’ for delineating procaryotic species (Harayama and Kasai, 2006; Stackebrandt, 2006). Almost 5000 bacterial species have been described (Garrity et al., 2004), many of them classified predominantly on the basis of whole-genome DNA–DNA reassociation experiments (Rossello-Mora, 2006). The practice of DNA–DNA hybridization has allowed the establishment of a relatively stable and operative classification system for procaryotes (Stackebrandt et al., 2002).

Over the last 50 years, various hybridization-based techniques have been developed for microbial taxonomy (Rossello-Mora, 2006). All of these methods measure the extent and/or stability of complementary DNAs to renature under stringent conditions. Two main strategies have been used for performing re-association experiments: one is to carry out the hybridization reaction in free solution (Brenner et al., 1969; Crosa et al., 1973; Popoff and Coynault, 1980; Ziemke et al., 1998) and the other is to immobilize the test DNA onto a solid surface, such as membrane filters and microtiter plates (De Lay and Tijtgat, 1970; Ezaki et al., 1989; Amann et al., 1992; Adnan et al., 1993; Kaznowski, 1995; Cardinali et al., 2000; Gade et al., 2004; Mehlen et al., 2004). Relative binding ratio, the proportion of double-stranded hybrid DNA for a given pair of genomes relative to that of the reference DNA under identical renature conditions, and/or the increment of melting temperature (ΔTm) can be used to measure genome similarities. However, DNA–DNA hybridization is extremely labor intensive, tedious and time consuming because only pairwise comparisons can be made per experiment, thus prohibiting implementation of this method for analysis of large culture collections. More effective high-throughput approaches are needed to allow large-scale parallel analysis of many microbial genomes.

Compared to the conventional nucleic acid hybridization with porous membranes, real-time PCR and other molecular approaches, microarray-based hybridization offers advantages of high-throughput and parallel detection. DNA microarrays have been widely used to analyze gene expression in pure-culture studies (Schena et al., 1995, 1996; Lockhart et al., 1996; DeRisi et al., 1997; Ye et al., 2000; Liu et al., 2003; Gao et al., 2004), environmental microbial community analysis (Wu et al., 2001; Loy et al., 2002; Bodrossy et al., 2003; Rhee et al., 2004; Tiquia et al., 2004; Schadt et al., 2005; Zhang et al., 2007) and the comprehensive comparison of genomes among closely related species (Behr et al., 1999; Salama et al., 2000; Dong et al., 2001; Kato-Maeda et al., 2001; Murray et al., 2001). Two types of microarrays have been used for determining species relatedness. One is to use the whole-genome open-reading frame array-based hybridization approach to reveal genome diversity and relatedness among closely related organisms (Murray et al., 2001). The other is to use DNA microarrays containing random genomic fragments to delineate species relationships where the genome sequence does not exist (Cho and Tiedje, 2001). These types of microarrays are useful in revealing the genomic diversity and relatedness of closely related organisms with higher resolution compared to traditional DNA–DNA hybridization methods (Zhou, 2003). These approaches, however, are more time consuming and costly to develop and such arrays would have more limited use because many of the arrayed probes would be used for each reference microorganism (Zhou, 2003).

We have previously developed a novel type of microarray, termed the community genome array (CGA), that contained whole-genomic DNA of different microorganisms (Wu et al., 2004). This prototype array was tested for specificity, sensitivity, quantitation and environmental applications with respect to microbial community analysis (Wu et al., 2004). Our results suggested that CGA-based hybridization has potential as a specific, sensitive and quantitative tool for the detection and identification of microorganisms in environmental samples. Additionally, CGA-based hybridization is potentially useful for high-throughput parallel determination of species relatedness of many microorganisms. To test this possibility, we evaluated the CGA-based hybridization as an alternative method to measure DNA relatedness, and compared predicted relationships with traditional whole-genome DNA–DNA hybridization methods, small subunit (SSU) rRNA and gyrB gene analysis, and genomic fingerprinting. Our results indicated that the CGA-based hybridization is comparable to traditional DNA–DNA hybridization methods in terms of species delineation and could serve as a powerful, high-throughput format for determining species relatedness of microorganisms.

Materials and methods

Bacterial strains and genomic DNA isolation

Both closely and distantly related representative bacterial strains (n=66) were selected for this study on the basis of phylogenetic relationships, GC content, availability of DNA–DNA reassociation data and other molecular studies, and/or the accessibility of the strains (Supplementary Table_SM1). Fifty-five of the 66 representative bacterial strains were from the laboratory culture collections of JZ and JT, and included 16 Shewanella strains, 30 Pseudomonas strains and 9 Azoarcus strains. The genotypic and phenotypic descriptions and taxonomic classification of some of these bacteria have been reported elsewhere (Rossello et al., 1991; Zhou et al., 1995; Song et al., 1999; Venkateswaran et al., 1999; Cladera et al., 2004). The strain collection also included one Halomonas strain, one α-proteobacterium strain, five Marinobacter strains and two Bacillus strains (Braker et al., 2000). Escherichia coli S17-1/pir and a Saccharomyces cerevisiae strain were also included as negative controls.

The genomic DNAs were isolated from pure cultures using previously described protocols (Zhou et al., 1995). All genomic DNA samples were treated with RNase A (Sigma, St Louis, MO, USA) and analyzed on agarose gels stained with ethidium bromide prior to microarray fabrication. DNA concentration was determined in the presence of ethidium bromide by fluorometric measurement of the excitation at 360 nm and emission at 595 nm using a HTS700 BioAssay Reader (Perkin-Elmer, Norwalk, CT, USA).

DNA sequencing and genomic fingerprinting

Small subunit rRNA gene sequences for all of the strains used in this study were retrieved from the Ribosomal Database Project (http://rdp.cme.msu.edu/html/). DNA topoisomerase subunit B (gyrB) gene sequences of some representative strains were PCR amplified as described elsewhere (Yamamoto and Harayama, 1995), and the products cloned directly into the pCR2.1 vector according to the manufacturer's instructions (Invitrogen, Carlsbad, CA, USA). The gyrB sequences were determined using the BigDye Terminator kit (Applied Biosystem, Foster City, CA, USA) with a 3700 DNA analyser (Perkin-Elmer, Wellesley, MA, USA) according to the manufacturer's instructions. Sequences were assembled and edited manually using Sequencher 4.7 (Gene Codes Corp., Ann Arbor, MI, USA) and aligned in ClustalW (Thompson et al., 1994). Bootstrap values (500 re-samplings) supported neighbor joining phylogenies that were constructed in MEGA 3.1 (Kumar et al., 2003) from Poisson correction distances to account for multiple substitution events/site using pairwise deletion of gaps and missing data.

The genomic fingerprinting methods of BOX-PCR and REP-PCR were performed according to a previously described protocol (Versalovic et al., 1994). The REP-PCR and BOX-PCR genomic fingerprints for each isolate were linearly combined using Molecular Analyst 1.6 (BioRad, Hercules, CA, USA) with a resolution of 400 as described previously (Rademaker et al., 2000). The similarity between pairs of combined fingerprints was calculated, and cluster analysis of the similarity values was determined using a UPGMA (unweighted pair group method using averages) algorithm.

Microarray construction, probe labeling and hybridization

The genomic DNA probe concentration used for printing was 200 ng μl−1, and each DNA sample was suspended in 50% DMSO. Five S. cerevisiae genes were also included on the arrays as negative controls. All 71 probes (including negative controls) were arranged as a matrix of 15 rows × 5 columns (denoted columns ae). Genomic DNA samples were prepared for deposition and printed as described previously (Wu et al., 2001). Each glass slide contained three replicates of genomic DNA from individual strains. Following printing, glass slides were post-processed and evaluated for spot quality as described previously (Wu et al., 2001).

Representative strains from different bacterial groups, including three Shewanella strains, two Azoarcus strains, nine Pseudomonas strains, one Halomonas strain, one Bacillus strain, one Marinobacter strain and one α-proteobacterium, were labeled and hybridized against the CGAs. For convenience, these labeled strains are referred to as reference strains. The genomic DNAs (1 μg) from the reference strains were fluorescently labeled using the random priming method and purified as described previously (Wu et al., 2001). Microarray experiments were carried out in triplicate (a total of nine replicates per genomic DNA probe) to enable statistical analyses. Microarray hybridization was performed as described previously (Wu et al., 2004).

Microarray scanning and signal intensity quantitation

A ScanArray 5000 Microarray Analysis System (Perkin-Elmer) was used for scanning microarrays. A quick scan at a resolution of 50 μm was performed prior to the real scanning at a resolution of 5 μm and laser power and photomultiplier tube gain were adjusted to avoid saturation of spots. Scanned image displays were saved as 16-bit TIFF files and analyzed by quantifying the pixel density (intensity) of each spot using ImaGene version 4.0 (Biodiscovery Inc., Los Angeles, CA, USA). Mean signal intensity was determined for each spot and the local background signals were subtracted automatically from the hybridization signal of each spot. Fluorescence intensity values for all replicates of the negative control genes (five yeast genes) were averaged and then subtracted from the background-corrected intensity values for each hybridization signal. The signal-to-noise ratio (SNR) was also calculated based on the formula derived from Verdnik et al. (2002), where SNR=(Signal Intensity−Background)/Standard Deviation of Background. Spots with SNRs lower than 2 were defined as empty spots. Empty spots and poor spots defined by the program based on the morphology and signal saturation of the spots were removed prior to data analysis. The outliers, represented by the data points that were not consistently reproducible and had a disproportionately large effect on the statistical result, were detected and removed at P<0.01. When the absolute value of a data point minus the mean was larger than 2.90 σ, this data point was determined as an outlier and removed.

DNA similarity calculation and cluster analysis

Routine statistical analysis was performed using SigmaPlot 5.0 (Jandel Scientific, San Rafael, CA, USA). DNA similarity of the strain of interest relative to a reference strain (Si) was calculated using the following formula: where fi is the average (nine replicated spots) signal intensity of the spots corresponding to the strain of interest, and fr is the average fluorescence signal intensity of the spots of the reference strain. Relationships among different strains from the microarray hybridizations were determined using hierarchical cluster analysis (CLUSTER) and visualized with TREEVIEW (Eisen et al., 1998).

Results

Species relatedness revealed by the CGA-based hybridization

To determine whether the CGA-based hybridization can be used to determine the relationships of closely related species, the genomic DNAs from the selected representative strains were fluorescently labeled and hybridized to the CGA. The DNA similarity data based on hybridization signal intensities were then used to visualize the relationships among the organisms spotted on the arrays. As expected, all strains examined were well separated at different taxonomic levels, which were more or less consistent with previous classifications, especially at the genus/species levels (Figure 1). For instance, the Gram-positive Bacillus species (Figure 1 branch D) were separated from most Proteobacterial species (Figure 1). Different subdivisions of the Proteobacteria were also fairly well separated from each other (Figure 1). ‘Pseudomonas’ sp. G179 (an α-proteobacterium), for example, did not cluster with true Pseudomonas species (γ-Proteobacteria). Species in the Azoarcus genus (Figure 1, branch B) of the β-Proteobacteria were well separated from the γ-Proteobacteria, represented by Shewanella (Figure 1, branch A) and Pseudomonas species (Figure 1, branch C). In addition, different species or strains within a genus formed distinctive cohesive clusters, such as Azoarcus, Pseudomonas and Shewanella clusters, and were clearly separated from each other. Finally, species relatedness was also revealed based on the labeled strains, which formed distinct clusters as expected (Figure 1, top panel).

Figure 1
figure 1

Hierarchical cluster analysis of species relationships based on DNA similarities obtained from the CGA-based hybridization. This figure was generated using hierarchical cluster analysis (CLUSTER) and visualized with TREEVIEW. (A) Microarray hybridization patterns at 55 °C plus 50% formamide with the labeled genomic DNAs from the selected species indicated in each column. Each row represents the hybridization signal observed for each species (that is, the test strain) when the genomic DNA from the species (that is, the reference strain) indicated in the column was fluorescently labeled and used for hybridization. Black represents no detectable hybridization above background levels, while red represents positive hybridization signals. The color darkness indicates differences in hybridization signal intensity. The columns correspond to the hybridization patterns obtained with Cy5-labeled genomic DNA from the following species: Pseudomonas sp. G179, Bacillus methanolicus F6-2, Azoarcus tolulyticus Td-21, A. tolulyticus Td-1, P. stutzeri B2-2, P. stutzeri E4-2, Pseudomonas stutzeri ATCC 17587, an unknown α-proteobacterium C1-4, Shewanella algae BrY, S. oneidensis MR-1, Shewanella oneidensis MR-4, Halomonas variabilis B9-12, and Marinobacter sp. 2–25.

Genome differences among different species within different genera were also clearly revealed by CGA-based hybridzation. For instance, all Azoarcus strains showed very strong hybridization to the labeled Azoarcus reference strains (Td-1, Td-21) but weaker hybridization to other labeled reference strains (Figure 1). Within the Azoarcus genus, three subclusters were formed (Figure 1, branch A). A. tolulyticus strains Td17 and Td21 grouped together as the first subcluster (Figure 1, b1); Td1, Td2, BL-11, Td15 formed the second subcluster (Figure 1, b2), whereas Td3 and Td19 grouped together as the third subcluster (Figure 1, b3). Although some strains of Azoarcus tolulyticus could not be unambiguously resolved under hybridization conditions of 55 °C and 50% formamide, they could be differentiated at higher temperatures (65 and 75 °C with 50% formamide) (data not shown) (Wu et al., 2004). These results are consistent with those based on conventional DNA–DNA hybridization and/or SSU rRNA gene analysis (see below).

Similarly, all Pseudomonas strains showed very strong hybridization to the labeled P. stuzeri strains (B2-2, E4-2 and ATCC 17587), but not to other labeled reference strains (Figure 1). All Pseudomonas stutzeri strains clustered together (Figure 1, branch C). Interestingly, the strain P. balearca DSM 6038 (Figure 1, branch C) formed a distinct cluster with all P. stutzeri strains, and this strain was originally classified as a Pseudomonas stutzeri strain (Rossello et al., 1991). Four subclusters were observed within P. stutzeri (Figure 1, c1–c4). ATCC strains 17595, 17592 and 17587 as well as marine isolates B2-2, E4-2 and F9-2 grouped within the first subcluster (Figure 1, c1), while ATCC strains 17594, 11256, 27591 and DNSP21 formed the second subcluster (Figure 1, c2). P. stutzeri ATCC 50238, P. balearica DSM 6083 and marine isolate C5-1 formed the third subcluster (Figure 1, c3). P. stutzeri ATCC 50238 showed relatively higher hybridization to the reference strain ATCC 17587, while C5-1 as well as P. blearica DSM 6038 showed weaker hybridization to this reference strain (Figure 1), indicating that the marine isolate C5-1 is less related to P. stutzeri than it is to P. balearica DSM 6038. The two isolates D7-6 and D8-12 from the same marine environment formed the fourth subcluster (Figure 1, c4). Both strains showed very strong hybridization with the reference strain ATCC 17587 (Figure 1). Interestingly, while strain D8-12 showed weaker hybridization to the labeled reference strains (B2-2, E4-2), strain D7-6 had stronger hybridization signals to these reference strains (Figure 1), indicating the existence of some genetic differences among these strains. The above results are consistent with available DNA–DNA hybridization data using conventional approaches (see below).

All Shewanella strains hybridized relatively well to the labeled Shewanella reference strains (BrY, MR-1 and MR-4) (Figure 1). S. algae strains BrY and OK-1, S. oneidensis MR-1, and Shewanella sp. MR-4 showed very strong hybridization with their corresponding labeled reference strains, while all other Shewanella strains examined showed weak hybridization (Figure 1), but their hybridization signals were stronger than those with non-Shewanella strains. These results indicated the relatively high genome diversity of the Shewanella strains examined. Within the Shewanella genus, MR-1 is more closely related to MR-4 as expected, while S. algae strain BrY is more closely related to strain OK-1. S. woodyi strain MS32 and S. amazonensis SB2B formed a separate subcluster. Most of the isolates from marine sediments (Stapleton et al., 2005) formed a distinct subcluster except for Shewanella sp. A8-3. In addition, the three Marinobacter sp. strains, 2-25 (Braker et al., 2000), C10-5 and D5-10, formed a distinct cluster (Figure 1), which is consistent with SSU rRNA gene-based analysis.

In summary, all of the above results indicated that the CGA-based hybridization could be used to reliably visualize the relationships among different microorganisms at the genus and species levels. However, it could be difficult to use CGA-based hybridization to provide reliable classification for distantly related species at higher taxonomic levels. For instance, as illustrated in Figure 1, the Bacillus strains more closely grouped with Pseudomonas strains than with Shewanella strains. This is consistent with the conventional whole-genome DNA–DNA hybridization approach.

Comparison of the CGA-based hybridization to conventional DNA–DNA hybridizations

To determine whether the species relatedness revealed by the CGA-based whole-genome hybridization is similar to that by traditional DNA–DNA reassociation approaches, DNA similarities determined by the CGA-based hybridization were compared to previously reported DNA–DNA hybridization data (Rossello et al., 1991; Song et al., 1999; Venkateswaran et al., 1999; Sepulveda-Torres et al., 2001; Cladera et al., 2004). Detailed DNA similarity data among Pseudomonas strains with various reference strains, P. stutzeri KC, P. chlororaphis ATCC17811, P. putida ATCC12633, P. fluorescens 13525 and P. auruginosa 15692, were also listed (Supplementary Table_SM2). Significant overall linear relationships were observed between DNA–DNA reassociation values and similarities based on CGA-based hybridization data (n=43, r=0.82, P<0.0001; Figure 2a) among all strains examined, thus suggesting that the resolving power of CGA-based hybridization could be comparable to that of traditional whole-genome DNA–DNA hybridization. The 70% DNA–DNA reassociation value corresponds to about 62% of similarity based on CGA hybridization data under the conditions examined (Figure 2a). However, the majority of the DNA–DNA hybridization data points have less than 40% similarity. A reliable linear relationship did not extend above a similarity of 40% based on the traditional DNA–DNA hybridization method, which corresponds to about 50% similarity obtained using the CGA-based hybridization method under the conditions examined.

Figure 2
figure 2

Relationship between DNA similarities (%) obtained from the CGA-based hybridization and DNA similarities (%) reported previously using different methods. (a) Overall comparison was made for DNA similarities by the microarray method to all available DNA similarities by conventional methods (DNA differences were transformed to similarities.). Then, DNA similarities by the microarray method was compared to DNA differences (b) by hydroxyapatite and dot filter methods [45] for reference strain P. tutzeri 17587 (identical to 17591), and to DNA similarities by (c) membrane filter methods, for reference strains P. aeruginosa 15692, P. fluorescens 13525, P. putida DSM 4467 and P. stutzeri KC, and (d) S1 nuclease method for reference strain A. tolulyticus Td-1 and Td-21.

When the comparisons were made among individual groups, even higher significant linear relationships (r=0.9–0.99) were obtained. DNA similarities from the CGA-based whole-genome DNA–DNA hybridization for various Pseudomonas strains were consistent with those determined using hydroxyapatite and/or the dot filter method (Supplementary Table_SM2). For example, differences in ΔTm derived from DNA–DNA hybridization with strain 17591 using the hydroxyapatite and/or dot filter method (Rossello et al., 1991) correlated well with microarray-based similarities (n=7, r=−0.92, P<0.01: Figure 2b). The CGA-based hybridization revealed that the strains 17592 and 17595 were 73.5% and 78.2% similar, respectively, to reference strain 17587 (Supplementary Table_SM2). This is consistent with previously reported DNA differences (0.5 and 1 °C ΔTm, respectively) of these strains compared to strain 17591, which is identical to the reference strain 17587 (Rossello et al., 1991) used in this study. Generally, the 70% DNA–DNA reassociation value corresponds to 5 °C ΔTm. As shown in Figure 2b, the data suggested that the 5 °C ΔTm is equivalent to 57% similarity from CGA-based hybridization. In addition, significant correlations of DNA similarities (r=0.90, P<0.01; Figure 2c) were observed between the CGA-based approach and the traditional membrane filter method for the strains P. fluorescens 13525, P. stutzeri KC, P. auruginosa 15692 and P. putida 12633. However, the DNA similarities derived from microarray hybridization were also somewhat lower than those from the membrane methods (Supplementary Table_SM2). Similarly, the 70% DNA–DNA reassociation values for these strains corresponds to about 57.5% of similarity based on CGA hybridization data under the conditions examined (Figure 2c).

Phenotypic and genotypic heterogeneity among P. stutzeri strains has been documented in previous studies (Palleron et al., 1970). Actually, Rossello et al. (1991) divided P. stutzeri into seven clear-cut genomic groups with significant DNA differences. The DNA similarities derived from CGA-based hybridization agreed with Rossello's classification of the P. stutzeri strains. As shown in Figure 1, the genetic group II (17595, 17592 and 17587) strains in P. stutzeri are distinctly separated from group I strains (ATCC strains 17594, 11256). In addition, ATCC strain 27591 was not included when various P. stutzeri strains were classified into different genetic groups (Rossello et al., 1991). However, the phylogenetic analysis based on gyrB sequence similarities showed a close relationship with genetic group I (Cladera et al., 2004). According to the DNA similarities from the CGA-based hybridizations, ATCC strain 27591 also clustered with genetic group I strains (ATCC strains 17594 and 11256) (Figure 1, c2). Finally, CGA-based hybridization supports the previous results that strain KC should belong to a new species other than P. stutzeri (Sepulveda-Torres et al., 2001), because its CGA hybridization-based DNA similarities to all other P. stutzeri strains were relatively low, ranging from 12.8% to 47.8% (Supplementary Table_SM2 and 3).

The CGA hybridization-derived DNA similarities matched well with the DNA reassociation values using the S1 DNA nuclease method for different A. tolulyticus strains (r=0.99, P<0.0001; Figure 2d and Table 1). CGA-based hybridization revealed 89.7% similarity of Td-17 to the reference strain Td-21, which is very similar to that (89%) determined by the S1 method. The DNA similarities from CGA-based hybridization of Td-2 and BL-11 with the reference strain Td-1 were 97.4% and 55%, respectively, which are consistent with the reassociation values derived from S1 method (99% and 56%, respectively). The microarray-based DNA similarities between the other Azoarcus strains and the reference strains (Td-1 and Td-21) ranged from 15% to 35%, which also agreed well with the corresponding DNA–DNA reassociation values (Table 1). The 70% DNA–DNA reassociation values for the Azoarcus strains correspond to about 69% similarity based on CGA hybridization data under the conditions examined (Figure 2d).

Table 1 Comparison of DNA similarities from microarray hybridization and traditional methods for Azoarcus strains

The DNA similarities obtained from CGA-based whole-genome DNA–DNA hybridization supported a different classification of the A. tolulyticus strains capable of aromatic compound utilization under denitrifying conditions into different genomic groups or new species (Zhou et al., 1995; Song et al., 1999). The strains Td-17 and Td-21 were considered to be well separated from genomic group 1 of A. tolulyticus and were reclassified as a new species (that is, A. toluvorans) based on the DNA similarities measured by DNA–DNA hybridization using S1 method and SSU rRNA gene similarities (Song et al., 1999). The CGA-based hybridization is able to delineate these differences as well. As shown in Figure 1, strain Td-17 and Td-21 were distinctly separated from Td-1.

Comparison of the CGA hybridization-derived similarities to SSU rRNA gene sequence similarities

The SSU rRNA genes are commonly used as markers for establishing phylogenetic relationships among the Bacteria and Archaea. To further understand whether the CGA-based hybridization could be used to determine species relatedness, the relationships between the CGA hybridization-derived similarities and the nucleotide sequence similarities of SSU rRNA genes were examined. Overall, a significant linear relationship was obtained between the DNA similarities derived from the CGA-based hybridization and the similarities based on SSU rRNA gene sequences (n=54, r=0.79, P<0.0001; Figure 3a). Such linear relationships became lower for those microorganisms exhibiting SSU rRNA gene sequence similarities of >92% (n=47, r=0.77, P<0.0001, Supplementary Table_SM3). This is because the SSU rRNA gene sequences of the A. tolulyticus strains with distant DNA relatedness are almost identical (Song et al., 1999). However, the relationships between DNA similarities derived from the CGA-based hybridization and those from SSU rRNA genes vary considerably among different individual reference strains (Supplementary Table SM_3). Also, the tree topology based on the CGA-hybridization data is more or less congruent with that based on SSU rRNA gene sequences (Supplementary Figure SM1 and Figure 1), although noticeable differences were observed at high taxonomic levels and fine-scale strain levels. These results also indicated that the percent similarity value based on the CGA hybridization could be a good predictor of fine-scale phylogenetic distances among different microorganisms.

Figure 3
figure 3

Relationship between DNA similarities by the CGA-based hybridization and percent similarity in nucleotide sequences for SSU rRNA and gyrB genes and fingerprinting patterns. DNA similarities was compared to (a) the SSU rRNA gene sequence similarities in the range of >90%; (b) the gyrB gene sequence similarities in the range of >70%; and (c) the REP-PCR similarities (combined REP-PCR and BOX-PCR), for the closely related strains.

Previous studies showed that the 70% DNA–DNA reassociation value corresponds roughly to 97% homology based on SSU rRNA genes (Stackebrandt and Goebel, 1994). This value is generally used to define a new procaryotic species based on SSU rRNA gene sequences (Venter et al., 2004; Harayama and Kasai, 2006). From the data available in this study, the 97% homology of SSU rRNA genes corresponds to about 47% similarity based on CGA hybridization data under the conditions examined (Figure 3a).This result also indicated that the 97% homology of SSU rRNA genes as a cutoff for defining new species is a very conservative estimation.

Comparison of DNA similarities from CGA-based hybridizations to gyrB gene sequence similarities

Although SSU rRNA gene sequences are widely used as a phylogenetic marker for bacterial systematics and ecology, it is difficult to use them to obtain fine-scale resolution at species and strain levels (Yamamoto and Harayama, 1995), because the rate of evolution of this molecule is low. An alternative phylogenetic marker, gyrB, which encodes the β-subunit of DNA gyrase, has been widely used for differentiating closely related species/strains (Yamamoto and Harayama, 1995, 1996, 1998; Yamamoto et al., 1999; Rodrigues et al., 2003; Yan et al., 2003; Holmes et al., 2004; Maeda et al., 2006). A phylogenetic tree based on gyrB sequences results in a 10-fold greater resolution than a tree based on SSU rRNA genes. To further understand whether the CGA-based hybridization could be used to determine species relatedness, the relationships between the CGA hybridization-based similarities and the nucleotide sequence similarities of gyrB genes were also compared at the species/strain levels. Significant overall linear relationships (n=122, r=0.95, P<0.0001) between the CGA hybridization-based similarities and the gyrB gene sequence similarities were observed for strains that exhibited >70% similarities of gyrB gene sequences (Figure 3b). The comparisons were also made for the hybridizations with individual reference strains. All correlations (n=13–17, r=0.88–0.96, P<0.01; Supplementary Table_SM3) were significant, although such correlations vary among different reference strains. Based on the data available in this study, the 70% DNA–DNA reassociation values for various Shewanella and Pseudomonas strains corresponds to about 94% homology based on gyrB genes, which is also equivalent to 71% similarity based on CGA hybridization data under the conditions examined (Figure 3b).

The phylogenetic trees based on gyrB gene sequences for Shewanella and Pseudomonas strains were constructed (Figure 4). The tree topology based on gyrB sequences is comparable to that derived from the CGA-based hybridizations. For instance, in the gyrB gene tree, Pseudomonas sp. G179 is distant from other Pseudomonas strains, and a similar result was obtained with the CGA-based DNA similarity trees (Figure 1). Pseudomonas stutzeri strains 17595, 17587 and 17592 grouped in the same cluster, which is also found in the DNA similarity tree. Pseudomonas stutzeri strain 27951 grouped in the same branch with strain 11256, as revealed by both the gyrB gene and DNA similarity tree. In both gyrB gene and DNA similarity tree, S. algae strains BrY and OK-1 grouped together. Also both trees revealed the close relationship between S. oneidensis MR-1 and Shewanella sp. MR-4 (Figures 1 and 4). Most of the Shewanella marine isolates clustered in a similar way in both the gyrB gene and DNA similarity trees, although there were fine differences (Figures 1 and 4).

Figure 4
figure 4

Phylogenetic relationship of gyrB gene nucleotide sequences of Shewanella and Pseudomonas strains (partial, 780 bp). The dendrogram was generated by phylogenetic distance analysis with a neighbor-joining algorithm with Azoarcus sp. Td-15 as the outgroup. Values indicate the percentage of 100 replicate trees supporting the branching order. Bootstrap values below 50 were omitted. Scale bar, 10 mutations per 100 sequence positions. The percentage and color indicate the nucleotide sequence similarities among the strains in groups.

Comparison of CGA-based similarities to those derived from genomic fingerprinting (REP-PCR and BOX-PCR)

REP-PCR and BOX-PCR (repetitive DNA sequence of the BOXA subunit of the BOX element of Streptococcus pneumoniae) are commonly used methods for differentiating bacterial strains (Versalovic et al., 1994; Rademaker et al., 2000). To further understand whether different bacterial strains could be differentiated using CGA-based hybridization, the relationship between the CGA hybridization-derived similarities and the similarities based on combined REP-PCR and BOX-PCR fingerprint patterns were examined for closely related Pseudomonas, Shewanella and Azoarcus strains. Similarly, a linear relationship was obtained for these two methods when the REP-PCR and BOX-PCR values were above 60% (n=65, r=0.82, P<0.0001) (Figure 4c). The linear relationship was stronger for the strains within species (n=65, r=0.82, P<0.0001, Figure 3c) than for relationships within a genus (n=76, r=0.73, P<0.01, data not shown). The results suggested that the CGA-based hybridization could distinguish closely related bacterial strains of Pseudomonas, Shewanella and Azoarcus species. Based on the data available in this study, the 70% DNA–DNA reassociation values for various Shewanella, Pseudomonas and Azoarcus strains corresponded to about 87% similarity based on genome fingerprinting, which is also equivalent to 71% similarity based on CGA hybridization data under the conditions examined (Figure 3c).

Discussion

Determination of procaryotic species and the degree of their relationship is a great challenge for microbiologists. In the last 50 years, many different molecular methods, including whole-genome DNA–DNA hybridization, SSU rRNA sequencing, multiple locus sequencing of protein encoding genes (for example, gyrB, recA) and average nucleotide identity, have been proposed for delineating bacterial species. Although the SSU rRNA gene-based method is a valuable, convenient and rapid tool for the determination of the phylogenetic relationships among different microorganisms, it provides poor resolution at the species and subspecies levels (Yamamoto and Harayama, 1998). Also, many procaryotic species have virtually identical SSU rRNA gene sequences but only have 25% DNA similarity (Stackebrandt and Goebel, 1994). Thus, the SSU rRNA analysis method is not a valid approach for determining species/strain relationships. Protein-coding genes could provide high resolution for species/strain determination, but the difficulty in using sets of protein coding genes for phylogenetic evaluation lies in selecting appropriate gene targets and designing amplification primers useful for large sets of microorganisms (Harayama and Kasai, 2006). In addition, phylogenetic analyses based on complete microbial genome sequences are possible, but the likelihood that all the sequenced genomes needed for comparison will be available is not feasible (Brutlag, 1998). Finally, whole-genome sequence analysis is a powerful approach for resolving the major problems of evolution, phylogeny and systematics of living organisms, but its use in general taxonomic studies is not currently practical (Tourova, 2000). Such analyses will require larger genomic data sets and more carefully designed sampling of natural populations (Konstantinidis and Tiedje, 2007). The average nucleotide identity of the shared genes between two strains was proposed to be a robust approach to determine genetic relatedness among different strains (Konstantinidis and Tiedje, 2004). The average nucleotide identity value of 94% corresponded to the traditional 70% DNA–DNA reassociation standard of the current species definition. Although the average nucleotide identity approach is simple, it still relies on the availability of whole-genome sequences and hence it will have a limited use. Nevertheless, whole-genome DNA–DNA hybridization is still considered to be the cornerstone for bacterial species determination and will have to be used to circumscribe procaryotic species (Rossello-Mora, 2006).

The development and application of microarray-based genomic technology for microbial detection and community analysis have received a great deal of attention. Because of its high-density and high-throughput capacity, it is expected that microarray-based genomic technologies will revolutionize the detection, identification and characterization of microorganisms. Therefore, in this study, we have developed the CGA-based hybridization approach for determining species relationships. Experimental comparisons of the CGA hybridization-based results with available traditional DNA–DNA hybridization data, SSU rRNA and gyrB gene sequences and genome fingerprinting methods indicate that the CGA-based hybridization could be a useful alternative to the traditional whole-genome DNA–DNA hybridization approaches for determining procaryotic species relationships.

Overall, DNA similarities from the CGA-based hybridizations were comparable to those from various conventional whole-genome DNA–DNA hybridization approaches. When the actual values for genome relatedness were compared between different methods, the results from the CGA-based hybridization were more consistent to those from the S1 method than the membrane filter method (Goris et al., 1998), as indicated by smaller average differences (<15%) between the similarity values derived from CGA-based hybridization and S1 methods for A. tolulyticus strains. DNA similarities from the CGA-based hybridizations for reference strain P. stutzeri ATCC 17587 also matched well with the ΔTm values of the P. stuzeri strains by hydroxyapatite and/or dot filter method with the reference strain P. stutzeri ATCC 17591 (identical to 17587). However, DNA similarities from CGA-based hybridizations with multiple Pseudomonas strains were significantly lower than those from membrane filter methods, although strong linear relationships were observed. One possible explanation might be related to differences in hybridization stringency. Higher similarity values are expected if the hybridization is carried out at relatively low stringent conditions. The hybridization conditions may need to be optimized for different target microbial groups by considering GC content and genome size. It is also important to point out that DNA similarities determined by the traditional hybridization methods vary significantly among different methods (Goris et al., 1998).

Although significant relationships between the CGA hybridization-based similarities and those from SSU rRNA genes, gyrB genes or genomic fingerprinting were observed, the degree of correlations is considerably different. The correlations of the CGA hybridization-derived similarities to gyrB sequences are stronger than those to SSU rRNA sequences and genomic fingerprinting. It appears that the taxonomic resolution of the CGA-based hybridization is similar to or slightly higher than gyrB sequence analysis. While SSU rRNA sequence analysis can provide reliable information about species relationships at higher taxonomic levels (for example, genus or above), whole-genome DNA–DNA hybridizations are useful in providing insight into phylogenetic relationships at the species/strain levels. For example, the SSU rRNA gene sequence similarities among the A. tolulyticus strains tested in this study are all over 98%–100%; however, the DNA similarities determined by both the CGA-based method and S1 method varied across a wide range (15.2%–93.9% for the CGA method, 18%–99% for S1 method) (Song et al., 1999). Owing to differences in resolving phylogenetic relationships, integrating CGA-based hybridization with SSU rRNA and gyrB gene sequence analysis could provide a reliable, rapid approach for delineating procaryotic species relationships.

Genome fingerprinting analysis is suitable for the elucidation of strain-level relationships (Versalovic et al., 1994), and was shown to be highly correlated to DNA–DNA reassociation values for xanthomonads (Rademaker et al., 2000). In this study, strong correlations between CGA hybridization-based similarities and the similarities derived from fingerprinting approaches were observed among closely related strains of P. stutzeri and A. tolulyticus, but not among distantly related species from Pseudomonas, Azoarcus or Shewanella genera. The genomic fingerprinting similarity values above 60% correlated well with CGA hybridization values for the tested P. stutzeri and A. tolulyticus strains. These results indicated that CGAs could provide meaningful insight into relationships between closely related strains. But the power of phylogeny to resolve relationships at the strain level will be lower using CGA-based hybridization than genome fingerprinting approaches. For instance, based on the CGA hybridization results, A. tolulyticus Td-3 was not separated from Td-19, and P. stutzeri DNSP21 (genetic group IV) could not be separated from P. stutzeri ATCC 11256 (genetic group I) (Figure 1), but they were well separated based on genome fingerprinting methods. Lower resolution was also observed for some Shewanella species (data not shown).

Compared to the traditional DNA–DNA reassociation approach, CGAs have several advantages for the determination of species relatedness, including high-throughput capacity, parallel analyses and quantitation. CGA hybridization differs from membrane filter-based hybridization approaches in that the non-porous surface has advantages of miniaturization, hybridization kinetics, sample volume, reagent absorption, signal detection approaches and reproducibility (Schena and Davis, 2000). The capability of accurate and precise miniaturization with robots on non-porous substrates with the use of fluorescence-based detection offers significant advantages. In addition, multiple pairwise comparisons can be done with smaller amounts of genomic DNA (that is, 1 μg). This is important for determining the relationships between procaryotic species that are difficult to cultivate. CGAs could provide a high-throughput means for rapid identification of microbial species/strains. Because of its high capacity, one can construct a CGA containing bacterial type strains plus appropriately related strains. By hybridizing genomic DNA from unknown strains with this type of microarray, one should be able to quickly and reliably identify unknown strains, provided a suitably related probe is on the array. Generally, SNRs for hybridizations with perfect match DNAs are significantly higher than those with mismatch DNAs from other strains of the same species (Wu et al., 2004). Thus, species identification can be achieved based on the differences in hybridization intensity. However, as a low level of cross-hybridization could occur among different strains, establishing appropriate SNR thresholds to differentiate self-hybridization, cross-hybridization and background hybridization among different strains should be useful. In addition, when using CGAs for species identification, lower stringent hybridization conditions (for example, 42 °C and 50% formamide) should be used first to ensure that good hybridization signals can be obtained for distantly related target species.

Microbial diversity is extremely high and the majority of microorganisms are as-yet uncultivable. This could be a limitation in using CGA-based hybridization to determine species relatedness of uncultured microorganisms. However, the CGA-based hybridization itself does not require culturing. With the recent advances in environmental genomics, high-molecular-weight DNA from uncultivated microorganisms could be accessed through bacterial artificial chromosomes or fosmid cloning. High-molecular-weight bacterial artificial chromosomes/fosmid clones could also be used to fabricate CGAs, thus allowing the determination of relationships of target strain/clones to the uncultivated components of a complex microbial community. Because the size of bacterial artificial chromosomes/fosmid clones is generally 50- to 200-fold less than that for an entire genome, it is expected that microarrays fabricated with high-molecular-weight bacterial artificial chromosomes/fosmid clones should have similar performance characteristics as CGAs.