Introduction

Bradyrhizobium japonicum is a symbiotic nitrogen-fixing soil bacterium that has the ability to form root nodules on soybeans. Genotypic and phenotypic variations of B. japonicum strains have been reported in terms of DNA fingerprints (Hartmann et al., 1992; Minamisawa et al., 1999), internal transcribed spacer (ITS) sequences between 16S and 23S rDNA (van Berkum and Fuhrmann, 2000), uptake hydrogenase (van Berkum, 1990), denitrification (Sameshima-Saito et al., 2006), symbiotic associations (Ishizuka et al., 1991) and nitrogen fixation (Basit et al., 1991).

A notable feature of the genome of B. japonicum strain USDA110 (9.1 kb) is the existence of a large genomic island (GI), termed a ‘symbiosis island’ (681 kb), carrying a cluster of symbiotic genes with lower GC contents, which is structurally inserted into a val-tRNA gene on the genome (Kaneko et al., 2002). Besides the symbiosis island, 14 smaller GIs (4–97 kb) were found on the genome (Kaneko et al., 2002). However, little is known about the genomic diversity in the strain variations of B. japonicum and its correlation with the presence of GIs.

Genomic islands are horizontally acquired DNA regions that are usually inserted in the vicinity of tRNA genes on chromosomes and flanked by direct repeat sequences (Dobrindt et al., 2004). The GC content of GIs often differs from that of the core genomes of bacteria, and GIs harbor several functional genes that encode for proteins involved in pathogenicity, xenobiotic degradation, iron uptake, antibiotic resistance, secondary metabolism or symbiosis (Dobrindt et al., 2004).

Bradyrhizobium japonicum is a member of the family Bradyrhizobiaceae, which belongs to the order Rhizobiales in the Alphaproteobacteria (Gupta and Mok, 2007). Members of Bradyrhizobiaceae include a number of nonsymbiotic bacteria with diverse biochemical functions such as photosynthesis (Molouba et al., 1999; Larimer et al., 2004; Giraud et al., 2007), oligotrophy (Saito et al., 1998; King, 2007), 2,4-dichlorophenoxyacetic acid degradation (Kamagata et al., 1997) and nitrification (Starkenburg et al., 2006).

Comparative genomic hybridizations (CGHs) by DNA micro- and macroarrays have been employed to reveal the evolution and function of pathogenicity in genomic terms in organisms such as Escherichia coli (Dobrindt et al., 2003; Carter et al., 2008), Yersinia pestis (Hinchliffe et al., 2003), Yersinia pseudotuberculosis (Zhou et al., 2004), Xylella fastidiosa (Koide et al., 2004), Campylobacter jejuni (Pearson et al., 2003), Streptococcus agalactiae (Brochet et al., 2006) and Streptococcus pneumoniae (Obert et al., 2006). The results of these studies indicate that mobile genetic elements such as phages, transposons and GIs contribute to pathogenicity acquisition and environmental adaptation. The CGH approach is suitable for efficient determination of the global genome variations among target bacterial populations, although the comparison has been limited to the original gene repertoires in template genomes. In rhizobia, DNA micro- and macroarrays have been used mainly for global gene expression analysis of B. japonicum (Chang et al., 2007; Pessi et al., 2007; Brechenmacher et al., 2008; Wei et al., 2008), Mesorhizobium loti (Uchiumi et al., 2004) and Sinorhizobium meliloti (Becker et al., 2004). CGH analyses of four natural isolates of S. meliloti by microarray revealed a significant fraction of variable genes including transposease and unknown genes on pSymA megaplasmid (Giuntini et al., 2005). However, there is no other report for array-based CGH analysis of rhizobia including B. japonicum.

Our aims were to clarify the variations in the genomes of B. japonicum and other members of the family Bradyrhizobiaceae and to determine the involvement of these genomic variations in the symbiotic phenotypes of B. japonicum strains.

Materials and methods

Bacterial strains and media

The strains used are listed in Table 1. B. japonicum, B. elkanii, Agromonas oligotrophica, G14130 and Bradyrhizobium sp. HWK12 and HW13 were grown aerobically at 30 °C in HM salt medium (Nieuwkoop et al., 1987) supplemented with 0.1% arabinose and 0.025% Difco yeast extract (Becton, Dickinson and Company, Sparks, MD, USA). The other strains were grown aerobically at 30 °C in Difco Nutrient broth (Becton, Dickinson and Company). Total bacterial DNA was prepared from cultured cells as described earlier (Minamisawa et al., 2002).

Table 1 Bacterial strains used

Comparative genomic hybridization

Total DNA was sonicated for 30 s with an Ultrasonic Cleaner Vs-25 (As One, Osaka, Japan). Twenty-five nanograms of fragmented DNA was labeled with [α-33P] dCTP (2500 Ci mmol−1; Amersham Biosciences, Pittsburgh, PA, USA) by using a Rediprime II random primer labeling system (Amersham Biosciences). DNA macroarray of B. japonicum USDA110 (Ito et al., 2006; Wei et al., 2008) was used for CGH analyses. The macroarray contains 3960 spots on a nylon membrane, on which 2.7-kb DNA segments (on average) of USDA110 brb libraries (Kaneko et al., 2002) were mainly spotted. Hybridization was carried out as described earlier (Uchiumi et al., 2004; Ito et al., 2006; Wei et al., 2008). The membrane was washed twice for 15 min at 55 °C in 2 × saline sodium citrate (SSC) containing 0.1% sodium dodecyl sulfate and twice for 15 min at 55 °C in 0.1 × SSC (0.2 × SSC was used for bacteria other than B. japonicum) containing 0.1% sodium dodecyl sulfate (1 × SSC is 0.15 M NaCl plus 0.015 M sodium citrate). Image acquisition and data analysis were performed as described earlier (Uchiumi et al., 2004; Ito et al., 2006; Wei et al., 2008). At least three sets of biologically independent array analyses were performed for each strain. Signal values were normalized against total signal values. After normalization, the signals of the tested strains were calculated as ratios of the signal of USDA110. Principal component analysis of the CGH profile was performed with CANOCO (version 4.5 for Windows; Microcomputer Power, Ithaca, NY, USA). The default parameters were used except for intersample scaling to generate ordination plots based on the scores of the first two principal components. Signal differences among respective spots were confirmed by t-test (P0.05) between the tested strain and USDA110.

PCR amplification and sequence analysis

Twelve primer pairs were designed for PCR amplification of the boundary regions of the GIs observed in the USDA110 genome (Supplementary Table 1). LA Taq polymerase (Takara, Osaka, Japan) was used for PCR amplification. The reaction mixture was first incubated at 94 °C for 1 min; then subjected to 14 cycles of 98 °C for 20 s and 68 °C for 20 min; 16 cycles of 98 °C for 20 s and 68 °C for 20–24 min (15-s increase per cycle); and finally to incubation at 72 °C for 10 min. Amplified DNA fragments were separated on agarose gels and purified by using a Wizard SV Gel and PCR Clean-Up System (Promega, Madison, WI, USA). Direct sequencing was carried out by using the PCR products as templates and the PCR primers with an ABI PRISM 310 DNA sequencer and Big Dye Terminator v.3.1 Cycle Sequencing Kit (Applied Biosystems, Foster City, CA, USA). PCR amplification and sequence analysis of 16S-23S rDNA ITS sequences were performed as described by Saeki et al. (2005, 2006).

Phylogenetic analysis

Table 1 lists the DDBJ/GenBank/EMBL accession numbers for the 16S rRNA genes and ITS sequences between 16S- and 23S-rRNA that are used in this study. For the phylogenetic analysis, the neighbor-joining method and Clustal W were used as described earlier (Saito et al., 2008).

Determination of fixed nitrogen

Soybean (Glycine max (L.) Merr. ‘Enrei’) was cultivated in a greenhouse at Hiroshima University. Pots (3 l) containing granitic regosol, perlite and peatmoss at 2:1:1 (v v−1) were sterilized by autoclave, and then B. japonicum cells cultured in yeast extract-mannitol liquid medium (Jordan, 1984) at 30 °C for 7 days were mixed into the pots (106 cells per seed). One week later, five surface-sterilized soybean seeds were planted in each pot. For cultivation, sterilized inorganic nutrient solution was supplied as described earlier (Masuda et al., 1989). Plants were harvested 56 days after seed planting. Plants were carefully sampled from the pots and dissected into leaves, stem, roots and nodules. The plant samples were dried individually at 80 °C, weighed and then ground with a vibration sample mill (Model Tl-100; Heiko Co. Ltd, Iwaki, Fukushima, Japan) for nitrogen analysis. The powdered samples were digested with sulfuric acid and acid mixture to quantify total nitrogen (TN) by the Kjeldahl method. The amount of TN was calculated from the sum of nitrogen amounts in each part (leaves, stem, roots and nodules). Fixed nitrogen per plant (FN) was calculated by subtracting the amount of TN in uninoculated plants from that in the plants inoculated with each strain.

Correlation and multiple regression analyses

Correlation analyses between genomic variable regions and host plant parameters were performed with Excel software (Microsoft Corporation, Redmond, WA, USA). To examine the effect of each profile on the phenotypes of the inoculated soybeans, we conducted multiple regression analyses by using R 2.0.1 (Ihaka and Gentleman, 1996; available at http://www.R-project.org). Profiles A, B, C and D were all included in the analyses as candidate-independent variables (for ease of notation, we assigned the numbers 1, 2, 3 and 4 to profiles A, B, C and D, respectively). For each profile, we tested a model wherein the profile was included as an independent variable and a model where the profile was excluded. That is, the observed phenotypic value of strain i (i=1, 2, …, 9), yi, can be described by the linear model

where b0 is an intercept, bj is the effect (that is, coefficient) associated with profile j, ei is an error term that follows a normal distribution and xij denotes the results of genomic Southern analysis of profile j for strain i and is defined by 1 for a positive result and 0 for a negative result. δj is an indicator variable where δj=1 corresponds to the case in which the profile j (j=1, 2, …, 4) is included in the model and δj=0 implies exclusion. All possible models were tested (that is, a total of 24=16). Models were selected in accordance with the Akaike Information Criterion (Akaike, 1973). That is, the model that showed the minimum Akaike Information Criterion was selected as the best model. The significance of the best model and the significance of each effect included in the best model were evaluated by F-test and t-test, respectively.

Expression analysis

For expression analysis by macroarray, surface-sterilized soybean seeds (G. max ‘Enrei’) were germinated in sterile vermiculite for 2 days at 25 °C and transplanted into a Leonard jar (Leonard, 1943; Trung and Yoshida, 1983) that contained sterile vermiculite and nutrient solution (Minamisawa et al., 2002). B. japonicum was inoculated at 109 cells per seed. Plants were grown in a phytotron (Koitotron type KC; Koito Industries, Tokyo, Japan) for 28 days with a day temperature of 25 °C for 16 h and a night temperature of 23 °C for 8 h; daylight was supplied at a photon flux density of 277 μmol m−2 s−1. Total RNA of bacteroids (isolated from 2 g of ‘Enrei’ nodules) and free-living cells (cultured in 100 ml of Yeast Mold medium) was prepared by the hot phenol method as described earlier (Ditta et al., 1987; Uchiumi et al., 2004). Bacteroid preparation, cDNA labeling, hybridization, image acquisition and data analysis were performed as described earlier (Uchiumi et al., 2004; Ito et al., 2006; Wei et al., 2008).

Nucleotide sequence accession number and microarray data

DNA sequences determined in this study were deposited under the following accession numbers in the DDBJ DNA database: boundary regions around GIs, AB282934AB282955; and ITS sequences, AB278125 (NC4), AB278126 (NC6), AB278127 (NK2), AB278128 (T7) and AB278129 (T9). The CGH and expression data are available at the MacroArray Analysis of Bradyrhizobium japonicum website (http://orca10.bio.sci.osaka-u.ac.jp/array02/).

Results

Conditions and presentation of CGH analysis

Total DNAs extracted from the respective strains (USDA110, USDA122, USDA124, USDA6, NC4, NC6, NK2, T7, T9, S58, G14130, BTAi1, ORS278, HW13, HWK12, CGA009 and USDA76 in Table 1) were labeled with radioactive 33P and were hybridized to DNA macroarray membranes of USDA110. Unlike in the nine strains of B. japonicum, weak signals were detected in CGH analysis of the other eight strains, S58, G14130, BTAi1, ORS278, HW13, HWK12, CGA009 and USDA76 (referred to here as the ‘non-Bj’ strains). This is probably because the genomes of the non-Bj strains were less similar to the USDA110 genome than were the genomes of the B. japonicum strains. Therefore, a low-stringency second wash (0.2 × SSC) was adopted for the non-Bj strains after hybridization, as described in the Materials and Methods, whereas B. japonicum strains were subjected to a normal second wash (0.1 × SSC). The signal ratios of 3960 probes in the tested strains were calculated as compared with those of USDA110, and the ratios along with the genomic positions of USDA110 were plotted as CGH profiles (Figure 1).

Figure 1
figure 1

Comparative genomic hybridization (CGH) analysis of 17 strains of Bradyrhizobiaceae. Horizontal axis indicates array probes arranged in order on the USDA110 genome, and the vertical axis in the CGH window (lanes 1–17) shows the signal ratio for each strain (signal of tested strain versus signal of USDA110). Lanes 1–9 show the CGH profiles of Bradyrhizobium japonicum strains, and lanes 10–17 show the profiles of other members of the Bradyrhizobiaceae (see Table 1). Lane 18 shows the GC content of B. japonicum strain USDA110, calculated by using a 10-kb window. *GIs with lower signals in strains USDA122, NC4 and NK2 of genome type 122 as compared with those of USDA110. **GIs that appear to be conserved among strains of genome types 110 and 122. The names of the trn elements are those given as GIs by the whole-genome sequencing of B. japonicum USDA110 (Kaneko et al., 2002). The position of the symbiosis island in B. japonicum USDA110 (1.88–2.29 Mb) is boxed. GI, genomic island.

Intraspecies comparison of B. japonicum CGH profiles

When B. japonicum NC6 was compared with USDA110, the profiles of both signal intensities were almost the same, except at the 8.9-Mb position (lanes 1 and 2; Figure 1). In contrast, in the other strains of B. japonicum (USDA122, NK2, NC4, USDA124, USDA6, T7 and T9), we found many genomic regions with lower signal ratios (LSR) in clusters, the positions of which were likely shared by these strains (lanes 3–9; Figure 1). Interestingly, the LSR regions corresponded to those of low G+C content in the USDA110 genome (lane 18; Figure 1), and they included 14 GIs previously reported by Kaneko et al. (2002) (‘GI-like’ in Figure 1). These results suggest that LSR regions are commonly missing on the genomes of many B. japonicum strains. When the DNA sequences in and around the LSR regions were surveyed on the USDA110 genome, we found seven additional GI-like regions that flanked t-RNA genes around the LSR regions on the USDA110 genome, although the direct repeat sequences of the t-RNA genes were not structurally observed as genomic islands (‘GI-like’ in Figure 1).

To compare the CGH profiles of B. japonicum strains, a principal component analysis was conducted based on the signal ratios (Figure 2). Plots of the nine strains of B. japonicum were divided into three places on the principal component analysis chart, where 62% variance was explained by principal components PC1 and PC2 (Figure 2a). In this way, we classified and termed genome type 110 (USDA110 and NC6), genome type 122 (USDA122, NC4 and NK2) and genome type 6 (USDA6, USDA124, T7 and T9). This grouping of the genome types of B. japonicum strains was supported by the CGH profiles (Figure 1).

Figure 2
figure 2

Principal component analysis of CGH profiles. (a) Principal component plots generated from all the CGH profiles of seven strains of Bradyrhizobium japonicum. (b) Principal component plots generated from the genome core of CGH profiles without the symbiosis island of B. japonicum and from non-Bj strains. CGH, comparative genomic hybridization.

Relationships between phylogenetic analysis and CGH profile among B. japonicum strains

To examine whether the CGH profiles of B. japonicum strains reflected the phylogeny of the strains, phylogenetic trees were constructed based on the 16S rRNA gene and the ITS sequences between the 16S and 23S rRNA genes (Figure 3). The tree based on the 16S rRNA gene divided the B. japonicum strains into two main clusters, BJ1 and BJ2, as described earlier (Sameshima et al., 2003; Sameshima-Saito et al., 2006) (Figure 3a). B. japonicum strains belonging to the two genome types 110 and 122, as defined by the CGH profile (Figures 1 and 2), still formed a single clade, BJ1 (Figure 3a). The ITS analysis tree enhanced the resolution of the phylogenetic relationships, as reported by van Berkum and Fuhrmann (2000): the B. japonicum strains were subdivided into genome types 110, 122 and 6, with high bootstrap values (Figure 3b). These results showed that the CGH profiles certainly reflected the phylogeny within B. japonicum.

Figure 3
figure 3

Phylogenetic relationships of Bradyrhizobium japonicum and other Bradyrhizobiaceae members based on (a) 16S rRNA gene sequences and (b) internal transcribed spacer (ITS) sequences. (a) Tree constructed on the basis of the 16S rRNA gene sequences of the 17 strains that we tested (asterisked), together with those of Blastobacter denitrificans (AF338176) and Nitrobacter winogradskyi (CP000115), by the neighbor-joining (NJ) method. (b) Tree constructed on the basis of ITS sequences by the NJ method. The black arrowhead shows the estimated positions of horizontal transfer of seven GIs (trnR1, trnF1, trnQ1, trnS1, trnK1 trnP1 and trnR2) during microevolution of B. japonicum. The white arrowhead shows the estimated positions of horizontal transfer of trnM1 and trnK4 (see Discussion). Asterisk shows the estimated position of acquisition of an ancestral symbiosis island associated with soybeans (see Discussion). For both trees, Mesorhizobium loti MAFF303099 was used as the outgroup. Numbers at the nodes are bootstrap values from 1000 replications. Bars show base substitutions per nucleotide.GI, genomic island.

PCR and sequence analysis of variable regions

The intraspecies comparisons of CGH analyses suggested that the variable genomic regions of seven B. japonicum strains (lanes 3–9; Figure 1) corresponded to the positions of GIs on the USDA110 genome (lane 18; Figure 1). To examine whether GIs were structurally missing on the genomes of these strains, PCR amplification was conducted targeting the GIs of B. japonicum strains. PCR primers were designed in the sequences flanking the GIs in the genome of USDA110 for the respective GI targets (Figure 4a, Supplementary Table 1).

Figure 4
figure 4

PCR amplification and sequence analysis of genomic islands. (a) Schematic presentation of PCR primer and variable regions around GIs in genome types 110 and 122, as revealed by DNA sequencing of the PCR products. Bold lines indicate the genome core and ‘DRs’ are direct repeats derived from target tRNA genes. FP, forward primer; RP, reverse primer. (b) Gel electrophoresis of PCR products of USDA122 (lanes 1, 4, 7, 10, 13, 16, 19, 21), NC4 (lanes 2, 5, 8, 11, 14, 17, 20, 22) and NK2 (lanes 3, 6, 9, 12, 15, 18, 23). trnR1, trnF1, trnI2, trnQ1, trnS1, trnK1, trnP1 and trnR2 are the genomic islands targeted by using the primer sets in Supplementary Table 1. M1 and M2 are DNA size markers of Lamda DNA HindIII digest and fX174 DNA HaeIII digest, respectively. No PCR product was observed for Bradyrhizobium japonicum strains, USDA124, USDA6, T7 and T9 belonging to genome type 6 (Figures 1 and 2).

When total DNA of strain USDA110 or NC6 was used as a template (Figure 4a), no PCR products for trnR1, trnI1, trnI2, trnK1, trnP1 and trnR2 were observed, but PCR products (6–9 kb) for trnF1, trnQ1 and trnS1 were detected (Supplementary Table 1). No detection of most PCR products is probably due to the large target sizes of the GIs on the USDA110 genome. However, when total DNA of strains USDA122, NC4 or NK2 of genome type 122 was used as a template, short PCR products could be detected for eight GIs (trnR1, trnF1, trnI2, trnQ1, trnS1, trnK1, trnP1 and trnR2), except with a combination of NK2 and trnP1 primer sets (Figure 4b). The sizes of the PCR products suggested the occurrence of direct connections, without GIs (Supplementary Table 1). Examination of the DNA sequences of these amplified DNA fragments revealed that at least seven GIs (trnR1, trnF1, trnQ1, trnS1, trnK1, trnP1 and trnR2) were indeed absent in genome type 122 (USDA122, NC4 and NK2) with direct connections of the genome core; the exception was trnP1 in strain NK2 (Figure 4b, Supplementary Table 1). Analysis of the DNA sequence of the 7-kb PCR product of the use of the trnI2 primer set and the NC4 DNA template (Figure 4b) suggested partial truncation of the structure of the original trnI2 on the USDA110 genome (Figure 4a). With strains USDA124, USDA6, T7 and T9 of genome type 6, no PCR product was observed (data not shown), probably because of sequence divergence around the GIs in these strains. These results indicate that at least seven GIs (trnR1, trnF1, trnQ1, trnS1, trnK1, trnP1 and trnR2) are missing in the B. japonicum strains belonging to genome type 122.

Determination of regions missing in B. japonicum strains

Strains of B. japonicum are divided into two genotypes for the presence and absence of hup structural genes encoding uptake hydrogenase (van Berkum, 1990) and nos structural genes encoding nitrous oxide reductase (Sameshima-Saito et al., 2006; Table 1). These genes were used to determine a criterion for the absence and presence of certain genomic regions in B. japonicum strains. When hup+ strains were used as templates to array probe brb11967 including hupSL (bll6941 and bll6942), the resultant signal ratios ranged from 0.8 to 1.5, whereas the signal ratios ranged from 0.2 to 0.5 in hup strains. When the nos+ strains were used as templates to array probe brb00265 including a nitrous oxide reductase gene (nosZ), the signal ratios ranged from 0.9 to 1.1, whereas the signal ratios ranged from 0.2 to 0.3 in nos strains. Probably, cross-hybridization of different genomic regions gives rise to higher background signals because of the large size and complexity of the B. japonicum genome (Kaneko et al., 2002; Ito et al., 2006).

Therefore, if the signal ratio of a certain probe (spot) was less than 0.5 and differed significantly by t-test (P0.05) between the tested strain and USDA110, we considered that the test strain lacked the corresponding region on the genome and produced a matrix consisting of 0 (‘missing’) or 1 (‘present’) in the tested strains, along with the array probe positions (Figure 5a). Similar criteria have been adopted in macroarray CGH analysis of Streptococcus species (Brochet et al., 2006).

Figure 5
figure 5

Correlation and expression analysis of genome structure and symbiotic nitrogen fixation phenotype among Bradyrhizobium japonicum strains. (a) Missing genomic regions in nine strains of B. japonicum, as determined from CGH profiles. Black lines indicate missing regions (signal ratio0.5; P0.05) on the genome in tested strains. Arrowheads indicate positions of the GIs with asterisks, whose explanations are the same in Figure 1. (b) Correlation analysis between the existence of certain genomic regions and symbiosis parameters. The symbiosis parameters are total nitrogen per plant (TN), fixed nitrogen per plant (FN), nitrogen content of leaves (NCL), nitrogen content of stem (NCS), nitrogen content of roots (NCR), nitrogen content in nodules (NCN), total dry weight of plant (TDW) and nodule dry weight per plant (NDW). Black lines indicate significant positive correlations between each array probe and the symbiosis parameter. (c) Expression in profiles A and D under bacteroid and free-living cells of B. japonicum USDA110. Dotted line shows average relative expression throughout all the genomic regions. CGH, comparative genomic hybridization; GI, genomic island.

When the determined missing regions of nine B. japonicum strains were aligned based on USDA110 genome position, it is likely that strains USDA124, USDA6, T7 and T9, belonging to genome type 6, lack trnM1 and trnK4 and the vicinity of 4 Mb position (Figure 5a). In other words, these regions appear to be conserved among strains of genome types 110 and 122.

Strain variations of symbiotic phenotype in B. japonicum

The above results strongly indicate that B. japonicum USDA110 has acquired many DNA fragments, such as GIs, in its genome (Figures 1 and 3). When we surveyed the possible functions of genes on GIs and other variable regions of the USDA110 genome, there were many candidate genes (approximately 2138), 57% of which were unknown earlier. This situation was similar to CGH result of S. meliloti (Giuntini et al., 2005). These results beg the question of how the acquired foreign regions function in B. japonicum USDA110. Strain USDA110 of B. japonicum is highly adapted to modern soybean cultivars and often has superior symbiotic nitrogen-fixation capability (Israel et al., 1986; Basit et al., 1991). These facts prompted us to examine whether the variable regions containing GIs enhance symbiotic nitrogen fixation in soybeans.

When soybean plants were inoculated with the nine strains used in this study and cultivated in a greenhouse from seed for 56 days, we observed strain differences in TN, FN and total dry weight (TDW) of the inoculated soybean cultivar Enrei (Table 2). USDA110 showed the highest mean values in TN, FN and TDW as compared with other B. japonicum strains. Statistical analysis indicated that these indexes of USDA110 were significantly higher than those of USDA122, NC4, USDA124, USDA6, T7 and T9 (genome types 122 and 6), although NK2 had high errors (s.d.) in TN, FN and TDW values. On the other hand, T7 and T9 (genome type 6) showed markedly low values of TN and FN, which were statistically significant as compared with USDA110 and NC6 (genome type 110).

Table 2 Total nitrogen content, fixed nitrogen, total dry weight and nodule weight in soybean plants inoculated with various Bradyrhizobium japonicum strains1

Correlation of region absence and symbiotic phenotype

By using data on the nine strains of B. japonicum, we calculated the correlation coefficients between the existence of certain genomic regions (X=0 or 1) (lines were scored in missing regions (X=0) in Figure 5a) and the values of the eight parameters (Figure 5b) of inoculated host plants including TN, FN, TDW, nodule dry weight, nitrogen contents of leaves, stem, roots and nodules along with 3960 array probes. When significant positive correlations (P<0.01) were detected, lines were scored on the positions on USDA110 genome (Figure 5b).

The positions where significant correlations were detected generally corresponded to the missing regions in strains USDA122, NC4, NK2, USDA124, USDA6, T7 and T9 belonging to genome types 122 and 6 (Figures 5a and b). Correlations were found for indexes of nitrogen fixation (TN and FN), plant growth (TDW), nitrogen contents of leaf, stem and root (Figure 5b). On the other hand, no and weak correlations were observed in nodule nitrogen content and nodule dry weight, respectively (Figure 5b). In particular, many missing regions (Figure 5a) were heavily correlated with indexes of nitrogen fixation (TN and FN) and TDW (Figure 5b). When similar patterns of the missing regions (Figure 5a) and the positive correlation with the indexes (Figure 5b) were sorted irrespective of the probe positions on the USDA110 genome, four major variable regions appeared; we designated them as profiles A, B, C and D (Figure 5b below, Supplementary Figure 1).

Profile A on the USDA110 genome was completely missing in genome type 122 (strains USDA122, NC4 and NK2) and genome type 6 (strains USDA6, USDA124, T7 and T9). A high rate of occurrence of GIs was found in profile A: 10 GIs were concentrated in profile A among the 14 confirmed GIs on the USDA 110 genome (71%). Profile D had missing regions only in strains T7 and T9. Profiles C and D did not include GIs.

Multiple regression analysis

Linear models were used to conduct a multiple regression analysis of the correlations between the four profiles and the parameters measured in the inoculated soybeans. For TN, the model that included all the profiles had the minimum Akaike information criterion (Akaike, 1973) among 16 models and was selected as the best model as ‘A+B+C+D’ (Table 3). This model explained about 95% of the total variations observed among the nine strains (that is, R2=0.951). Among the four profiles, A and D had highly significant (that is, P<0.01) positive effects on this trait. The coefficients of profiles B and C did not show significant effects of these profiles by the t-test. For FN, ‘A+D’ was selected as the best model (Table 3). Both profiles A and D had highly significant (that is, P<0.01) positive effects on FN. For TDW, ‘A+B’ was selected as the best model (Table 3). Profile A had highly significant (that is, P<0.01) positive effects on TDW.

Table 3 Regression analyses with various models

Gene expression in variable regions under symbiotic conditions

The above results suggested that symbiotic nitrogen fixation is increased by the genes present in profiles A and D. It is reasonable to expect that the genes highly expressed in nodule bacteroids would be the genes in profiles A and D that enhance nitrogen fixation. Transcriptome analysis indicated that several parts of profiles A and D were selectively expressed in bacteroids and free-living cells of USDA110 (Figure 5c). Comparison of our expression data with microarray results (Chang et al., 2007) revealed that blr6420 (one of three pobA homologs; 4-hydroxybenzoate hydroxylase) in GI trnK2 and bll6377 (putative acyl-CoA dehydrogenase) and blr6378 (transcriptional regulatory protein LacI family) in trnK3 were symbiotically upregulated genes with known functions.

CGH profiles of non-Bj strains

The patterns of the CGH profiles of non-Bj strains were similar to each other (Figure 1), although the washing conditions for non-Bj strains were slightly different from those for B. japonicum strains. In particular, the low hybridization signals on the symbiosis island (Figure 1) suggest that the symbiosis island was missing in non-Bj strains, with the exception of B. elkanii USDA76, a soybean endosymbiont. Indeed, no symbiosis island containing nif/fix and nod genes has been found on the genomes of Rhodopseudomonas palustris (Larimer et al., 2004) or Bradyrhizobium sp. BTAi1 and ORS278 (Giraud et al., 2007), although BTAi1 and ORS278 are stem-nodulating and nitrogen-fixing bacteria (Giraud et al., 2007).

When we examined the relatedness of the core genomic regions outside the symbiosis island by principal component analysis, non-Bj strains, except for B. elkanii USDA76, formed a compact cluster (Figure 2b). In contrast, B. japonicum USDA110, R. palustris CGH009 and Bradyrhizobium sp. BTAi1 and ORS278 differed from each other in terms of genome synteny (Supplementary Figure 2A). Therefore, CGH profiles based on the B. japonicum USDA110 array were not useful for evaluating the genomes of the non-Bj strains, probably because of the overall lower level of synteny than that of the USDA110 genome.

Discussion

Horizontal gene transfer is an evolutionary phenomenon that involves the occurrence of genetic exchanges between different evolutional lineages (Jordan and Koonin, 2004). We found that the GIs on the genome of B. japonicum USDA110 were variable genome regions that were not conserved universally on the genomes of the Bradyrhizobiaceae (Figure 1). We demonstrated that at least seven GIs (trnR1, trnF1, trnQ1, trnS1, trnK1, trnP1 and trnR2) on the USDA110 genome did not exist on the strains of genome type 122 within B. japonicum (Figure 4). The trnK3-, trnK2- and trnR3-missing regions were commonly observed among B. japonicum strains of genome types 122 and 6 (Figures 1 and 5a), although their boundary sequencing could not be always determined by our PCR strategy. CGH profile analysis suggested that the absence of GIs on the genome types 122 and 6 in B. japonicum extends to non-Bj strains (Figure 1). So far, the GI sequences have proven specific for B. japonicum USDA110 by our BLAST search (http://www.ncbi.nlm.nih.gov/blast) of data on the genomes of 10 strains of the family Bradyrhizobiaceae (Gupta and Mok, 2007), although we found partial homologous sequences within the GIs in Bradyrhizobium sp. BTAi1 and ORS278, R. palustris CGA009, BisA53, BisB18, BisB5 and HaA2, and Rhizobium leguminosarum bv. viciae pRL plasmids.

These lines of evidence strongly suggest that the GIs were horizontally inserted into the ancestral genome of genome type 110. As CGH profiles on USDA110 symbiosis island in B. japonicum USDA124, USDA6, T7 and T9 (genome type 6) were still conserved rather than those in B. elkanii USDA76 and non-Bj strains (Figure 1), the ancestor of B. japonicum might diverge into two 110–122 and six lineages after the acquisition of a symbiosis island for soybean associations (asterisk in Figure 3b). The B. japonicum strains of genome type 6 have diverged in terms of phylogeny (Figures 3a and b), CGH profiles (Figures 1 and 2a) and no PCR product by using USDA110 primers (Figure 4). Thus, the genome information of USDA6 and its array system would be also required to cover all B. japonicum strains with associated soybeans for precise CGH analysis.

From an examination of the ITS sequence phylogeny (Figure 3b) and the distribution of missing regions among B. japonicum strains (Figures 1 and 5a), we can speculate that there were two steps of GI acquisition by the ancestral genome: (1) first, two GIs, trnK4 and trnM1, were acquired before the divergence of strains of genome types 110 and 122 (white arrowhead in Figure 3b); (2) then trnR1, trnF1, trnQ1, trnS1, trnP1, trnR2, trnK3, trnK2 and trnR3 were acquired as GIs after the divergence of strains of genome types 110 and 122 (black arrowhead in Figure 3b). Although we do not know how the truncated GI trnI2 (Figure 4a) was generated through partial deletion or insertion, GIs might be highly dynamic entities on B. japonicum genomes.

The recent completion of genome-sequencing projects in several rhizobial and bradyrhizobial species enables us to analyze global postgenomic approaches to bacteria, including the analysis of transcriptomes, systematic mutant libraries, ORFeomes, proteomes, transportomes (Mauchline et al., 2006) and metabolomes, toward the further development of functional genomics (MacLean et al., 2007). In this study, we aimed to reveal bradyrhizobial genomic regions relevant to symbiotic nitrogen fixation by using statistical model analysis to correlate variable genomic regions with symbiotic phenotypes of the host plants. The multiple regression model analysis showed that the variable regions of profiles A and D were significantly correlated with symbiotic nitrogen fixation (Figure 5b, Table 3).

Profile A included 10 of the 14 GIs on the USDA110 genome (Figure 5c). The horizontal transfer of complete genes raises the important issue of how the expression of these genes is regulated in the new host. H-NS protein, a pleiotropic repressor of transcription, has a key role in selective silencing of the transcription of large numbers of horizontally acquired AT-rich genes, including pathogenicity islands (Dorman, 2007). It was recently found that Rho termination is required to suppress similar silencing (Cardinale et al., 2008). If horizontally transferred genes function to enhance symbiotic nitrogen fixation, we can expect them to be expressed or upregulated in bacteroids. Our expression data suggest that, overall, their expression in profiles A and D in bacteroids is lower than the average expression levels, but some regions are upregulated in symbiosis (Figure 5c). Therefore, one possible approach to verify the involvement of profiles A and D into symbiotic nitrogen fixation is to examine the symbiotic phenotypes of B. japonicum USDA110 mutants lacking these expressed regions. An alternative approach is a fine mapping of these profiles by using many strains of B. japonicum that share similar genome backgrounds within genome types 110 and 122.

Traditionally, the ability of rhizobial inoculum strains to promote symbiotic nitrogen fixation has been improved. The best-characterized example of this is the uptake hydrogenase (Hup) system, which takes up hydrogen generated through nitrogenase and enhances the energy efficiency of nitrogen fixation (Maier and Triplett, 1996; Baginsky et al., 2005). As genes for the Hup system were included in GI trnM in profile B (Figures 5a and b), we first expected to detect this region by regression model analysis. However, profiles A and D were found to be variable genomic regions, suggesting the existence of novel systems other than the Hup system that enhance symbiotic nitrogen fixation.

In this study, soybean cultivar Enrei, a modern cultivar of soybeans in Japan, was consistently used for the determination of symbiotic phenotypes (Table 3) and expression analysis in bacteroids (Figure 5). However, symbiotic effectiveness for nodulation and nitrogen fixation was sometimes dependent on host legume genotypes (Israel et al., 1986; Nautiyal et al., 1988; Ishizuka et al., 1991; Sadowsky et al., 1991). Thus, care should be taken for soybean genotypes to compare our results with others.

Although rhizobial postgenome studies have provided valuable insights into rhizobial–legume symbioses, marked limitations remain within these types of studies (MacLean et al., 2007). On the basis of the diversity of E. coli O157 (Ishii and Sadowsky, 2008, Manning et al. (2008) analyzed the association between SNP clades and severe disease, and they suggested that the presence of particular clade lineage was a critical determinant of severe disease. Little is known about the factors in symbiotic bacterial that contribute to variation in symbiotic phenotypes. Our approach combining CGH, symbiotic phenotype and expression analyses should open a new window on how rhizobia have developed efficient systems for symbiotic nitrogen fixation.

Recently, a model system of nodC- and nec1-targeted monitoring was developed to diagnose rhizobial function (Bontemps et al., 2005) and disease control (Koyama et al., 2007). Our CGH profiles are available to examine overall genotypic diversity, including the hup and nos genotypes of B. japonicum field isolates. In addition, the genomic phylogeny and relatedness of field isolates could be evaluated within B. japonicum (Figure 2). We therefore expect that this method will be applicable to the diagnosis of field populations of B. japonicum for soybean production and environmental conservation.