Introduction

Purple bacteria are anoxygenic and phototrophic bacteria, including purple sulfur bacteria (PSB) and purple non-sulfur bacteria (PNSB). They are ubiquitously distributed in different natural environments with versatile metabolism potentials. Purple bacteria also serve as models for clarifying the biogeochemistry of C, N, S and Fe in the earth evolution1. YL28, a member of PSB, was isolated from mangrove special ecosystem2. It synthesized abundant rhodopin (not spirilloxanthin) carotenoid component under anaerobic in the light condition. It was also capable of using reduced sulfur compounds, nitrogen compounds or molecular hydrogen as electron donors2. Moreover, YL28 utilized ammonium, nitrite or nitrate as the sole nitrogen source for phototrophic growth. It is interesting that it was found to completely remove nitrite (up to 200 mg/L)3,4 and perform simultaneous heterotrophic nitrification and denitrification under anaerobic conditions5.

Nitrogen is one of the most abundant elements on earth. It comprises the majority of earth’s atmosphere and functions as one of the primary nutrients6. Nitrogen cycle is the most complex biogeochemical one in the biosphere7,8. However, the nitrogen cycle has been drastically disrupted because of overexploitation and modern agricultural activities9. Nearly half of the nitrogen reaches the coastal ocean via river input and/or atmospheric deposition10. This leads to extensive eutrophication of waters and coastal zones and increases inventories of the potent greenhouse gas (such as nitrous oxide). The elevated concentrations of NH3-N and NO2-N (the major pollutants) are problematic in aquatic ecosystems11.

Up to 10 nitrogen cycle pathways were reported9. For examples, purple bacteria have three nitrogen reduction cycles (reduced nitrogen compounds as electron acceptor, such as nitrogen fixation, nitrate assimilation, denitrification) and one nitrogen oxidation cycle (nitrite oxidation to nitrate as electron donor for photosynthesis)12,13,14,15. However, nitrification or dissimilatory nitrate reduction to ammonium (DNRA) pathway has not been reported in purple bacteria yet. Denitrification and assimilation nitrate reduction cycles were only reported in PNSB, but unknown in PSB16. Currently, the molecular mechanism of nitrogen cycles in purple bacteria remain unclear17,18,19. To elucidate nitrogen cycles in purple bacteria, we sequenced the genome of YL28 as a representative strain, compared the nitrogen cycle genes to other 35 sequenced purple bacteria. The genes involved in the sulfur metabolism, salt tolerance and the stress response were also investigated in details to gain the insights of the surviving mechanisms of YL28 in the highly selective environments.

Results

Genome features and phylogenic inference

The general features of genome of YL28 and the other 35 sequenced purple bacteria were presented in Table 1. Removal of short contigs and sequences with potential contamination resulted in 120 contigs with a N50 of 93,809 bp. The largest contig size was 176,489 bp. The assembled genome comprised 3.8 million nucleotides with a GC content of 68.84%. Gene annotation was carried out by the NCBI Prokaryotic Genome Annotation Pipeline (PGAP). A total of 3,361 genes yielded a coding capacity of 3.3 million nucleotides (genes/genome, 86.4%). There were at least 69 RNA sequences, including 56 tRNAs, 9 rRNAs, and 4 function unknown RNAs in the genome. Among the protein-encoding genes, 1,752 of them could be assigned putative functions while 1,612 were predicted to encode hypothetical proteins. At least 1,671 proteins were assigned to 25 different functional categories with 368 subsystems using SEED subsystems by RAST analysis (Supplementary Fig. SF1). 40 genes involved in nitrogen metabolism were found (Supplementary Fig. SF1). Compared to its relative M. purpuratum 984, YL28 has more genes for nitrogen metabolism, photosynthesis and sulfur metabolism (Supplementary Fig. SF2, Tables 2 and 3).

Table 1 The genome characteristics of purple bacteria used in this study.
Table 2 The key enzymes responsible for different nitrogen metabolism in purple bacteria.
Table 3 The key enzymes responsible for different sulfur metabolism in purple sulfur bacteria (PSB).

No plasmid sequences were discovered in YL28. However, at least one was found in Rhodovulum spp, Rhodobacter spp, Allochromatium spp, R. rubrum ATCC 11170, T. mobilis 8321 and R. palustris CGA009 (Table 1). Moreover, most of the Rhodobacter strains possesses up to two chromosomes. The genomic size of the purple bacteria ranges from 2.6 to 5.4 Mb. Except for T. mobilis 8321 and T. violascens DSM 198, the genome size in PSB is smaller than that in PNSB (Table 1). Overall, genome size in most of PSB is smaller than 4.0 MB, with the smallest one (2.68 Mb) in Halorhodospira halophila SL1. Except for Blastochloris, genome size of PNSB is generally larger than 4.0 MB; among them, Rhodoplanes sp. Z2-YC6860 has the largest genome size (8.19 Mb). Collectively, a high GC content (>55%) is often observed in purple bacteria.

Phylogenic inference

The phylogenetic analysis based on 16S rDNA sequence (Fig. 1A) showed that PSB and PNSB derived from a common ancestor and formed two clades, which was consistent with the previous studies20. 28 PNSB strains were clustered into three subdivisions: the first one consisted of Rhodopseudomonas, Blastochloris, Rhodoplanes and Rhodomicrobium vannielii ATCC 17100; the second one included Rhodobacter (freshwater species) and Rhodovulum (marine species); the third one had Rhodospirillum and Rhodocista. 7 PSB species were clustered together on the same clade. Both the phylogenetic trees based on whole genome or core genome (Fig. 1B,C) showed that some PSB species were included in PNSB. Interestingly, Rubrivivax gelatinosus IL144 (a member of PNSB) was the closest phylogenetic neighbors of PSB based on 16S rDNA sequence and core genome.

Figure 1
figure 1

Phylogenetic analyses between PSB and PNSB. The NJ algorithm tree of 16S rRNA genes of 36 strains from purple bacteria by the MEGA 6.06 (A); the Mauve guide tree of the 35 strains based on whole-genomic similarity at the nucleotide level through multiple genome comparison tool Mauve (B); the NJ tree of 77 conserved proteins shared among the 36 strains by the BPGA (C).

Gene repertoire of PSB

The power trend line has not reached the plateau (Fig. 2A), demonstrating that PSB displays an open accessory-genome. The core genome analysis of PSB shows that the numbers of shared genes decreases with the addition of the input genomes and is predicted to converge against 550 (Fig. 2A). Singleton development plot data indicates that up to 1594 new genes are expected with each newly added genome (Fig. 2B). The core genomes for the seven PSB species contain at least 539 CDSs per genome. The current core genome represents the PSB quite well. YL28 and M. purpuratum 984 share at least 2869 genes (data not shown); only 241 and 210 genes uniquely occur in YL28 and 984, respectively. The shared gene numbers in genome between YL28 and other six PSB species range from 732 to 2869 genes (data not shown).

Figure 2
figure 2

Pan, core and singleton genome evolution according to the number of selected purple bacteria genomes. (A) Number of genes (core genome and Pan-genome) for the selected purple bacteria genomes sequentially added. (B) Number of unique genes for the selected purple bacteria genomes sequentially added.

Nitrogen cycle genes in YL28 and purple bacteria

YL28 possesses six nitrogen cycle pathways (Fig. 3 and Table 2). Two genes encoding NirS and NorBC (the key enzymes for nitrite reduction) exists in YL28. These two enzymes in YL28 possibly contribute to transform toxic nitrite into lower cytotoxic N2O by denitification5. The genes encoding NasAB and NirA (the key enzymes for assimilation nitrite reduction) were also detected in YL28, suggesting that it had the complete nitrate reduction ability. YL28 has the genes encoding Nrf which is critical in fermentative nitrate reduction (DNRA). However, the DNRA pathway in YL28 may be incomplete due to the lack of entire complement enzymes. Ammonium assimilation and ammonification related genes were detected in all purple bacteria. Additional four nitrogen cycle pathways display diverse metabolism traits in examined strains (Table 2). The presence of genes encoding NifDKH key enzymes in all purple bacteria shows this group of bacteria has nitrogen fixation potentials. Interestingly, the alternative nitrogenase (AnfG) from Methanosarcina was only found in PNSB but not in PSB21.

Figure 3
figure 3

The model of nitrogen metabolism in M. gracile YL28. There are six nitrogen metabolic pathways in YL28 whose key enzyme confirmed by CDART and CDD. (A) Ammonification pathway; (B) ammonium assimilation pathway; (C) fermentative nitrate reduction; (D) denitrification; (E) assimilation nitrate reduction; (F) nitrogen fixation.

Sulfur metabolism genes in YL28

YL28 has genes invovled in at least three sulfur metabolism pathways such as oxidation of sulfide, reversed dissimilatory sulfite reduction and sox system (Table 3). The sqr and the dsr gene cluters (dsrABEFHCMKLJOPNR) are the key enzyme genes for the oxidation of sulfide and the reversed dissimilatory sulfite reduction. These process oxidize toxic sulfide into S0 or sulfate20. In addition, sox gene clusters in YL28 shows that thiosulfate is possbly converted into S0 or sulfate by the truncated Sox system. Moreover, the presence of aprAB and sat genes (encoding adenylylsulfate reductase, sulfate adenylyltransferase, respectively) suggests that YL28 possibly possesses an alternative sulfite oxiditon pathway (converting toxic sulfite into sulfate).

Halo-tolerance

YL28 possessed a gene cluster involved in the salt-alkali tolerance (nhaABCDEFG, encoding Na+/H+ antiporter). Moreover, a novel putative Na+/H+ antiporter gene (duf 2062) exsit in YL28. The unique stress response subsystem in YL28 was listed in Table 4. There are six unique proteins participating in choline and betaine uptake and betaine biosynthesis (betaine aldehyde dehydrogenase, choline dehydrogenase), synthesis of osmoregulated periplasmic glucans (phosphoglycerol transferase I, cation channel protein), heavy metal resistence (DNA-binding heavy metal response regulator, heavy metal sensor histidine kinase).

Table 4 The unique stress response protein in YL28.

Genome islands (GIs) of YL28

At least 9 GIs were identified with YL28 (Fig. 4A,B and Supplemental Table ST1) by both IslandPick and IslandPath-DIMOB methods. The GI size ranges from 8 to 31.3 Kb (Supplemental Table ST1). These GIs were noted as follows: the mobile element proteins, transposase, phage structure proteins, integration host factor, proteins involved in CRISPR system, 5-methylcytosine-specific restriction related enzyme, cephalosporin hydroxylase, flagellum synthesis component (CheABR), carbohydrate/nitrogen metabolism proteins (phenylalanyl-tRNA metabolism, Threonyl-tRNA biosynthesis, amylomaltase, asparagine synthetase, glycosyl transferase, aspartate aminotransferase), DNA replication and proofreading systems (DNA recombination, repair DNA recombination and repair protein RecF Type II restriction enzyme, ATP-dependent endonuclease of the OLD family, ATP-dependent DNA helicase pcrA, DNA processing chain A) (Fig. 4A,B and Supplemental Table ST1).

Figure 4
figure 4

Whole-genome comparisons in three genera of purple bacteria. The color intensity in each ring represents the BLAST match identity. (A) Whole-genome comparison of all the strains considered in this work; (B) Whole-genome comparisons of strains of purple sulfur bacteria and T118, IL144. The color intensity in each ring represents the BLAST match identity. from inner to outer ring: M. purpuratum 984, A. vinosum DSM 180, T. violascens DSM 198, T. mobilis 8321, Ectothiorhodospira sp. BSL-9, H. halophila SL1, R. gelatinosus IL144, R. ferrireducens T118. (C) Whole-genome comparisons of 12 strains of purple non-sulfur bacteria. The color intensity in each ring represents the BLAST match identity. from inner to outer ring: R. palustris BisA53, BisB18, BisB5, DX-1, HaA2, TIE-1, B. viridis, B. viridis ATCC19567, B. viridis DSM133, Rhodoplanes sp. Z2-YC6860, R. vannielii ATCC 17100. (D) Whole-genome comparisons 11 strains of two gena in Rhodobacter and Rhodovulum in purple non-sulfur bacteria. The color intensity in each ring represents the BLAST match identity from inner to outer ring: R. capsulatus SB 1003, Rhodobacter sp. LPB0142, R. sphaeroides ATCC 17025, R. sphaeroides ATCC 17029, R. sphaeroides KD131, R. sphaeroides MBTLJ-13, R. sphaeroides MBTLJ-8, R. sulfidophilum DSM1374, R. sulfidophilum DSM2351, R. sulfidophilum SNK001; (E) whole-genome comparisons in Rhodobacter, from inner to outer ring: R. rubrum F11, R. centenum SW, P. photometricum DSM 122.

Among these predicted GIs, GI-IX has the largest size (31.3 kb) in YL28. Notably, GI-IX has many DNA metabolic related genes including DNA recombination and repair protein RecF, ATP-dependent endonuclease of the OLD family, ATP-dependent DNA helicase pcrA, ATP-dependent DNA helicase RecQ, DNA processing chain A and type II restriction enzyme. It also contains some genes encoding diverse enzymes such as aspartate aminotransferase, methylase and ATP-dependent protease. The second largest GI (GI-II) carries a chemoreceptor gene cluster including the cheA, cheB, cheR and a gene encoding the methyl-accepting chemotaxis protein. Moreover, GI-II has the predicted two-component hybrid sensor and regulator, 4-alpha-glucanotransferase (amylomaltase), and mobile element protein (Supplemental Table ST1). Immediately downstream of GI-II, there are genes involved in the DNA metabolism such as 5-methylcytosine-specific restriction related enzyme, DNA helicase, phosphatase 2C homolog and 5-methylcytosine-specific restriction related enzyme. The size of other GIs is smaller than GI-IX.

There are five GIs in Rhodopseudomonas, Blastochloris and Rhodoplanes (Supplemental Table ST2). These GIs carry nitrogen-fixing genes (nitrogenase gene and alternative nitrogenase gene), LysR family of proteins transcriptional regulation, glycosyltransferase protein family and arsenic resistance (ArsH). There is an alternative nitrogenase (AnfG) in a GI region of R. palustris, which replaces nitrogenase (NifDKH) for nitrogen fixation22. At least 10 GIs are predicted in Rhodobacter (Supplemental Table ST3). Among these GIs, there are many genes involved in nitrogen metabolism including a nitrogen-fixing island (nitrogenase genes, oxidoreductase/nitrogenase component), a nitrogen-rich23 island (nitrate/sulfonate/bicarbonate ABC transporter, nitrogen regulatory protein P-II), and a solid-island24 (metallophosphoesterase). Moreover, some GIs were found to have several genes that involved in sulfur metabolism, biosynthesis of flagella, phage infection area and BadM/Rrf2 family of transcriptional regulatory protein. Rhodobacter possibly acquires nitrogen-fixation and the sulfur metabolism by the horizontal gene transfer (HGT). At least 11 GIs were predicted in genus Rhodospirillum (Supplemental Table ST4). Among those GIs, Rhodospirillum has gene elements including CRISPR, sulfide metabolism, arsenic resistance and the DNA and ribosomal protein synthesis, allowing this bacterium to survive in toxic environments under certain concentration of arsenic and sulfide.

Synteny analysis

A good collinearity relationship was shown in PSB using the synteny plots analysis (Supplementary Fig. SF3, YL28 as the reference genome). YL28, T. violascens DSM 198, A. vinosum DSM 180, T. mobilis 8312 and M. purpuratum 984 have the closer relationship than do Ectothiorhodospira sp. BSL-9 and H. halophila SL1. When R. palustris CGA009, R. sphaeroides 2.4.1 or R. rubrum ATCC 11170 was selected as a reference sequence, PNSB members showed a diverse collinearity relationship (Supplementary Figs SF4SF6). Three groups are divided: the first one consists of the 4 genera including Rhodopseudomonas, Blastochloris, Rhodoplanes, and Rhodomicrobium (Fig. SF4); the second one consists of the 2 genera Rhodobacter and Rhodovillum (Fig. SF5); the third one only contains Rhodospirillum (except R. centenum SW) (Fig. SF6). Interestingly, a good collinearity was found between R. ferrireducens T118 (a non-phototrophic member) and PSB. Instead, a relatively poor collinearity was presented among PSB, R. gelatinosus IL144 and R. centenum SW. Collectively, the synteny pattern is consistent with the whole genome phylogenetic tree (Fig. 1B). The poorest collinearity relationship was shown between YL28 and R. gelatinosus IL144.

Discussion

The interfaces of mangrove locate between land and sea in the tropical and the sub-tropical latitudes25. PSB and PNSB are predominantly detected in the mangrove ecosystems. The two bacterial groups significantly contribute to the primary productivity of coastal seaboards and the food-web dynamics of various tropical coastline ecosystems26,27. The high salinity, limited nutrients and S-richness sulfate concentrations often occur in the niche25. However, it remains unclear about how the mangrove-associated microorganisms survive in the mangrove ecosystems and contribute to the host physiology. Particularly, the nitrogen utilization mechanisms in PSB and PNSB have not well studied. In this study, we sequenced the genome of YL28, investigated the nitrogen cycle pathways and compared them to other purple bacteria. Our study will contribute to elucidate bacterial surviving mechanisms in the special mangrove ecosystem.

YL28 utilized ammonium, nitrite or nitrate as the sole nitrogen source for phototrophic growth3,4,5. In the present study, our results reveal that the strong ability for nitrate and nitrite utilization may be due to the six nitrogen metabolic pathways. Ammonium is preferred nitrogen sources by all species of purple bacteria28. YL28 may grow on ammonium by the ammonium assimilation pathway, and converted organic nitrogen into ammonium by ammonification. Moreover, YL28 utilized N2 as nitrogen source to support cell growth by nitrogen fixation when the absence of ammonium happened. It is interesting that YL28 grew chemoheterotrophically using the nitrate and/or nitrite as electron acceptor. The denitrification and/or fermentative nitrate reduction and/or assimilation nitrate reduction pathways may contribute to this process under the anoxic condition. Diverse nitrogen utilization pathways allow YL28 to grow on the different nitrogen compounds. YL28 can convert nitrite (toxic compounds in the mangrove ecosystems) into non-toxicity or low-toxicity products by fermentative nitrate reduction or assimilation nitrate reduction pathways. Denitrification may also contribute to this process5.

In this study, our bioinformatics studies further show that purple bacteria have 6 nitrogen cycle pathways (Table 3). The alternative nitrogenase genes are frequently observed in purple bacteria. The genes (encoding two nitrogenase subunits) are highly conserved in many phyla of bacteria and archaea, suggesting that nitrogen fixation genes evolve once and subsequently spread by vertical in heritance or by HGT9. Nitrate/nitrite reduction-related genes were found only in the accessory genomes. The observation agrees with the previous studies that show that nitrate/nitrite is not preferred nitrogen sources29. The partial denitrification pathway was widely found in PNSB30,31. To best of our knowledge, our results first demonstrated that the partial denitrification pathway existed in PSB, suggesting that some PSB possibly also played important roles in the nitrogen cycle (e.g., Rhodopseudomonas and Rhodobacter). The partial denitrification pathway contributes to remove excess redox and mitigates the toxicity from the certain nitrogen oxide intermediates20. The fermentative nitrate reduction is less known in purple bacteria and assimilation nitrate reduction is only limited to a member of PNSB (Rhodobacter capsulatus E1F1). However, our results showed the fermentation nitrate reduction (DNRA) was possible occurred in purple bacteria, and assimilation nitrate reduction was in PSB.

The use of reduced sulfur compounds as electron donors for anoxygenic photosynthesis has been found in all groups of purple bacteria (PSB and PNSB). Remarkably, PSB utilize higher concentration reduced sulfur compounds and accumulated element sulfur inside the cells than PNSB20,32,33. Our previous studies showed that YL28 was capable of utilizing diverse reduce sulfur compound and depositing sulfur granules inside the cells. It demonstrated a higher tolerance (3.6 mM) to sulfide than PNSB (0.5–2 mM)1,2. YL28 has the diverse sulfur metabolism-related genes such as sqr, dsr, sox, apr and sat. The oxidation of sulfide and the reversed dissimilatory sulfite reduction (key enzyme gene sqr and dsr) possibly contribute to detoxification of toxic sulfide. Thiosulfate may be converted into S0 or sulfate by truncated Sox system. Moreover, an alternative pathway (key genes apr and sat) also allows to detoxifing the toxic sulfite. However, sulfite may not be directly converted into sulfate via a two-electron transfer because purple bacteria lack of the sulfite:cytochrome c oxidoreductase33. Without soxC/soxD, sulfane sulfur atom cannot be subsequently oxidized34. However, YL28 has soxD. The study on the complete sulfur oxidation in YL28 by soxD needs to be investigated. Some genes involved in classical sulfite metabolism were not detected in most purple bacteria34 (except for some freshwater PSB species such as A. vinosum DSM180)35. However, YL28 has both sulfide: quinone reductase gene (sqr, YL28 could directly oxidize S2− to S0) and APS reductase gene (apr). This suggests that YL28 may oxidize SO32− to SO42− and reduce the sulfite toxicity to cells. Sulfite reductase gene (dsr) was observed in all of the selected PSB (not PNSB). Gene dsr may be a useful taxonomic or systematic marker for PSB phylogeny.

Our studies revealed that there was a distinctive difference in halo-tolerance traits between freshwater and salt-dependent species by the 16S rDNA, core or whole genome sequence analysis. The salt-dependent or salt-requiring species showed different halo-tolerance. For example, YL28 possesses unique stress response genes (betA, duf 2062, mrp, glucan and heavy metal response genes etc.). These genes possibly contribute to the tolerance to in high salinity environment36,37.

The size of genome, the numbers and size of GIs in PSB were smaller than those in PNSB, implying that PSB had more flexible response to environment change. A gene transfer agent (GTA) is an unusual bacteriophage-like element which transfers a random host genomic DNA fragments (4–14 kb in size) between closely related bacteria38,39,40. Genes involved in the photosynthesis may be horizontally transferred between the same phyla by GTA41. Our results revealed that GTA gene clusters including ICEs (the self-transmissible mobile genetic elements, integrative and conjugative elements) were presented in all examined genomes. PNSB may acquire more genes by HGT to survive in the niches. The flagellum biosynthesis and methyl accepting chemotaxis genes (such as cheA, cheB and cheR) were found in the larger GIs, implying that purple bacteria possibly employ a complex set of chemosensory pathways to swim towards carbon and nitrogen sources, light and/or oxygen42.

Conclusion

This study provided a novel insight into the mechanisms of diverse nitrogen cycle, habitat-specificity and toxic nitrite utilization in YL28 and purple bacteria (including PSB and PNSB). Purple bacteria possess 6 nitrogen cycle pathways. The denitrification, complete assimilation nitrate reduction and fermentative nitrate reduction were first demonstrated in PSB. The fermentative nitrate reduction was possibly widely occurred in purple bacteria. YL28 possesses good ability to utilize toxic nitrite which is possibly linked to the combination of three nitrogen cycle pathways (denitrification, fermentative nitrate reduction and complete assimilation nitrate reduction). Collectively, the genes involved in diverse nitrogen cycles (6 pathways), sulfur cycles (3 pathways), unique salt-alkali tolerance, stress response as well as other traits contribute to bacterial adaptation to the mangrove habitat.

Materials and Methods

DNA extraction, genome sequencing and annotation

YL28 was isolated from an intertidal sediment of the inshore mangrove in Fujian, China2. Genomic DNA of YL28 was extracted using an TIANamp Bacterial DNA kit following the protocols recommended by the manufacturer. Genome sequencing was performed by Shanghai Majorbio Bio-Pharm Technology Co. (China) using the Illumina HiSeq2000 sequencer system with a 500 bp pair-end library. The reads were assembled using SOAPdenovo v2.04. Putative protein-encoding genes were identified using Glimmer 3.0243. Annotation was performed by BLAST +2.2.24 searching against databases, including the National Center for Biotechnology Information (NCBI), Clusters of Orthologous Groups of Proteins (COG), the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene ontology (GO). The genome sequence of M. gracile YL-28 was deposited in the GenBank database under the accession number LSYU0000000044.

Acquisition and re-annotation of the selected genome sequences

The genome sequences of other 35 sequenced purple bacteria were obtained from NCBI database which assessed level was complete. To avoid the possible deviations due to different annotation methods, we used Rapid Annotation using Subsystem Technology (RAST) server for reannotation45 of the selected genomes. Glimmer algorithm was used for gene calling.

Phylogeny

Three different methods were used for constructing the phylogeny trees. The 16S rDNA sequences of the selected microorganisms were first used to infer phylogenies using the Neighbor-Joining (NJ) method of MEGA 6.0.646. The core genomes for these selected genomes were next clustered by USEARCH and the phylogeny trees of core and accessory genomes were constructed by Neighbor-Joining algorithm using the BPGA-1.347. The 36 genome sequences were aligned by the progressive MAUVE to generate a phylogenetic guide tree48.

Comparisons of conserved and variable regions

The genome of YL28 was used as a reference sequence to align all other genomes of 35 purple bacteria. Similarly, the genome of R. palustris CGA009 was selected as a reference to align those in Rhodopseudomonas, Blastochloris and Rhodoplanes; the genome of R. sphaeroides 2.4.1 was used as a reference to align the genomes of two genera Rhodobacter and Rhodovulum; R. rubrum ATCC11170 was selected as a reference to align the genomes of Rhodospirillum genus. The multiple whole–genome alignment was conducted using the progressive alignment algorithm implemented in MAUVE. Syntheny plots were generated by aligning regions of the predicted and reference genomes that differed by default parameter. All regions were aligned and displayed in MAUVE version 2.3.148. The circular genomic map was generated by the BLAST Ring Image Generator (BRIG, version 0.95)49 using alignment reference genome on a local BLAST + basis, with standard parameters (50% lower – 70% upper cut-off for identity and E-value of 10). The ring color gradients correspond to varying degrees of identity of BLAST matches. Circular genomic maps also include the information on GC skew and GC content.

Comparative analyses of core genome, accessory genome and unique genome

In order to depict the core and accessory genome in each genus, a reciprocal best hit search using the BPGA software was performed47. Orthologous clusters (OCs) were assigned by grouping all protein sequences in the 36 genomes using USEARCH based on their sequence similarity (E-value < 10−5, >50% coverage). A series of built-in scripts were used to (i) parse, (ii) upload to the MySQL relational database, (iii) perform a reciprocal best hit analysis to form pairs of sequences, and (iv) normalize the E-values for all the pairs formed. Normalization of E-values was done by averaging all recent ortholog (in paralogs) and dividing each pair of ortholog by the average. Pan-core plot against combinations will give core and pan genome boxplot and dot plot generated using desired number of unique combinations of genomes. Atypical GC analysis will give sequences of core, accessory and unique genes with atypical (extreme) GC content. COG and KEGG distributions of the core, accessory and unique gene families were calculated based on representative sequences. Keywords were used to query for nitrogen metabolism functional in the orthologous families and to calculate the number of matches of those functions, using custom bash commands.

Genomic island prediction

Genomic islands (GIs) were predicted by using IslandViewer350 including IslandPick, IslandPath-DIMOB and SIGI-HMM.

Nitrogen metabolism, sulfur metabolism and stress response analysis

We chose three RAST subsystems for further analyze the nitrogen metabolism, sulfur metabolism and stress response pathways. The key protein sequences51,52,53 responsible for nitrogen metabolism, sulfur metabolism was used as reference sequences to search against the 36 genomes of purple bacteria by BlastP. The obtained sequence by BLAST was further evaluated by the combination of KEGG, RAST, CDART (conserved domain architecture retrieval tool54 and CDD (the conserved domain database)55.