Comparative genome analysis reveals the molecular basis of nicotine degradation and survival capacities of Arthrobacter

Arthrobacter is one of the most prevalent genera of nicotine-degrading bacteria; however, studies of nicotine degradation in Arthrobacter species remain at the plasmid level (plasmid pAO1). Here, we report the bioinformatic analysis of a nicotine-degrading Arthrobacter aurescens M2012083, and show that the moeB and mogA genes that are essential for nicotine degradation in Arthrobacter are absent from plasmid pAO1. Homologues of all the nicotine degradation-related genes of plasmid pAO1 were found to be located on a 68,622-bp DNA segment (nic segment-1) in the M2012083 genome, showing 98.1% nucleotide acid sequence identity to the 69,252-bp nic segment of plasmid pAO1. However, the rest sequence of plasmid pAO1 other than the nic segment shows no significant similarity to the genome sequence of strain M2012083. Taken together, our data suggest that the nicotine degradation-related genes of strain M2012083 are located on the chromosome or a plasmid other than pAO1. Based on the genomic sequence comparison of strain M2012083 and six other Arthrobacter strains, we have identified 17 σ70 transcription factors reported to be involved in stress responses and 109 genes involved in environmental adaptability of strain M2012083. These results reveal the molecular basis of nicotine degradation and survival capacities of Arthrobacter species.

The complete genome sequences of four other Arthrobacter species including three soil strains, Arthrobacter sp. FB24 (FB24), Arthrobacter chlorophenolicus A6 (A6), and Arthrobacter phenanthrenivorans Sphe3 (Sphe3), and a food-processing strain, Arthrobacter arilaitensis Re117 (Re117), were reported. Strains FB24, A6 and Sphe3 were isolated by their ability to degrade dimethylbenzene, 4-chlorophenol, and phenanthrene, respectively. Genome analyses revealed many metabolic and stress-related genes in these environmental strains, which reflected their pollutantdegrading abilities and niche specializations. Strain Re117 was isolated from the surface of Reblochon cheese; its genetic adaptation to the habitat via genes such as those involved in iron acquisition, salt tolerance, and catabolism of fatty acids, amino acids, and lactic acid, which are major carbon substrates present at the cheese surface, was also revealed by genome sequence analysis 32,[34][35][36] .
Although nicotine is a typical N-heterocyclic pollutant and Arthrobacter is one of the most prevalent genera of nicotine-degrading bacteria, studies of nicotine degradation by Arthrobacter species remain at the plasmid level, while the complete genome of another prevalent bacterium, Pseudomonas putida S16, has been sequenced and extensively studied [37][38][39][40][41][42][43] . A catabolic plasmid of 165-kb isolated from Arthrobacter nicotinovorans (formerly classified as Arthrobacter oxydans) was sequenced, and almost all the genes involved in nicotine degradation by the bacterium were shown to locate on it 37,38,44 . Two related genes that were absent from the plasmid are required for molybdenum cofactor (MoCo) biosynthesis; one encodes MoeB, which is essential for ATP-dependent activation of the MoaD subunit of molybdopterin (MPT) synthase; the other encodes a MoaB/MogA homolog, which is probably required for MPT-adenylate formation 38,45 . These genes must locate on the chromosome, and genome sequence analysis will help us to establish this. Furthermore, the survival capacity of nicotine-degrading Arthrobacter strains, which is very important to their application potential, is still unknown. Therefore, it is necessary to obtain and analyze the genome sequence of a nicotine-degrading Arthrobacter strain in order to uncover the molecular basis of its survival capacity.
Many catabolic plasmids are thought to take part in the spread of catabolic traits by horizontal gene transfer between bacteria 46-48 . Recently, Mihasan and Brandsch proposed that the nicotine-degrading genes of A. nicotinovorans were possibly obtained via plasmid pAO1 or via a precursor plasmid from the chromosome of a strain that was probably related to Rhodococcus opacus or another Arthrobacter species. However, Arthrobacter species harboring nicotine-degrading genes on the chromosome have not yet been reported 39 .
Recently, we isolated a new nicotine-degrading strain, Arthrobacter sp. M2012083 (M2012083), from tobacco waste, and obtained and published its genome sequence, but without further analysis 49 . In the present study, we classified this strain as Arthrobacter aurescen by physiological and biochemical identification, 16S rDNA phylogenetic analysis, and genomic analysis; we compared the genomic sequence of strain M2012083 with the six available Arthrobacter complete genomes and the sequence of nicotine-degrading plasmid pAO1, in order to provide a better understanding of the genetic basis of the nicotine-degradation and survival capacities of this genus.

Results
Classification of strain M2012083. The nicotine-degrading strain M2012083 was isolated from tobacco waste and deposited in the China Center for Type Culture Collection (CCTCC; collection number M2012083). Strain M2012083 is gram-positive, aerobic, and asporous. Transmission electron microscopic observation revealed that the cells of strain M2012083 were long rods in their lag phase (6 h) and became coccus-shaped during the log phase (36 h) in nicotine-containing medium. These morphological characteristics are representative of the genus Arthrobacter, but were also detected in members of other genera such as Brevibacterium and Rhodococcus 50 . The results of carbon source oxidation tests, enzyme activity tests, and carbon source utilization tests of strain M2012083 are shown in Table S1, Table S2, and Table  S3, respectively. The 16S rDNA sequence of strain M2012083 was sequenced and submitted to GenBank (accession number KF893303). Nucleotide BLAST searches of the GenBank database and Ribosomal Database Project library (http://blast.ncbi.nlm.nih.gov/Blast.cgi) indicated that the 16S rDNA sequence of strain M2012083 exhibited more than 99% nucleotide acid sequence identities with that of TC1 and Rue61a. We selected 55 16S rDNA sequences from different strains in order to construct a phylogenetic tree by using MEAG5 analysis with the neighbor-joining method 51,52 ; strain M2012083 is closest in phylogenetic tree to type strains of A. aurescens and A. nitroguajacolicus (Fig. 1). Furthermore, genome comparison using the RAST Prokaryotic Genome Annotation Server showed that the genomic sequence of M2012083 was more similar to the complete genome of TC1 (comparison score: 546) than to that of Rue61a (comparison score: 524) (Fig. 2). Based on the above results, strain M2012083 was classified as A. aurescens.
General genome features and comparative genomics. The general features of the seven subject Arthrobacter genomes are summarized in Table 1. Pan-genome analyses, such as an overview of genomic conserved regions, orthologous group categorization, and orthologous relationship analysis, were performed in order to identify conserved and strain-specific CDSs. BLASTatlas tool was used to compare the genomes of the seven Arthrobacter strains and to provide a quick overview of the conserved genomic regions 53 (Fig. 3). The genome of TC1 was used as a reference because it is most similar to the genome of M2012083 and has been thoroughly studied 18,27 (Fig. 1, 2). For the seven different Arthrobacter genomes, 68-79% of protein-coding genes were functionally categorized based on the Clusters of Orthologous Groups (COGs) database 54 (Fig. 4).
Orthologous relationship analysis was performed using the OrthoMCL method 55 . All protein-coding genes of the seven subject strains were clustered into a total of 5,368 orthologous groups with 1,653 conserved groups, while the protein-coding genes of M2012083 were clustered into a total of 3,774 groups (DataSet1, Fig. 5). The strain-specific CDSs of each of the seven Arthrobacter strains, which have no homologous genes in the other six strains, are listed in DataSet2, along with their ORF IDs and annotations.
Survival capacity of Arthrobacter strains. Genome analysis and comparisons showed that more than 100 ORFs related to survival capacity occurred in all subject Arthrobacter strains except Re117, which contained only 74 ORFs (Table 2). ORFs involved in survival capacity are listed in detail for all subject Arthrobacter strains in DataSet4.

Discussion
The assembled genome of M2012083 had approximately 329-fold sequence coverage 49 , with 4,312 protein-coding genes. The genome size of the seven subject Arthrobacter strains ranges from 3.8 to 5.3 Mbp. The smallest genome is that of Re117, which is the only food-processing bacterium among the subject strains, resulting in the fewest protein-coding genes (  www.nature.com/scientificreports the different genomic G 1 C content of the seven studied Arthrobacter strains can provide data on the links between G 1 C content, genetic relationship, and function. For example, N-heterocyclic pollutant-degrading strains M2012083, TC1 and Rue61a, which are close on the phylogenetic tree, show similar G 1 C content, while strains A6 and Re117, which are far away from M2012083 on the phylogenetic tree, show the highest and the lowest G 1 C content, respectively (Table 1, Fig. 1). In the genome of M2012083, there are 1,694 conserved CDSs, most of which are related to cell cycle control, translation, and nucleotide transport and metabolism; conserved CDSs involved in carbohydrate transport and metabolism are the fewest, indicating that the kinds of carbohydrate that the seven subject Arthrobacter strains utilize are very different (Table 1, Fig. 3, DataSet1). In addition, there are 434 strain-specific CDSs in M2012083, including most of the nicotine-degrading genes ( Table 1, Fig. 3, DataSet2). COG comparisons among the seven Arthrobacter genomes showed that the number of genes involved in carbohydrate transport and metabolism (COG G) was highest in the genomes of the six environmental Arthrobacter strains, FB24, TC1, A6, Sphe3, Rue61a, and M2012083, while genes related to replication, recombination, and repair (COG L) were the most numerous in the Re117 genome, in which the percentage of COG G proteins (6.5%) was much lower than that in the other subject genomes (.11.8%); this is probably because strain Re117 locates on the surface of cheese, which is rich in proteins rather than carbohydrates. Furthermore, the numbers of genes categorized as COG C, K, R, S, and T in the Re117 genome are also much lower than those in other subject genomes, which indicates that surviving on the surface of cheese requires less functions related to energy transformation, transcription, or signal transduction. The COGs have highly similar distributions among the six environmental Arthrobacter strains, except for COGs L and N. Genes with COG L annotations, representing replication, recombination, and repair, are more numerous in Sphe3 than in the other five strains. There are 15 genes annotated as COG N in A6, which is far more than that in other strains (only 0 or 1), suggesting that all the subject Arthrobacter strains lacked cell motility, except strain A6. Genes annotated as COG B, the group related to chromatin structure and dynamics, were not found in any of the seven subject genomes (Fig. 4). Comparisons between the M2012083 genome and each of the other six subject genomes indicated that 3,575 orthologous groups in M2012083 were conserved in Rue61a, while 199 groups were specific to M2012083 compared with Rue61a. In contrast, M2012083 and Re117 only share 2,004 conserved groups, and 1,770 groups in M2012083 are specific to M2012083 compared with Re117. There were 3,369 conserved groups among the genomes of the N-heterocyclic pollutant-degrading bacteria, M2012083, TC1, and Rue61a, including 356 groups that were not found in the other four subject genomes. Among the six environmental Arthrobacter strains, M2012083, FB24, TC1, A6, Sphe3, and Rue61a, 2,323 conserved groups were identified, including 670 groups that were not found in the food-processing strain, Re117 (Fig. 5).
All the nicotine degradation-related genes, such as those encoding nicotine dehydrogenase subunits (ndhL, ndhM, and ndhS), 6hydroxy-L-nicotine oxidase (6hlno), 6-hydroxy-D-nicotine oxidase (6hdno), and ketone dehydrogenase subunits (kdhL, kdhM, and  kdhS), and those involved in c-N-methylaminobutyrate catabolism (mgaba), MoCo biosynthesis (moco), assembly and quality control of the (abc) 2 holoenzyme complexes (aqc), compound transport (perm1 and perm2), and gene regulation (tr1, tr2, and tr3), were located on a 68,622-bp DNA segment (nic segment-1) of contig 058 in the M2012083 genome, which showed 98.1% nucleotide acid sequence identity to the 69,252-bp nic segment of pAO1 (Fig. 6,  DataSet3). Furthermore, the only nicotine degradation-related ORFs absent from plasmid pAO1, the two which encode the MoeB protein and a MogA homolog that are essential for MoCo biosynthesis were also identified in the M2012083 genome (1197706.3. peg.1697 and 1197706.3.peg.1089). However, the xerD integrase gene and the IS1473 insertion sequence that contains the resolvase gene are absent from the nic segment-1 of M2012083, while the sequence of the Tn554 transposon is incomplete within it (Fig. 6,  DataSet3). Plasmid pAO1 sequences other than the nic segment, which contain all the plasmid-function genes, show no significant similarity to the M2012083 genome sequences (Fig. 6). Most of the  pAO1 ORFs involved in plasmid function, such as the ORFs involved in replication (CDS_28 and CDS_120), partitioning (PAR_1 and CDS_9), maintenance (CDS_21 and CDS_22), conjugation (CDS_5, CDS_6, CDS_83, and CDS_92), stress response (CDS_7), abortive infection phage resistance (CDS_118), cell-cell contact (CDS_18), and cell signaling (CDS_113), and many predicted membrane proteins that may participate in conjugation (CDS_8, CDS_14, CDS_30, CDS_31, CDS_84, CDS_85, CDS_86, CDS_87, CDS_88, CDS_89, CDS_90, CDS_91, and CDS_114), were not found in the M2012083 genome (DataSet3) 37,38 . These data suggest that the nicotine degradation-related genes of A. aurescens M2012083 are located on the chromosome or a plasmid other than pAO1. It has been reported that alternative s-factor induction is an important strategy for bacteria to deal with environmental stress 27     similar to that in the other two N-heterocyclic pollutant-degrading Arthrobacter strains (TC1 and Rue61a) and considerably more than that in strain Re117, which only encodes six ( Table 2). The expression of universal stress-related proteins (USPs) is induced in cells in response to heat shock, oxidant and UV exposure, carbon, nitrogen, and phosphate starvation, and entering stationary phase 60 . As in strain TC1, M2012083 contains eight potential USPs, far fewer than FB24 and Rue61a (15 and 13, respectively) but more than Re117 Figure 6 | Comparison among the genome of M2012083 and plasmid pAO1. The synteny plot of plasmid pAO1 versus M2012083 genome was generated by using MUMmer program. Red ringlets indicate the forward conserved genes between M2012083 genome and plasmid pAO1; blue ringlets indicate the reverse conserved genes between M2012083 genome and plasmid pAO1. The names, arrangements and functions of genes, and the nicotine-degrading pathway in Arthrobacter are shown according to Ganas P, Igloi GL and Brandsch R 38 . The rectangles in different colors represent gene clusters involved in carbohydrate catabolism (ch), plasmid function (plf), c-N-methylaminobutyrate catabolism (mgaba), assembly and quality control of the (abc) 2 holoenzyme complexes of NDH and KDH (aqc), nicotine degradation (nic1 and nic2), molybdenum cofactor biosynthesis (moco) and transposons (Tn554), and some insertion sequences (IS and IS1473). ORFs were indicated by arrows, and the hollow arrows represent hypothetical protein genes (hyp). See text and DataSet3 for details.  M2012083  35  17  8  35  29  21  4  12  109  FB24  29  13  15  33  31  18  3  11  111  TC1  34  17  8  32  30  19  3  12  104  A6  28  12  11  34  30  20  6  11  112  Re117  22  6  6  20  26  15  3  4  74  Sphe3  27  10  11  30  32  19  6  10  108  Rue61a  35  18  13  35  31  20  3 DataSet4). In addition, ORFs involved in osmotic and oxidative stress responses, cold-and heat-shock responses, detoxification, and carbon starvation are present, and distributed differently, among strain M2012083 and the other subject Arthrobacter strains ( Table 2, DataSet4). Trehalose, glycogen, and glycine betaine can help bacteria tolerate osmotic stress [61][62][63] . Many ORFs associated with the synthesis and catabolism of trehalose and glycogen were identified in the six environmental Arthrobacter strains, while no trehalose synthase or glycogen debranching ORFs were identified in strain Re117. These findings suggest that all the subject strains, except Re117, can synthesize or catabolize trehalose and glycogen in order to regulate intracellular osmotic pressure when necessary. Although they have different choline-intake abilities, all seven Arthrobacter strains can use exogenic choline as a substrate for synthesizing betaine in order to maintain osmotic balance. An operon that includes the choline dehydrogenase gene (betA) and betaine aldehyde dehydrogenase gene (betB), and a gene cluster that encodes the four subunits of sarcosine oxidase, are present in all seven Arthrobacter genomes. A second copy of betA was identified in FB24, TC1, and A6, while a second copy of betB is found only in FB24 and A6. Many glycine betaine transporter genes, such as clusters of the glycine betaine ABC transport system permease and opuD genes, were also present in all seven Arthrobacter genomes. Only strains FB24 and A6 contained both the choline binding protein A gene and high-affinity choline uptake protein (BetT) gene; only strains M2012083, TC1, Sphe3, and Rue61a contained the choline binding protein A gene, while only strain Re117 contained the betT gene. Moreover, in strain M2012083, osmotic stress may be reduced by the regulation of water movement, glycerol uptake, and osmoregulated periplasmic glucan synthesis, via aquaporin Z, the glycerol uptake facilitator protein, and opgC, respectively (DataSet4).
Three to five cold-shock protein cspA genes were found in environmental Arthrobacter strains (M2012083, FB24, TC1, A6, Sphe3, and Rue61a), while only one was found in the food-processing Arthrobacter strain, Re117. Every subject genome contained an ORF encoding cold-shock protein CspC, except the genome of Rue61a. In addition, all the Arthrobacter strains contained a dnaK operon (encoding chaperone proteins DnaJ and DnaK, and heat shock protein GrpE), an hspR gene encoding a transcriptional repressor of the dnaK operon, a heat-inducible transcription repressor hrcA gene, and several other genes encoding heat-shock response proteins. Interestingly, an additional one or two dnaJ, dnak, and grpE genes were identified in the M2012083 genome, but these genes were not located in an operon; similar data were observed in strain Rue61a (DataSet4).
Selenium is essential for living organisms, at low concentration, and occurs extensively in the Earth's crust. However, in aerobic environments, selenium occurs as selenate and selenite, which are toxic and mutagenic at excess concentration. The dedA gene is reported to be very important for the regulation of selenite uptake in a high selenite-resistant bacterium, Ralstonia metallidurans CH34 65 . Three dedA genes were identified in strain M2012083 and two in FB24, TC1, A6, Sphe3, and Rue61a, while only one dedA gene was identified in Re117. However, only the ATP-binding component of the selenite uptake transporter is present in all the subject strains, while the periplasmic substrate-binding and permease components of the ABC transporter exist only in strains A6 and Sphe3 (DataSet4). D-Amino acids may be toxic to growing cells because they may be mistakenly attached to tRNAs. D-Tyrosyl-tRNA Tyr deacylase can remove misattached D-amino acids from tRNAs, thus detoxifying D-amino acids 66 . A Re117-specific D-tyrosyl-tRNA Tyr deacylase ORF was detected, which suggests that Re117 is capable of detoxifying D-amino acids, while other subject Arthrobacter strains are not (DataSet4).
In addition to s 70 factors, s B and RpoS family members, which are alternative s factors for RNA polymerase transcription, play major roles in the general stress response and environmental survival of bacteria. s B activity is inhibited by anti-s B factor RsbW, while it is stimulated by serine phosphatase RsbU and anti-s B factor antagonist RsbV 67 . Strains M2012083, FB24, TC1, A6, and Rue61a contain genes that encode RNA polymerase s factor SigB, RsbU, and RsbV, while strain Sphe3 contains genes that encode SigB and RsbU. In contrast to other subject Arthrobacter strains, strain Re117 does not contain any genes involved in s B stress-response regulation. The rsbW gene was not detected in any of the subject strains (DataSet4).
During the response to carbon starvation, rpoS gene expression is inhibited by RpsA, which is a starvation-sensing protein 68 . The rpsA gene and a carbon starvation protein A gene were identified in strain M2012083 and the other environmental subject strains. In contrast, genes involved in the carbon starvation response were not detected in strain Re117 (DataSet4). Expression of the rpoS gene is also regulated by homoserine lactones (HSLs) and a derivative thereof 68 . Genes encoding aspartate kinase, aspartate semialdehyde dehydrogenase, and homoserine dehydrogenase, which has been suggested catalyze HSL synthesis from L-asparate 27 , were identified in the M2012083 genome (1197706.3.peg.3479, 1197706.3.peg.1452, and 1197706.3.peg.2912, respectively) and the other six Arthrobacter genomes (DataSet1). Based on all the genome analyses, strain Re117 contains the fewest carbohydrate metabolism and stress-response genes of the seven subject Arthrobacter strains; this is probably because the habitat of Re117, the surface of cheese, is much more stable than soil.

Methods
Strain identification. The M2012083 strain was cultured at 30uC and 220 rpm in lysogenic broth or nicotine medium, as previously described 69 . Morphological characteristics were observed using a transmission electron microscope. Physiological and biochemical properties, such as the oxidation and utilization of different carbon sources, and the activities of different enzymes were determined by the CCTCC.
The 16S rDNA of strain M2012083 was amplified by PCR with the universal primer pair of 27F (59-AGAGTTTGATCCTGGCTCA-39) and 1492R (59-GGTTACCTTGTTACGACTT-39) from extracted genomic DNA. The PCR product was purified and sequenced, then analyzed by homology alignment using the BLAST search program (http://www.ncbi.nlm.nih.gov/BLAST.html). A phylogenetic tree was constructed by the neighbor-joining method using MEGA5 analysis 51,70 .
Genome sequencing and annotation. The genomic DNA of M2012083 was extracted using the Wizard Genomic DNA Purification Kit (Promega, Madison, WI, USA). Whole-genomes of strain M2012083 were sequenced by the Chinese National Human Genome Center (Shanghai, China). The Velvet program was used to assemble the pair-end reads de novo with manually optimized settings. The genome sequences of FB24, TC1, A6, Re117, Sphe3, and Rue61a were obtained from GenBank. The genomes of all these Arthrobacter strains were submitted to the RAST web service for automated annotation followed by manual checking. The annotations are available via a guest account at the RAST website. Information on COGs and from the Conserved Domain Database was also used for comparisons.

www.nature.com/scientificreports
Genome comparison and analysis. The National Institutes of Allergy and Infectious Diseases/Pathosystems Resource Integration Center (PATRIC) and RAST were used for genome comparisons and metabolic pathway analyses. The tBLASTn program was used to search for homologs of plasmid pAO1 protein-coding genes in the M2012083 genome with coding amino acid sequence identity . 30% at an e-value , 1e-5. The MUMmer program was used to generate a synteny plot of plasmid pAO1 versus the M2012083 genome. The OrthoMCL method was used to analyze the orthologous relationships between protein-coding genes in different genomes. Genes in different genomes with coding amino acid sequence identity . 50% at an e-value , 1e-5 were identified as orthologous genes. We used the CRISPRFinder program in CRISPR's web server to identify clustered regularly interspaced short palindromic repeats in the CRISPR/Cas RNA-guided nuclease system 71 . The genomic context was visualized using the Circos software.