Effect of over expressing protective antigen on global gene transcription in Bacillus anthracis BH500

Protective antigen (PA) of Bacillus anthracis is being considered as a vaccine candidate against anthrax and its production has been explored in several heterologous host systems. Since the systems tested introduced adverse issues such as inclusion body formation and endotoxin contamination, the production from B. anthracis is considered as a preferred method. The present study examines the effect of PA expression on the metabolism of B. anthracis producing strain, BH500, by comparing it with a control strain carrying an empty plasmid. The strains were grown in a bioreactor and RNA-seq analysis of the producing and non-producing strain was conducted. Among the observed differences, the strain expressing rPA had increased transcription of sigL, the gene encoding RNA polymerase σ54, sigB, the general stress transcription factor gene and its regulators rsbW and rsbV, as well as the global regulatory repressor ctsR. There were also decreased expression of intracellular heat stress related genes such as groL, groES, hslO, dnaJ, and dnaK and increased expression of extracellular chaperons csaA and prsA2. Also, major central metabolism genes belonging to TCA, glycolysis, PPP, and amino acids biosynthesis were up-regulated in the PA-producing strain during the lag phase and down-regulated in the log and late-log phases, which was associated with decreased specific growth rates. The information obtained from this study may guide genetic modification of B. anthracis to improve PA production.

profiling also showed that bottlenecks can develop in different pathways in E. coli depending on the type and behavior of the recombinant protein, e.g. interferon-β (inclusion body), xylanase (soluble), and GFP (soluble) 16 . Marciniak et al. showed that by using 8 different recombinant proteins that B. subtilis gene responses depended on both the origin of the proteins (endogenous vs. heterologous) and on their cellular localization (secreted, membrane, lipid anchored) 19 . At the present time, there is no information on gene expression patterns in B. anthracis expressing recombinant proteins at high cell density. Most transcriptome studies in B. anthracis have focused on networks involved in host-pathogen interactions 20,21 and metabolism 22 . The present study seeks to identify plausible bottlenecks restricting overexpression of PA protein by analyzing the whole genome transcriptional changes in producing and non-producing (control) recombinant BH500 strains grown in a bioreactor. Preliminary studies showed that genes present in the backbone of the empty pSW4 vector cause a significant decrease in the growth rate when compared to the untransformed BH500 strain. Therefore, to identify transcriptional changes caused specifically by PA expression, we compared strains containing either the pagA gene-containing pYS5 plasmid or the empty parental vector pSW4. Changes in gene expression were determined for bioreactor-grown cultures sampled in lag, log, and late-log phases. The differences seen in essential pathways required for protein expression including: central carbon metabolism, amino acid biosynthesis, transcription, translation, folding and secretion, were evaluated to identify plausible bottlenecks. The genes identified provide targets for genetic engineering to increase the effectiveness of B. anthracis strains as production hosts 11 .

Results
Growth of B. anthracis BH500 expressing and not expressing recombinant protective antigen (rPA). Growth parameters of two B. anthracis BH500 strains, one harboring plasmid pYS5 expressing PA and the other harboring control plasmid pSW4, are shown in Fig. 1(a). PA expressing and non-expressing cultures were grown without kanamycin selection pressure, where plasmid stability tests confirmed the culture purity and no generation of non-recombinants. Kanamycin was avoided since previous studies showed a decrease in the growth rate of cultures growing in the presence of kanamycin compared with cultures without kanamycin. The specific growth rates of the two cultures are seen in Fig. 1(b). The strain expressing PA reached a maximum of 0.8 h −1 and then declined as the culture OD 600 exceeded 10, whereas the control strain reached a maximum specific growth rate of 1 h −1 that started to decline as the culture reached OD 600 ~ 6. The highest PA expression at the end of the batch run was ~180 mg/L. The lag, log, and late-log growth phase samples of PA expressing, and non-expressing culture were processed with live/dead cell assay, which showed no significant difference and thus PA expression had no significant effect on cell viability.
Transcriptome analysis of the PA-producing and non-producing strains during lag, log, and late-log phases. Gene transcription analyses of the strain producing recombinant PA and the control strain at lag, log, and late-log growth phases were performed by quantifying transcript levels using RNA-seq (Table 1). RNA samples from lag, log, and late-log phase cultures were converted to cDNA and sequenced on the Illumina Hiseq 2500 platform. Triplicate data were obtained from the biological replicates of the three growth phases of PA expressing and non-expressing cultures. Average read quality was close to ~40% in all samples and fraction of no calls (%N) was 0% in all samples. The single end sequencing was done for 50 base pair amplifications and the total number of reads for each sample was above the acceptable range. The unaligned reads were then trimmed to remove the low-quality bases, identified by the probability that they are called incorrectly. All samples were ~97% uniquely aligned with the reference genome and seem to be of satisfactory quality. Principal component analysis (PCA) showed that the principal gene components in the biological triplicates of RNAseq data were distributed close to each other, representing a higher correlation ( Fig. 2(a)). PCA also represented a significant difference in the overall gene expression distribution among the biological triplicate sets of samples from PA expressing and non-expression cultures. By aligning transcripts with the reference B. anthracis Ames ancestor genome (AE017334.2), transcripts for 5507 genes were identified. Statistical analysis using ANOVA was applied   on the 5507 genes, which identified 3212 differentially expressed genes. By applying further filtering criteria of p value ≤ 0.05 and fold change (FC) cut-off <2>, 31 up-regulated and 43 down-regulated genes were identified in the lag phase of the PA-producing culture as compared to the control. The analysis identified 54 up-regulated and 47 down-regulated genes in log phase, 44 up-regulated, and 154 down-regulated genes in the late-log phase ( Fig. 2(b)). A small fraction (2.5-3%) of the transcripts did not align with the B. anthracis reference genome. Many of these transcripts aligned with the plasmid sequence and specifically with the neo, repB and bla genes. As expected, pagA transcripts (from rPA gene) were found only in the PA-producing culture. There was 1.36-fold increase in the number of pagA transcripts in the PA-producing culture from lag phase to log phase, which then decreased as the culture progressed to late-log phase.
Transcriptional changes specific to lag, log, or late-log phases. To characterize the distribution of genes expressed differentially in the three growth phases in the producing vs. the non-producing strains, Venn analysis was conducted, producing the result shown in Fig. 3. Twenty-four genes were differentially expressed in all three growth phases. Thirty-seven genes were differentially expressed only in the lag phase and 11 genes were shared between the lag and log phases. Only 2 genes were differentially expressed in the lag and the late-log phases, which clearly shows the distinct differences in gene transcription during the different phases. The log phase had a higher number of differentially expressed genes; there were 60 genes uniquely present in this set. The log phase gene set shared 11 and 6 genes with the lag and late-log phase sets, respectively. The late-log phase gene set was the largest of the three comparison sets where 165 genes expressed differentially, while just 2 and 6 differentially expressed genes were shared with the lag and log phases, respectively. The specific genes in each set, along with their log 2 fold change expression values, up and down-regulation state and their protein product are summerized in Table 2.
Over-representation of differentially expressed genes in various metabolic pathways. Pathway enrichment analysis on the differentially expressed genes from each growth phase was done to identify the effects of PA expression. Bar graph presentation of the important pathways containing differentially expressed genes is shown in Fig. 4, along with the enrichment scores (only pathways with scores >1 are shown). As the cultures progressed from lag to log and late-log, more pathways showed differentially expressed genes. Pyrimidine metabolism was among those most affected, since genes from this class (such as pyrC, pyrD, pyrE and pyrF) had high enrichment scores in the lag phase. These genes remained overrepresented in log as well as late-log phases together with the carB, which encodes a subunit of carbamoyl phosphate synthase and GBAA_3162 (5′-nucleotidase) respectively. Amino acid metabolism which is critical for cell growth was found to be up-regulated in the lag phase, likely to support increased requirements for amino acid for PA biosynthesis. Specifically, alanine, aspartate and glutamate metabolism, tryptophan metabolism, and phenylalanine, tyrosine and tryptophan biosynthesis genes, were overrepresented in the lag phase data. The down-regulation of the amino acid biosynthesis genes in log and late-log phase probably slowed down the overall amino acid metabolism. Lag phase data was mostly overrepresented with energy and central carbon metabolism pathways such as oxidative phosphorylation, glycolysis, and TCA. It is important to mention that in the brief period when the culture progressed from log phase to late-log phase, the number of pathways with enrichment score >1 increased more than three times: 198 genes were differentially expressed in the late-log phase (the largest number of genes compared with the other growth phases). The outcome was enrichment scores >1 in 28 different metabolic pathways (Fig. 4(c)) (The full data set  generation, inducing genes such as ctaD and ctaE for cytochrome c oxidase subunit I & III; and nuoD, nuoH, nuoI, nuoJ, nuoK, nuoL, nuoM, and nuoN encoding different subunits of NADH dehydrogenase; qseA and qseB coding menquinol-cytochrome c reductase iron sulfur subunit, and sdhC of succinate dehydrogenase. In addition, central carbon metabolism pathways were also overrepresented in the late-log phase. This included fbp2, GBAA_2222, GBAA_4896, pdhA, and pdhB genes from glycolysis pathways, acnA, GBAA_1862, pdhA, pdhB, pyc, sdhC, and sucC from TCA cycle, and dra, fbp2, and tkt1 from the pentose phosphate pathway. Quorum sensing, sulfur metabolism, and cysteine and methionine metabolism genes were also differentially expressed in late-log phase (Supplementary Table 1).

Discussion
As shown in the results section, the cells expressing PA grew more slowly than the control cells. This could be due to the diversion of resources to protein expression and/or interference with cell surface processes by high protein secretion. These stresses are expected to lead to changes in gene transcription. The changes identified by RNA-seq in pathways that play vital roles in cellular physiology are summarized here. Amino acid biosynthesis. Although the cells were growing in the presence of amino acids and short peptides, high regulation of amino acid biosynthesis genes such as ilvN (3.02), tyrA (1.83), and GBAA_2958 (1.90) was observed in the lag phase of the PA-expressing cells, probably to fulfil the requirement for precursor amino acids needed for PA expression. As the culture advanced into the log and late-log phase, amino acid biosynthesis slows down in the producing culture compared with the control (Fig. 6). The down-regulation of amino acid transcript limited the supply needed for cellular building block and PA synthesis, pointing towards possible bottleneck where the supply of molecules needed for PA translation is decreased. This was followed by declining rate of recombinant PA expression in log and late-log phase. The slowdown in amino acid biosynthesis with the decreased energy metabolism was associated with the decline in the specific growth rate. . But later, during the log and the late-log phases more than 95% of the tRNA transcripts were down regulated. Aminoacyl-tRNA synthetase GBAA_1424 (hisZ, ATP phosphoribosyltransferase) play a regulatory role and is an essential component of histidine biosynthesis 23 . The PA expressing culture showed up-regulation of GBAA_1424, (1.55 in lag, 1.85 in log and 2.52 in late log phases), an indication that this gene is essential for PA biosynthesis. Some RNA polymerase transcripts such as rpoA, rpoB, rpoC, and rpoE, were down-regulated throughout PA expression. In addition, a mechanism to keep transcription active was observed by up-regulation of the rpoZ transcript in the late-log phase, which facilitates the interaction between alpha and beta subunits of RNA polymerase enzyme assembly. During the lag and the log phases of the PA expressing culture, there was also up-regulation of sigL. This gene encodes the RNA polymerase factor, Sigma 54, that enhances expression of nitrogen assimilation and metabolism genes 24,25 that reflected in faster growth during lag to log phases. Another study has shown the existence of a catabolite responsive element (CRE) site in sigL where transcription factor CcpA (catabolite control protein) binds and blocks mRNA synthesis 26 . Similar regulation was observed in our data where sigL expression was higher because its transcriptional repressor ccpA was relatively less expressed in the PA producing culture. The general stress transcription factor, sigB, was expressed differentially in PA expressing culture. sigB together with its The large dot represents the average (mean) of all data values for subsystem genes belonging to a pathway while small dots represent a data value for an individual subsystem/gene) (generated using EcoCyc omics dashboard). Blue color represent lag phase (pYS5) vs lag phase (pSW4), orange color represent log phase (pYS5) vs log phase (pSW4) and gold color represents late-log phase (pYS5) vs late-log phase (pSW4).  27 . sigB was found to be induced in B. subtilis at the stationary phase while it remains unaffected in B. anthracis 28 . In this work sigB was up-regualted throughout the growth (lag 2.5, to log 3.0 and latelog 2.7) in the PA expressing strain, together with the regulators of sigma B operon, rsbV and rsbW. This suggests that sigB can also be induced, due to stress features associated with PA expression rather than just at stationary phase. In addition, many other stress related genes were up-regulated in the PA expressing strain such as htrA, dps, GBAA_1113, GBAA_0326, GBAA_5290 and GBAA_4875.

Glycolysis
Protein folding and protein synthesis associated processes. Protein expression from high copy number plasmids may generate large amounts of nascent polypeptide chains which can outpace the supply of protein folding resources. It was reported, that when recombinant proteins are expressed in Gram positive bacteria, the heat shock proteins (HSPs) were up-regulated 29 to provide chaperones to assist in protein folding. PA is transported across the cytoplasmic membrane unfolded and later folded by extracellular chaperones. We also observed that grpE, groEL/ES, and dnaK were down-regulated in PA expressing culture. It is possible that this down-regulation is mediated by HrcA repressor 30,31 . However, since hrcA was down-regulated, it cannot be responsible for the repression of groEL/ES and dnaK. Therefore, if up-regulated HSPs are an indicator of the stress, it is possible that the data presented in this work indicates that PA expression does not impose intracellular folding stress in the BH500 strain. This was strengthened with the finding that the clp genes (encoding Clp ATP dependent proteases) that have a role in stress survival 32 , were down-regulated in PA expressing culture. It is possible that the clp genes were down-regulated because of the regulatory repression of ctsR, stress and heat shock repressor belonging to class III heat shock response in Gram positive bacteria 33 . The ctsR gene was up-regulated in lag (4.03), log (4.53) and late-log (4.91) in PA expressing culture (Fig. 7).

Extracellular secretion. Secretion of expressed proteins is a characteristic of the Bacillus species 34,35 , so
it is not surprising that csaA, a key protein secretion chaperone, was up-regulated in the PA expressing strain throughout the growth. It was reported that csaA expression prevented in-vitro aggregation of thermally inactivated luciferase 36 by interaction with secreted precursor proteins 37 . Although expression of csaA in E. coli was shown to prevent growth and secretion issues 38 this was not the case in the B. anthracis. It is possible that in the Bacillus, this gene is responsible for maintaining efficient PA secretion and thereby prevents the impact of protein accumulation on cellular health and growth. Another protein secretion chaperone, PrsA, was reported to play a crucial role in cell viability, folding, and expression of alpha-amylases, and that an increased prsA copy number led to a six-fold increase in alpha-amylase secretion in B. subtilis 39 . There are three prsA homologues in B. anthracis as compared to just one copy of prsA genes in B. subtilis. The three homologues in B. anthracis have shown to have a different effect on the rPA yield and it was suggested that it might be because of the their different or overlapping substrate specificities 40 . It is possible that they have completely different functions and selecting the right homologues for manipulation could lead to significant improvement in the rPA yield. In the present study, prsA1 and prsA2 were up-regulated while prsA remained down-regulated in the PA producing culture. The expression trends of prsA1 and prsA2 were opposite; prsA1 was highly up-regulated in lag-phase (6.6), which later decreased to 1.6-fold in late-log phase, and prsA2 was less up-regulated (1.8) in the lag-phase, and then increased to 6.1 in the late-log phase (Fig. 7). The higher expression of homologue prsA2 during higher expression of rPA in the late log phase suggests its key role in increasing production level of properly folded protein, therefore this gene could be a candidate for further improving PA expression.
Transporter genes. Transporter genes that are responsible for nutrient intake and for product secretion were expressed differentially in the PA-expressing bacteria compared to the control strain (Supplementary Table 2). In the PA expressing bacteria, the heavy metal transporting ATPase GBAA_0595 involved in maintaining heavy metal homeostatis 41,42 was up-regulated until the late-log phase (2.6) while the GBAA_3859 and GBAA_0410 were down-regulated, indicating reduced cellular health. Glycine betaine/L-proline ABC transporter, permease and substrate-binding protein GBAA_2280 (1.5) remained up-regulated in the PA expressing strain during lag, log, and late-log phase, while glycine betaine transporter gene opuD1 (1.1), glycine betaine/L-proline ABC transporter, and ATP-binding protein proV1 (1.5) were only up-regulated in the late-log phase. The up-regulation of these transport membrane proteins supports the import of osmolytes like glycine betaine and proline from the culture medium, which assist in maintaining proper osmolarity that prevents misfolding or aggregation of expressed proteins 43,44 . Although PA is secreted, it is likely that its increased biosynthesis during late-log phase required more osmolarity adjustment, which was reflected in up-regulation of an additional number of osmolyte transport genes. In addition, the ABC transporter genes which facilitate import of small peptides by carrying out ATP hydrolysis 45,46 and transfer nutrients from culture broth and support growth of bacteria 47 were differentially expressed. For example, the ATP binding protein GBAA_0384, was up-regulated (>3 fold) in lag, log, and late-log phases of PA expressing culture compared with the control. At the same time, the expression level of amino acid ABC transporter substrate-binding protein GBAA_0855, amino acid ABC transporter permease GBAA_0856, and amino acid ABC transporter ATP-binding GBAA_0857 protein, were down regulated −4.37, −6.33, and −8.16 respectively. Also, the complete operon of oligopeptide ABC transporter permease that includes five genes: GBAA_0231, GBAA_0232, GBAA_0233, GBAA_0234, and GBAA_0235 was down-regulated. The decrease in expression of amino acid and oligopeptide ABC transporters indicates the decline in the culture capability to import nutrients, which directly affects the cellular health and recombinant PA protein expression.

Conclusion
The effect of protective antigen expression in Bacillus anthracis on the profile of the bacterial transcriptome is presented in this work. The expressing strain was compared with strain carrying empty plasmid at all three growth phases. This unique approach eliminated the background noise caused by antibiotics, expression of plasmid genes and natural growth phase transition. By using Venn diagram among the three growth phases of rPA expressing strain comparing it to the non-expressing strain, a set of 24 genes, mostly not well characterized, were found to be differentially expressed in all three growth phases. These genes were not differentially expressed in the control strain, representing their unique connection with expression of rPA (Fig. 3). Log to late-log phase transition was associated with decline in specific growth rate of the expression culture and down-regulation of several biosynthetic pathways. Especially six genes (GBAA_1162, hemH2, carB, GBAA_2459, sigB, GBAA_4072) were found to be differentially expressed in the log and late-log phase of rPA expressing culture (Table 2). A schematic of major metabolic pathways and functions affected in B. anthracis due to rPA expression are presented in Fig. 8. The upregulation of sigL the gene encoding RNA polymerase sigma factor 54 is especially important, it increased the expression of nitrogen assimilation and metabolism genes that likely improve growth rate and rPA expression. Increased expression of the down-regulated sigL gene in the late-log phase might support improved growth rate and rPA expression. In addition, up-regulation of the extracellular secretion chaperones csaA and prsA may increase the rPA export. The csaA was shown to be effective in solving secretion issues in E. coli 38 , while increased expression of prsA lead to improved expression of alpha amylase and rPA in B. subtilis 39 and B.anthracis 40 respectively. This study predicted prsA to be an important target which matched with the previous findings 3 . This study also identified that only two out of three prsA variants i.e. prsA1 and prsA2 might be good candidates for improved rPA expression in B.anthracis ames strain.

Materials and Methods
Strains and plasmids. B. anthracis Ames BH500 strain 11 Supplementary Fig. S1. The plasmid pYS5 contains original pagA promoter from pXO1, including 162 bp of B. anthracis DNA upstream of the PA start codon. This promoter comprises both P1 and P2 start transcription sites identified by Koehler et al. 48 . The pagA and the upstream region were inserted into pUB110 plasmid downstream of kanamycin and truncated bleomycin resistance genes, producing pYS5 plasmid. Both antibiotic resistance genes contain own promoters. Plasmid pSW4 represents pYS5 plasmid with deleted pagA gene 49 . The strain was transformed with plasmid pYS5 50  Sample collection. Samples were collected at three different growth phases: lag phase (OD 3), log phase (OD 10), and late-log phase (OD 16). All dilutions and washes to store cells for further analysis were done in PBS containing 1 mM CaCl 2 and 2 mM MgCl 2 . Each sample was measured after dilution to an 0.2-0.5 OD 600 . Based on these readings, and within 3-5 min, a sample of 3-10 ml was removed and immediately diluted to give >30 ml at a calculated A 600 = 1.0. This diluted cell suspension was treated differently for different analysis: (i) for RNA-seq: 4.0 ml was added into tubes containing 40 ml RNAprotect (Qiagen, Valencia, Calif.), and the contents were immediately mixed and incubated for 5 min at room temperature. The cells were concentrated by centrifugation at 4000 g, supernatant discarded and the pellet was suspended in 1 ml TRIzol Reagent (GIBCO BRL, Invitrogen), quickly frozen in dry ice and stored at −80 °C; (ii) for protein content determination: 1 ml was pelleted in microfuge tubes, washed with PBS, and the pellets were frozen for BCA assay; *(iii) for determination of plasmid retention, cfu, and purity: two 1 ml samples were diluted to 10 4 -and 10 6 -fold and 100 µl was plated on LB agar plates (no antibiotics), and isolated colonies were streaked to LB agar plates with and without 50 µg/ml kanamycin; (iv) for viability measurement: two 1 ml samples collected at mid-log and late-log time points were assayed with a live/ dead dye stain, (v) for PA quantitation using SDS-PAGE: undiluted 1 ml sample was centrifuged and supernatant was collected, filtered through 0.22 µm and stored at −20 °C.
Plasmid stability test. Time point samples from each run were collected and diluted 10 4 -and 10 6 -fold into PBS and 100 µl from each was plated on LB plates with and without kanamycin (50 µg/ml). Each colony from the non-antibiotic plate was re-streaked on kanamycin plate (50 µg/ml). LB plates were incubated at 37 °C overnight.
Quantitation of total protein and PA. Total    RNA-seq data processing. PartekFlow (Partek Inc., St Louis, MO, USA) was used for the bioinformatic analysis. The QA/QC of these unaligned reads were checked to get an insight into the sequencing quality and the average read quality was close to ~40%. The unaligned reads were trimmed to remove low-quality bases from the 3-prime end and checked for quality. These reads were aligned with the reference organism B. anthracis Ames ancestor AE017334.2, using Bowtie2 method, which resulted in >97% alignment for each sample. The annotation model file from Ensembl was used to quantify transcriptome where we used the Partek E/M algorithm. PCA was performed to look for the distance between the sample populations, which showed that replicates for a given time point were closer to each other than to replicates of other time points. Before comparing different time point data, normalization was performed to minimize the impact of possible sources of systemic variations, e.g. sequencing depth, gene length, composition, or technical variations. Normalization was performed using the transcripts per million (TPM) method, and then +1 value was added to expression value before taking log 2 . Three biological replicates for each time point sample were used to perform statistical tests to identify differentially expressed genes.
Pathway and network analysis. Pathway enrichment analysis on the differentially expressed genes was done by integrating B. anthracis Ames KEGG pathway database with the Partek Pathway tool in Partek Genomics Suite (Partek Inc., St Louis, MO, USA). The pathway enrichment, as well as its ranking were analyzed based on p-value and fold-change values. EcoCyc database and previous literature was also used to look for gene functions and their possible interconnections. Since many genes in the B. anthracis Ames ancestor genome are not annotated, BLAST was also used to find homologous genes and infer function 53 .
Data submission. The RNA-seq raw and processed data files were deposited in the NCBI Gene Expression Omnibus (GEO) database with the following accession number: GSE108973.

Data Availability
RNA-seq data are available in the NCBI Gene Expression Omnibus (GEO) database with the following accession number: GSE108973.