The propagation of perturbations in rewired bacterial gene networks

What happens to gene expression when you add new links to a gene regulatory network? To answer this question, we profile 85 network rewirings in E. coli. Here we report that concerted patterns of differential expression propagate from reconnected hub genes. The rewirings link promoter regions to different transcription factor and σ-factor genes, resulting in perturbations that span four orders of magnitude, changing up to ∼70% of the transcriptome. Importantly, factor connectivity and promoter activity both associate with perturbation size. Perturbations from related rewirings have more similar transcription profiles and a statistical analysis reveals ∼20 underlying states of the system, associating particular gene groups with rewiring constructs. We examine two large clusters (ribosomal and flagellar genes) in detail. These represent alternative global outcomes from different rewirings because of antagonism between these major cell states. This data set of systematically related perturbations enables reverse engineering and discovery of underlying network interactions.


Supplementary
. Growth timecourses and growth rates for selected rewired constructs. 5 ml precultures were grown in LB medium for 16 hours, and diluting to an OD 600 of 0.0015 (around 1/800 dilution), -well plate. Absorbance was then measured on a platereader every 360 s, at 595 nm (A 595 ). Growth rates were calculated as the slope of the natural logarithm of A 595 , with respect to time (a moving average over 6 consecutive readings). To aid comparison, growth rates were plotted against A 595 . (a), Growth rates at A 595 = 0.07 (initial growth, as soon as can be detected by the platereader) and A 595 = 0.27 (midlog growth). (b), Growth timecourses (left) and growth rates versus A 595 (right) for selected constructs. The identity of each construct (together with number of differentiallyexpressed genes in brackets) is shown to the top-right of each dataset. The control construct (Co) is used as a reference throughout (purple). (c), Growth timecourses (left) and growth rates versus A 595 (right) for ribosome cluster constructs. Note the greater growth of pheS-arcA, relative to the control, Co (easiest to see on the A 595 growth curve). Other ribosomal constructs have various combinations of faster growth, slower growth, or delayed growth, relative to the control Co. Error bars are 1 s.e.m. for 10 replicates (colonies) throughout.  Figure 6 (qp-graph 30% precision) with the two public databases. 51 network interactions are common to all 3 databases. (b), A comparison of the 50% precision graph in Figure S7 (qp-graph 50%) with two public databases. 41 network interactions are common to all 3 databases. Figure 7. Reversed-engineered network using qpgraph at 50% precision (0.5qpgraph). Connected nodes are shown as circles (red=transcription factor -factor; grey=regulated gene). Edges (direct predicted connections) are shown by lines (blue=absent in RegulonDB 3 ; pink=present in RegulonDB). The inset table outlines the number of interactions recovered by qpgraph, divided into interactions that are present or absent in regulonDB. Figure 8. Testing new interactions in a subnetwork predicted by reverse engineering. (a), The ExuR 50% precision subnetwork predicted by qpgraph contains potential new regulatory connections to gmk and mazEF that are absent in RegulonDB. Connected nodes are shown as circles (red=transcription factor (TF); grey=regulated gene). Edges (direct predicted connections) are shown by lines (blue=absent in RegulonDB 3 ) (b), MEME 5 was used to predict a sequence logo for ExuR DNA binding. (c), MEME analysis of promoter regions described as being bound by ExuR in RegulonDB or in our qpgraph prediction. MEME finds putative ExuR sites on the gmk promoter region but not the mazEF promoter. (d), DNaseI footprinting analysis to test potential binding sites of ExuR. Sequences that are protected from DNaseI cleavage in the red ExuR sample, reveal the blue control traces. Footprint sites, and their approximate positions and sequences, are highlighted on the traces for the exuT and gmk promoters; no footprints are found in the mazEF promoter, in agreement with the MEME analysis. a.u., arbitrary units. ORF  ORF outdegree   mean

Affymetrix E. coli Genome 2.0 microarrays
The 85 chosen gene network rewiring plasmids (from 6 ) were transformed into E. coli TOP10 cells and grown under standardised conditions: bacteria were freshly plated onto LB Agar plates (supplemented with 100 µg ml -1 Ampicillin and 50 µg ml -1 Streptomycin) and incubated overnight at 37°C to form colonies. For each biological replicate, single colonies (maximum 3 days old) were used to inoculate separate 5 ml overnight pre-cultures in LB containing 100 µg ml -1 Ampicillin and 50 µg ml -1 Streptomycin. Constructs were grown for 37°C, at 220 rpm in an orbital shaker. The precultures were diluted to an OD 600 of 0.0015 (approximately 1:800 dilution) in 2 ml of the same medium, in 14 ml culture tubes. Constructs were grown for 16h at 37°C, at 220 rpm in an orbital shaker. These conditions were chosen to match previous work 6 .
Total bacterial RNA was extracted with a RNeasy Protect Bacteria Mini Kit (Qiagen, Cat. 74106). RNAprotect Bacteria Reagent (Qiagen, Cat. 76506) was added following the manufacturer's instructions (2 volumes reagent: 1 volume bacterial culture). RNA concentration was measured with a NanoDrop 2000 spectrophotometer. For microarray analysis, RNA integrity number was analysed with an Agilent 2100 Bioanalyzer. 10 µg of total bacterial RNA with a RNA integrity number > 7.0 was used for transcriptome analysis with Affymetrix GeneChip E. coli Genome 2.0 Arrays. MIAME compliant microarray data files are available at EBI ArrayExpress (ID: E-MTAB-3233). There are 255 raw array files for the 85 networks in biological triplicate samples (representing different colonies) and 5 files for the Co control 6 .

Microarray differential expression analysis
The microarrays probed for the relative RNA expression levels of 3891 annotated E. coli genes with unique Entrez IDs. All network constructs (biological triplicates from different colonies) were compared to 5 biological replicates of a "wild-type" reference standard (E. coli transformed with an empty promoterless GFP plasmid; denoted "Co" 6 ). Data analysis was performed using the R programming language and the Bioconductor software packages. Each Affymetrix chip was background adjusted, normalized and log2 transformed using the Robust Multichip Averaging (RMA) algorithm 7 . Differential expression analysis was performed by using the Bioconductor package limma 8 . The genes which were differentially expressed (<5% False Discovery Rate 9 , >1.2-fold-change) were extracted and used to analyse the scale of network perturbations.
EPCLUST (http://www.bioinf.ebc.ee/EP/EP/EPCLUST/) 10 was used for clustering the array data of fold-changes in mRNA levels. The chosen options for clustering provided on the webserver are annotated in Supplemental Data 1.

pBAD arabinose-inducible constructs
ORFS were obtained via PCR, from 15 ng E. coli TOP10 genomic DNA, and were cloned using the pBAD202 Directional TOPO Expression Kit (Invitrogen, Cat. K4202-01). After sequence verification, positive clones in TOP10 cells were grown in LB medium, supplemented with 30 µg ml -1 Kanamycin and 0.002% arabinose, to induce the pBAD promoter. RNA was extracted as described above.

Quantitative real-time PCR (qRT-PCR)
E. coli TOP10 cells containing the different promoter-gene constructs were grown for RNA extraction as described above, except that they were grown in 500 µl LB (ribosome experiments; Figure 4), in 2.2 ml 96 well culture plates (PEQLAB Cat. 82-0932-A). The plates were sealed with Breathe-Easy™ sealing membrane (Sigma, Cat. Z380059).
RNA . All primer sets were tested for specificity by melting curve analysis standard curves with E. coli genomic DNA. All samples were normalized against levels of gnd (a housekeeping gene) in the same sample. Fold-change of expression over the Co control sample was calculated by dividing the gnd-normalized concentration of the sample by the gnd-normalized concentration of the Co sample.

Statistics
Linear regression correlation analysis was carried out on microarray data with StatPlus:mac.LE.2009 ANOVA F test to calculate p-values. 95% confidence intervals of growth curves (fluorescence and A595) were estimated using ± 1.96  standard error mean for each time point.

Biclustering
Biclustering was implemented by using the Iterative Signature Algorithm (ISA) 11 , as implemented in the R Bioconductor eisa package (v. 1.4.1) 12 . The gene (features) and sample thresholds were set to 2.1 and 1.5, respectively. Gene filter cutoffs: genes with log2 fold-changes >0.25 or <-0.25 in at least three points were used.

Gene Ontology Enrichment Analysis
Gene ontology 13 analysis was performed using R Bioconductor package GOstats (v. 2.18) 13 . Affymetrix IDs were converted into Entrez Gene identifiers and then mapped against the GO database (R Bioconductor annotation packages ecoli2.db and GO.db). All tests were done on the Biological Process (BP), Molecular Functions (MF) and Cellular Components (CC) ontology. Enrichment analysis used hypergeometric-based tests with a 'universe' of 4070 annotated MG1655 E. coli genes. p-value threshold was set to 0.01 and significance was determined with the FDR-adjusted p-value cutoff at 0.05 9 .

Reverse-engineering with qpgraph
The Bioconductor package qpgraph 14 was used to estimate the structure of the network from the 255 microarrays. Non-rejection rates (NRR) were calculated, which are based on partial correlations of order q < (n-2) 15 . The NRR gives an estimate of the strength of a direct interaction between two genes and can be understood as a linear measure of association over all marginal distributions of size q that is calculated for every gene pair. We calculated NRRs for every possible value of q and calculated an average NRR 14 which was used to rank all regulator-target associations and to estimate precision-recall curves based on RegulonDB 7 3 . 30%-precision (% number of true positives per number of predicted edges whose genes belong to at least 1 RegulonDB interaction) was selected to generate the full network .

Network visualization
Network visualization was done with Cytoscape Version 2.8.2 16 . To calculate topological parameters of the network we used the Cytoscape plug-in NetworkAnalyzer v2.7 17 . ClueGo v1.4 plug-in was used for gene set enrichment analysis with GO Biological process annotations 18 .

De-novo detection of overrepresented motifs
Upstream DNA regions of co-expressed genes were obtained from Ecogene 3.0 19 and the MEME online Motif Search tool was used to screen the -1000 upstream regions (-1000bp from transcription start sites) for overrepresented motifs 5 .

In vitro transcription-translation and DNaseI footprinting
The ORF for exuR was PCR amplified from E. coli Top10 genomic DNA with KOD Hot Start Polymerase (Novagen) and was cloned via NdeI-BamHI into the control plasmid from the PURExpress in vitro protein synthesis kit (NEB)(primers: GCGCGCCATATGGAAATCACTGAA; CCCGGGATCCTCATCATTTACTGCCGCT). The -1000 upstream regions was also obtained by genomic PCR, with primers containing 5'-complementary sequence to universal fluorescent (HEX) labelled primers, for generating fluorescent end-labelled promoter fragments. Primer sets for DnaseI footprinting: ExuR plasmid or negative control plasmid was added to the protein synthesis kit and expression was verified by SDS-PAGE. After expression the reactions were incubated with 1 µg poly-dIdC (Sigma-Aldrich #81349) (on ice, 10 min), labelled probe (37˚C, 20 min) and 0.05-0.1 units DNaseI (26˚C, 5 min) in Valmeekam ExuR Binding Buffer: 7.5 mM Tris-HCl pH 7.4, 0.5 mM EDTA, 5 mM KCl, 2.5% glycerol, 0.5 mM DTT, 0.4 µg BSA. Samples were heat inactivated (75˚C, 10 min), column purified, and mixed 1:6 with HiDi Formamide and 0.4 µl ROX 1000 GeneScan size standard. Fragment analysis was performed on a GeneScan ABI3130xl DNA analyzer. Samples were analysed with GeneMapper Version 4.0. Sequences were obtained in parallel with Thermo Sequenase Dye Primer Manual Cycle Sequencing Kit. Sequences and fragments were aligned using the ROX 1000 internal size standard.