Mutations in sigma 70 transcription factor improves expression of functional eukaryotic membrane proteins in Escherichia coli

Eukaryotic integral membrane proteins (IMPs) are difficult to study due to low functional expression levels. To investigate factors for efficient biogenesis of eukaryotic IMPs in the prokaryotic model organism Escherichia coli, important, e.g., for isotope-labeling for NMR, we selected for E. coli cells expressing high levels of functional G protein-coupled receptors (GPCRs) by FACS. Utilizing an E. coli strain library with all non-essential genes systematically deleted, we unexpectedly discovered upon whole-genome sequencing that the improved phenotype was not conferred by the deleted genes but by various subtle alterations in the “housekeeping” sigma 70 factor (RpoD). When analyzing effects of the rpoD mutations at the transcriptome level we found that toxic effects incurred on wild-type E. coli during receptor expression were diminished by two independent and synergistic effects: a slower but longer-lasting GPCR biosynthesis and an optimized transcriptional pattern, augmenting growth and expression at low temperature, setting the basis for further bacterial strain engineering.


Whole genome sequencing
The genomic DNA of the clone to be sequenced was extracted from approximately 1 × 10 9 E. coli cells with the GenEluteÔ Bacterial Genomic DNA Kit (Sigma, Cat. No. NA2110) and quantified with the Quant-iT™ PicoGreen ds DNA kit (Invitrogen, Cat. No. P7589). Prior to sequencing, the quality of the isolated genomic DNA was checked with a Bioanalyzer 2100 instrument (Agilent Technologies).
The genomes were sequenced using either 4 single-molecule real-time sequencing (SMRT) cells on a PacBio RS II or an Illumina MiSeq instrument (NextXT library kit), both at the Functional Genomics Center Zurich. The concentration of the input DNA was determined by using the Qubit Fluorometer dsDNA Broad Range assay (Life Technologies, Cat. No. Q32850).
PacBio RS II. The DNA sequence of the whole genome of Keio clone ΔqseB and the wildtype strain BW25113 were obtained using PacBio RS with SMRT cells to get long pairend reads and thus to be able to also detect large rearrangements.
The SMRT bell was produced using the DNA Template Prep Kit 2.0 (Pacific Biosciences, Cat. No. 001-540-835) according to the 3-kb or 10-kb template preparation and sequencing protocol provided by Pacific Biosciences. 10 μg of genomic DNA were mechanically sheared to an average size distribution of 10 kb, using a Covaris gTube (Kbiosciences Cat. No 520079). A Bioanalyzer 2100 12K DNA Chip assay (Agilent Technologies, Cat. No. 5067-1508) was used to assess the fragment size distribution. 5 μg of sheared genomic DNA were incubated with polishing enzymes to repair damages at the ends of the DNA fragments. A blunt-end ligation reaction followed by exonuclease treatment was performed to create the SMRT bell template. The quality of the library was inspected with the Agilent Bioanalyzer 12Kb DNA Chip and the Qubit Fluorimeter. A ready-to-sequence SMRT bell-polymerase complex was created using the P4 DNA/Polymerase binding kit 2.0 according to the manufacturer's instructions (Pacific Biosciences, Cat. No. 100-236-500).
The Pacific Biosciences RS2 instrument was programmed to load and sequence the sample on 4 SMRT cells v3.0 per sample (Pacific Biosciences, Cat. No. 100-171-800), recording 1 movie of 120 minutes each per SMRT cell. A MagBead loading (Pacific Biosciences, Cat. No 100-133-600) method was chosen in order to improve the enrichment of the longer fragments. After the run, a sequencing report was generated for every cell via the SMRT portal, in order to assess the adapter dimer contamination, the sample loading efficiency, the obtained average read-length and the number of filtered sub-reads.
A total of 71682/71237 reads with a mean length of 3332/3621 bp were assembled with a 20/30 fold coverage into 1/4 contigs for the wild type BW25113 strain/ Keio clone Δ qseB, respectively. The genomes were compared and the replacement of the qseB gene by the kanamycin resistance cassette in the Keio clone was confirmed.
Illumina MiSeq. As Nextera XT requires a maximum of 1 ng of total genomic DNA in 5 µl of starting volume, each sample was diluted to a concentration of 0.2 ng/µl genomic DNA as input dsDNA. The library preparation with individual library barcoding and normalization of the respective libraries was performed using the Nextera XT kit (Illumina, Cat. No. FC-131-1096) according to the manufacturer's protocol. Nextera XT libraries were quantified using Qubit and the size profile was analyzed on the 2200 TapeStation (Agilent). The libraries were pooled together and diluted to 4 nM. The library pool was denatured and further diluted prior to loading on a MiSeq paired-end 500 cycle (v2) sequencing run. We thus obtained a pattern of sequencing 2x250bp and a minimum genome coverage of 25x on average.

Site-directed mutagenesis in the E. coli genome
We developed a method for making site-directed mutagenesis in the E. coli genome. For this purpose, we use the methodology named Splicing by Overlap Extension 1 to create a DNA fusion between the kanamycin resistance cassette targeted to the non-essential mug gene (downstream of rpoD) and the last 350 bp of the rpoD gene. We use a DNA fragment containing the required mutation (rpoD-E575V) and also in parallel the wild-type rpoD sequence. Next, we followed the Datsenko method for gene deletions 2 , using the fused DNA fragment created as input, and we then used the kanamycin resistance for selection of the new E. coli strains that only differ by the desired point mutation.
To be able to use this method with the E. coli BL21 strain, we needed to integrate the recA gene from E. coli to the lambda Red recombinase system, as this gene is deleted in the BL21 strain. This was done by using the Red/ET recombination kit from Gene Bridges ® 3 , instead of the Datsenko and Wanner plasmids.

RNA-sequencing
Total RNA was extracted from approximately 5 × 10 8 E. coli cells using the RNeasy Mini kit (Qiagen, Cat. No. 74104). Briefly, bacterial cell cultures were directly mixed with twice the volume of RNA-protect Bacteria Reagent (Qiagen, Cat. No. 76506) and the recommended protocol of lysozyme-mediated lysis and digestion with Proteinase K was followed. The RNase-Free DNase Set (Qiagen, Cat. No. 79254) was used for an on-column DNase digestion for 30 min prior to RNA elution.

Bioinformatics
After sequencing, reads were analyzed using SUSHI 4 , an NGS data analysis workflow management system developed at the Functional Genomics Center Zurich. First, reads were quality-checked with FastQC (Babraham Bioinformatics) and low-quality ends were clipped (5 bases from the start, 10 bases from the end). Trimmed reads were aligned and mapped to the reference genome and transcriptome of E. coli K-12 DH10 (FASTA and GTF files, respectively, downloaded from Ensembl) with Bowtie version 2.1 5 .
For the whole genome sequencing experiments, polymorphisms were detected using GATK version 2.2.0, following the recommended DNA-seq best practices 6 , and introduced in the NCBI reference E. coli K12 MG1655 using the GATK tool FastaAlternateReferenceMaker. This new FASTA file was then used as the background to identify the variants between the individuals in the sample groups. In every case, polymorphisms were considered to pass the filter, if they showed at least 15-fold coverage and a minimum quality score of 50.
The Unified Genotyper was used with the following options: baq Gap open penalty (whole-genome analysis) set to 30; minimum consensus coverage to genotype indels set to 8 (default: 5); minimum depth set to 19; minimum base quality score and minimum variants phred score set to 15; minimum variant quality score set to 50.
Variants were annotated using snpEFF version 3.4 7 , and distribution of the reads across genomic isoform expression was quantified using the R package GenomicRanges 8 from Bioconductor Version 3.0.
For the transcriptome analysis, mapped reads for each annotated gene were counted using CountOverlaps in the Bioconductor package GenomicRanges 8 . The differentially expressed genes were identified with the Bioconductor package edgeR 9 where the raw counts were normalized using the TMM (trimmed mean of M values) method 10 . The sequencing reads and raw counts have been deposited in Gene Expression Omnibus of NCBI under accession number GSE109819.
Enrichment analyses of the gene-expression data were made using the web tools at BioCyc.org, in particular the EcoCyc Database 11 . SmartTables and Omics Dashboard 12 enrichment parameters were set to include results whose p-value were less than 0.05 applying a Fisher exact statistics algorithm. In addition, statistics analyses specially targeted for sigma factor enrichment were done with the free statistical computing environment R v. 3.4.3. 13 using the fisher.test command and the experimental sigma factor-gene interaction dataset from RegulonDB v. 9.0 14 . In all cases, p-values were first false discovery rate (FDR)-adjusted, using the procedure of multiple hypothesis testing correction of Benjamini and Hochberg 15 .

Quantitative real time PCR
The whole experiment was performed following the MIQE guidelines (minimum information for publication of quantitative real-time PCR experiments) 16 . Total RNA was extracted as described for RNA sequencing analysis. Isolated RNA was further treated with the TURBO DNA-free kit (Ambion, Cat. No. AM1907), to remove residual genomic DNA. The purity and integrity of RNA was evaluated by electrophoresis in an agarose gel and measuring the ratio of the absorbance at 260/280 nm on a Nanodrop spectrophotometer. The RNA concentration was estimated by using Quant-iT™ RiboGreen RNA Assay Kit (Invitrogen, Cat. No. R11490). Total RNA (1 μg) was reversetranscribed to obtain cDNA with a SuperScript First-Strand Synthesis kit using random hexamers (Invitrogen, Cat. No. 11904018). Primers were designed with Primer3 software 17 or obtained from PrimerBank 18 .
The quantitative PCR was performed in a Mx3005P qPCR System (Agilent) using 5 μl of 20-fold diluted cDNA product, the reagent SYBR Select Master Mix (Applied Biosystems, Cat. No. 4472908) and 10 pmol of specific primers for each gene in a 20 μl reaction volume. The temperature profile was 95°C for 2 min, followed by 40 cycles of 95°C for 15 s and 60°C for 1 min. A post-amplification melting-curve analysis was done to discard primer-dimer artifacts and to ensure reaction specificity by heating the products to 95°C for 5 s, followed by cooling to 60°C and heating to 95°C while monitoring fluorescence. PCR products of the correct lengths were verified by agarose gel electrophoresis. Samples without reverse transcriptase treatment were measured in parallel to determine the concentration of any contaminating DNA.
For each strain, three biological replicates were analyzed and three technical replicates were carried out for each qPCR measurement. The cycle threshold (CT) and efficiency values obtained were used for further analysis and calculation of relative expression levels using the 2ΔΔCt method 19 . Each sample was normalized using TATAA Universal RNA Spike II (TATAA Biocenter AB) as a spike-in internal control, and then the results from samples X and Y were compared to those in Z, as a calibrator sample. Tests for enzymatic inhibition and RNA extraction yield were performed as suggested for the TATAA Universal RNA Spike II (TATAA Biocenter AB). Figure S1: Scheme of selection and sorting process of the Keio mutants according to their GPCR expression.

Supplementary Figures and Tables
The Keio clones were transformed with a GPCR-encoding plasmid (NTR1), the mutant strains were grown and GPCR expression was induced. The outer cell membrane was then permeabilized and functional receptors become labeled when the fluorescent ligands binds. E. coli cells were sorted by FACS to enrich for highly expressing mutants.   E. coli strain BW25113 (wt) and 4 clones of the selected Keio clones harboring the plasmid pRG-NTR were grown in M9 minimal medium (MM). Growth was estimated with OD600nm measurements after 20 hours of GPCR expression at 20°C. Results are normalized to values for the E. coli wt strain. Results of growth in rich medium 2xYT as in Figure S3 were included for comparison. The x-axis label indicates the gene that is deleted on the respective Keio clone. Means and standard deviations from three independent experiments are shown.  Figure S11: Summary of gene enrichment analysis using Pathway Tools Omics Dashboard 12 with the RNA-seq data. Numbers are an enrichment score: -log10(pvalue), where p-values were computed using Grossmann's parent-child-union variation of the Fisher-exact test, and applying the specified multiple hypothesis correction. Analyses were done using subsets of up-or down-regulated genes in each comparison.  RNA-seq data (see Table S1 for the full set of data) were used to analyze the pattern of global gene expression in the different E. coli strains. In comparing E. coli BW25113 harboring pRG-NTR versus E. coli BW25113 (without NTR), log2 ratios of gene expression are shown in blue in a descending order from left to right. Only those genes with log2 ratio bigger than 1 or smaller than -1 are plotted. In the same gene order, log2 ratios of gene expression are shown in red when comparing E. coli rpoD mutant harboring pRG-NTR versus E. coli BW25113 harboring pRG-NTR.   Table S1: Spreadsheet with RNA Seq data (separate file)