The transcription unit architecture of the Escherichia coli genome


Bacterial genomes are organized by structural and functional elements, including promoters, transcription start and termination sites, open reading frames, regulatory noncoding regions, untranslated regions and transcription units. Here, we iteratively integrate high-throughput, genome-wide measurements of RNA polymerase binding locations and mRNA transcript abundance, 5′ sequences and translation into proteins to determine the organizational structure of the Escherichia coli K-12 MG1655 genome. Integration of the organizational elements provides an experimentally annotated transcription unit architecture, including alternative transcription start sites, 5′ untranslated region, boundaries and open reading frames of each transcription unit. A total of 4,661 transcription units were identified, representing an increase of >530% over current knowledge. This comprehensive transcription unit architecture allows for the elucidation of condition-specific uses of alternative sigma factors at the genome scale. Furthermore, the transcription unit architecture provides a foundation on which to construct genome-scale transcriptional and translational regulatory networks.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Flowchart of the systematic iterative integration process.
Figure 2: Integration of the organizational components.
Figure 3: Modular units.
Figure 4: Determination of transcription units and use of alternative TSSs.

Accession codes


Gene Expression Omnibus


  1. 1

    MacLean, D., Jones, J.D. & Studholme, D.J. Application of 'next-generation' sequencing technologies to microbial genetics. Nat. Rev. Microbiol. 7, 287–296 (2009).

  2. 2

    Faith, J.J. et al. Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 5, e8 (2007).

  3. 3

    Graham, R., Graham, C. & McMullan, G. Microbial proteomics: a mass spectrometry primer for biologists. Microb. Cell Fact. 6, 26 (2007).

  4. 4

    Medini, D. et al. Microbiology in the post-genomic era. Nat. Rev. Microbiol. 6, 419–430 (2008).

  5. 5

    Xia, Q. et al. Protein abundance ratios for global studies of prokaryotes. Proteomics 7, 2904–2919 (2007).

  6. 6

    Fleischmann, R.D. et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269, 496–512 (1995).

  7. 7

    Reed, J.L., Famili, I., Thiele, I. & Palsson, B.O. Towards multidimensional genome annotation. Nat. Rev. Genet. 7, 130–141 (2006).

  8. 8

    Cho, B.K., Knight, E.M., Barrett, C.L. & Palsson, B.O. Genome-wide analysis of Fis binding in Escherichia coli indicates a causative role for A-/AT-tracts. Genome Res. 18, 900–910 (2008).

  9. 9

    Koonin, E.V. & Wolf, Y.I. Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world. Nucleic Acids Res. 36, 6688–6719 (2008).

  10. 10

    Grainger, D.C. et al. Studies of the distribution of Escherichia coli cAMP-receptor protein and RNA polymerase along the E. coli chromosome. Proc. Natl. Acad. Sci. USA 102, 17693–17698 (2005).

  11. 11

    Ishihama, Y. et al. Protein abundance profiling of the Escherichia coli cytosol. BMC Genomics 9, 102 (2008).

  12. 12

    Typas, A. et al. High-throughput, quantitative analyses of genetic interactions in E. coli. Nat. Methods 5, 781–787 (2008).

  13. 13

    Feist, A.M. et al. Reconstruction of biochemical networks in microorganisms. Nat. Rev. Microbiol. 7, 129–143 (2009).

  14. 14

    Cho, B.K. et al. Genome-scale reconstruction of the Lrp regulatory network in Escherichia coli. Proc. Natl. Acad. Sci. USA 105, 19462–19467 (2008).

  15. 15

    Crick, F. Central dogma of molecular biology. Nature 227, 561–563 (1970).

  16. 16

    Campbell, E.A. et al. Structural mechanism for rifampicin inhibition of bacterial RNA polymerase. Cell 104, 901–912 (2001).

  17. 17

    Herring, C.D. et al. Immobilization of Escherichia coli RNA polymerase and location of binding sites by use of chromatin immunoprecipitation and microarrays. J. Bacteriol. 187, 6166–6174 (2005).

  18. 18

    Choi, P.J., Cai, L., Frieda, K. & Xie, X.S. A stochastic single-molecule event triggers phenotype switching of a bacterial cell. Science 322, 442–446 (2008).

  19. 19

    Halasz, G. et al. Detecting transcriptionally active regions using genomic tiling arrays. Genome Biol. 7, R59 (2006).

  20. 20

    Power, J. The L-rhamnose genetic system in Escherichia coli K-12. Genetics 55, 557–568 (1967).

  21. 21

    Keseler, I.M. et al. EcoCyc: a comprehensive view of Escherichia coli biology. Nucleic Acids Res. 37, D464–D470 (2009).

  22. 22

    David, L. et al. A high-resolution map of transcription in the yeast genome. Proc. Natl. Acad. Sci. USA 103, 5320–5325 (2006).

  23. 23

    Zimmer, J.S., Monroe, M.E., Qian, W.J. & Smith, R.D. Advances in proteomics data analysis and display using an accurate mass and time tag approach. Mass Spectrom. Rev. 25, 450–482 (2006).

  24. 24

    Rudd, K.E. EcoGene: a genome sequence database for Escherichia coli K-12. Nucleic Acids Res. 28, 60–64 (2000).

  25. 25

    Jaffe, J.D., Berg, H.C. & Church, G.M. Proteogenomic mapping as a complementary method to perform genome annotation. Proteomics 4, 59–77 (2004).

  26. 26

    Ansong, C. et al. Proteogenomics: needs and roles to be filled by proteomics in genome annotation. Brief. Funct. Genomics Proteomics 7, 50–62 (2008).

  27. 27

    Sabatti, C., Rohlin, L., Oh, M.K. & Liao, J.C. Co-expression pattern from DNA microarray experiments as a tool for operon prediction. Nucleic Acids Res. 30, 2886–2893 (2002).

  28. 28

    Venkatraman, E.S. & Olshen, A.B. A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics 23, 657–663 (2007).

  29. 29

    Yanofsky, C. Attenuation in the control of expression of bacterial operons. Nature 289, 751–758 (1981).

  30. 30

    Kaberdin, V.R. & Blasi, U. Translation initiation and the fate of bacterial mRNAs. FEMS Microbiol. Rev. 30, 967–979 (2006).

  31. 31

    Cho, B.K., Charusanti, P., Herrgard, M.J. & Palsson, B.O. Microbial regulatory and metabolic networks. Curr. Opin. Biotechnol. 18, 360–364 (2007).

  32. 32

    Powell, B.S. et al. Novel proteins of the phosphotransferase system encoded within the rpoN operon of Escherichia coli. Enzyme IIANtr affects growth on organic nitrogen and the conditional lethality of an erats mutant. J. Biol. Chem. 270, 4822–4839 (1995).

  33. 33

    Bieda, M. et al. Unbiased location analysis of E2F1-binding sites suggests a widespread role for E2F1 in the human genome. Genome Res. 16, 595–605 (2006).

  34. 34

    Reppas, N.B., Wade, J.T., Church, G.M. & Struhl, K. The transition between transcriptional initiation and elongation in E. coli is highly variable and often rate limiting. Mol. Cell 24, 747–757 (2006).

  35. 35

    Lipton, M.S. et al. Global analysis of the Deinococcus radiodurans proteome by using accurate mass tags. Proc. Natl. Acad. Sci. USA 99, 11049–11054 (2002).

  36. 36

    Blattner, F.R. et al. The complete genome sequence of Escherichia coli K-12. Science 277, 1453–1462 (1997).

Download references


The authors thank Derek Lovley at the University of Massachusetts, Amherst for his insightful discussion and Marc Abrams for editing the manuscript. Proteomics experiments were performed using EMSL, a national scientific user facility sponsored by the Department of Energy's Office of Biological and Environmental Research and located at Pacific Northwest National Laboratory. This work was supported by the US National Institutes of Health Grant GM062791 and by the Office of Science (BER), US Department of Energy, cooperative agreement DE-FC02-02ER63446.

Author information

B.-K.C., K.Z., Y.Q., E.M.K. and B.Ø.P. conceived and designed experiments. B.-K.C., Y.S.P., Y.G. and E.M.K. performed genome-scale experiments. All data analyses were performed by B.-K.C., K.Z., Y.Q., Y.S.P. and C.L.B. The manuscript was written by B.-K.C., K.Z. and B.Ø.P.

Correspondence to Bernhard Ø Palsson.

Supplementary information

Supplementary Text and Figures

Supplementary Figs. 1–9 and Supplementary Tables 1–14 (PDF 26256 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Cho, B., Zengler, K., Qiu, Y. et al. The transcription unit architecture of the Escherichia coli genome. Nat Biotechnol 27, 1043–1049 (2009).

Download citation

Further reading