Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Commentary
  • Published:

Bioinformatics for the 'bench biologist': how to find regulatory regions in genomic DNA

The combination of bioinformatic and biological approaches constitutes a powerful method for identifying gene regulatory elements. High-quality genome sequences are available in public databases for several vertebrate species. Comparative cross-species sequence analysis of these genomes shows considerable conservation of noncoding sequences in DNA. Biological analyses show that an unexpectedly high number of the conserved sequences correspond to functional cis-regulatory regions that influence gene transcription. Because research biologists are often unfamiliar with the bioinformatic resources at their disposal, this commentary discusses how to integrate biological and bioinformatic methods in the discovery of gene regulatory regions and includes a tutorial on widely available comparative genomics programs.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: TH2 cytokine locus on mouse chromosome 11.
Figure 2: Identification of orthologs.
Figure 3: Alignment of a section of the IL4 locus on mouse chromosome 11 with the corresponding region of human chromosome 5.

References

  1. Baltimore, D. Our genome unveiled. Nature 409, 814–816 (2001).

    Article  CAS  PubMed  Google Scholar 

  2. Levine, M. & Tjian, R. Transcription regulation and animal diversity. Nature 424, 147–151 (2003).

    Article  CAS  PubMed  Google Scholar 

  3. Carey, M. & Smale, S.T. Transcriptional Regulation in Eukaryotes: Concepts, Strategies, and Techniques (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 2000).

    Google Scholar 

  4. Fischle, W., Wang, Y. & Allis, C.D. Histone and chromatin cross-talk. Current Opinion in Cell Biology 15, 172–183 (2003).

    Article  CAS  PubMed  Google Scholar 

  5. Arnone, M. & Davidson, E. The hardwiring of development: organization and function of genomic regulatory systems. Development 124, 1851–1864 (1997).

    CAS  PubMed  Google Scholar 

  6. Davidson, E.H. Genomic Regulatory Systems: Development and Evolution (Academic, San Diego, 2001).

  7. Kirschner, M. & Gerhart, J. Evolvability. Proc. Natl. Acad. Sci. USA 95, 8420–8427 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Locascio, A., Manzanares, M., Blanco, M.J. & Nieto, M.A. Modularity and reshuffling of Snail and Slug expression during vertebrate evolution. Proc. Natl. Acad. Sci. USA 99, 16841–16846 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Lynch, M. & Conery, J.S. The origins of genome complexity. Science 302, 1401–1404 (2003).

    Article  CAS  PubMed  Google Scholar 

  10. Mancini-DiNardo, D., Steele, S.J.S., Ingram, R.S. & Tilghman, S.M. A differentially methylated region within the gene Kcnq1 functions as an imprinted promoter and silencer. Hum. Mol. Genet. 12, 283–294 (2003).

    Article  CAS  PubMed  Google Scholar 

  11. Loots, G.G. et al. Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. Science 288, 136–140 (2000).

    CAS  PubMed  Google Scholar 

  12. Loots, G.G., Ovcharenko, I., Pachter, L., Dubchak, I. & Rubin, E.M. rVista for comparative sequence-based discovery of functional transcription factor binding sites. Genome Res. 12, 832–839 (2002).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Pennacchio, L.A. & Rubin, E.M. Genomic strategies to identify mammalian regulatory sequences. Nat. Rev. Genet. 2, 100–109 (2001).

    Article  CAS  PubMed  Google Scholar 

  14. Frazer, K.A., Elnitski, L., Church, D.M., Dubchak, I. & Hardison, R.C. Cross-species sequence comparisons: a review of methods and available resources. Genome Res. 13, 1–12 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Pennacchio, L.A. & Rubin, E.M. Comparative genomic tools and databases: providing insights into the human genome. J. Clin. Invest. 111, 1099–1106 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Wasserman, W.W. & Sandelin, A. Applied bioinformatics for the identification of regulatory elements. Nat. Rev. Genet. 5, 276–287 (2004).

    Article  CAS  PubMed  Google Scholar 

  17. Agarwal, S. & Rao, A. Modulation of chromatin structure regulates cytokine gene expression during T cell differentiation. Immunity 9, 765–775 (1998).

    Article  CAS  PubMed  Google Scholar 

  18. Takemoto, N. et al. Th2-specific DNase I-hypersensitive sites in the murine IL-13 and IL-4 intergenic region. Int. Immunol. 10, 1981–1985 (1998).

    Article  CAS  PubMed  Google Scholar 

  19. Agarwal, S., Avni, O. & Rao, A. Cell-type-restricted binding of the transcription factor NFAT to a distal IL-4 enhancer in vivo. Immunity 12, 643–652 (2000).

    Article  CAS  PubMed  Google Scholar 

  20. Lee, G.R., Fields, P.E. & Flavell, R.A. Regulation of IL-4 gene expression by distal regulatory elements and GATA-3 at the chromatin level. Immunity 14, 447–459 (2001).

    Article  CAS  PubMed  Google Scholar 

  21. Mohrs, M. et al. Deletion of a coordinate regulator of type 2 cytokine expression in mice. Nat. Immunol. 2, 842–847 (2001).

    Article  CAS  PubMed  Google Scholar 

  22. Solymar, D.C., Agarwal, S., Bassing, C.H., Alt, F.W. & Rao, A. A 3′ enhancer in the IL-4 gene regulates cytokine production by Th2 cells and mast cells. Immunity 17, 41–50 (2002).

    Article  CAS  PubMed  Google Scholar 

  23. Smale, S.T. & Fisher, A.G. Chromatin structure and gene regulation in the immune system. Annu. Rev. Immunol. 20, 427–462 (2002).

    Article  CAS  PubMed  Google Scholar 

  24. Ansel, K.M., Lee, D.U. & Rao, A. An epigenetic view of helper T cell differentiation. Nat. Immunol. 4, 616–623 (2003).

    Article  CAS  PubMed  Google Scholar 

  25. Lee, D.U., Avni, O., Chen, L. & Rao, A. A distal enhancer in the interferon-γ (IFN-γ) locus revealed by genome sequence comparison. J. Biol. Chem. 279, 4802–4810 (2004).

    Article  CAS  PubMed  Google Scholar 

  26. Kim, H.P., Kelly, J. & Leonard, W.J. The basis for IL-2-induced IL-2 receptor α chain gene regulation: importance of two widely separated IL-2 response elements. Immunity 15, 159–172 (2001).

    Article  CAS  PubMed  Google Scholar 

  27. Göttgens, B. et al. Long-range comparison of human and mouse SCL loci: localized regions of sensitivity to restriction endonucleases correspond precisely with peaks of conserved noncoding sequences. Genome Res. 11, 87–97 (2001).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Chapman, M.A. et al. Comparative and functional analyses of LYL1 loci establish marsupial sequences as a model for phylogenetic footprinting. Genomics 81, 249–259 (2003).

    Article  CAS  PubMed  Google Scholar 

  29. Glusman, G. et al. Comparative genomics of the human and mouse T cell receptor loci. Immunity 15, 337–349 (2001).

    Article  CAS  PubMed  Google Scholar 

  30. Amsen, D. et al. Instruction of distinct CD4 T helper cell fates by different notch ligands on antigen-presenting cells. Cell 117, 515–526 (2004).

    Article  CAS  PubMed  Google Scholar 

  31. Hammond, K.J. & Kronenberg, M. Natural killer T cells: natural or unnatural regulators of autoimmunity? Curr. Opin. Immunol. 15, 683–689 (2003).

    Article  CAS  PubMed  Google Scholar 

  32. Weiss, D.L. & Brown, M.A. Regulation of IL-4 production in mast cells: a paradigm for cell-type-specific gene expression. Immunol. Rev. 179, 35–47 (2001).

    Article  CAS  PubMed  Google Scholar 

  33. Falcone, F.H., Haas, H. & Gibbs, B.F. The human basophil: a new appreciation of its role in immune responses. Blood 96, 4028–4038 (2000).

    Article  CAS  PubMed  Google Scholar 

  34. Frazer, K.A. et al. Computational and biological analysis of 680 kb of DNA sequence from the human 5q31 cytokine gene cluster region. Genome Res. 7, 495–512 (1997).

    Article  CAS  PubMed  Google Scholar 

  35. Lee, D.U., Agarwal, S. & Rao, A. Th2 lineage commitment and efficient IL-4 production involves extended demethylation of the IL-4 gene. Immunity 16, 649–660 (2002).

    Article  CAS  PubMed  Google Scholar 

  36. Hural, J.A., Kwan, M., Henkel, G., Hock, M.B. & Brown, M.A. An intron transcriptional enhancer element regulates IL-4 gene locus accessibility in mast cells. J. Immunol. 165, 3239–3249 (2000).

    Article  CAS  PubMed  Google Scholar 

  37. Ludwig, M.Z., Bergman, C., Patel, N.H. & Kreitman, M. Evidence for stabilizing selection in a eukaryotic enhancer element. Nature 403, 564–567 (2000).

    Article  CAS  PubMed  Google Scholar 

  38. Stern, D.L. Evolutionary developmental biology and the problem of variation. Evolution 54, 1079–1091 (2000).

    Article  CAS  PubMed  Google Scholar 

  39. Bergman, C.M. & Kreitman, M. Analysis of conserved noncoding DNA in drosophila reveals similar constraints in intergenic and intronic sequences. Genome Res. 11, 1335–1345 (2001).

    Article  CAS  PubMed  Google Scholar 

  40. Doyle, J.J. & Gaut, B.S. Evolution of genes and taxa: a primer. Plant Mol. Biology 42, 1–23 (2000).

    Article  CAS  Google Scholar 

  41. Wolfe, K.H. & Shields, D.C. Molecular evidence for an ancient duplication of the entire yeast genome. Nature 387, 708–713 (1997).

    Article  CAS  PubMed  Google Scholar 

  42. Lutfalla, G. et al. Comparative genomic analysis reveals independent expansion of a lineage-specific gene family in vertebrates: The class II cytokine receptors and their ligands in mammals and fish. BMC Genomics 4, 29 (2003).

    Article  PubMed  PubMed Central  Google Scholar 

  43. Birney, E. et al. An overview of Ensembl. Genome Res. 14, 925–928 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Wheeler, D.L. et al. Database resources of the National Center for Biotechnology Information: update. Nucleic Acids Res. 32, D35–40 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Karolchik, D. et al. The UCSC genome browser database. Nucleic Acids Res. 31, 51–54 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Koski, L.B. & Golding, G.B. The closest BLAST hit is often not the nearest neighbor. J. Mol. Evol. 52, 540–542 (2001).

    Article  CAS  PubMed  Google Scholar 

  47. Forsyth, S., Horvath, A. & Coughlin, P. A review and comparison of the murine α1-antitrypsin and α1-antichymotrypsin multigene clusters with the human clade A serpins. Genomics 81, 336–345 (2003).

    Article  CAS  PubMed  Google Scholar 

  48. Thomas, J.W. et al. Comparative analyses of multi-species sequences from targeted genomic regions. Nature 424, 788–793 (2003).

    Article  CAS  PubMed  Google Scholar 

  49. Cooper, G.M. et al. Quantitative estimates of sequence divergence for comparative analyses of mammalian genomes. Genome Res. 13, 813–820 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Durbin, R., Eddy, S.R., Krogh, A. & Mitchison, G. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids (Cambridge University Press, Cambridge, 1998).

    Book  Google Scholar 

  51. Pollard, D.A., Bergman, C.M., Stoye, J., Celniker, S.E. & Eisen, M.B. Benchmarking tools for the alignment of functional noncoding DNA. BMC Bioinformatics 5, 6 (2004).

    Article  PubMed  PubMed Central  Google Scholar 

  52. Bray, N., Dubchak, I. & Pachter, L. AVID: A global alignment program. Genome Res. 13, 97–102 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Mayor, C. et al. VISTA: visualizing global DNA sequence alignments of arbitrary length. Bioinformatics 16, 1046–1047 (2000).

    Article  CAS  PubMed  Google Scholar 

  54. Brudno, M. et al. LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13, 721–731 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Schwartz, S. et al. Human-mouse alignments with BLASTZ. Genome Res. 13, 103–107 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Schwartz, S. et al. PipMaker—A web server for aligning two genomic DNA sequences. Genome Res. 10, 577–586 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Brudno, M., Chapman, M., Gottgens, B., Batzoglou, S. & Morgenstern, B. Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics 4, 66 (2003).

    Article  PubMed  PubMed Central  Google Scholar 

  58. Gross, D.S. & Garrard, W.T. Nuclease hypersensitive sites in chromatin. Annu. Rev. Biochem. 57, 159–197 (1988).

    Article  CAS  PubMed  Google Scholar 

  59. Adlam, M. & Siu, G. Hierarchical interactions control CD4 gene expression during thymocyte development. Immunity 18, 173–184 (2003).

    Article  CAS  PubMed  Google Scholar 

  60. Lee, G.R., Fields, P.E., Griffin, T.J. & Flavell, R.A. Regulation of the Th2 cytokine locus by a locus control region. Immunity 19, 145–153 (2003).

    Article  CAS  PubMed  Google Scholar 

  61. Horsley, V., Jansen, K.M., Mills, S.T. & Pavlath, G.K. IL-4 acts as a myoblast recruitment factor during mammalian muscle growth. Cell 113, 483–494 (2003).

    Article  CAS  PubMed  Google Scholar 

  62. Yamashita, M. et al. Identification of a conserved GATA3 response element upstream proximal from the interleukin-13 gene locus. J. Biol. Chem. 277, 42399–42408 (2002).

    Article  CAS  PubMed  Google Scholar 

  63. Burgess-Beusse, B. et al. The insulation of genes from external enhancers and silencing chromatin. Proc. Natl. Acad. Sci. USA 99, 16433–16437 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Crawford, G.E. et al. Identifying gene regulatory elements by genome-wide recovery of DNase hypersensitive sites. Proc. Natl. Acad. Sci. USA 101, 992–997 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Ellmeier, W., Sunshine, M.J., Maschek, R. & Littman, D.R. Combined deletion of CD8 locus cis-regulatory elements affects initiation but not maintenance of CD8 expression. Immunity 16, 623–634 (2002).

    Article  CAS  PubMed  Google Scholar 

  66. Taniuchi, I., Sunshine, M.J., Festenstein, R. & Littman, D.R. Evidence for distinct CD4 silencer functions at different stages of thymocyte differentiation. Mol. Cell 10, 1083–1096 (2002).

    Article  CAS  PubMed  Google Scholar 

  67. Taniuchi, I. et al. Differential requirements for Runx proteins in CD4 repression and epigenetic silencing during T lymphocyte development. Cell 111, 621–633 (2002).

    Article  CAS  PubMed  Google Scholar 

  68. Avni, O. et al. TH cell differentiation is accompanied by dynamic changes in histone acetylation of cytokine genes. Nat. Immunol. 3, 643–651 (2002).

    Article  CAS  PubMed  Google Scholar 

  69. Bird, A. DNA methylation patterns and epigenetic memory. Genes Dev. 16, 6–21 (2002).

    Article  CAS  PubMed  Google Scholar 

  70. Schug, J. & Overton, G.C. http://www.cbil.upenn.edu/tess (Computational Biology and Informatics Laboratory, School of Medicine, University of Pennsylvania, Philadelphia, 1997).

  71. Kel-Margoulis, O.V. et al. Composition-sensitive analysis of the human genome for regulatory signals. In Silico Biol. 3, 145–171 (2003).

    CAS  PubMed  Google Scholar 

  72. Lenhard, B. et al. Identification of conserved regulatory elements by comparative genome analysis. J. Biol. 2, 13.1–13.11 (2003).

    Article  Google Scholar 

  73. Wray, G.A. et al. The evolution of transcriptional regulation in eukaryotes. Mol. Biol. Evol. 20, 1377–1419 (2003).

    Article  CAS  PubMed  Google Scholar 

  74. Rutherford, S.L. From genotype to phenotype: buffering mechanisms and the storage of genetic information. Bioessays 22, 1095–1105 (2000).

    Article  CAS  PubMed  Google Scholar 

  75. Bell, A.C., West, A.G. & Felsenfeld, G. The protein CTCF is required for the enhancer blocking activity of vertebrate insulators. Cell 98, 387–396 (1999).

    Article  CAS  PubMed  Google Scholar 

  76. Szabo, S.J. et al. A novel transcription factor, T-bet, directs Th1 lineage commitment. Cell 100, 655–669 (2000).

    Article  CAS  PubMed  Google Scholar 

  77. Hardison, R.C. Comparative genomics. PLoS Biol. 1, E58 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Supplementary information

Supplementary Tutorial

Bioinformatics for the Bench Scientist. (HTM 4 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nardone, J., Lee, D., Ansel, K. et al. Bioinformatics for the 'bench biologist': how to find regulatory regions in genomic DNA. Nat Immunol 5, 768–774 (2004). https://doi.org/10.1038/ni0804-768

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/ni0804-768

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing