Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Detecting genomic islands using bioinformatics approaches

Key Points

  • Genomic islands (GIs) are large genomic regions (typically 10–200 kb in length) that are found in bacterial genomes and that have probably been horizontally acquired.

  • GIs disproportionately carry genes related to various functions of medical and environmental importance and have been named accordingly as 'pathogenicity islands', antibiotic 'resistance islands' and 'metabolic islands'.

  • The location of GIs can be computationally predicted by identifying one or more of the various features associated with GIs, such as sequence composition bias, known integration sites and genes of particular function, as well as abnormal phyletic patterns.

  • The accuracy of GI prediction programs varies widely, with some having high precision and others having high recall.

  • Many other bioinformatics tools can complement GI prediction programs, such as whole-genome alignment programs, genome viewers, genome annotators and databases of previously identified GIs.

  • Although various methods exist for the identification of GIs, manual curation is still often required to verify the predictions. Although increased genomic sampling should improve the accuracy of many of the methods that are currently available, future methods that combine various GI prediction approaches and improve the identification of GI boundaries should further help researchers to identify these important genomic regions.

Abstract

Bacterial genomes contain clusters of genes that are acquired by horizontal transfer, called genomic islands (GIs). GIs are frequently associated with microbial adaptations that are of medical and environmental interest, and they have had a substantial impact on bacterial evolution. Therefore, there is growing interest in efficiently identifying GIs in newly sequenced bacterial genomes. Several computational methods for detecting GIs have been developed recently, presenting researchers with a myriad of choices. Here, we discuss the limitations and benefits of the main approaches that are available and present guidelines to aid researchers in effectively identifying these important genomic regions.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Mobile genetic elements.
Figure 2: Graphical representation of several features associated with genomic islands.
Figure 3: Example of genomic island prediction results for four islands in Pseudomonas aeruginosa str. LESB58.

Similar content being viewed by others

References

  1. Hacker, J. et al. Deletions of chromosomal regions coding for fimbriae and hemolysins occur in vitro and in vivo in various extraintestinal Escherichia coli isolates. Microb. Pathog. 8, 213–225 (1990).

    Article  CAS  PubMed  Google Scholar 

  2. Hacker, J., Blum-Oehler, G., Muhldorfer, I. & Tschape, H. Pathogenicity islands of virulent bacteria: structure, function and impact on microbial evolution. Mol. Microbiol. 23, 1089–1097 (1997).

    Article  CAS  PubMed  Google Scholar 

  3. Hacker, J. & Kaper, J. B. Pathogenicity islands and the evolution of microbes. Annu. Rev. Microbiol. 54, 641–679 (2000).

    Article  CAS  PubMed  Google Scholar 

  4. Boyd, E. F., Almagro-Moreno, S. & Parent, M. A. Genomic islands are dynamic, ancient integrative elements in bacterial evolution. Trends Microbiol. 17, 47–53 (2009).

    Article  CAS  PubMed  Google Scholar 

  5. Gal-Mor, O. & Finlay, B. B. Pathogenicity islands: a molecular toolbox for bacterial virulence. Cell. Microbiol. 8, 1707–1719 (2006).

    Article  CAS  PubMed  Google Scholar 

  6. Dobrindt, U., Hochhut, B., Hentschel, U. & Hacker, J. Genomic islands in pathogenic and environmental microorganisms. Nature Rev. Microbiol. 2, 414–424 (2004). A review of GIs and their importance in bacterial evolution.

    Article  CAS  Google Scholar 

  7. Winstanley, C. et al. Newly introduced genomic prophage islands are critical determinants of in vivo competitiveness in the Liverpool Epidemic Strain of Pseudomonas aeruginosa. Genome Res. 19, 12–23 (2008). A recent study showing that several newly aquired prophages and GIs provide an advantage to a virulent P. aeruginosa strain.

    Article  PubMed  Google Scholar 

  8. Ho Sui, S. J., Fedynak, A., Hsiao, W. W., Langille, M. G. & Brinkman, F. S. The association of virulence factors with genomic islands. PLoS ONE 4, e8094 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Lawrence, J. G. Common themes in the genome strategies of pathogens. Curr. Opin. Genet. Dev. 15, 584–588 (2005).

    Article  CAS  PubMed  Google Scholar 

  10. Manson, J. M. & Gilmore, M. S. Pathogenicity island integrase cross-talk: a potential new tool for virulence modulation. Mol. Microbiol. 61, 555–559 (2006).

    Article  CAS  PubMed  Google Scholar 

  11. Bueno, S. M. et al. Precise excision of the large pathogenicity island, SPI7, in Salmonella enterica serovar Typhi. J. Bacteriol. 186, 3202–3213 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Middendorf, B. et al. Instability of pathogenicity islands in uropathogenic Escherichia coli 536. J. Bacteriol. 186, 3086–3096 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Finlay, B. B. & Falkow, S. Common themes in microbial pathogenicity revisited. Microbiol. Mol. Biol. Rev. 61, 136–169 (1997).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. Gogol, E. B., Cummings, C. A., Burns, R. C. & Relman, D. A. Phase variation and microevolution at homopolymeric tracts in Bordetella pertussis. BMC Genomics 8, 122 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Hochhut, B. et al. Molecular analysis of antibiotic resistance gene clusters in Vibrio cholerae O139 and O1 SXT constins. Antimicrob. Agents Chemother. 45, 2991–3000 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. Darling, A. C. E., Mau, B., Blattner, F. R. & Perna, N. T. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 14, 1394–1403 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Vernikos, G. S. & Parkhill, J. Interpolated variable order motifs for identification of horizontally acquired DNA: revisiting the Salmonella pathogenicity islands. Bioinformatics 22, 2196–2203 (2006).

    Article  CAS  PubMed  Google Scholar 

  19. Karlin, S., Mrazek, J. & Campbell, A. M. Codon usages in different gene classes of the Escherichia coli genome. Mol. Microbiol. 29, 1341–1355 (1998).

    Article  CAS  PubMed  Google Scholar 

  20. Karlin, S. Detecting anomalous gene clusters and pathogenicity islands in diverse bacterial genomes. Trends Microbiol. 9, 335–343 (2001). This article describes one of the first attempts to use sequence composition to predict the location of GIs.

    Article  CAS  PubMed  Google Scholar 

  21. Sandberg, R. et al. Capturing whole-genome characteristics in short sequences using a naive Bayesian classifier. Genome Res. 11, 1404–1409 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Tsirigos, A. & Rigoutsos, I. A new computational method for the detection of horizontal gene transfer events. Nucleic Acids Res. 33, 922–933 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Lawrence, J. G. & Ochman, H. Amelioration of bacterial genomes: rates of change and exchange. J. Mol. Evol. 44, 383–397 (1997). The first publication to suggest that the sequence composition of a sequence derived from an HGT event adapts to that of the new host over time, therefore making the prediction of ancient GIs using sequence composition bias more difficult.

    Article  CAS  PubMed  Google Scholar 

  24. Williams, K. P. Integration sites for genetic elements in prokaryotic tRNA and tmRNA genes: sublocation preference of integrase subfamilies. Nucleic Acids Res. 30, 866–875 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Reiter, W. D., Palm, P. & Yeats, S. Transfer RNA genes frequently serve as integration sites for prokaryotic genetic elements. Nucleic Acids Res. 17, 1907–1914 (1989).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Langille, M. G. & Brinkman, F. S. IslandViewer: an integrated interface for computational identification and visualization of genomic islands. Bioinformatics 25, 664–665 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Fouts, D. Phage_Finder: automated identification and classification of prophage regions in complete bacterial genome sequences. Nucleic Acids Res. 34, 5839–5851 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Hsiao, W. W. et al. Evidence of a large novel gene pool associated with prokaryotic genomic islands. PLoS Genet. 1, e62 (2005).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Vernikos, G. S. & Parkhill, J. Resolving the structural features of genomic islands: a machine learning approach. Genome Res. 18, 331–342 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Nakamura, Y., Itoh, T., Matsuda, H. & Gojobori, T. Biased biological functions of horizontally transferred genes in prokaryotic genomes. Nature Genet. 36, 760–766 (2004).

    Article  CAS  PubMed  Google Scholar 

  31. Waack, S. et al. Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models. BMC Bioinformatics 7, 142 (2006).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Suttle, C. A. Viruses in the sea. Nature 437, 356–361 (2005).

    Article  CAS  PubMed  Google Scholar 

  33. Langille, M. G. I., Hsiao, W. W. L. & Brinkman, F. S. L. Evaluation of genomic island predictors using a comparative genomics approach. BMC Bioinformatics 9, 329 (2008). An in-depth analysis of the differences between the GI prediction programs that are currently available.

    Article  PubMed  PubMed Central  Google Scholar 

  34. Merkl, R. SIGI: score-based identification of genomic islands. BMC Bioinformatics 5, 22 (2004).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Nakamura, Y., Gojobori, T. & Ikemura, T. Codon usage tabulated from the international DNA sequence databases; its status 1999. Nucleic Acids Res. 27, 292 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Eddy, S. R. What is a hidden Markov model? Nature Biotech. 22, 1315–1316 (2004).

    Article  CAS  Google Scholar 

  37. Finn, R. D. et al. The Pfam protein families database. Nucleic Acids Res. 36, D281–D288 (2008).

    Article  CAS  PubMed  Google Scholar 

  38. Tu, Q. & Ding, D. Detecting pathogenicity islands and anomalous gene clusters by iterative discriminant analysis. FEMS Microbiol. Lett. 221, 269–275 (2003).

    Article  CAS  PubMed  Google Scholar 

  39. Rajan, I., Aravamuthan, S. & Mande, S. S. Identification of compositionally distinct regions in genomes using the centroid method. Bioinformatics 23, 2672–2677 (2007).

    Article  CAS  PubMed  Google Scholar 

  40. Pundhir, S., Vijayvargiya, H. & Kumar, A. PredictBias: a server for the identification of genomic and pathogenicity islands in prokaryotes. In Silico Biol. 8, 223–234 (2008).

    CAS  Google Scholar 

  41. Delcher, A. L., Phillippy, A., Carlton, J. & Salzberg, S. L. Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 30, 2478–2483 (2002).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Ou, H. Y. et al. MobilomeFINDER: web-based tools for in silico and experimental discovery of bacterial genomic islands. Nucleic Acids Res. 35, W97–W104 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  43. Ou, H. Y. et al. A novel strategy for the identification of genomic islands by comparative analysis of the contents and contexts of tRNA sites in closely related bacteria. Nucleic Acids Res. 34, e3 (2006).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Rutherford, K. et al. Artemis: sequence visualization and annotation. Bioinformatics 16, 944–945 (2000).

    Article  CAS  PubMed  Google Scholar 

  45. Hsiao, W., Wan, I., Jones, S. J. & Brinkman, F. S. IslandPath: aiding detection of genomic islands in prokaryotes. Bioinformatics 19, 418–420 (2003).

    Article  CAS  PubMed  Google Scholar 

  46. Chiapello, H. et al. Systematic determination of the mosaic structure of bacterial genomes: species backbone versus strain-specific loops. BMC Bioinformatics 6, 171 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Mantri, Y. & Williams, K. P. Islander: a database of integrative islands in prokaryotic genomes, the associated integrases and their DNA site specificities. Nucleic Acids Res. 32, D55–D58 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Laslett, D., Canback, B. & Andersson, S. BRUCE: a program for the detection of transfer-messenger RNA genes in nucleotide sequences. Nucleic Acids Res. 30, 3449–3453 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Yoon, S. et al. Towards pathogenomics: a web-based resource for pathogenicity islands. Nucleic Acids Res. 35, D395–D400 (2006).

    Article  PubMed  PubMed Central  Google Scholar 

  51. Yang, J., Chen, L., Sun, L., Yu, J. & Jin, Q. VFDB 2008 release: an enhanced web-based resource for comparative pathogenomics. Nucleic Acids Res. 36, D539–D542 (2008).

    Article  CAS  PubMed  Google Scholar 

  52. Smart, C. H., Walshaw, M. J., Hart, C. A. & Winstanley, C. Use of suppression subtractive hybridization to examine the accessory genome of the Liverpool cystic fibrosis epidemic strain of Pseudomonas aeruginosa. J. Med. Microbiol. 55, 677–688 (2006).

    Article  CAS  PubMed  Google Scholar 

  53. Fothergill, J. L., Mowat, E., Ledson, M. J., Walshaw, M. J. & Winstanley, C. Fluctuations in phenotypes and genotypes within populations of Pseudomonas aeruginosa in the cystic fibrosis lung during pulmonary exacerbations. J. Med. Microbiol. 59, 472–481 (2009).

    Article  PubMed  Google Scholar 

  54. Carver, T. J. et al. ACT: the Artemis Comparison Tool. Bioinformatics 21, 3422–3423 (2005).

    Article  CAS  PubMed  Google Scholar 

  55. Brudno, M. et al. Glocal alignment: finding rearrangements during alignment. Bioinformatics 19 (Suppl. 1), i54–i62 (2003).

    Article  PubMed  Google Scholar 

  56. Frazer, K. A., Pachter, L., Poliakov, A., Rubin, E. M. & Dubchak, I. VISTA: computational tools for comparative genomics. Nucleic Acids Res. 32, W273–W279 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Markowitz, V. M. et al. The integrated microbial genomes (IMG) system. Nucleic Acids Res. 34, 344–348 (2006).

    Article  Google Scholar 

  58. Azad, R. K. & Lawrence, J. G. Detecting laterally transferred genes: use of entropic clustering methods and genome position. Nucleic Acids Res. 35, 4629–4639 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Arvey, A. J., Azad, R. K., Raval, A. & Lawrence, J. G. Detection of genomic islands via segmental genome heterogeneity. Nucleic Acids Res. 37, 5255–5266 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Chen, J. & Novick, R. P. Phage-mediated intergeneric transfer of toxin genes. Science 323, 139–141 (2009).

    Article  CAS  PubMed  Google Scholar 

  61. Canchaya, C., Fournous, G. & Brussow, H. The impact of prophages on bacterial chromosomes. Mol. Microbiol. 53, 9–18 (2004).

    Article  CAS  PubMed  Google Scholar 

  62. Casjens, S. Prophages and bacterial genomics: what have we learned so far? Mol. Microbiol. 49, 277–300 (2003).

    Article  CAS  PubMed  Google Scholar 

  63. Tinsley, C. R., Bille, E. & Nassif, X. Bacteriophages and pathogenicity: more than just providing a toxin? Microbes Infect. 8, 1365–1371 (2006).

    Article  CAS  PubMed  Google Scholar 

  64. Rajakumar, K., Sasakawa, C. & Adler, B. Use of a novel approach, termed island probing, identifies the Shigella flexneri she pathogenicity island which encodes a homolog of the immunoglobulin A protease-like family of proteins. Infect. Immun. 65, 4606–4614 (1997).

    CAS  PubMed  PubMed Central  Google Scholar 

  65. Al-Hasani, K. et al. The sigA gene which is borne on the she pathogenicity island of Shigella flexneri 2a encodes an exported cytopathic protease involved in intestinal fluid accumulation. Infect. Immun. 68, 2457–2463 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Al-Hasani, K. et al. Genetic organization of the she pathogenicity island in Shigella flexneri 2a. Microb. Pathog. 30, 1–8 (2001).

    Article  CAS  PubMed  Google Scholar 

  67. Al-Hasani, K., Adler, B., Rajakumar, K. & Sakellaris, H. Distribution and structural variation of the she pathogenicity island in enteric bacterial pathogens. J. Med. Microbiol. 50, 780–786 (2001).

    Article  CAS  PubMed  Google Scholar 

  68. Kurtz, S. & Schleiermacher, C. REPuter: fast computation of maximal repeats in complete genomes. Bioinformatics 15, 426–427 (1999).

    Article  CAS  PubMed  Google Scholar 

  69. Tatusov, R. L., Koonin, E. V. & Lipman, D. J. A genomic perspective on protein families. Science 278, 631–637 (1997).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge the Simon Fraser University and University of British Columbia's Bioinformatics Training Program, which is funded by the Canadian Institutes of Health Research (CIHR) and the Michael Smith Foundation for Health Research (MSFHR), for providing initial funding. F.S.L.B. is the recipient of a MSFHR Senior Scholar award and a CIHR New Investigator award. Support for analyses was also provided by Genome Canada and the Cystic Fibrosis Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fiona S. L. Brinkman.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

Related links

DATABASES

Entrez Genome Project

Escherichia coli

Escherichia fergusonii ATCC 35469

Pseudomonas aeruginosa

Shigella flexneri 2a str. 301

FURTHER INFORMATION

Fiona S. L. Brinkman's homepage

ACT

Alien_Hunter

Artemis

BLAST

CUTG codon usage database

HMMer

IMG

islander

IslandPath

IslandViewer

Mauve

MobilomeFinDER

MOSAIC

MUMmer

PAIDB

PFAM

PredictBias

Shuffle-LAGAN

SIGI-HMM

VFDB

Glossary

Horizontal gene transfer

Transfer of genetic material from one organism to another organism that is not its offspring.

Pathogenicity island

A subset of genomic islands that contribute to the pathogenicity of a bacterium.

Genomic island

In a bacterial genome, a cluster of genes for which there is evidence of horizontal origins.

Mobile genetic element

Any sequence of DNA that is physically moved within the genome of an organism or between different organisms.

Prophage

A viral genome that has integrated into a bacterial host genome.

Integron

A gene capture system that assembles tandem arrays of genes and provides them with a promoter for expression. Integrons are often found in other mobile elements.

Conjugative transposon

An integrated DNA element that can excise and transfer, by conjugation, to another bacterial host.

Integrative conjugative element

A self-transmissible MGE that is transferred by conjugation and integrates into the genome in order to replicate.

Phyletic pattern

The presence or absence of evolutionarily related genes or organisms.

Integrase

An enzyme that is often used by phages for site-specific recombination between two DNA strands, catalysing the integration or excision of DNA and resulting in the formation of a transient covalent bond with the DNA substrate.

Transposase

An enzyme that is encoded by transposons and insertion sequence elements and is required for site-specific recombination between two DNA elements that specifically does not involve the formation of a covalent enzyme–substrate intermediate.

Insertion sequence element

A short mobile DNA sequence similar to a transposon but only encoding genes for its transposition.

k-mer

A piece of nucleotide sequence of length k nucleotides.

Hidden Markov Model

A statistical model used for pattern recognition that can be used to analyse DNA sequences.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Langille, M., Hsiao, W. & Brinkman, F. Detecting genomic islands using bioinformatics approaches. Nat Rev Microbiol 8, 373–382 (2010). https://doi.org/10.1038/nrmicro2350

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrmicro2350

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing