Key Points
-
New sequencing technologies, such as Solexa, 454 pyrosequencing and SOLiD, developed by Illumina, Roche and Applied Biosystems, respectively, are set to revolutionize microbiology by dramatically increasing throughput and reducing costs of DNA sequencing.
-
These new technologies present new technical and computational challenges, as well as new research opportunities.
-
Applications include de novo genome sequence assembly, metagenomics, sRNA discovery, detection of polymorphisms, expression profiling and epigenetics.
-
Many freely available software packages are available for dealing with the large datasets generated by these applications.
-
As well as sequence alignment and assembly, there is a need for downstream processing of data into a form that is accessible to biologists.
-
Standards are emerging for analysis and archiving of data generated by the new technologies.
Abstract
New sequencing methods generate data that can allow the assembly of microbial genome sequences in days. With such revolutionary advances in technology come new challenges in methodologies and informatics. In this article, we review the capabilities of high-throughput sequencing technologies and discuss the many options for getting useful information from the data.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Pop, M. & Salzberg, S. L. Bioinformatics challenges of new sequencing technology. Trends Genet. 24, 142–149 (2008). An accessible overview of the computational challenges presented by new sequencing technologies.
Trombetti, G. A., Bonnal, R. J., Rizzi, E., De Bellis, G. & Milanesi, L. Data handling strategies for high throughput pyrosequencers. BMC Bioinformatics 8, S22 (2007).
Hall, N. Advanced sequencing technologies and their wider impact in microbiology. J. Exp. Biol. 210, 1518–1525 (2007).
Holt, R. A. & Jones, S. J. The new paradigm of flow cell sequencing. Genome Res. 18, 839–846 (2008). A comprehensive description of sequencing technologies and their applications.
Mardis, E. R. The impact of next-generation sequencing technology on genetics. Trends Genet. 24, 133–141 (2008).
Mardis, E. R. Next-generation DNA sequencing methods. Annu. Rev. Genomics Hum. Genet. 9, 387–402 (2008).
Marguerat, S., Wilhelm, B. T. & Bähler, J. Next-generation sequencing: applications beyond genomes. Biochem. Soc. Trans. 36, 1091–1096 (2008).
Medini, D. et al. Microbiology in the post-genomic era. Nature Rev. Microbiol. 6, 419–430 (2008).
Rusk, N. & Kiermer, V. Primer: sequencing — the next generation. Nature Methods 5, 15 (2008).
Schuster, S. C. Next-generation sequencing transforms today's biology. Nature Methods 5, 16–18 (2008).
Shendure, J. & Ji, H. Next-generation DNA sequencing. Nature Biotechnol. 26, 1135–1145 (2008). Contains detailed descriptions of sequencing technologies and their applications, and a useful survey of available software.
Snyder, L. A., Loman, N., Pallen, M. J. & Penn, C. W. Next-generation sequencing — the promise and perils of charting the great microbial unknown. Microb. Ecol. 57, 1–3 (2009).
Steinberg, K. M., Okou, D. T. & Zwick, M. E. Applying rapid genome sequencing technologies to characterize pathogen genomes. Anal. Chem. 80, 520–528 (2008).
Wold, B. & Myers, R. M. Sequence census methods for functional genomics. Nature Methods 5, 19–21 (2008).
Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 (2005).
Braslavsky, I., Hebert, B., Kartalov, E. & Quake, S, R. Sequence information can be obtained from single DNA molecules. Proc. Natl Acad. Sci. USA 100, 3960–3964 (2003).
Harris, T. D. et al. Single-molecule DNA sequencing of a viral genome. Science 320, 106–109 (2008).
Medini, D., Donati, C., Tettelin, H., Masignani, V. & Rappuoli, R. The microbial pan-genome. Curr. Opin. Genet. Dev. 15, 589–594 (2005).
Velicer, G. J. Comprehensive mutation identification in an evolved bacterial cooperator and its cheating ancestor. Proc. Natl Acad. Sci. USA 103, 8107–8112 (2006).
Mardis, E., McPherson, J., Martienssen, R., Wilson, R. K. & McCombie, W. R. What is finished, and why does it matter. Genome Res. 12, 669–671 (2002).
Stiens, M. et al. Comparative genomic hybridisation and ultrafast pyrosequencing revealed remarkable differences between the Sinorhizobium meliloti genomes of the model strain Rm1021 and the field isolate SM11. J. Biotechnol. 136, 31–37 (2008).
La Scola, B. et al. Rapid comparative genomic analysis for clinical microbiology: the Francisella tularensis paradigm. Genome Res. 18, 742–750 (2008).
Dinsdale, E. A. et al. Functional metagenomic profiling of nine biomes. Nature 455, 830 (2008). The 454 GS20 technology developed by Roche enabled the authors to find that metagenomes from different biomes encode distinctly different metabolic profiles.
Ossowski, S. et al. Sequencing of natural strains of Arabidopsis thaliana with short reads. Genome Res. 18, 2024–2033 (2008). The authors tackle genome-wide polymorphism by integrating 'resequencing' approaches with de novo assembly.
Baird, N. A. et al. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS ONE 3, e3376 (2008).
Holt, K. E. et al. High-throughput sequencing provides insights into genome variation and evolution in Salmonella Typhi. Nature Genet. 40, 987–993 (2008).
Liu, Z. et al. Patterns of diversifying selection in the phytotoxin-like scr74 gene family of Phytophthora infestans. Mol. Biol. Evol. 22, 659–672 (2004).
Kamoun, S. A catalogue of the effector secretome of plant pathogenic oomycetes. Annu. Rev. Phytopathol. 44, 41–60 (2006).
Srivatsan, A. et al. High-precision, whole-genome sequencing of laboratory strains facilitates genetic studies. PLoS Genet. 4, e1000139 (2008).
Loman, N. J. & Pallen, M. J. XDR-TB genome sequencing: a glimpse of the microbiology of the future. Future Microbiol. 3, 111–113 (2008).
Velculescu, V. E., Zhang, L., Vogelstein, B. & Kinzler, K. W. Serial analysis of gene expression. Science 270, 484–487 (1995).
Cheung, F. et al. Analysis of the Pythium ultimum transcriptome using Sanger and pyrosequencing approaches. BMC Genomics 9, 542 (2008).
Cloonan, N. et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nature Methods 5, 613–619 (2008).
Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA–Seq. Nature Methods 5, 621–628 (2008).
Nagalakshmi, U. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320, 1344–1349 (2008).
Lister, R. et al. Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell 133, 523–536 (2008). This ambitious and comprehensive survey of the epigenome was enabled by sequencing technology developed by Illumina.
Marioni, J. C., Mason, C. E., Mane, S. M., Stephens, M. & Gilad, Y. RNA–Seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18, 1509–1517 (2008).
Shendure, J. The beginning of the end for microarrays? Nature Methods 5, 585–587 (2008).
Ren, B. et al. Genome-wide location and function of DNA binding proteins. Science 290, 2306–2309 (2000).
Johnson, D. S., Mortazavi, A., Myers, R. M. & Wold B. Genome-wide mapping of in vivo protein–DNA interactions. Science 316, 1497–1502 (2007).
Taylor, K. H. Ultradeep bisulfite sequencing analysis of DNA methylation patterns in multiple gene promoters by 454 sequencing. Cancer Res. 67, 8511–8518 (2007).
Cokus, S. J. et al. Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature 452, 215–219 (2008).
Barski, A. et al. High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 (2007).
Hakimi, M. A. & Deitsch, K. W. Epigenetics in Apicomplexa: control of gene expression during cell cycle progression, differentiation and antigenic variation. Curr. Opin. Microbiol. 10, 357–362 (2007).
Wang, G. P., Ciuffi, A., Leipzig, J., Berry, C. C. & Bushman, F. D. HIV integration site selection: analysis by massively parallel pyrosequencing reveals association with epigenetic modifications. Genome Res. 17, 1186–1194 (2007).
Molnár, A., Schwach, F., Studholme, D. J., Thuenemann, E. C. & Baulcombe, D. C. miRNAs control gene expression in the single-cell alga Chlamydomonas reinhardtii. Nature 447, 1126–1129 (2007).
Dohm, J. C., Lottaz, C., Borodina, T. & Himmelbauer, H. SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res. 17, 1697–1706 (2007).
Warren, R. L., Sutton, G. G., Jones, S. J. & Holt, R. A. Assembling millions of short DNA sequences using SSAKE. Bioinformatics 23, 500–501 (2007).
Jeck, W. R. et al. Extending assembly of short DNA sequences to handle error. Bioinformatics 23, 2942–2944 (2007).
Pevzner, P. A., Tang, H. & Waterman, M. S. An Eulerian path approach to DNA fragment assembly. Proc. Natl Acad. Sci. USA 98, 9748–9753 (2001).
Zerbino, D. R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008).
Chaisson, M. J. & Pevzner, P. A. Short read fragment assembly of bacterial genomes. Genome Res. 18, 324–330 (2008).
Hernandez, D., François, P., Farinelli, L., Osterås, M. & Schrenzel, J. De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Res. 18, 802–809 (2008).
Butler, J. et al. ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res. 18, 810–820 (2008).
Phillippy, A. M., Schatz, M. C. & Pop, M. Genome assembly forensics: finding the elusive mis-assembly. Genome Biol. 9, R55 (2008).
Huang, W. & Marth, G. EagleView: a genome assembly viewer for next-generation sequencing technologies. Genome Res. 18, 1538–1543 (2008).
Farrer, R. A., Kemen, E., Jones, J. D. G. & Studholme, D. J. De novo assembly of the Pseudomonas syringae pv. syringae B728a genome using Illumina/Solexa short sequence reads. FEMS Microbiol. Lett. 291, 103–111 (2009).
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Kent, W. J. BLAT — the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
Ning, Z., Cox, A. J. & Mullikin, J. C. SSAHA: a fast search method for large DNA databases. Genome Res. 11, 1725–1729 (2001).
Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851–1858 (2008).
Wu, T. D. & Watanabe, C. K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005).
Smith, A. D., Xuan, Z. & Zhang, M. Q. Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinformatics 9, 128 (2008).
Prüfer, K. et al. PatMaN: rapid alignment of short sequences to large databases. Bioinformatics 24, 1530–1531 (2008).
Li, R., Li, Y., Kristiansen, K. & Wang, J. SOAP: short oligonucleotide alignment program. Bioinformatics 24, 713–714 (2008).
Jiang, H. & Wong, W. H. SeqMap: mapping massive amount of oligonucleotides to the genome. Bioinformatics 24, 2395–2396 (2008).
Coarfa, C. & Milosavljevic, A. Pash 2.0: scaleable sequence anchoring for next-generation sequencing technologies. Pac. Symp. Biocomput. 102–113 (2008).
Fejes, A. P. et al. FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technology. Bioinformatics 24, 1729–1730 (2008).
Valouev, A. et al. Genome-wide analysis of transcription factor binding sites based on ChIP–Seq data. Nature Methods 5, 829–834 (2008).
Stein, L. D. The generic genome browser: a building block for a model organism system database. Genome Res. 12, 1599–1610 (2002).
Barton, G. et al. EMAAS: an extensible grid-based rich internet application for microarray data analysis and management. BMC Bioinformatics 9, 493 (2008).
Huntley, D., Tang, Y. A., Nesterova, T. B., Butcher, S. & Brockdorff, N. Genome Environment Browser (GEB): a dynamic browser for visualising high-throughput experimental data in the context of genome features. BMC Bioinformatics 9, 501 (2008).
Field, D. et al. The minimum information about a genome sequence (MIGS) specification. Nature Biotechnol. 26, 541–547 (2008).
Aury, J. M. High quality draft sequences for prokaryotic genomes using a mix of new sequencing technologies. BMC Genomics 9, 603 (2008).
Reinhardt, J. A. et al. De novo assembly using low-coverage short read sequence data from the rice pathogen Pseudomonas syringae pv. oryzae. Genome Res. 19, 294–305 (2009).
Acknowledgements
We are grateful to S. Kamoun, E. Kemen, S. Foster and M. Pallen for useful discussions and suggestions on the manuscript. This work was supported by Gatsby Foundation core funding to The Sainsbury Laboratory.
Author information
Authors and Affiliations
Corresponding author
Related links
Related links
DATABASES
Entrez Genome Project
Francisella tularensis subsp. holartica
Salmonella enterica subsp. enteric serovar Typhi
FURTHER INFORMATION
Glossary
- De novo assembly
-
Construction of longer sequences, such as contigs or genomes, from shorter sequences, such as sequence reads, without prior knowledge of the order of the reads or reference to a closely related sequence.
- Contig
-
A fragment of genome sequence derived by assembling shorter sequence reads into larger constructs on the basis of overlap between the sequence reads.
- Paired-end read
-
A sequence read known to come from a genomic region within a limited number of nucleotides of another. The extra information puts constraints on how far apart the reads can be placed during assembly or alignment, allowing more accurate placement and construction of contigs.
- Epigenetics
-
The study of inherited changes in gene function that cannot be explained by changes in DNA sequence.
- de Bruijn graph
-
In mathematics, a network structure is properly called a graph. The entities that are connected are called nodes and the connections are called edges. A de Bruijn graph is a graph in which the nodes are sets of symbols (similarly to the nucleotides in a sequence read) and the edges represent overlaps between the symbols. This is a convenient way to represent data, such as overlapping sequence reads.
- k-mer
-
A piece of nucleotide sequence of length k. A k-mer is usually used to indicate a computationally selected subsequence of an experimentally derived sequence, such as a read or a genome.
- N50
-
A measure of contig length. If all contigs generated in an assembly are placed end to end in order of length (longest first), then the N50 is the length of the contig that, when added, causes the total length of the chain to exceed half of the length of the genome being sequenced. The longer the contigs are the longer the contig that would break this barrier.
- BLAST
-
(Basic local alignment and search tool). A computer program for finding sequences in a database that have identity to a query sequence. BLAST has been available for years, and is the most widely used search tool.
- MIGS
-
(Minimum information about a genome sequence). A proposed metadata standard that aims to capture essential species, the source of the strain and other phylogenetic and experimental data about a sequenced organism. Such data collection facilitates the cataloguing and searching of species in large-scale databases.
- Finished genome
-
A genome sequence that has been shotgun sequenced and subjected to post-assembly procedures, such as long PCR, to close the gaps that occur between contigs.
Rights and permissions
About this article
Cite this article
MacLean, D., Jones, J. & Studholme, D. Application of 'next-generation' sequencing technologies to microbial genetics. Nat Rev Microbiol 7, 96–97 (2009). https://doi.org/10.1038/nrmicro2088
Published:
Issue Date:
DOI: https://doi.org/10.1038/nrmicro2088
This article is cited by
-
Metabolic properties, gene functions, and biosafety analysis reveal the action of three rhizospheric plant growth-promoting bacteria of Jujuncao (Pennisetum giganteum)
Environmental Science and Pollution Research (2022)
-
Molecular Identification and In Vitro Plant Growth-Promoting Activities of Culturable Potato (Solanum tuberosum L.) Rhizobacteria in Tanzania
Potato Research (2021)
-
Comprehensive transcriptomic and proteomic analyses of antroquinonol biosynthetic genes and enzymes in Antrodia camphorata
AMB Express (2020)
-
Transcriptome exploration to provide a resource for the study of Auricularia heimuer
Journal of Forestry Research (2020)
-
Screening for Candidate Genes Associated with Biocontrol Mechanisms of Bacillus pumilus DX01 Using Tn5 Transposon Mutagenesis and a 2-DE-Based Comparative Proteomic Analysis
Current Microbiology (2020)