Abstract
Areas of life sciences research that were previously distant from each other in ideology, analysis practices and toolkits, such as microbial ecology and personalized medicine, have all embraced techniques that rely on next-generation sequencing instruments. Yet the capacity to generate the data greatly outpaces our ability to analyse it. Existing sequencing technologies are more mature and accessible than the methodologies that are available for individual researchers to move, store, analyse and present data in a fashion that is transparent and reproducible. Here we discuss currently pressing issues with analysis, interpretation, reproducibility and accessibility of these data, and we present promising solutions and venture into potential future developments.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Allison, D., Cui, X. & Page, G. Microarray data analysis: from disarray to consolidation and consensus. Nature Rev. Genet. 7, 55–65 (2006).
Quackenbush, J. Computational analysis of microarray data. Nature Rev. Genet. 2, 418–427 (2001).
Ioannidis, J. P. A. et al. Repeatability of published microarray gene expression analyses. Nature Genet. 41, 149–155 (2009).
Pepke, S., Wold, B. & Mortazavi, A. Computation for ChIP-seq and RNA-seq studies. Nature Methods 6, S22–S32 (2009).
Wang, Z., Gerstein, M. & Snyder, M. RNA-seq: a revolutionary tool for transcriptomics. Nature Rev. Genet. 10, 57–63 (2009).
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Chen, R. et al. Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell 148, 1293–1307 (2012).
Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).
Gibbs, R., Belmont, J., Hardenbol, P. & Willis, T. The International HapMap Project. Nature 426, 789–796 (2003).
Nielsen, R., Paul, J. S., Albrechtsen, A. & Song, Y. S. Genotype and SNP calling from next-generation sequencing data. Nature 12, 443–451 (2011).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genet. 43, 491–498 (2011).
Auton, A. et al. A fine-scale chimpanzee genetic map from population sequencing. Science 336, 193–198 (2012).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Agrawal, N. et al. Exome sequencing of head and neck squamous cell carcinoma reveals inactivating mutations in NOTCH1. Science 333, 1154–1157 (2011).
Stransky, N. et al. The mutational landscape of head and neck squamous cell carcinoma. Science 333, 1157–1160 (2011).
Lushbough, C. An overview of the bioextract server: a distributed, web-based system for genomic analysis. Adv. Comp. Biol. 680, 361–369 (2010).
Goecks, J., Nekrutenko, A. & Taylor, J. Galaxy Team Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11, R86 (2010).
Reich, M., Liefeld, T., Gould, J., Lerner, J. & Tamayo, P. GenePattern 2.0. Nature Genet. 38, 500–501 (2006).
Halbritter, F., Vaidya, H. J. & Tomlinson, S. R. GeneProf: analysis of high-throughput sequencing experiments. Nature Methods 9, 7–8 (2011).
Néron, B., Ménager, H., Maufrais, C. & Joly, N. Mobyle: a new full web bioinformatics framework. Bioinformatics 25, 3005–3011 (2009).
Mesirov, J. P. Accessible reproducible research. Science 327, 415–416 (2010).
Goto, H. et al. Dynamics of mitochondrial heteroplasmy in three families investigated via a repeatable re-sequencing study. Genome Biol. 12, R59 (2011).
Langmead, B., Schatz, M., Lin, J. & Pop, M. Searching for SNPs with cloud computing. Genome Biol. 25, 3005–3011 (2009).
Langmead, B. & Hansen, K. Cloud-scale RNA-sequencing differential expression analysis with Myrna. Genome Biol. 11, R83 (2010).
Angiuoli, S. V. et al. CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing. BMC Bioinformatics 12, 356 (2011).
Afgan, E. et al. Galaxy CloudMan: delivering cloud compute clusters. BMC Bioinformatics 11, (Suppl. 12), S4 (2010).
Afgan, E. et al. Harnessing cloud computing with Galaxy Cloud. Nature Biotech. 29, 972–974 (2011).
Stein, L. Creating a bioinformatics nation. Nature 417, 119–120 (2002).
States, D. J. Bioinformatics code must enforce citation. Nature 417, 588 (2002).
Parkhill, J., Crook, J., Horsnell, T. & Rice, P. Artemis: sequence visualization and annotation 16, 944–945 (2000).
Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).
Clarke, L. et al. The 1000 Genomes Project: data management and community access. Nature Methods 9, 459–462 (2012).
Sanger, F. & Nicklen, S. DNA sequencing with chain-terminating inhibitors. Bioinformatics 24, 104–108 (1977).
Saiki, R. et al. Enzymatic amplification of β-globin genomic sequences and restriction site analysis for diagnosis of sickle cell anemia. Science 230, 1350–1354 (1985).
Saiki, R. K. et al. Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science 239, 487–491 (1988).
Schwab, M., Karrenbach, N. & Claerbout, J. Making scientific computations reproducible. Comput. Sci. Engineer. 2, 61–67 (2000).
Carey, V. J. & Stodden, V. in Biomedical Informatics for Cancer Research (eds Ochs, M. F. et al.) 149–175 (2010).
Begley, C. G. & Ellis, L. M. Drug development: raise standards for preclinical cancer research. Nature 483, 531–533 (2012).
Perkel, J. M. Coding your way out of a problem. Nature Methods 8, 541–543 (2011).
Mailman, M., Feolo, M., Jin, Y., Kimura, M. & Tryka, K. The NCBI dbGaP database of genotypes and phenotypes. Nature Genet. 39, 1181–1186 (2007).
Li, J., Schmieder, R., Ward, R. & Delenick, J. SEQanswers: an open access community for collaboratively decoding genomes. Bioinformatics 28, 1272–1273 (2012).
Mangan, M., Miller, C. & Albert, I. BioStar: an online question & answer resource for the bioinformatics community. PLoS Comp. Biol. 7, e1002216 (2011).
Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature Protoc. 7, 562–578 (2011).
Acknowledgements
The authors are grateful for the support of the Galaxy Team (E. Afgan, G. Ananda, D. Baker, D. Blankenberg, D. Bouvier, D. Clements, N. Coraor, C. Eberhard, J. Goecks, J. Jackson, G. Von Kuster, R. Lazarus, R. Marenco and S. McManus). The visual analytics framework shown in Figure 1 has been built by J. Goecks. The authors' laboratories are supported by US National Institutes of Health grants HG005133, HG004909 and HG006620 and US National Science Foundation grant DBI 0850103. Additional funding is provided, in part, by the Huck Institutes for the Life Sciences at Penn State, the Institute for Cyberscience at Penn State and a grant with the Pennsylvania Department of Health using Tobacco Settlement Funds. The Department specifically disclaims responsibility for any analyses, interpretations or conclusions.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary information S1 (table)
A survey of papers utilizing variant detection with next generation sequencing (PDF 223 kb)
Supplementary information S2 (reference list)
A survey of analyses using bwa mapper (PDF 115 kb)
Related links
Glossary
- Application-programming interfaces
-
These define how different software components interact with each other. In the context of cloud computing, an application-programming interface defines how user-provided software interacts with the underlying cloud platform resources.
- Virtual machines
-
A computing resource that appears to be a physical machine with a defined computing environment but may be simulated on another computing platform. In the context of cloud computing, virtual machines can be provisioned on demand and then accessed over the Internet like any other machine.
Rights and permissions
About this article
Cite this article
Nekrutenko, A., Taylor, J. Next-generation sequencing data interpretation: enhancing reproducibility and accessibility. Nat Rev Genet 13, 667–672 (2012). https://doi.org/10.1038/nrg3305
Published:
Issue Date:
DOI: https://doi.org/10.1038/nrg3305
This article is cited by
-
Genome-wide association study reveals that the IBSP locus affects ear size in cattle
Heredity (2023)
-
A novel synthesis of two decades of microsatellite studies on European beech reveals decreasing genetic diversity from glacial refugia
Tree Genetics & Genomes (2023)
-
Molecular epidemiology of antimicrobial-resistant Pseudomonas aeruginosa in a veterinary teaching hospital environment
Veterinary Research Communications (2023)
-
Local ancestry and selection in admixed Sanjiang cattle
Stress Biology (2023)
-
ARPEGGIO: Automated Reproducible Polyploid EpiGenetic GuIdance workflOw
BMC Genomics (2021)