Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Opinion
  • Published:

Next-generation sequencing data interpretation: enhancing reproducibility and accessibility

Abstract

Areas of life sciences research that were previously distant from each other in ideology, analysis practices and toolkits, such as microbial ecology and personalized medicine, have all embraced techniques that rely on next-generation sequencing instruments. Yet the capacity to generate the data greatly outpaces our ability to analyse it. Existing sequencing technologies are more mature and accessible than the methodologies that are available for individual researchers to move, store, analyse and present data in a fashion that is transparent and reproducible. Here we discuss currently pressing issues with analysis, interpretation, reproducibility and accessibility of these data, and we present promising solutions and venture into potential future developments.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: A prototype visual analytics framework for next-generation sequencing analysis.

Similar content being viewed by others

References

  1. Allison, D., Cui, X. & Page, G. Microarray data analysis: from disarray to consolidation and consensus. Nature Rev. Genet. 7, 55–65 (2006).

    Article  CAS  PubMed  Google Scholar 

  2. Quackenbush, J. Computational analysis of microarray data. Nature Rev. Genet. 2, 418–427 (2001).

    Article  CAS  PubMed  Google Scholar 

  3. Ioannidis, J. P. A. et al. Repeatability of published microarray gene expression analyses. Nature Genet. 41, 149–155 (2009).

    Article  CAS  PubMed  Google Scholar 

  4. Pepke, S., Wold, B. & Mortazavi, A. Computation for ChIP-seq and RNA-seq studies. Nature Methods 6, S22–S32 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Wang, Z., Gerstein, M. & Snyder, M. RNA-seq: a revolutionary tool for transcriptomics. Nature Rev. Genet. 10, 57–63 (2009).

    CAS  PubMed  Google Scholar 

  6. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Chen, R. et al. Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell 148, 1293–1307 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

  9. Gibbs, R., Belmont, J., Hardenbol, P. & Willis, T. The International HapMap Project. Nature 426, 789–796 (2003).

    Article  CAS  Google Scholar 

  10. Nielsen, R., Paul, J. S., Albrechtsen, A. & Song, Y. S. Genotype and SNP calling from next-generation sequencing data. Nature 12, 443–451 (2011).

    CAS  Google Scholar 

  11. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  12. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genet. 43, 491–498 (2011).

    Article  CAS  PubMed  Google Scholar 

  14. Auton, A. et al. A fine-scale chimpanzee genetic map from population sequencing. Science 336, 193–198 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Agrawal, N. et al. Exome sequencing of head and neck squamous cell carcinoma reveals inactivating mutations in NOTCH1. Science 333, 1154–1157 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Stransky, N. et al. The mutational landscape of head and neck squamous cell carcinoma. Science 333, 1157–1160 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Lushbough, C. An overview of the bioextract server: a distributed, web-based system for genomic analysis. Adv. Comp. Biol. 680, 361–369 (2010).

    Article  CAS  Google Scholar 

  19. Goecks, J., Nekrutenko, A. & Taylor, J. Galaxy Team Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11, R86 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Reich, M., Liefeld, T., Gould, J., Lerner, J. & Tamayo, P. GenePattern 2.0. Nature Genet. 38, 500–501 (2006).

    Article  CAS  PubMed  Google Scholar 

  21. Halbritter, F., Vaidya, H. J. & Tomlinson, S. R. GeneProf: analysis of high-throughput sequencing experiments. Nature Methods 9, 7–8 (2011).

    Article  PubMed  Google Scholar 

  22. Néron, B., Ménager, H., Maufrais, C. & Joly, N. Mobyle: a new full web bioinformatics framework. Bioinformatics 25, 3005–3011 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Mesirov, J. P. Accessible reproducible research. Science 327, 415–416 (2010).

    Article  CAS  PubMed  Google Scholar 

  24. Goto, H. et al. Dynamics of mitochondrial heteroplasmy in three families investigated via a repeatable re-sequencing study. Genome Biol. 12, R59 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Langmead, B., Schatz, M., Lin, J. & Pop, M. Searching for SNPs with cloud computing. Genome Biol. 25, 3005–3011 (2009).

    Google Scholar 

  26. Langmead, B. & Hansen, K. Cloud-scale RNA-sequencing differential expression analysis with Myrna. Genome Biol. 11, R83 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Angiuoli, S. V. et al. CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing. BMC Bioinformatics 12, 356 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Afgan, E. et al. Galaxy CloudMan: delivering cloud compute clusters. BMC Bioinformatics 11, (Suppl. 12), S4 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Afgan, E. et al. Harnessing cloud computing with Galaxy Cloud. Nature Biotech. 29, 972–974 (2011).

    Article  CAS  Google Scholar 

  30. Stein, L. Creating a bioinformatics nation. Nature 417, 119–120 (2002).

    Article  CAS  PubMed  Google Scholar 

  31. States, D. J. Bioinformatics code must enforce citation. Nature 417, 588 (2002).

    Article  CAS  PubMed  Google Scholar 

  32. Parkhill, J., Crook, J., Horsnell, T. & Rice, P. Artemis: sequence visualization and annotation 16, 944–945 (2000).

  33. Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Clarke, L. et al. The 1000 Genomes Project: data management and community access. Nature Methods 9, 459–462 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Sanger, F. & Nicklen, S. DNA sequencing with chain-terminating inhibitors. Bioinformatics 24, 104–108 (1977).

    Google Scholar 

  36. Saiki, R. et al. Enzymatic amplification of β-globin genomic sequences and restriction site analysis for diagnosis of sickle cell anemia. Science 230, 1350–1354 (1985).

    Article  CAS  PubMed  Google Scholar 

  37. Saiki, R. K. et al. Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science 239, 487–491 (1988).

    Article  CAS  PubMed  Google Scholar 

  38. Schwab, M., Karrenbach, N. & Claerbout, J. Making scientific computations reproducible. Comput. Sci. Engineer. 2, 61–67 (2000).

    Article  Google Scholar 

  39. Carey, V. J. & Stodden, V. in Biomedical Informatics for Cancer Research (eds Ochs, M. F. et al.) 149–175 (2010).

    Book  Google Scholar 

  40. Begley, C. G. & Ellis, L. M. Drug development: raise standards for preclinical cancer research. Nature 483, 531–533 (2012).

    Article  CAS  PubMed  Google Scholar 

  41. Perkel, J. M. Coding your way out of a problem. Nature Methods 8, 541–543 (2011).

    Article  CAS  PubMed  Google Scholar 

  42. Mailman, M., Feolo, M., Jin, Y., Kimura, M. & Tryka, K. The NCBI dbGaP database of genotypes and phenotypes. Nature Genet. 39, 1181–1186 (2007).

    Article  CAS  PubMed  Google Scholar 

  43. Li, J., Schmieder, R., Ward, R. & Delenick, J. SEQanswers: an open access community for collaboratively decoding genomes. Bioinformatics 28, 1272–1273 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Mangan, M., Miller, C. & Albert, I. BioStar: an online question & answer resource for the bioinformatics community. PLoS Comp. Biol. 7, e1002216 (2011).

    Article  Google Scholar 

  45. Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature Protoc. 7, 562–578 (2011).

    Article  Google Scholar 

Download references

Acknowledgements

The authors are grateful for the support of the Galaxy Team (E. Afgan, G. Ananda, D. Baker, D. Blankenberg, D. Bouvier, D. Clements, N. Coraor, C. Eberhard, J. Goecks, J. Jackson, G. Von Kuster, R. Lazarus, R. Marenco and S. McManus). The visual analytics framework shown in Figure 1 has been built by J. Goecks. The authors' laboratories are supported by US National Institutes of Health grants HG005133, HG004909 and HG006620 and US National Science Foundation grant DBI 0850103. Additional funding is provided, in part, by the Huck Institutes for the Life Sciences at Penn State, the Institute for Cyberscience at Penn State and a grant with the Pennsylvania Department of Health using Tobacco Settlement Funds. The Department specifically disclaims responsibility for any analyses, interpretations or conclusions.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Anton Nekrutenko or James Taylor.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary information S1 (table)

A survey of papers utilizing variant detection with next generation sequencing (PDF 223 kb)

Supplementary information S2 (reference list)

A survey of analyses using bwa mapper (PDF 115 kb)

Related links

Related links

FURTHER INFORMATION

Anton Nekrutenko's homepage

1000 Genomes Project

1000 Genomes analysis description

1000 Genomes software tools

BioExtract

CloudMan

CloVR

Crossbow

dbGaP

dbSNP

Dryad

Figshare

Galaxy

Galaxy Tool Shed

GenePattern

GeneProf

GParc

HapMap

Mobyle

Myrna

Picard

SAMtools

Software Carpentry

XSEDE

Glossary

Application-programming interfaces

These define how different software components interact with each other. In the context of cloud computing, an application-programming interface defines how user-provided software interacts with the underlying cloud platform resources.

Virtual machines

A computing resource that appears to be a physical machine with a defined computing environment but may be simulated on another computing platform. In the context of cloud computing, virtual machines can be provisioned on demand and then accessed over the Internet like any other machine.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nekrutenko, A., Taylor, J. Next-generation sequencing data interpretation: enhancing reproducibility and accessibility. Nat Rev Genet 13, 667–672 (2012). https://doi.org/10.1038/nrg3305

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg3305

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research