Advances in technology across all areas of science have ushered in an era of big data, providing researchers with unprecedented opportunities to understand how biological systems function and interact. Scientists are now faced with the challenge of developing sophisticated computational tools capable of unravelling these data and uncovering important biological signals. Computational biology will continue to play a key role in facilitating multi-disciplinary collaborations, encouraging data sharing and establishing experimental and analytical standards in the life sciences.

Produced with support from IBM Research and IBM Watson Health.

Nature Biotechnology Primers

What is flux balance analysis?

Jeffrey D Orth, Ines Thiele and Bernhard θ Palsson


Nature Biotechnology 28, 245-248 (2010)

What is a support vector machine?

William S Noble


Nature Biotechnology 24, 1565-1567 (2006)

What is principal component analysis?

Markus Ringnér


Nature Biotechnology 26, 303-304 (2008)

How does gene expression clustering work?

Patrik D'haeseleer


Nature Biotechnology 23, 1499-1501 (2005)

How to map billions of short reads onto genomes

Cole Trapnell and Steven L Salzberg


Nature Biotechnology 27, 455-457 (2009)

What is a hidden Markov model?

Sean R Eddy


Nature Biotechnology 22, 1315-1316 (2004)

How does multiple testing correction work?

William S Noble


Nature Biotechnology 27, 1135-1137 (2009)

Where did the BLOSUM62 alignment score matrix come from?

Sean R Eddy


Nature Biotechnology 22, 1035-1036 (2004)

What are DNA sequence motifs?

Patrik D'haeseleer


Nature Biotechnology 24, 423-425 (2006)

How to apply de Bruijn graphs to genome assembly

Phillip E C Compeau, Pavel A Pevzner and Glenn Tesler


Nature Biotechnology 29, 987-991 (2011)

What is the expectation maximization algorithm?

Chuong B Do and Serafim Batzoglou


Nature Biotechnology 26, 897-899 (2008)

What are artificial neural networks?

Anders Krogh


Nature Biotechnology 26, 195-197 (2008)

How do RNA folding algorithms work?

Sean R Eddy


Nature Biotechnology 22, 1457-1458 (2004)

How do shotgun proteomics algorithms identify proteins?

Edward M Marcotte


Nature Biotechnology 25, 755-757 (2007)

Inference in Bayesian networks

Chris J Needham, James R Bradford, Andrew J Bulpitt and David R Westhead


Nature Biotechnology 24, 51-53 (2006)

What are decision trees?

Carl Kingsford and Steven L Salzberg


Nature Biotechnology 26, 1011-1013 (2008)

How does DNA sequence motif discovery work?

Patrik D'haeseleer


Nature Biotechnology 24, 959-961 (2006)

What is dynamic programming?

Sean R Eddy


Nature Biotechnology 22, 909-910 (2004)

What is Bayesian statistics?

Sean R Eddy


Nature Biotechnology 22, 1177-1178 (2004)

How to visually interpret biological data using networks

Daniele Merico, David Gfeller and Gary D Bader


Nature Biotechnology 27, 921-924 (2009)

SNP imputation in association studies

Eran Halperin and Dietrich A Stephan


Nature Biotechnology 27, 349-351 (2009)

Analyzing 'omics data using hierarchical models

Hongkai Ji and X Shirley Liu


Nature Biotechnology 28, 337-340 (2010)

How does eukaryotic gene prediction work?

Michael R Brent


Nature Biotechnology 25, 883-885 (2007)

Understanding genome browsing

Melissa S Cline and W James Kent


Nature Biotechnology 27, 153-155 (2009)

Maximizing power in association studies

Eran Halperin and Dietrich A Stephan


Nature Biotechnology 27, 255-256 (2009)


Nature Biotechnology Research Articles

Discovering microRNAs from deep sequencing data using miRDeep

Marc R Friedländer, Wei Chen, Catherine Adamidi, Jonas Maaskola, Ralf Einspanier, Signe Knespel and Nikolaus Rajewsky


Nature Biotechnology 26, 407-415 (2008)

Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences

Morgan G I Langille, Jesse Zaneveld, J Gregory Caporaso, Daniel McDonald, Dan Knights, Joshua A Reyes, Jose C Clemente, Deron E Burkepile, Rebecca L Vega Thurber, Rob Knight, Robert G Beiko and Curtis Huttenhower


Nature Biotechnology 31, 814-821 (2013)

Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

Peng Qiu, Erin F Simonds, Sean C Bendall, Kenneth D Gibbs, Robert V Bruggner, Michael D Linderman, Karen Sachs, Garry P Nolan and Sylvia K Plevritis


Nature Biotechnology 29, 886-891 (2011)

viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia

El-ad David Amir, Kara L Davis, Michelle D Tadmor, Erin F Simonds, Jacob H Levine, Sean C Bendall, Daniel K Shenfeld, Smita Krishnaswamy, Garry P Nolan and Dana Pe'er


Nature Biotechnology 31, 545-552 (2013)

Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data

Joshua C Denny, Lisa Bastarache, Marylyn D Ritchie, Robert J Carroll, Raquel Zink, Jonathan D Mosley, Julie R Field, Jill M Pulley, Andrea H Ramirez, Erica Bowton, Melissa A Basford, David S Carrell, Peggy L Peissig, Abel N Kho, Jennifer A Pacheco, Luke V Rasmussen, David R Crosslin, Paul K Crane, Jyotishman Pathak, Suzette J Bielinski, Sarah A Pendergrass, Hua Xu, Lucia A Hindorff, Rongling Li, Teri A Manolio, Christopher G Chute, Rex L Chisholm, Eric B Larson, Gail P Jarvik, Murray H Brilliant, Catherine A McCarty, Iftikhar J Kullo, Jonathan L Haines, Dana C Crawford, Daniel R Masys and Dan M Roden


Nature Biotechnology 31, 1102-1111 (2013)

Crowdsourced analysis of clinical trial data to predict amyotrophic lateral sclerosis progression

Robert Küffner, Neta Zach, Raquel Norel, Johann Hawe, David Schoenfeld, Liuxia Wang, Guang Li, Lilly Fang, Lester Mackey, Orla Hardiman, Merit Cudkowicz, Alexander Sherman, Gokhan Ertaylan, Moritz Grosse-Wentrup, Torsten Hothorn, Jules van Ligtenberg, Jakob H Macke, Timm Meyer, Bernhard Schölkopf, Linh Tran, Rubio Vaughan, Gustavo Stolovitzky and Melanie L Leitner


Nature Biotechnology 33, 51-57 (2014)

Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning

Babak Alipanahi, Andrew Delong, Matthew T Weirauch and Brendan J Frey


Nature Biotechnology 33, 831-838 (2015)

MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification

Jürgen Cox and Matthias Mann


Nature Biotechnology 26, 1367-1372 (2008)

Full-length transcriptome assembly from RNA-Seq data without a reference genome

Manfred G Grabherr, Brian J Haas, Moran Yassour, Joshua Z Levin, Dawn A Thompson, Ido Amit, Xian Adiconis, Lin Fan, Raktima Raychowdhury, Qiandong Zeng, Zehua Chen, Evan Mauceli, Nir Hacohen, Andreas Gnirke, Nicholas Rhind, Federica di Palma, Bruce W Birren, Chad Nusbaum, Kerstin Lindblad-Toh, Nir Friedman and Aviv Regev


Nature Biotechnology 29, 644-652 (2011)

Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation

Cole Trapnell, Brian A Williams, Geo Pertea, Ali Mortazavi, Gordon Kwan, Marijke J van Baren, Steven L Salzberg, Barbara J Wold and Lior Pachter


Nature Biotechnology 28, 511-515 (2010)

Network-based prediction of human tissue-specific metabolism

Tomer Shlomi, Moran N Cabili, Markus J Herrgârd, Bernhard θ Palsson and Eytan Ruppin


Nature Biotechnology 26, 1003-1010 (2008)

Hybrid error correction and de novo assembly of single-molecule sequencing reads

Sergey Koren, Michael C Schatz, Brian P Walenz, Jeffrey Martin, Jason T Howard, Ganeshkumar Ganapathy, Zhong Wang, David A Rasko, W Richard McCombie, Erich D Jarvis and Adam M Phillippy


Nature Biotechnology 30, 693-700 (2012)