Collection |

Computational Biology

Advances in technology across all areas of science have ushered in an era of big data, providing researchers with unprecedented opportunities to understand how biological systems function and interact. Scientists are now faced with the challenge of developing sophisticated computational tools capable of unravelling these data and uncovering important biological signals. Computational biology will continue to play a key role in facilitating multi-disciplinary collaborations, encouraging data sharing and establishing experimental and analytical standards in the life sciences.


Flux balance analysis is a mathematical approach for analyzing the flow of metabolites through a metabolic network. This primer covers the theoretical basis of the approach, several practical examples and a software toolbox for performing the calculations.

Primer | | Nature Biotechnology

Support vector machines (SVMs) are becoming popular in a wide variety of biological applications. But, what exactly are SVMs and how do they work? And what are their most promising applications in the life sciences?

Primer | | Nature Biotechnology

Statistical models called hidden Markov models are a recurring theme in computational biology. What are hidden Markov models, and why are they so useful for so many different problems?

Primer | | Nature Biotechnology

Sequence motifs are becoming increasingly important in the analysis of gene regulation. How do we define sequence motifs, and why should we use sequence logos instead of consensus sequences to represent them? Do they have any relation with binding affinity? How do we search for new instances of a motif in this sea of DNA?

Primer | | Nature Biotechnology

Artificial neural networks have been applied to problems ranging from speech recognition to prediction of protein secondary structure, classification of cancers and gene prediction. How do they work and what might they be good for?

Primer | | Nature Biotechnology

Programs such as MFOLD and ViennaRNA are widely used to predict RNA secondary structures. How do these algorithms work? Why can't they predict RNA pseudoknots? How accurate are they, and will they get better?

Primer | | Nature Biotechnology

Bayesian networks are increasingly important for integrating biological data and for inferring cellular networks and pathways. What are Bayesian networks and how are they used for inference?

Primer | | Nature Biotechnology

Decision trees have been applied to problems such as assigning protein function and predicting splice sites. How do these classifiers work, what types of problems can they solve and what are their advantages over alternatives?

Primer | | Nature Biotechnology

How can we computationally extract an unknown motif from a set of target sequences? What are the principles behind the major motif discovery algorithms? Which of these should we use, and how do we know we've found a 'real' motif?

Primer | | Nature Biotechnology

Only a subset of single-nucleotide polymorphisms (SNPs) can be genotyped in genome-wide association studies. Imputation methods can infer the alleles of 'hidden' variants and use those inferences to test the hidden variants for association.

Primer | | Nature Biotechnology

Only a subset of genetic variants can be examined in genome-wide surveys for genetic risk factors. How can a fixed set of markers account for the entire genome by acting as proxies for neighboring associations?

Primer | | Nature Biotechnology

RNA-Seq enables rapid sequencing of total cellular RNA and should allow the reconstruction of spliced transcripts in a cell population. Trapnell et al. achieve this and transcript quantification using only paired-end RNA-Seq data and an unannotated genome sequence, and apply the approach to characterize isoform switching over a developmental time course.

Letter | | Nature Biotechnology

Metabolic network modeling in multicellular organisms is confounded by the existence of multiple tissues with distinct metabolic functions. By integrating a genome-scale metabolic network with tissue-specific gene- and protein-expression data, Shlomi et al. adapt constraint-based approaches used for microorganisms to predicting metabolism in ten human tissues. Their computational approach should facilitate interpretation of expression data in the context of metabolic disorders.

Analysis | | Nature Biotechnology

Single-molecule sequencing technologies can produce multikilobase-long reads, which are more useful than short reads for assembling genomes and transcriptomes, but their error rates are too high. Koren et al. correct long reads from a PacBio instrument using high-fidelity, short reads from complementary technologies, facilitating assembly of previously intractable sequences.

Article | | Nature Biotechnology

From the archives

The Bowtie 2 software achieves fast, sensitive, accurate and memory-efficient gapped alignment of sequencing reads using the full-text minute index and hardware-accelerated dynamic programming algorithms.

Brief Communication | | Nature Methods

Despite the need for new psychoactive drugs, there are few robust approaches for discovering novel neuroactive molecules. Development of a behavior-based high-throughput screen in zebrafish led to the discovery of molecules with neurological effects. Translating the complex behavioral phenotypes elicited by compounds into a simple barcode enabled identification of their mechanism of action.

Article | | Nature Chemical Biology

Eukaryotic genomes do not exist in vivo as naked DNA, but in complexes known as chromatin. Chromatin contains nucleosomes, short stretches of DNA tightly wrapped around a histone protein core, which exclude most DNA binding proteins and so act as repressors. A combined computational and experimental approach has been used to determine DNA sequence preferences of nucleosomes and to predict genome-wide nucleosome organization. The yeast genome encodes an intrinsic nucleosome organization that explains about half of the in vivo nucleosome positions. Highly conserved across eukaryotes, the code directs transcription factors to their binding sites and facilitates many other specific chromosome functions. An accompanying News and Views piece discusses the role of DNA sequence and other regulators in nucleosome positioning. The cover graphic represents a stretch of chromatin including several nucleosomes.

Article | | Nature

A natural polypeptide chain can fold into a native protein in microseconds, but predicting such stable three-dimensional structure from any given amino-acid sequence and first physical principles remains a formidable computational challenge. Aiming to recruit human visual and strategic powers to the task, Seth Cooper, David Baker and colleagues turned their 'Rosetta' structure-prediction algorithm into an online multiplayer game called Foldit, in which thousands of non-scientists competed and collaborated to produce a rich set of new algorithms and search strategies for protein structure refinement. The work shows that even computationally complex scientific problems can be effectively crowd-sourced using interactive multiplayer games.

Letter | | Nature

The analysis of protein-interaction networks is essential to an understanding of the regulatory processes in a living cell. Many methods have been developed with a view to predicting protein–protein interactions (PPIs) at a genome-wide level, although the differences obtained using these approaches suggest that there are still factors unaccounted for. Barry Honig and colleagues have developed a new way of predicting PPIs that is based on the proteins' three-dimensional structures and functional data. Tests of several predictions of the new algorithm, known as PREPPI, confirm the accuracy of the results.

Letter | | Nature

Our ability to multitask and our capacity for cognitive control decline linearly as we age. A new study shows that cognitive training can help repair this decline. In older adults aged between 60 and 85 who trained at home by playing NeuroRacer, a custom-designed 3D video game, both multitasking and cognitive control improved, with effects persisting for six months. The benefits of this training extended to untrained cognitive functions such as sustained attention and working memory. These findings suggest that the ageing brain may be more robustly plastic than previously thought, allowing for cognitive enhancement using appropriately designed strategies.

Letter | | Nature

Owen Rackham, Jose Polo, Julian Gough and colleagues present a method, Mogrify, for predicting sets of transcription factors that can induce transdifferentiation between cell types. They show that Mogrify is able to predict known factors for published cell conversions and experimentally validate factors for two new conversions.

Letter | | Nature Genetics

Identifying molecular predictors of effective vaccination is an important clinical and technical goal. Pulendran and colleagues use a systems biology approach to study human responses to vaccination against influenza and determine the correlates of immunogenicity.

Resource | | Nature Immunology

The authors examined neuronal responses in V1 and V2 to synthetic texture stimuli that replicate higher-order statistical dependencies found in natural images. V2, but not V1, responded differentially to these textures, in both macaque (single neurons) and human (fMRI). Human detection of naturalistic structure in the same images was predicted by V2 responses, suggesting a role for V2 in representing natural image structure.

Article | | Nature Neuroscience

The authors develop a new method to mine genomic cancer data to uncover complex indels. These simultaneous deletions and insertions have been over-looked by previous sequencing data analysis methods, and the Pindel-C algorithm uncovers new information about their potential contribution to tumorigenesis.

Analysis | | Nature Medicine