Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain
the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in
Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles
and JavaScript.
The scale and complexity of genetic and genomic data are ever-expanding, requiring biologists to apply increasingly more sophisticated computational tools in the analysis, interpretation and storage of these data. This series contains articles that focus on the application of these software tools in genetics and genomics.
In this Review the authors provide an overview of key algorithmic developments, popular tools and emerging technologies used in the bioinformatic analysis of genomes. They also describe how such analysis can identify point mutations, copy number alterations, structural variations and mutational signatures in cancer genomes.
Machine learning is widely applied in various fields of genomics and systems biology. In this Review, the authors describe how responsible application of machine learning requires an understanding of several common pitfalls that users should be aware of (and mitigate) to avoid unreliable results.
In this Review, the authors discuss computational methods for interpreting the molecular and clinical effects of genetic variants. They focus on methods leveraging machine learning, including those that characterize the effects on wider molecular networks.
The interactions between tumours and the immune system are highly complex. This article discusses methods — primarily computational tools — for characterizing diverse aspects of cancer–immune cell interactions, including antigen presentation, T cell repertoires and heterogeneity in cell types and cell states. The Review particularly highlights the insights from single-cell data from both sequencing technologies and in situ imaging of tissues.
The repetitive nature of transposable elements (TEs) creates bioinformatic challenges that frequently result in them being disregarded (‘masked’) in analyses. As physiological and pathological roles for TEs become increasingly appreciated, this Review discusses bioinformatics tools dedicated to TE analysis, including for genomic annotation, TE classification, identifying polymorphisms and assessing likely functional impacts.
Fine-mapping is the process by which a trait-associated region from a genome-wide association study (GWAS) is analysed to identify the particular genetic variants that are likely to causally influence the examined trait. This Review discusses the diverse statistical approaches to fine-mapping and their foundations, strengths and limitations, including integration of trans-ethnic human population data and functional annotations.
Various genomics-related fields are increasingly taking advantage of long-read sequencing and long-range mapping technologies, but making sense of the data requires new analysis strategies. This Review discusses bioinformatics tools that have been devised to handle the numerous characteristic features of these long-range data types, with applications in genome assembly, genetic variant detection, haplotype phasing, transcriptomics and epigenomics.
Next-generation sequencing technologies have fuelled a rapid rise in data, which require vast computational resources to store and analyse. This Review discusses the role of cloud computing in genomics research to facilitate data sharing and new analyses of archived sequencing data, as well as large-scale international collaborations.
Cancer immunotherapies are promising strategies for cancer treatment. However, their optimized use will require a comprehensive understanding of the diverse cell types, antigens and genetic variants (both germline and somatic) that comprise the tumour–immune system interface. This Review discusses various bioinformatics tools that process multi-level omics data for insights into tumour–immune cell interactions.
Computer simulation of next-generation sequencing data can be extremely useful for assessing and validating biological models, benchmarking sequence analysis tools or gaining an understanding of specific data sets. Here, the authors review the functionality, requirements and applications of 23 currently available simulation tools and provide a guide for the selection of the most appropriate one.
With biomedical datasets growing exponentially in size and number, efforts to increase their utility and availability are essential, but much work remains to maximize exploitability. This Review summarizes trends, developments and future perspectives in the rapidly advancing field of human genotype–phenotype databases.
The rapid accumulation and increasing quality of human DNA sequence-variation data brought about by advances in genome-scale sequencing present opportunities to investigate human evolution. The authors discuss the statistical methods and models that can be used to gain insight into the evolution of human populations from analyses of large-scale genomic data sets, as well as the challenges associated with these approaches.
The field of cancer genomics has been transformed by recent advances in sequencing and the development of new computational methods. This Review outlines the available cancer genomics software and describes recent insights gained from the application of these tools.
Functional interactions between proteins and within proteins results in co-evolutionary signatures in amino acid sequences that serve as clues to various forms of interdependence. This Review discusses the principles and distinctions of the large range of computational tools to analyse protein co-evolution and the biological insight that they are providing.
As the use of next-generation sequencing has proliferated, so has the range of sequencing applications and software tools that are available for assembling sequences. To help readers to make informed choices about assembly techniques, this Review discusses the available options and practical trade-offs.
Text mining — retrieving information from papers and databases — is increasingly used in data-rich fields such as genomics, systems biology and biomedical research. This Review discusses recent tools that can aid researchers and sets out the potential of enhancing integrative research using text mining.
The analysis and interpretation of genome-wide DNA methylation data poses unique bioinformatics challenges. In this article, the tools that are available for processing, visualizing and interpreting these epigenetic data sets are discussed, and the relative advantages of various methods are considered.
Phylogenetic analysis is pervading every field of biological study. The authors review and assess the main methods of phylogenetic analysis — including parsimony, distance, likelihood and Bayesian methods — and provide guidance for selecting the most appropriate approach and software package.
Although genome sequencing is becoming routine, genome annotation is becoming increasingly challenging. The authors provide an overview of the steps and software tools that are available for annotating eukaryotic genomes, and describe the best practices for sharing, quality checking and updating the annotation.
Computer simulations can be valuable components of studies in many fields, including population genetics, evolutionary biology, genetic epidemiology and ecology. The recent increase in the available range of software packages is now making simulation an accessible option for researchers with limited bioinformatics experience.
Repeat sequences in DNA remain one of the most challenging aspects of next-generation sequencing data analysis and interpretation. This Review explains the problems and current strategies for handling repeats; ignoring repeats risks missing important biological information.
Studies of the composition, dynamics and function of the human microbiome have taken off in the past two years thanks to the development of new sequencing technologies and advanced algorithms. This article provides a guide to the experimental and analytical best practices in this flourishing field.
Systems biology is intrinsically reliant on software tools and data resources. Through looking at each stage in a systems biology workflow, this Review presents the available options and key challenges, and sets out the concept of an integrated software platform.
Advances in sequencing technologies, assembly algorithms and computing power are making it feasible to assemble the entire transcriptome from short RNA reads. The article reviews the transcriptome assembly strategies, their advantages and limitations and how to apply them effectively.
The recent surge in sequencing output has uncovered a wealth of genetic variation, but interpretation of these data remains a challenge. This Review discusses computational and experimental methods for estimating the deleteriousness and functional significance of genetic variants to better identify those that are potentially causal for disease.
An overview of the steps required in converting next-generation sequencing (NGS) data into accurate called SNPs and genotypes, a process that is crucial for the many downstream analyses of NGS data.
Structural variation in the genome can influence disease, complex traits and evolution, but comprehensive characterization of variants is challenging. This Review compares current methods — particularly microarray platforms and sequencing-based computational analysis — and considers future research strategies.
This Review describes the different types of computational environments — such as cloud and heterogeneous computing — that are increasingly being used by life scientists to manage and analyse large multidimensional data sets.
A huge range of genome-scale data sets — including genomic, epigenomic and transcriptomic information — are now available, and it is widely acknowledged that combining several data sets can provide important biological insights. However, there are practical, conceptual and computational challenges to data integration.