This Review outlines a broad, universal framework for systems biology applied to infectious disease research. From study design and omics data collection, analysis, visualization and interpretation to translational outcomes, the authors illustrate how systems biology can provide insights into host–pathogen relationships for the betterment of human health.
Good experiments start with good planning. The analysis of genetic and molecular information involves increasingly sophisticated tools and methodologies, many of which are advancing at a fast pace. The articles in this series provide detailed advice on the most appropriate and up-to-date means for designing experiments, with the aim of helping researchers to maximise the value of their data.
The increased adoption of DNA sequencing in genetic association studies is uncovering a wide range of population genetic variation, including rare genetic variants. Although this rarity limits the statistical power of associating individual rare variants with phenotypes, this Review discusses the diverse methods for leveraging the collective effects of rare variants in order to uncover important roles in complex traits, particularly human diseases.
Various types of observational studies can provide statistical associations between factors, such as between an environmental exposure and a disease state. This Review discusses the various genetics-focused statistical methodologies that can move beyond mere associations to identify (or refute) various mechanisms of causality, with implications for responsibly managing risk factors in health care and the behavioural and social sciences.
The rapid development of CRISPR-based gene manipulation has enabled various approaches for high-throughput functional genomics. This Review guides users through the practicalities of CRISPR-based functional genomics screens, including study design options, best-practice approaches, pitfalls to avoid and data analysis strategies.
Advances in genome sequencing, editing and synthetic biology have enhanced the feasibility of large-scale genome engineering, termed genome writing. In this Opinion article, Chari and Church discuss the strengths and limitations of diverse strategies for genome writing, including extensively modifying existing genomes versus synthesizing genomesde novo, and they provide future visions for writing large genomes.
Although it has been known for decades that RNA is subjected to numerous covalent modifications, there has been a recent surge in interest driven by sequencing-based transcriptome-wide detection methods and the realization that RNA modifications have important roles in diverse biological processes. This Review describes the range of detection strategies for RNA modifications, their particular strengths and limitations, and how responsible and complementary application of these techniques will be required to ensure the quality and interpretability of the rapidly accumulating data sets.
The use of phylogenetics in cancer genomics is increasing owing to a growing appreciation of the importance of evolutionary theory to cancer progression. The authors provide guidance on the design and analysis of tumour phylogeny studies by surveying the range of phylogenetic methods and tools available to the cancer researcher and discussing their key applications and the unsolved problems in the field.
To characterize the genetic underpinnings of speciation, genome scans can identify genomic regions that differ between divergent populations of wild organisms. In this Review, Wolf and Ellegren describe the methodological details of these approaches and how genomic islands of differentiation should be interpreted cautiously in the search for 'speciation genes'. They also discuss methodological best practice that takes into consideration genomic differentiation occurring through speciation-independent evolutionary processes.
Loss-of-function (LOF) approaches are powerful experimental tools for characterizing gene functions. However, emerging discrepancies when genes are investigated using different tools or organisms has triggered debate about how such LOF results should be biologically interpreted. In this Review, experts from varied fields discuss how understanding the underlying features of each LOF approach can provide explanations for different experimental outcomes and can guide their optimal and reliable application.
A genome sequence is only useful once the information encoded in it can be deciphered. In this Review, Mudge and Harrow describe the latest approaches to higher eukaryote gene annotation, including making the best use of complex transcriptome data sets, integrating evidence for functionality and extending annotations to encompass regulatory features.
Considerable resources are required to gain maximal insights into the diverse big data sets in biomedicine. In this Review, the authors discuss how crowdsourcing, in the form of collaborative competitions (known as Challenges), can engage the scientific community to provide the diverse expertise and methodological approaches that can robustly address some of the most pressing questions in genetics, genomics and biomedical sciences.
Phenotypic data from electronic health records and epidemiological studies can be exploited to study the impact of genetic variation on the phenome. This Review highlights challenges that need to be overcome for the characterization of the complex human genome–phenome relationship using phenome-wide association studies (PheWAS).
Technical differences between the many variant methods that are based on restriction site-associated DNA sequencing (RADseq) lead to trade-offs in experimental design and analysis. Here, the authors comprehensively review the various RADseq approaches and provide general considerations for designing a RADseq study.
Directed evolution uses laboratory-based evolution to enhance the properties of biomolecules, primarily to generate proteins with optimized or novel activities. This Review discusses the diverse range of technologies for the directed evolution of proteins, particularly methods for generating diversity in the gene library and approaches for screening and selecting for variants with desired properties. The relative strengths and limitations of these approaches are highlighted to guide readers to appropriate strategies.
Before genome-wide association studies, linkage analysis was the primary approach used for genetic mapping of complex traits in humans. Now, with the widespread application of whole-genome sequencing (WGS), linkage analysis based on WGS data is emerging as a useful tool for the identification of susceptibility genes for human disease. This Review reiterates the main principles of linkage analysis and provides guidelines for performing linkage analysis on WGS data.
Next-generation sequencing methods can be used to examine features of chromatin biology, although the outputs of these methods can be subject to various potential biases. This Review describes the ways in which biases can be introduced to such experiments and outlines methods to detect and mitigate their effect.
This Review discusses the principles and applications of significance testing and power calculation, including recently proposed gene-based tests for rare variants.
Heritability estimates provide a useful means of understanding the genetic and environmental contributions to phenotypic variance. The authors define heritability, discuss how to estimate and interpret it in the context of disease and examine how biases in heritability estimates arise.
In addition to somatic mutations in tumours, inherited genetic variants can influence how a patient with cancer responds to drug treatment. This Review considers best practice in design and analysis for pharmacogenomic studies to identify such variants, potentially leading to personalized oncology.
ChIP–seq and beyond: new and improved methodologies to detect and characterize protein–DNA interactions
This Review discusses recent improvements to ChIP–seq and a range of complementary techniques, such as DNaseI hypersensitivity mapping, for studying protein–DNA interactions. Functional characterization of protein binding can be improved by methods analysing chromatin conformation or allele-specific binding.
Revealing genetic influences on metabolic phenotypes is important in further understanding the aetiology of many complex diseases. Here, the authors introduce study design considerations and applications for genome-wide association studies with metabolic traits.
Twin studies have long been used for dissecting the relative contributions of genetics and other factors to various phenotypes. This Review discusses how these traditional studies are now being integrated with modern omics technologies to provide a wide range of biological insights.
Biological processes are inherently dynamic and therefore capturing data about gene expression at multiple time points can provide valuable insights into biological systems. This Review discusses experimental and analytical considerations for studies of gene expression dynamics, and the possibilities for integration with other data sets.
Phylogenetic analysis is pervading every field of biological study. The authors review and assess the main methods of phylogenetic analysis — including parsimony, distance, likelihood and Bayesian methods — and provide guidance for selecting the most appropriate approach and software package.
This Review describes how genome-wide analyses have provided various insights into the most lethal malarial parasite,Plasmodium falciparum, including determinants of antimalarial drug resistance. The authors also propose how genetic tools can be refined to monitor future therapeutic interventions.
Although genome sequencing is becoming routine, genome annotation is becoming increasingly challenging. The authors provide an overview of the steps and software tools that are available for annotating eukaryotic genomes, and describe the best practices for sharing, quality checking and updating the annotation.
Computer simulations can be valuable components of studies in many fields, including population genetics, evolutionary biology, genetic epidemiology and ecology. The recent increase in the available range of software packages is now making simulation an accessible option for researchers with limited bioinformatics experience.
Studies of the composition, dynamics and function of the human microbiome have taken off in the past two years thanks to the development of new sequencing technologies and advanced algorithms. This article provides a guide to the experimental and analytical best practices in this flourishing field.
Systems biology is intrinsically reliant on software tools and data resources. Through looking at each stage in a systems biology workflow, this Review presents the available options and key challenges, and sets out the concept of an integrated software platform.
Although many studies claim to have detected an adaptive allele, this label is not always applied rigorously. The authors argue that obtaining direct evidence that specific alleles are adaptive requires approaches which functionally connect genotype, phenotype and fitness.
Exome sequencing is a powerful approach for accelerating the discovery of the genes underlying Mendelian disorders and, increasingly, of genes underlying complex traits. This Review describes the experimental and analytical options for applying exome sequencing and the key challenges in using this approach.
Advances in sequencing technologies, assembly algorithms and computing power are making it feasible to assemble the entire transcriptome from short RNA reads. The article reviews the transcriptome assembly strategies, their advantages and limitations and how to apply them effectively.
The authors review the experimental and computational approaches for determining haplotype phase, focusing on statistical methods, the factors that influence the strategy used and the value of using information on identity-by-descent.
The increased genetic diversity in populations with recent ancestry from more than one continent may help in the identification of genetic variants underlying disease risk. This Progress article discusses recent developments in methods to study complex traits in these admixed populations, including combining SNP and admixture association signals.
Technological advances now allow large-scale studies of human disease-associated epigenetic variation, specifically variation in DNA methylation. Such epigenome-wide association studies present novel opportunities, but, as discussed here, they also create new challenges that are not encountered in genome-wide association studies.
The authors describe the best practices for a growing number of methods that use next-generation sequencing to rapidly discover and assess genetic markers across any genome, with applications from population genomics and quantitative trait locus mapping to marker-assisted selection.
An overview of the steps required in converting next-generation sequencing (NGS) data into accurate called SNPs and genotypes, a process that is crucial for the many downstream analyses of NGS data.
Prediction of genetic values using whole-genome markers has been successfully applied in commercial breeding. This article outlines the use of this method for predicting health-related outcomes in humans.
There is increasing interest in investigating the influence of rare variants on common diseases, aided by high-throughput sequencing. However, the statistical approaches that are essential for analysing associations between rare variants and traits of interest are urgently in need of evaluation and refinement.
Identity by descent (IBD) — the probability that two alleles descended from a common ancestor — is used in fundamental applications such as gene mapping and estimating heritability. The authors offer a solution to the confusion between IBD and identity by state (IBS) that is caused by the common practice of using dense SNPs to estimate IBD.
This article reviews the increasing range of genome-scale methods that are being used to analyse eukaryotic DNA replication. Studies in different species and of replication timing or origin location have yielded varying degrees of success; technical hurdles remain, but important biological insights have been gained.
Cancer is fundamentally a disease of the genome and so high-throughput sequencing technologies offer great potential for improving our understanding of the biology and treatment of cancer. Experimental strategies, computational approaches and cancer-specific considerations for detecting different types of genomic alterations are discussed.
Despite the yield of genome-wide association studies, the variants identified explain little of the heritability of most complex diseases. This unexplained heritability could be partly due to gene–environment (G×E) interactions. This Review provides a guide to designs and analytical approaches for studying specific G×E interactions.
Mapping DNA methylation is vital for understanding the importance of this epigenetic mark in health and disease. Recent years have seen rapid progress in the development of techniques for genome-scale methylation profiling; this Review introduces and evaluates the available methods.
Genome-wide association studies are not widespread in Africa, partly because of the challenges of dealing with population structure and high genomic diversity. New approaches in statistical imputation and whole-genome sequencing are now set to exploit these features for fine mapping causal variants.
There is an increasing demand for next-generation sequencing technologies that rapidly deliver high volumes of accurate genome information at a low cost. This Review provides a guide to the features of the different platforms, and describes the recent advances in this fast-moving area.
Bayesian analyses are increasingly being used in genetics, particularly in the context of genome-wide association studies. This article provides a guide to using Bayesian analyses for assessing single-SNP associations and highlights the advantages of these methods compared with standard frequentist analyses.
Coupling next-generation sequencing to chromatin immunoprecipitation has transformed the resolution and genomic coverage of DNA-binding protein and nucleosome mapping studies. However, successful ChIP–seq requires careful consideration of the experimental and analytical approaches; this Review evaluates the current strategies and challenges.
Uncovering the genetic determinants of individual variation in gene expression in humans can improve our understanding of gene regulation and help to identify disease risk alleles. Further advances might be achieved by testing under different conditions, by using larger sample sizes or through network analysis.
Advances in mass spectrometry-based proteomics have led to an increasing use of proteomics data for the analysis of mutant phenotypes. Integrating this proteomic information with genomics and phenomics data into networks represents a promising route for modelling how phenotypes emerge.
Microfluidic 'lab-on-a-chip' devices can be used to study the dynamics of gene networks in single cells. This Review discusses the various designs of these devices and the insights into modelling the complex dynamics of gene regulation that these new technologies have provided.
FSTdescribes the processes that lead to genetic differentiation among and within populations and is widely used in population and evolutionary genetics. This article describes the meaning ofFSTand how it should be estimated and interpreted.
The limited lack of success of many human complex disease studies is often attributed to the existence of interactions between loci. This article reviews and assesses the methods and software packages that have been developed to detect these gene by gene interactions.
Genome-wide association studies have identified many promising links between genetic variants and human traits. However, the steps from the initial identification of associated markers to the reliable validation of the causal variant are long and tortuous, as the authors describe.
Mapping genetic variants that cause changes in transcript levels is a new tool that can give insight into the biology of disease risk loci identified by genome-wide association studies; here the potential power and technical challenges of this approach are discussed.
A realistic understanding of how a biological system arises from interactions between its parts increasingly depends on quantitative mathematical and statistical modelling. This Review explains how statistical inferences and stochastic modelling are the best tools we have for describing heterogeneous biological systems.
The development of high-throughput DNA sequencing methods provides a new method for mapping and quantifying transcriptomes — RNA sequencing (RNA-Seq). This article explains how RNA-Seq works, the challenges it faces and how it is changing our view of eukaryotic transcriptomes.
RNAi, a common gene knockdown technique, has been widely used in a variety of genetic screens. As part of our 'art and design of genetic screens' series, the authors discuss RNAi assay design and analytical approaches for large-scale screening experiments in cells and whole-animal experiments.
Genome-wide association studies have led to an improved understanding of the genetic basis of common diseases. Following the first wave of such studies, this Review takes a critical look at progress so far and considers how future studies can be optimized.