Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data

Cooper, Gregory M.; Shendure, Jay

doi:10.1038/nrg3046

Review Article
Published: 18 August 2011

Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data

Gregory M. Cooper¹ &
Jay Shendure²

Nature Reviews Genetics volume 12, pages 628–640 (2011)Cite this article

16k Accesses
391 Citations
27 Altmetric
Metrics details

Subjects

Key Points

Genome and exome sequencing yield extensive catalogues of genetic variation in many individuals, but purely genetic approaches are often insufficiently powered to specifically identify the few variants that are causally related to any given phenotype. Indeed, variant interpretation is an increasingly important challenge at the interface of genetics, statistics and biology.
Non-uniform estimates of the prior probability for variants to be biologically functional will be required to address this challenge. For disease studies, this can be translated into the need to estimate variant deleteriousness.
Nearly all computational methods to predict deleteriousness use comparative sequence analysis, exploiting the fact that natural selection removes deleterious variants and tends to conserve the identities of important positions within genes and genomes.
Assessment of protein-altering variants leverages both biochemical and evolutionary information, whereas non-coding variation is more challenging to study, given a lack of understanding of the molecular functionality of non-coding sequences relative to coding sequences.
Experimental assessments of the functional impact of variants have historically relied on low-throughput assays. However, projects such as the Encyclopedia of DNA Elements (ENCODE) and the clever use of next-generation sequencing technologies are increasingly facilitating large-scale, systematic experimental assessment of genomic variation of many types.
Ultimately, unified predictive methods that are applicable to both coding and non-coding variants that leverage both functional and evolutionary information will be crucial for the meaningful interpretation of personal genomes. However, important unknowns and unsolved phenomena, including the relative abundance and penetrance of coding versus non-coding variants, disagreements between evolutionary and experimental definitions of molecular functionality, and the vocabularies that define transcriptional regulatory elements, must first be addressed.

Abstract

Genome and exome sequencing yield extensive catalogues of human genetic variation. However, pinpointing the few phenotypically causal variants among the many variants present in human genomes remains a major challenge, particularly for rare and complex traits wherein genetic information alone is often insufficient. Here, we review approaches to estimate the deleteriousness of single nucleotide variants (SNVs), which can be used to prioritize disease-causal variants. We describe recent advances in comparative and functional genomics that enable systematic annotation of both coding and non-coding variants. Application and optimization of these methods will be essential to find the genetic answers that sequencing promises to hide in plain sight.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Assessing variant deleteriousness to boost discovery power of genetic analyses.**

**Figure 2: Functional and evolutionary annotations highlight disease variation at the *HBB* locus.**

**Figure 3: High-throughput experimental assessment of variant function.**

References

Shendure, J. & Ji, H. Next-generation DNA sequencing. Nature Biotech. 26, 1135–1145 (2008).
Article CAS Google Scholar
Bentley, D. R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).
Article CAS PubMed PubMed Central Google Scholar
The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).
Lander, E. S. Initial impact of the sequencing of the human genome. Nature 470, 187–197 (2011).
Article CAS PubMed Google Scholar
Manly, K. F., Nettleton, D. & Hwang, J. T. Genomics, prior probability, and statistical tests of multiple hypotheses. Genome Res. 14, 997–1001 (2004). This is a valuable review of the relationships between prior probability, statistical significance and false-discovery rates as they pertain to genome-wide analyses.
Article CAS PubMed Google Scholar
Morton, N. E. Sequential tests for the detection of linkage. Am. J. Hum. Genet. 7, 277–318 (1955).
CAS PubMed PubMed Central Google Scholar
Ng, S. B. et al. Exome sequencing identifies the cause of a mendelian disorder. Nature Genet. 42, 30–35 (2010).
Article CAS PubMed Google Scholar
Ng, S. B. et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461, 272–276 (2009). This is the first demonstration of exome sequencing being used to identify the causal variants for a Mendelian disease. Protein-based annotations of functional deleteriousness were essential to this effort.
Article CAS PubMed PubMed Central Google Scholar
Choi, M. et al. Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proc. Natl Acad. Sci. USA 106, 19096–19101 (2009).
Article CAS PubMed PubMed Central Google Scholar
Erlich, Y. et al. Exome sequencing and disease-network analysis of a single family implicate a mutation in KIF1A in hereditary spastic paraparesis. Genome Res. 21, 658–664 (2011).
Article CAS PubMed PubMed Central Google Scholar
Kimura, M. The Neutral Theory Of Molecular Evolution (Cambridge Univ. Press, New York, 1983).
Book Google Scholar
Cooper, G. M. & Brown, C. D. Qualifying the relationship between sequence conservation and molecular function. Genome Res. 18, 201–205 (2008).
Article CAS PubMed Google Scholar
McAuliffe, J. D., Jordan, M. I. & Pachter, L. Subtree power analysis and species selection for comparative genomics. Proc. Natl Acad. Sci. USA 102, 7900–7905 (2005).
Article CAS PubMed PubMed Central Google Scholar
Stone, E. A., Cooper, G. M. & Sidow, A. Trade-offs in detecting evolutionarily constrained sequence by comparative genomics. Annu. Rev. Genomics Hum. Genet. 6, 143–164 (2005).
Article CAS PubMed Google Scholar
Eddy, S. R. A model of the statistical power of comparative genome sequence analysis. PLoS Biol. 3, e10 (2005).
Article CAS PubMed PubMed Central Google Scholar
The Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437, 69–87 (2005).
Davydov, E. V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010).
Article CAS PubMed PubMed Central Google Scholar
The Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).
The ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007).
Boffelli, D. et al. Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299, 1391–1394 (2003).
Article CAS PubMed Google Scholar
Prabhakar, S. et al. Close sequence comparisons are sufficient to identify human cis-regulatory elements. Genome Res. 16, 855–863 (2006).
Article CAS PubMed PubMed Central Google Scholar
Kaessmann, H. Origins, evolution, and phenotypic impact of new genes. Genome Res. 20, 1313–1326 (2010).
Article CAS PubMed PubMed Central Google Scholar
Johnson, M. E. et al. Positive selection of a gene family during the emergence of humans and African apes. Nature 413, 514–519 (2001).
Article CAS PubMed Google Scholar
Wang, T. et al. Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53. Proc. Natl Acad. Sci. USA 104, 18613–18618 (2007).
Article CAS PubMed PubMed Central Google Scholar
Enard, W. et al. Molecular evolution of FOXP2, a gene involved in speech and language. Nature 418, 869–872 (2002).
Article CAS PubMed Google Scholar
Prabhakar, S. et al. Human-specific gain of function in a developmental enhancer. Science 321, 1346–1350 (2008). This study demonstrates that constraint-based measures may also identify sequences with human-specific functionality.
Article CAS PubMed PubMed Central Google Scholar
Stone, E. A. & Sidow, A. Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity. Genome Res. 15, 978–986 (2005). The authors describe a combined phylogenetic and biochemical approach to predict the effects of amino acid substitutions. They demonstrate a quantitative relationship between past evolutionary rates of biochemical change and present day deleteriousness.
Article CAS PubMed PubMed Central Google Scholar
De Gobbi, M. et al. A regulatory SNP causes a human genetic disease by creating a new transcriptional promoter. Science 312, 1215–1217 (2006).
Article CAS PubMed Google Scholar
Botstein, D. & Risch, N. Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nature Genet. 33, 228–237 (2003).
Article CAS PubMed Google Scholar
Ng, S. B. et al. Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nature Genet. 42, 790–793 (2010).
Article CAS PubMed Google Scholar
MacArthur, D. G. & Tyler-Smith, C. Loss-of-function variants in the genomes of healthy humans. Hum. Mol. Genet. 19, R125–R130 (2010).
Article CAS PubMed PubMed Central Google Scholar
Grantham, R. Amino acid difference formula to help explain protein evolution. Science 185, 862–864 (1974).
Article CAS PubMed Google Scholar
Ng, P. C. & Henikoff, S. Predicting the effects of amino acid substitutions on protein function. Annu. Rev. Genomics Hum. Genet. 7, 61–80 (2006).
Article CAS PubMed Google Scholar
Care, M. A., Needham, C. J., Bulpitt, A. J. & Westhead, D. R. Deleterious SNP prediction: be mindful of your training data! Bioinformatics 23, 664–672 (2007).
Article CAS PubMed Google Scholar
Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations. Nature Methods 7, 248–249 (2010).
Article CAS PubMed PubMed Central Google Scholar
Bromberg, Y. & Rost, B. SNAP: predict effect of non-synonymous polymorphisms on function. Nucleic Acids Res. 35, 3823–3835 (2007).
Article CAS PubMed PubMed Central Google Scholar
Capriotti, E., Calabrese, R. & Casadio, R. Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information. Bioinformatics 22, 2729–2734 (2006).
Article CAS PubMed Google Scholar
Ferrer-Costa, C., Orozco, M. & de la Cruz, X. Sequence-based prediction of pathological mutations. Proteins 57, 811–819 (2004).
Article CAS PubMed Google Scholar
Ng, P. C. & Henikoff, S. Predicting deleterious amino acid substitutions. Genome Res. 11, 863–874 (2001). This describes SIFT (also see reference 46), a commonly used tool to predict the effects of amino acid substitutions and an early demonstration of the importance of sequence conservation to functional predictions.
Article CAS PubMed PubMed Central Google Scholar
Schwarz, J. M., Rodelsperger, C., Schuelke, M. & Seelow, D. MutationTaster evaluates disease-causing potential of sequence alterations. Nature Methods 7, 575–576 (2010).
Article CAS PubMed Google Scholar
Thomas, P. D. et al. PANTHER: a library of protein families and subfamilies indexed by function. Genome Res. 13, 2129–2141 (2003).
Article CAS PubMed PubMed Central Google Scholar
Ye, Z. Q. et al. Finding new structural and sequence attributes to predict possible disease association of single amino acid polymorphism (SAP). Bioinformatics 23, 1444–1450 (2007).
Article CAS PubMed Google Scholar
Chun, S. & Fay, J. C. Identification of deleterious mutations within three human genomes. Genome Res. 19, 1553–1561 (2009).
Article CAS PubMed PubMed Central Google Scholar
Bao, L., Zhou, M. & Cui, Y. nsSNPAnalyzer: identifying disease-associated nonsynonymous single nucleotide polymorphisms. Nucleic Acids Res. 33, W480–W482 (2005).
Article CAS PubMed PubMed Central Google Scholar
Sunyaev, S. et al. Prediction of deleterious human alleles. Hum. Mol. Genet. 10, 591–597 (2001). This paper describes polymorphism phenotyping (polyPhen) (also see reference 35), a commonly used tool to predict the effects of amino acid substitutions, and illustrates the value of classifiers trained on numerous biochemical and evolutionary features.
Article CAS PubMed Google Scholar
Ng, P. C. & Henikoff, S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814 (2003).
Article CAS PubMed PubMed Central Google Scholar
Lynch, M. & Conery, J. S. The evolutionary fate and consequences of duplicate genes. Science 290, 1151–1155 (2000).
Article CAS PubMed Google Scholar
Marini, N. J., Thomas, P. D. & Rine, J. The use of orthologous sequences to predict the impact of amino acid substitutions on protein function. PLoS Genet. 6, e1000968 (2010).
Article CAS PubMed PubMed Central Google Scholar
Dobson, R. J., Munroe, P. B., Caulfield, M. J. & Saqi, M. A. Predicting deleterious nsSNPs: an analysis of sequence and structural attributes. BMC Bioinformatics 7, 217 (2006).
Article CAS PubMed PubMed Central Google Scholar
Saunders, C. T. & Baker, D. Evaluation of structural and evolutionary contributions to deleterious mutation prediction. J. Mol. Biol. 322, 891–901 (2002).
Article CAS PubMed Google Scholar
Yue, P., Li, Z. & Moult, J. Loss of protein structure stability as a major causative factor in monogenic disease. J. Mol. Biol. 353, 459–473 (2005).
Article CAS PubMed Google Scholar
Bao, L. & Cui, Y. Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information. Bioinformatics 21, 2185–2190 (2005).
Article CAS PubMed Google Scholar
Li, Y. et al. Predicting disease-associated substitution of a single amino acid by analyzing residue interactions. BMC Bioinformatics 12, 14 (2011).
Article PubMed PubMed Central Google Scholar
Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009).
Article CAS PubMed PubMed Central Google Scholar
Musunuru, K. et al. From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Nature 466, 714–719 (2010). This paper describes the precise identification of a common transcriptional regulatory variant that influences cholesterol levels and cardiovascular disease risk.
Article CAS PubMed PubMed Central Google Scholar
Storey, J. D. et al. Gene-expression variation within and among human populations. Am. J. Hum. Genet. 80, 502–509 (2007).
Article CAS PubMed PubMed Central Google Scholar
Nicolae, D. L. et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 6, e1000888 (2010). This analysis demonstrated that expression-associated variants are enriched among trait-associated variants, suggesting that non-coding regulatory variants are causally relevant for many traits.
Article CAS PubMed PubMed Central Google Scholar
King, M. C. & Wilson, A. C. Evolution at two levels in humans and chimpanzees. Science 188, 107–116 (1975).
Article CAS PubMed Google Scholar
Carroll, S. B. Evolution at two levels: on genes and form. PLoS Biol. 3, e245 (2005).
Article CAS PubMed PubMed Central Google Scholar
Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).
Article CAS PubMed PubMed Central Google Scholar
Lettice, L. A. et al. A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly. Hum. Mol. Genet. 12, 1725–1735 (2003). This study describes non-coding mutations that cause Mendelian limb defects by affecting enhancers important to developmental sonic hedgehog ( Shh ) gene regulation. A combination of evolutionary sequence conservation and mouse-based experimental assessments of variant function were used.
Article CAS PubMed Google Scholar
Stenson, P. D. et al. The Human Gene Mutation Database: providing a comprehensive central mutation database for molecular diagnostics and personalized genomics. Hum. Genomics 4, 69–72 (2009).
Article CAS PubMed PubMed Central Google Scholar
Treisman, R., Orkin, S. H. & Maniatis, T. Specific transcription and RNA splicing defects in five cloned β-thalassaemia genes. Nature 302, 591–596 (1983).
Article CAS PubMed Google Scholar
Woolfe, A. et al. Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol. 3, e7 (2005).
Article CAS PubMed Google Scholar
Dehal, P. et al. The draft genome of Ciona intestinalis: insights into chordate and vertebrate origins. Science 298, 2157–2167 (2002).
Article CAS PubMed Google Scholar
Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010).
Article CAS PubMed PubMed Central Google Scholar
Cooper, G. M. et al. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 15, 901–913 (2005).
Article CAS PubMed PubMed Central Google Scholar
Asthana, S., Roytberg, M., Stamatoyannopoulos, J. & Sunyaev, S. Analysis of sequence conservation at nucleotide resolution. PLoS Comput. Biol. 3, e254 (2007).
Article CAS PubMed PubMed Central Google Scholar
Margulies, E. H., Blanchette, M., Haussler, D. & Green, E. D. Identification and characterization of multi-species conserved sequences. Genome Res. 13, 2507–2518 (2003).
Article CAS PubMed PubMed Central Google Scholar
Dubchak, I. et al. Active conservation of noncoding sequences revealed by three-way species comparisons. Genome Res. 10, 1304–1306 (2000).
Article CAS PubMed PubMed Central Google Scholar
Parker, S. C., Hansen, L., Abaan, H. O., Tullius, T. D. & Margulies, E. H. Local DNA topography correlates with functional noncoding regions of the human genome. Science 324, 389–392 (2009).
Article CAS PubMed PubMed Central Google Scholar
Cooper, G. M. et al. Single-nucleotide evolutionary constraint scores highlight disease-causing mutations. Nature Methods 7, 250–251 (2010). This paper demonstrated that functionally agnostic nucleotide-level constraint scores, defined by GERP (also see references 17 and 67), offer considerable utility for causal variant discovery in exome analyses.
Article CAS PubMed PubMed Central Google Scholar
Wang, G. S. & Cooper, T. A. Splicing in disease: disruption of the splicing code and the decoding machinery. Nature Rev. Genet. 8, 749–761 (2007).
Article CAS PubMed Google Scholar
Drake, J. A. et al. Conserved noncoding sequences are selectively constrained and not mutation cold spots. Nature Genet. 38, 223–227 (2006).
Article CAS PubMed Google Scholar
Katzman, S. et al. Human genome ultraconserved elements are ultraselected. Science 317, 915 (2007).
Article CAS PubMed Google Scholar
Goode, D. L. et al. Evolutionary constraint facilitates interpretation of genetic variation in resequenced human genomes. Genome Res. 20, 301–310 (2010).
Article CAS PubMed PubMed Central Google Scholar
Pennacchio, L. A. et al. In vivo enhancer analysis of human conserved non-coding sequences. Nature 444, 499–502 (2006).
Article CAS PubMed Google Scholar
Margulies, E. H. et al. Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Res. 17, 760–774 (2007).
Article CAS PubMed PubMed Central Google Scholar
The ENCODE Project Consortium. A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol. 9, e1001046 (2011).
Ge, B. et al. Global patterns of cis variation in human cells revealed by high-density allelic expression analysis. Nature Genet. 41, 1216–1222 (2009).
Article CAS PubMed Google Scholar
Nica, A. C. et al. Candidate causal regulatory effects by integration of expression QTLs with complex trait genetic associations. PLoS Genet. 6, e1000895 (2010).
Article CAS PubMed PubMed Central Google Scholar
Zheng, W., Zhao, H., Mancera, E., Steinmetz, L. M. & Snyder, M. Genetic analysis of variation in transcription factor binding in yeast. Nature 464, 1187–1191 (2010).
Article CAS PubMed PubMed Central Google Scholar
Patwardhan, R. P. et al. High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis. Nature Biotech. 27, 1173–1175 (2009). This paper defined a method to exploit next-generation sequencing to comprehensively yet efficiently assay point mutations in transcriptional promoters.
Article CAS Google Scholar
Pitt, J. N. & Ferre-D'Amare, A. R. Rapid construction of empirical RNA fitness landscapes. Science 330, 376–379 (2010).
Article CAS PubMed PubMed Central Google Scholar
Fowler, D. M. et al. High-resolution mapping of protein sequence-function relationships. Nature Methods 7, 741–746 (2010).
Article CAS PubMed PubMed Central Google Scholar
Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-seq. Nature Methods 5, 621–628 (2008).
Article CAS PubMed Google Scholar
Johnson, D. S., Mortazavi, A., Myers, R. M. & Wold, B. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007).
Article CAS PubMed Google Scholar
Cao, A. R. et al. Genome-wide analysis of transcription factor E2F1 mutant proteins reveals that N- and C-terminal protein interaction domains do not participate in targeting E2F1 to the human genome. J. Biol. Chem. 286, 11985–11996 (2011).
Article CAS PubMed PubMed Central Google Scholar
Botstein, D. & Shortle, D. Strategies and applications of in vitro mutagenesis. Science 229, 1193–1201 (1985).
Article CAS PubMed Google Scholar
Blow, M. J. et al. ChIP-seq identification of weakly conserved heart enhancers. Nature Genet. 42, 806–810 (2010).
Article CAS PubMed Google Scholar
Cheng, Y. et al. Erythroid GATA1 function revealed by genome-wide analysis of transcription factor occupancy, histone modifications, and mRNA expression. Genome Res. 19, 2172–2184 (2009).
Article CAS PubMed PubMed Central Google Scholar
Miller, D. T. et al. Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies. Am. J. Hum. Genet. 86, 749–764 (2010).
Article CAS PubMed PubMed Central Google Scholar
Sebat, J. et al. Strong association of de novo copy number mutations with autism. Science 316, 445–449 (2007).
Article CAS PubMed PubMed Central Google Scholar
Walsh, T. et al. Rare structural variants disrupt multiple genes in neurodevelopmental pathways in schizophrenia. Science 320, 539–543 (2008).
Article CAS PubMed Google Scholar
Markiewicz, P., Kleina, L. G., Cruz, C., Ehret, S. & Miller, J. H. Genetic studies of the lac repressor. XIV. Analysis of 4000 altered Escherichia coli lac repressors reveals essential and non-essential residues, as well as “spacers” which do not require a specific sequence. J. Mol. Biol. 240, 421–433 (1994).
Article CAS PubMed Google Scholar
Rennell, D., Bouvier, S. E., Hardy, L. W. & Poteete, A. R. Systematic mutation of bacteriophage T4 lysozyme. J. Mol. Biol. 222, 67–88 (1991).
Article CAS PubMed Google Scholar
Loeb, D. D. et al. Complete mutagenesis of the HIV-1 protease. Nature 340, 397–400 (1989).
Article CAS PubMed Google Scholar
Hardison, R. C. et al. HbVar: a relational database of human hemoglobin variants and thalassemia mutations at the globin gene server. Hum. Mutat. 19, 225–233 (2002).
Article CAS PubMed Google Scholar
Olivier, M. et al. The IARC TP53 database: new online mutation analysis and recommendations to users. Hum. Mutat. 19, 607–614 (2002).
Article CAS PubMed Google Scholar
Yip, Y. L. et al. The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure information on human protein variants. Hum. Mutat. 23, 464–470 (2004).
Article CAS PubMed Google Scholar
Brown, C. D., Johnson, D. S. & Sidow, A. Functional architecture and evolution of transcriptional elements that drive gene coexpression. Science 317, 1557–1560 (2007).
Article CAS PubMed Google Scholar
Kim, J., He, X. & Sinha, S. Evolution of regulatory sequences in 12 Drosophila species. PLoS Genet. 5, e1000330 (2009).
Article CAS PubMed PubMed Central Google Scholar
Moses, A. M., Chiang, D. Y., Kellis, M., Lander, E. S. & Eisen, M. B. Position specific variation in the rate of evolution in transcription factor binding sites. BMC Evol. Biol. 3, 19 (2003).
Article PubMed PubMed Central Google Scholar
Visel, A. et al. ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature 457, 854–858 (2009).
Article CAS PubMed PubMed Central Google Scholar
Dekker, J., Rippe, K., Dekker, M. & Kleckner, N. Capturing chromosome conformation. Science 295, 1306–1311 (2002).
Article CAS PubMed Google Scholar
Liu, D. J. & Leal, S. M. A novel adaptive method for the analysis of next-generation sequencing data to detect complex trait associations with rare variants due to gene main effects and interactions. PLoS Genet. 6, e1001156 (2010). This paper describes an approach to assess the significance of correlations between gene or locus aggregates of rare variants and phenotypes and may also be useful in identifying significant variant interactions.
Article CAS PubMed PubMed Central Google Scholar
Yandell, M. et al. A probabilistic disease-gene finder for personal genomes. Genome Res. 23 Jun 2011 (doi:10.1101/gr.123158.111). This paper defines a method, VAAST, to predict disease genes or loci on the basis of the total predicted deleteriousness of rare variants observed in affected individuals.
Article CAS PubMed PubMed Central Google Scholar
Gerke, J., Lorenz, K. & Cohen, B. Genetic interactions between transcription factors cause natural variation in yeast. Science 323, 498–501 (2009).
Article CAS PubMed PubMed Central Google Scholar
Gerke, J., Lorenz, K., Ramnarine, S. & Cohen, B. Gene–environment interactions at nucleotide resolution. PLoS Genet. 6, e1001144 (2010).
Article CAS PubMed PubMed Central Google Scholar
Bush, W. S. et al. A knowledge-driven interaction analysis reveals potential neurodegenerative mechanism of multiple sclerosis susceptibility. Genes Immun. (2011).
Rual, J. F. et al. Towards a proteome-scale map of the human protein-protein interaction network. Nature 437, 1173–1178 (2005).
Article CAS PubMed Google Scholar
Emilsson, V. et al. Genetics of gene expression and its effect on disease. Nature 452, 423–428 (2008).
Article CAS PubMed Google Scholar
The Gene Ontology Consortium. et al. Gene ontology: tool for the unification of biology. Nature Genet. 25, 25–29 (2000).
Risch, N. & Merikangas, K. The future of genetic studies of complex human diseases. Science 273, 1516–1517 (1996).
Article CAS PubMed Google Scholar
Ioannidis, J. P. Why most published research findings are false. PLoS Med. 2, e124 (2005).
Article PubMed PubMed Central Google Scholar
Rothman, K. J. No adjustments are needed for multiple comparisons. Epidemiology 1, 43–46 (1990).
Article CAS PubMed Google Scholar
Keinan, A., Mullikin, J. C., Patterson, N. & Reich, D. Measurement of the human allele frequency spectrum demonstrates greater genetic drift in East Asians than in Europeans. Nature Genet. 39, 1251–1255 (2007).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank C. Brown and E. Stone for comments on an earlier draft and R. Patwardhan for sharing data.

Author information

Authors and Affiliations

HudsonAlpha Institute for Biotechnology, Huntsville, 35806, Alabama, USA
Gregory M. Cooper
Department of Genome Sciences, University of Washington, Seattle, 98115, Washington, USA
Jay Shendure

Authors

Gregory M. Cooper
View author publications
You can also search for this author in PubMed Google Scholar
Jay Shendure
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Gregory M. Cooper or Jay Shendure.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Glossary

Private: A genetic variant that is confined to a single individual, family or population.
Prior probability: Otherwise simply known as the 'prior', this is the probability of a hypothesis (or parameter value) without reference to the available data. Priors can be derived from first principles or be based on general knowledge or previous experiments.
Deleterious: A genetic variant that lowers the fitness of an organism: that is, it decreases survival or reproductive success.
Conserved: Shared identity of either protein or nucleotide sequences, which can be indicative of constraint.
Neutral: Sequences that are free to evolve in the absence of natural selection and are therefore subject only to random mutational and genetic drift processes.
Phylogenetic scope: The taxonomic range captured by a given comparative sequence analysis — for example, mammals or eukaryotes.
Constrained: Sequences that are under purifying selection to maintain function, which often, but not always, results in sequence conservation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cooper, G., Shendure, J. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat Rev Genet 12, 628–640 (2011). https://doi.org/10.1038/nrg3046

Download citation

Published: 18 August 2011
Issue Date: September 2011
DOI: https://doi.org/10.1038/nrg3046

This article is cited by

Next-generation sequencing and bioinformatics in rare movement disorders
- Michael Zech
- Juliane Winkelmann
Nature Reviews Neurology (2024)
Clinical diagnosis of genetic disorders at both single-nucleotide and chromosomal levels based on BGISEQ-500 platform
- Yanqiu Liu
- Liangwei Mao
- Xiaoming Wei
Human Genome Variation (2023)
KidneyNetwork: using kidney-derived gene expression data to predict and prioritize novel genes involved in kidney disease
- Floranne Boulogne
- Laura R. Claus
- Albertien M. van Eerde
European Journal of Human Genetics (2023)
The genomic footprint of whaling and isolation in fin whale populations
- Sergio F. Nigenda-Morales
- Meixi Lin
- Robert K. Wayne
Nature Communications (2023)
Laniakea@ReCaS: exploring the potential of customisable Galaxy on-demand instances as a cloud-based service
- Marco Antonio Tangaro
- Pietro Mandreoli
- Federico Zambelli
BMC Bioinformatics (2021)