Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data

Key Points

  • Genome and exome sequencing yield extensive catalogues of genetic variation in many individuals, but purely genetic approaches are often insufficiently powered to specifically identify the few variants that are causally related to any given phenotype. Indeed, variant interpretation is an increasingly important challenge at the interface of genetics, statistics and biology.

  • Non-uniform estimates of the prior probability for variants to be biologically functional will be required to address this challenge. For disease studies, this can be translated into the need to estimate variant deleteriousness.

  • Nearly all computational methods to predict deleteriousness use comparative sequence analysis, exploiting the fact that natural selection removes deleterious variants and tends to conserve the identities of important positions within genes and genomes.

  • Assessment of protein-altering variants leverages both biochemical and evolutionary information, whereas non-coding variation is more challenging to study, given a lack of understanding of the molecular functionality of non-coding sequences relative to coding sequences.

  • Experimental assessments of the functional impact of variants have historically relied on low-throughput assays. However, projects such as the Encyclopedia of DNA Elements (ENCODE) and the clever use of next-generation sequencing technologies are increasingly facilitating large-scale, systematic experimental assessment of genomic variation of many types.

  • Ultimately, unified predictive methods that are applicable to both coding and non-coding variants that leverage both functional and evolutionary information will be crucial for the meaningful interpretation of personal genomes. However, important unknowns and unsolved phenomena, including the relative abundance and penetrance of coding versus non-coding variants, disagreements between evolutionary and experimental definitions of molecular functionality, and the vocabularies that define transcriptional regulatory elements, must first be addressed.

Abstract

Genome and exome sequencing yield extensive catalogues of human genetic variation. However, pinpointing the few phenotypically causal variants among the many variants present in human genomes remains a major challenge, particularly for rare and complex traits wherein genetic information alone is often insufficient. Here, we review approaches to estimate the deleteriousness of single nucleotide variants (SNVs), which can be used to prioritize disease-causal variants. We describe recent advances in comparative and functional genomics that enable systematic annotation of both coding and non-coding variants. Application and optimization of these methods will be essential to find the genetic answers that sequencing promises to hide in plain sight.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Assessing variant deleteriousness to boost discovery power of genetic analyses.
Figure 2: Functional and evolutionary annotations highlight disease variation at the HBB locus.
Figure 3: High-throughput experimental assessment of variant function.

References

  1. 1

    Shendure, J. & Ji, H. Next-generation DNA sequencing. Nature Biotech. 26, 1135–1145 (2008).

    CAS  Article  Google Scholar 

  2. 2

    Bentley, D. R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  3. 3

    The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

  4. 4

    Lander, E. S. Initial impact of the sequencing of the human genome. Nature 470, 187–197 (2011).

    CAS  Article  Google Scholar 

  5. 5

    Manly, K. F., Nettleton, D. & Hwang, J. T. Genomics, prior probability, and statistical tests of multiple hypotheses. Genome Res. 14, 997–1001 (2004). This is a valuable review of the relationships between prior probability, statistical significance and false-discovery rates as they pertain to genome-wide analyses.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  6. 6

    Morton, N. E. Sequential tests for the detection of linkage. Am. J. Hum. Genet. 7, 277–318 (1955).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. 7

    Ng, S. B. et al. Exome sequencing identifies the cause of a mendelian disorder. Nature Genet. 42, 30–35 (2010).

    CAS  Article  Google Scholar 

  8. 8

    Ng, S. B. et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461, 272–276 (2009). This is the first demonstration of exome sequencing being used to identify the causal variants for a Mendelian disease. Protein-based annotations of functional deleteriousness were essential to this effort.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  9. 9

    Choi, M. et al. Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proc. Natl Acad. Sci. USA 106, 19096–19101 (2009).

    CAS  Article  Google Scholar 

  10. 10

    Erlich, Y. et al. Exome sequencing and disease-network analysis of a single family implicate a mutation in KIF1A in hereditary spastic paraparesis. Genome Res. 21, 658–664 (2011).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  11. 11

    Kimura, M. The Neutral Theory Of Molecular Evolution (Cambridge Univ. Press, New York, 1983).

    Google Scholar 

  12. 12

    Cooper, G. M. & Brown, C. D. Qualifying the relationship between sequence conservation and molecular function. Genome Res. 18, 201–205 (2008).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  13. 13

    McAuliffe, J. D., Jordan, M. I. & Pachter, L. Subtree power analysis and species selection for comparative genomics. Proc. Natl Acad. Sci. USA 102, 7900–7905 (2005).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  14. 14

    Stone, E. A., Cooper, G. M. & Sidow, A. Trade-offs in detecting evolutionarily constrained sequence by comparative genomics. Annu. Rev. Genomics Hum. Genet. 6, 143–164 (2005).

    CAS  Article  Google Scholar 

  15. 15

    Eddy, S. R. A model of the statistical power of comparative genome sequence analysis. PLoS Biol. 3, e10 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. 16

    The Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437, 69–87 (2005).

  17. 17

    Davydov, E. V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. 18

    The Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).

  19. 19

    The ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007).

  20. 20

    Boffelli, D. et al. Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299, 1391–1394 (2003).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  21. 21

    Prabhakar, S. et al. Close sequence comparisons are sufficient to identify human cis-regulatory elements. Genome Res. 16, 855–863 (2006).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  22. 22

    Kaessmann, H. Origins, evolution, and phenotypic impact of new genes. Genome Res. 20, 1313–1326 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  23. 23

    Johnson, M. E. et al. Positive selection of a gene family during the emergence of humans and African apes. Nature 413, 514–519 (2001).

    CAS  Article  Google Scholar 

  24. 24

    Wang, T. et al. Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53. Proc. Natl Acad. Sci. USA 104, 18613–18618 (2007).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  25. 25

    Enard, W. et al. Molecular evolution of FOXP2, a gene involved in speech and language. Nature 418, 869–872 (2002).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  26. 26

    Prabhakar, S. et al. Human-specific gain of function in a developmental enhancer. Science 321, 1346–1350 (2008). This study demonstrates that constraint-based measures may also identify sequences with human-specific functionality.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  27. 27

    Stone, E. A. & Sidow, A. Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity. Genome Res. 15, 978–986 (2005). The authors describe a combined phylogenetic and biochemical approach to predict the effects of amino acid substitutions. They demonstrate a quantitative relationship between past evolutionary rates of biochemical change and present day deleteriousness.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  28. 28

    De Gobbi, M. et al. A regulatory SNP causes a human genetic disease by creating a new transcriptional promoter. Science 312, 1215–1217 (2006).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  29. 29

    Botstein, D. & Risch, N. Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nature Genet. 33, 228–237 (2003).

    CAS  Article  Google Scholar 

  30. 30

    Ng, S. B. et al. Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nature Genet. 42, 790–793 (2010).

    CAS  Article  Google Scholar 

  31. 31

    MacArthur, D. G. & Tyler-Smith, C. Loss-of-function variants in the genomes of healthy humans. Hum. Mol. Genet. 19, R125–R130 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  32. 32

    Grantham, R. Amino acid difference formula to help explain protein evolution. Science 185, 862–864 (1974).

    CAS  Article  Google Scholar 

  33. 33

    Ng, P. C. & Henikoff, S. Predicting the effects of amino acid substitutions on protein function. Annu. Rev. Genomics Hum. Genet. 7, 61–80 (2006).

    CAS  Article  Google Scholar 

  34. 34

    Care, M. A., Needham, C. J., Bulpitt, A. J. & Westhead, D. R. Deleterious SNP prediction: be mindful of your training data! Bioinformatics 23, 664–672 (2007).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  35. 35

    Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations. Nature Methods 7, 248–249 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  36. 36

    Bromberg, Y. & Rost, B. SNAP: predict effect of non-synonymous polymorphisms on function. Nucleic Acids Res. 35, 3823–3835 (2007).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  37. 37

    Capriotti, E., Calabrese, R. & Casadio, R. Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information. Bioinformatics 22, 2729–2734 (2006).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  38. 38

    Ferrer-Costa, C., Orozco, M. & de la Cruz, X. Sequence-based prediction of pathological mutations. Proteins 57, 811–819 (2004).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  39. 39

    Ng, P. C. & Henikoff, S. Predicting deleterious amino acid substitutions. Genome Res. 11, 863–874 (2001). This describes SIFT (also see reference 46), a commonly used tool to predict the effects of amino acid substitutions and an early demonstration of the importance of sequence conservation to functional predictions.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  40. 40

    Schwarz, J. M., Rodelsperger, C., Schuelke, M. & Seelow, D. MutationTaster evaluates disease-causing potential of sequence alterations. Nature Methods 7, 575–576 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  41. 41

    Thomas, P. D. et al. PANTHER: a library of protein families and subfamilies indexed by function. Genome Res. 13, 2129–2141 (2003).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  42. 42

    Ye, Z. Q. et al. Finding new structural and sequence attributes to predict possible disease association of single amino acid polymorphism (SAP). Bioinformatics 23, 1444–1450 (2007).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  43. 43

    Chun, S. & Fay, J. C. Identification of deleterious mutations within three human genomes. Genome Res. 19, 1553–1561 (2009).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  44. 44

    Bao, L., Zhou, M. & Cui, Y. nsSNPAnalyzer: identifying disease-associated nonsynonymous single nucleotide polymorphisms. Nucleic Acids Res. 33, W480–W482 (2005).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  45. 45

    Sunyaev, S. et al. Prediction of deleterious human alleles. Hum. Mol. Genet. 10, 591–597 (2001). This paper describes polymorphism phenotyping (polyPhen) (also see reference 35), a commonly used tool to predict the effects of amino acid substitutions, and illustrates the value of classifiers trained on numerous biochemical and evolutionary features.

    CAS  Article  Google Scholar 

  46. 46

    Ng, P. C. & Henikoff, S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814 (2003).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  47. 47

    Lynch, M. & Conery, J. S. The evolutionary fate and consequences of duplicate genes. Science 290, 1151–1155 (2000).

    CAS  Article  Google Scholar 

  48. 48

    Marini, N. J., Thomas, P. D. & Rine, J. The use of orthologous sequences to predict the impact of amino acid substitutions on protein function. PLoS Genet. 6, e1000968 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. 49

    Dobson, R. J., Munroe, P. B., Caulfield, M. J. & Saqi, M. A. Predicting deleterious nsSNPs: an analysis of sequence and structural attributes. BMC Bioinformatics 7, 217 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. 50

    Saunders, C. T. & Baker, D. Evaluation of structural and evolutionary contributions to deleterious mutation prediction. J. Mol. Biol. 322, 891–901 (2002).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  51. 51

    Yue, P., Li, Z. & Moult, J. Loss of protein structure stability as a major causative factor in monogenic disease. J. Mol. Biol. 353, 459–473 (2005).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  52. 52

    Bao, L. & Cui, Y. Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information. Bioinformatics 21, 2185–2190 (2005).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  53. 53

    Li, Y. et al. Predicting disease-associated substitution of a single amino acid by analyzing residue interactions. BMC Bioinformatics 12, 14 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  54. 54

    Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009).

    CAS  Article  Google Scholar 

  55. 55

    Musunuru, K. et al. From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Nature 466, 714–719 (2010). This paper describes the precise identification of a common transcriptional regulatory variant that influences cholesterol levels and cardiovascular disease risk.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  56. 56

    Storey, J. D. et al. Gene-expression variation within and among human populations. Am. J. Hum. Genet. 80, 502–509 (2007).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  57. 57

    Nicolae, D. L. et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 6, e1000888 (2010). This analysis demonstrated that expression-associated variants are enriched among trait-associated variants, suggesting that non-coding regulatory variants are causally relevant for many traits.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. 58

    King, M. C. & Wilson, A. C. Evolution at two levels in humans and chimpanzees. Science 188, 107–116 (1975).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  59. 59

    Carroll, S. B. Evolution at two levels: on genes and form. PLoS Biol. 3, e245 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. 60

    Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  61. 61

    Lettice, L. A. et al. A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly. Hum. Mol. Genet. 12, 1725–1735 (2003). This study describes non-coding mutations that cause Mendelian limb defects by affecting enhancers important to developmental sonic hedgehog ( Shh ) gene regulation. A combination of evolutionary sequence conservation and mouse-based experimental assessments of variant function were used.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  62. 62

    Stenson, P. D. et al. The Human Gene Mutation Database: providing a comprehensive central mutation database for molecular diagnostics and personalized genomics. Hum. Genomics 4, 69–72 (2009).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  63. 63

    Treisman, R., Orkin, S. H. & Maniatis, T. Specific transcription and RNA splicing defects in five cloned β-thalassaemia genes. Nature 302, 591–596 (1983).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  64. 64

    Woolfe, A. et al. Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol. 3, e7 (2005).

    Article  CAS  Google Scholar 

  65. 65

    Dehal, P. et al. The draft genome of Ciona intestinalis: insights into chordate and vertebrate origins. Science 298, 2157–2167 (2002).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  66. 66

    Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  67. 67

    Cooper, G. M. et al. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 15, 901–913 (2005).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  68. 68

    Asthana, S., Roytberg, M., Stamatoyannopoulos, J. & Sunyaev, S. Analysis of sequence conservation at nucleotide resolution. PLoS Comput. Biol. 3, e254 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. 69

    Margulies, E. H., Blanchette, M., Haussler, D. & Green, E. D. Identification and characterization of multi-species conserved sequences. Genome Res. 13, 2507–2518 (2003).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  70. 70

    Dubchak, I. et al. Active conservation of noncoding sequences revealed by three-way species comparisons. Genome Res. 10, 1304–1306 (2000).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  71. 71

    Parker, S. C., Hansen, L., Abaan, H. O., Tullius, T. D. & Margulies, E. H. Local DNA topography correlates with functional noncoding regions of the human genome. Science 324, 389–392 (2009).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  72. 72

    Cooper, G. M. et al. Single-nucleotide evolutionary constraint scores highlight disease-causing mutations. Nature Methods 7, 250–251 (2010). This paper demonstrated that functionally agnostic nucleotide-level constraint scores, defined by GERP (also see references 17 and 67), offer considerable utility for causal variant discovery in exome analyses.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  73. 73

    Wang, G. S. & Cooper, T. A. Splicing in disease: disruption of the splicing code and the decoding machinery. Nature Rev. Genet. 8, 749–761 (2007).

    CAS  Article  Google Scholar 

  74. 74

    Drake, J. A. et al. Conserved noncoding sequences are selectively constrained and not mutation cold spots. Nature Genet. 38, 223–227 (2006).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  75. 75

    Katzman, S. et al. Human genome ultraconserved elements are ultraselected. Science 317, 915 (2007).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  76. 76

    Goode, D. L. et al. Evolutionary constraint facilitates interpretation of genetic variation in resequenced human genomes. Genome Res. 20, 301–310 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  77. 77

    Pennacchio, L. A. et al. In vivo enhancer analysis of human conserved non-coding sequences. Nature 444, 499–502 (2006).

    CAS  Article  Google Scholar 

  78. 78

    Margulies, E. H. et al. Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Res. 17, 760–774 (2007).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  79. 79

    The ENCODE Project Consortium. A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol. 9, e1001046 (2011).

  80. 80

    Ge, B. et al. Global patterns of cis variation in human cells revealed by high-density allelic expression analysis. Nature Genet. 41, 1216–1222 (2009).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  81. 81

    Nica, A. C. et al. Candidate causal regulatory effects by integration of expression QTLs with complex trait genetic associations. PLoS Genet. 6, e1000895 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. 82

    Zheng, W., Zhao, H., Mancera, E., Steinmetz, L. M. & Snyder, M. Genetic analysis of variation in transcription factor binding in yeast. Nature 464, 1187–1191 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  83. 83

    Patwardhan, R. P. et al. High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis. Nature Biotech. 27, 1173–1175 (2009). This paper defined a method to exploit next-generation sequencing to comprehensively yet efficiently assay point mutations in transcriptional promoters.

    CAS  Article  Google Scholar 

  84. 84

    Pitt, J. N. & Ferre-D'Amare, A. R. Rapid construction of empirical RNA fitness landscapes. Science 330, 376–379 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  85. 85

    Fowler, D. M. et al. High-resolution mapping of protein sequence-function relationships. Nature Methods 7, 741–746 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  86. 86

    Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-seq. Nature Methods 5, 621–628 (2008).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  87. 87

    Johnson, D. S., Mortazavi, A., Myers, R. M. & Wold, B. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007).

    CAS  Article  PubMed  Google Scholar 

  88. 88

    Cao, A. R. et al. Genome-wide analysis of transcription factor E2F1 mutant proteins reveals that N- and C-terminal protein interaction domains do not participate in targeting E2F1 to the human genome. J. Biol. Chem. 286, 11985–11996 (2011).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  89. 89

    Botstein, D. & Shortle, D. Strategies and applications of in vitro mutagenesis. Science 229, 1193–1201 (1985).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  90. 90

    Blow, M. J. et al. ChIP-seq identification of weakly conserved heart enhancers. Nature Genet. 42, 806–810 (2010).

    CAS  Article  Google Scholar 

  91. 91

    Cheng, Y. et al. Erythroid GATA1 function revealed by genome-wide analysis of transcription factor occupancy, histone modifications, and mRNA expression. Genome Res. 19, 2172–2184 (2009).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  92. 92

    Miller, D. T. et al. Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies. Am. J. Hum. Genet. 86, 749–764 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  93. 93

    Sebat, J. et al. Strong association of de novo copy number mutations with autism. Science 316, 445–449 (2007).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  94. 94

    Walsh, T. et al. Rare structural variants disrupt multiple genes in neurodevelopmental pathways in schizophrenia. Science 320, 539–543 (2008).

    CAS  Article  Google Scholar 

  95. 95

    Markiewicz, P., Kleina, L. G., Cruz, C., Ehret, S. & Miller, J. H. Genetic studies of the lac repressor. XIV. Analysis of 4000 altered Escherichia coli lac repressors reveals essential and non-essential residues, as well as “spacers” which do not require a specific sequence. J. Mol. Biol. 240, 421–433 (1994).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  96. 96

    Rennell, D., Bouvier, S. E., Hardy, L. W. & Poteete, A. R. Systematic mutation of bacteriophage T4 lysozyme. J. Mol. Biol. 222, 67–88 (1991).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  97. 97

    Loeb, D. D. et al. Complete mutagenesis of the HIV-1 protease. Nature 340, 397–400 (1989).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  98. 98

    Hardison, R. C. et al. HbVar: a relational database of human hemoglobin variants and thalassemia mutations at the globin gene server. Hum. Mutat. 19, 225–233 (2002).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  99. 99

    Olivier, M. et al. The IARC TP53 database: new online mutation analysis and recommendations to users. Hum. Mutat. 19, 607–614 (2002).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  100. 100

    Yip, Y. L. et al. The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure information on human protein variants. Hum. Mutat. 23, 464–470 (2004).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  101. 101

    Brown, C. D., Johnson, D. S. & Sidow, A. Functional architecture and evolution of transcriptional elements that drive gene coexpression. Science 317, 1557–1560 (2007).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  102. 102

    Kim, J., He, X. & Sinha, S. Evolution of regulatory sequences in 12 Drosophila species. PLoS Genet. 5, e1000330 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  103. 103

    Moses, A. M., Chiang, D. Y., Kellis, M., Lander, E. S. & Eisen, M. B. Position specific variation in the rate of evolution in transcription factor binding sites. BMC Evol. Biol. 3, 19 (2003).

    Article  PubMed  PubMed Central  Google Scholar 

  104. 104

    Visel, A. et al. ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature 457, 854–858 (2009).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  105. 105

    Dekker, J., Rippe, K., Dekker, M. & Kleckner, N. Capturing chromosome conformation. Science 295, 1306–1311 (2002).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  106. 106

    Liu, D. J. & Leal, S. M. A novel adaptive method for the analysis of next-generation sequencing data to detect complex trait associations with rare variants due to gene main effects and interactions. PLoS Genet. 6, e1001156 (2010). This paper describes an approach to assess the significance of correlations between gene or locus aggregates of rare variants and phenotypes and may also be useful in identifying significant variant interactions.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  107. 107

    Yandell, M. et al. A probabilistic disease-gene finder for personal genomes. Genome Res. 23 Jun 2011 (doi:10.1101/gr.123158.111). This paper defines a method, VAAST, to predict disease genes or loci on the basis of the total predicted deleteriousness of rare variants observed in affected individuals.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  108. 108

    Gerke, J., Lorenz, K. & Cohen, B. Genetic interactions between transcription factors cause natural variation in yeast. Science 323, 498–501 (2009).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  109. 109

    Gerke, J., Lorenz, K., Ramnarine, S. & Cohen, B. Gene–environment interactions at nucleotide resolution. PLoS Genet. 6, e1001144 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  110. 110

    Bush, W. S. et al. A knowledge-driven interaction analysis reveals potential neurodegenerative mechanism of multiple sclerosis susceptibility. Genes Immun. (2011).

  111. 111

    Rual, J. F. et al. Towards a proteome-scale map of the human protein-protein interaction network. Nature 437, 1173–1178 (2005).

    CAS  Article  PubMed  Google Scholar 

  112. 112

    Emilsson, V. et al. Genetics of gene expression and its effect on disease. Nature 452, 423–428 (2008).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  113. 113

    The Gene Ontology Consortium. et al. Gene ontology: tool for the unification of biology. Nature Genet. 25, 25–29 (2000).

  114. 114

    Risch, N. & Merikangas, K. The future of genetic studies of complex human diseases. Science 273, 1516–1517 (1996).

    CAS  Article  Google Scholar 

  115. 115

    Ioannidis, J. P. Why most published research findings are false. PLoS Med. 2, e124 (2005).

    Article  PubMed  PubMed Central  Google Scholar 

  116. 116

    Rothman, K. J. No adjustments are needed for multiple comparisons. Epidemiology 1, 43–46 (1990).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  117. 117

    Keinan, A., Mullikin, J. C., Patterson, N. & Reich, D. Measurement of the human allele frequency spectrum demonstrates greater genetic drift in East Asians than in Europeans. Nature Genet. 39, 1251–1255 (2007).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank C. Brown and E. Stone for comments on an earlier draft and R. Patwardhan for sharing data.

Author information

Affiliations

Authors

Corresponding authors

Correspondence to Gregory M. Cooper or Jay Shendure.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

Related links

FURTHER INFORMATION

Gregory M. Cooper's homepage

Jay Shendure's homepage

Encyclopedia of DNA Elements (ENCODE)

The Genome 10K Project

University of California, Santa Cruz (UCSC) Genome Bioinformatics

Human Gene Mutation Database (HGMD)

National Human Genome Research Institute Catalog of Published Genome-Wide Association Studies

Online Mendelian Inheritance in Man (OMIM)

Glossary

Private

A genetic variant that is confined to a single individual, family or population.

Prior probability

Otherwise simply known as the 'prior', this is the probability of a hypothesis (or parameter value) without reference to the available data. Priors can be derived from first principles or be based on general knowledge or previous experiments.

Deleterious

A genetic variant that lowers the fitness of an organism: that is, it decreases survival or reproductive success.

Conserved

Shared identity of either protein or nucleotide sequences, which can be indicative of constraint.

Neutral

Sequences that are free to evolve in the absence of natural selection and are therefore subject only to random mutational and genetic drift processes.

Phylogenetic scope

The taxonomic range captured by a given comparative sequence analysis — for example, mammals or eukaryotes.

Constrained

Sequences that are under purifying selection to maintain function, which often, but not always, results in sequence conservation.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Cooper, G., Shendure, J. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat Rev Genet 12, 628–640 (2011). https://doi.org/10.1038/nrg3046

Download citation

Further reading