Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Rare-variant collapsing analyses for complex traits: guidelines and applications

Abstract

The first phase of genome-wide association studies (GWAS) assessed the role of common variation in human disease. Advances optimizing and economizing high-throughput sequencing have enabled a second phase of association studies that assess the contribution of rare variation to complex disease in all protein-coding genes. Unlike the early microarray-based studies, sequencing-based studies catalogue the full range of genetic variation, including the evolutionarily youngest forms. Although the experience with common variants helped establish relevant standards for genome-wide studies, the analysis of rare variation introduces several challenges that require novel analysis approaches.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Outline of the standard collapsing analysis approach.
Fig. 2: Characterizing where the disease risk signal resides.

Similar content being viewed by others

References

  1. Visscher, P. M., Brown, M. A., McCarthy, M. I. & Yang, J. Five years of GWAS discovery. Am. J. Hum. Genet. 90, 7–24 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Goldstein, D. B. Common genetic variation and human traits. N. Engl. J. Med. 360, 1696–1698 (2009).

    CAS  PubMed  Google Scholar 

  4. Cirulli, E. T. & Goldstein, D. B. Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nat. Rev. Genet. 11, 415–425 (2010).

    Article  CAS  PubMed  Google Scholar 

  5. Gibson, G. Rare and common variants: twenty arguments. Nat. Rev. Genet. 13, 135–145 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Need, A. C. et al. Clinical application of exome sequencing in undiagnosed genetic conditions. J. Med. Genet. 49, 353–361 (2012).

    Article  CAS  PubMed  Google Scholar 

  7. Zhu, X. et al. Whole-exome sequencing in undiagnosed genetic diseases: interpreting 119 trios. Genet. Med. 17, 774–781 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Yang, Y. et al. Molecular findings among patients referred for clinical whole-exome sequencing. JAMA 312, 1870–1879 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Appenzeller, S. et al. De novo mutations in synaptic transmission genes including DNM1 cause epileptic encephalopathies. Am. J. Hum. Genet. 95, 360–370 (2014).

    Article  CAS  Google Scholar 

  10. Homsy, J. et al. De novo mutations in congenital heart disease with neurodevelopmental and other congenital anomalies. Science 350, 1262–1266 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Fitzgerald, T. W. et al. Large-scale discovery of novel genetic causes of developmental disorders. Nature 519, 223–228 (2015).

    Article  CAS  Google Scholar 

  12. Gilissen, C., Hoischen, A., Brunner, H. G. & Veltman, J. A. Unlocking Mendelian disease using exome sequencing. Genome Biol. 12, 228 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Cirulli, E. T. et al. Exome sequencing in amyotrophic lateral sclerosis identifies risk genes and pathways. Science 347, 1436–1441 (2015). Cirulli et al. present one of the first implementations of collapsing analyses in a case–control study of a complex disease, introducing the qualifying-variant framework, coverage correction and other methodological details.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Petrovski, S. et al. An exome sequencing study to assess the role of rare genetic variation in pulmonary fibrosis. Am. J. Respir. Crit. Care Med. 196, 82–93 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Allen, A. S. et al. Ultra-rare genetic variation in common epilepsies: a case–control sequencing study. Lancet Neurol. 16, 135–143 (2017). This study provides an implementation of collapsing analyses in epilepsy that explicitly evaluates signal as a function of MAF, showing that the association signal observed in epilepsy genes is concentrated amongst the rarest variants.

    Article  CAS  Google Scholar 

  16. Traynelis, J. et al. Optimizing genomic medicine in epilepsy through a gene-customized approach to missense variant interpretation. Genome Res. 27, 1715–1729 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Hayeck, T. J. et al. Improved pathogenic variant localization via a hierarchical model of sub-regional intolerance. Am. J. Hum. Genet. 104, 299–309 (2019). This research uses a hierarchical model for regional intolerance that can jointly use genome-wide, genic and sub-region-level information.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Gussow, A. B., Petrovski, S., Wang, Q., Allen, A. S. & Goldstein, D. B. The intolerance to functional genetic variation of protein domains predicts the localization of pathogenic mutations within genes. Genome Biol. 17, 9 (2016). This paper and reference 19 introduce regional intolerance scoring.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  19. Samocha, K. E. et al. Regional missense constraint improves variant deleteriousness prediction. Preprint at bioRxiv https://doi.org/10.1101/148353 (2017).

  20. Havrilla, J. M., Pedersen, B. S., Layer, R. M. & Quinlan, A. R. A map of constrained coding regions in the human genome. Nat. Genet. 51, 88–95 (2019).

    Article  CAS  PubMed  Google Scholar 

  21. Guo, M. H. et al. Determinants of power in gene-based burden testing for monogenic disorders. Am. J. Hum. Genet. 99, 527–539 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Skol, A. D., Scott, L. J., Abecasis, G. R. & Boehnke, M. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat. Genet. 38, 209–213 (2006).

    Article  CAS  PubMed  Google Scholar 

  23. Asimit, J. L., Day-Williams, A. G., Morris, A. P. & Zeggini, E. ARIEL and AMELIA: testing for an accumulation of rare variants using next-generation sequencing data. Hum. Hered. 73, 84–94 (2012).

    Article  CAS  PubMed  Google Scholar 

  24. Morris, A. P. & Zeggini, E. An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genet. Epidemiol. 34, 188–193 (2010).

    Article  PubMed  Google Scholar 

  25. Li, B. & Leal, S. M. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet. 83, 311–321 (2008). This study presents one of the early burden-testing methods for rare variants.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Madsen, B. E. & Browning, S. R. A groupwise association test for rare mutations using a weighted sum statistic. PLOS Genet. 5, e1000384 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  27. Han, F. & Pan, W. A data-adaptive sum test for disease association with multiple common or rare variants. Hum. Hered. 70, 42–54 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Liu, D. J. & Leal, S. M. A novel adaptive method for the analysis of next-generation sequencing data to detect complex trait associations with rare variants due to gene main effects and interactions. PLOS Genet. 6, e1001156 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  29. Ionita-Laza, I., Buxbaum, J. D., Laird, N. M. & Lange, C. A new testing strategy to identify rare variants with either risk or protective effect on disease. PLOS Genet. 7, e1001289 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Hoffmann, T. J., Marini, N. J. & Witte, J. S. Comprehensive approach to analyzing rare genetic variants. PLOS ONE 5, e13584 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  31. Price, A. L. et al. Pooled association tests for rare variants in exon-resequencing studies. Am. J. Hum. Genet. 86, 832–838 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Neale, B. M. et al. Testing for an unusual distribution of rare variants. PLOS Genet. 7, e1001322 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Wu, M. C. et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93 (2011). This study introduces a score-based variance-component test (SKAT) that allows for modelling bidirectional effects.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Lee, S. et al. Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am. J. Hum. Genet. 91, 224–237 (2012). SKAT-O is a unified test that combines burden tests with the non-burden sequence kernel association test.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Lee, S., Abecasis, G. R., Boehnke, M. & Lin, X. Rare-variant association analysis: study designs and statistical tests. Am. J. Hum. Genet. 95, 5–23 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Gao, F. et al. XWAS: a software toolset for genetic data analysis and association studies of the x chromosome. J. Hered. 106, 666–671 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Genovese, G. et al. Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence. N. Engl. J. Med. 371, 2477–2487 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  38. Buscarlet, M. et al. DNMT3A and TET2 dominate clonal hematopoiesis and demonstrate benign phenotypes and different genetic predispositions. Blood 130, 753–762 (2017).

    Article  CAS  PubMed  Google Scholar 

  39. Xie, M. et al. Age-related mutations associated with clonal hematopoietic expansion and malignancies. Nat. Med. 20, 1472–1478 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Carlston, C. M. et al. Pathogenic ASXL1 somatic variants in reference databases complicate germline variant interpretation for bohring-opitz syndrome. Hum. Mutat. 38, 517–523 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Lippert, C. et al. Fast linear mixed models for genome-wide association studies. Nat. Methods 8, 833–835 (2011).

    Article  CAS  PubMed  Google Scholar 

  42. Oualkacha, K. et al. Adjusted sequence kernel association test for rare variants controlling for cryptic and family relatedness. Genet. Epidemiol. 37, 366–376 (2013).

    Article  PubMed  Google Scholar 

  43. Petrovski, S. & Goldstein, D. B. Unequal representation of genetic variation across ancestry groups creates healthcare inequality in the application of precision medicine. Genome Biol. 17, 157 (2016). This study emphasizes the importance of the geographic ancestry of controls for both collapsing analyses and identifying pathogenic mutations in patients.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  44. Zhu, X. et al. A case-control collapsing analysis identifies epilepsy genes implicated in trio sequencing studies focused on de novo mutations. PLOS Genet. 13, e1007104 (2017).This report illustrates that collapsing analyses in a case–control design focused on the rarest variants can pick up the same variants as analyses of de novo mutations using trios.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  45. Hu, Y.-J., Liao, P., Johnston, H. R., Allen, A. S. & Satten, G. A. Testing rare-variant association without calling genotypes allows for systematic differences in sequencing between cases and controls. PLOS Genet. 12, e1006040 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  46. Guo, M. H., Plummer, L., Chan, Y.-M., Hirschhorn, J. N. & Lippincott, M. F. Burden testing of rare variants identified through exome sequencing via publicly available control data. Am. J. Hum. Genet. 103, 522–534 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Raghavan, N. S. et al. Whole-exome sequencing in 20,197 persons for rare variants in Alzheimer’s disease. Ann. Clin. Transl. Neurol. 5, 832–842 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Sim, N.-L. et al. SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res. 40, W452–W457 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Ioannidis, N. M. et al. REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am. J. Hum. Genet. 99, 877–885 (2016).

    Google Scholar 

  52. Sundaram, L. et al. Predicting the clinical impact of human mutation with deep neural networks. Nat. Genet. 50, 1161–1170 (2018).This study describes a deep neural network trained on hundreds of thousands of common variants from population sequencing of six non-human primate species that can identify pathogenic variants.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Gelfman, S. et al. Annotating pathogenic non-coding variants in genic regions. Nat. Commun. 8, 236 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  54. Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535–548.e24 (2019).

    Article  CAS  PubMed  Google Scholar 

  55. Karczewski, K. J. et al. Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes. Preprint at bioRxiv https://doi.org/10.1101/531210 (2019).

  56. Ganna, A. et al. Quantifying the impact of rare and ultra-rare coding variation across the phenotypic spectrum. Am. J. Hum. Genet. 102, 1204–1211 (2018). Ganna et al. show that across multiple phenotypes, rarer PTVs are on average more deleterious, with the strongest signal coming from ultra-rare variants.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Cameron-Christie, S. et al. Exome-based rare-variant analyses in CKD. J. Am. Soc. Nephrol. 30, 1109–1122 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  58. Cirulli, E. T. et al. Genome-wide rare variant analysis for thousands of phenotypes in 54,000 exomes. Preprint at bioRxiv https://doi.org/10.1101/692368 (2019). This analysis is the first to look for rare-variant associations in thousands of phenotypes across two large cohorts, including the UK Biobank data.

  59. Loh, P.-R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906–908 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Wang, X. Firth logistic regression for rare variant association tests. Front. Genet. 5, 187 (2014).

    PubMed  PubMed Central  Google Scholar 

  61. Firth, D. Bias reduction of maximum likelihood estimates. Biometrika 80, 27–38 (1993).

    Article  Google Scholar 

  62. Heinze, G. & Puhr, R. Bias-reduced and separation-proof conditional logistic regression with small or sparse data sets. Stat. Med. 29, 770–777 (2010).

    Article  PubMed  Google Scholar 

  63. Dey, R., Schmidt, E. M., Abecasis, G. R. & Lee, S. A fast and accurate algorithm to test for binary phenotypes and its application to PheWAS. Am. J. Hum. Genet. 101, 37–49 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Sham, P. C. & Purcell, S. M. Statistical power and significance testing in large-scale genetic studies. Nat. Rev. Genet. 15, 335–346 (2014).

    Article  CAS  PubMed  Google Scholar 

  65. Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Dewey, F. E. et al. Distribution and clinical impact of functional variants in 50,726 whole-exome sequences from the DiscovEHR study. Science 354, aaf6814 (2016).

    Article  PubMed  CAS  Google Scholar 

  67. Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Preprint at bioRxiv https://doi.org/10.1101/563866 (2019).

  68. Genovese, G. et al. Increased burden of ultra-rare protein-altering variants among 4,877 individuals with schizophrenia. Nat. Neurosci. 19, 1433–1441 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Gelfman, S. et al. A new approach for rare variation collapsing on functional protein domains implicates specific genic regions in ALS. Genome Res. 29, 809–818 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Petrovski, S. et al. The intolerance of regulatory sequence to genetic variation predicts gene dosage sensitivity. PLOS Genet. 11, e1005492 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  71. Baulac, S. et al. Evidence for digenic inheritance in a family with both febrile convulsions and temporal lobe epilepsy implicating chromosomes 18qter and 1q25-q31. Ann. Neurol. 49, 786–792 (2001).

    Article  CAS  PubMed  Google Scholar 

  72. Ito, M. et al. Phenotypes and genotypes in epilepsy with febrile seizures plus. Epilepsy Res. 70, 199–205 (2006).

    Article  CAS  Google Scholar 

  73. Fauser, S., Munz, M. & Besch, D. Further support for digenic inheritance in Bardet–Biedl syndrome. J. Med. Genet. 40, e104 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Katsanis, N. et al. Triallelic inheritance in Bardet–Biedl syndrome, a Mendelian recessive disorder. Science 293, 2256–2259 (2001).

    Article  CAS  PubMed  Google Scholar 

  75. Schäffer, A. A. Digenic inheritance in medical genetics. J. Med. Genet. 50, 641–652 (2013).

    Article  PubMed  CAS  Google Scholar 

  76. Glasscock, E., Qian, J., Yoo, J. W. & Noebels, J. L. Masking epilepsy by combining two epilepsy genes. Nat. Neurosci. 10, 1554–1558 (2007).

    Article  CAS  PubMed  Google Scholar 

  77. Davydov, E. V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLOS Comput. Biol. 6, e1001025 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  78. Ritchie, G. R. S., Dunham, I., Zeggini, E. & Flicek, P. Functional annotation of noncoding sequence variants. Nat. Methods 11, 294–296 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Gussow, A. B. et al. Orion: detecting regions of the human non-coding genome that are intolerant to variation using population genetics. PLOS ONE 12, e0181604 (2017).

    Google Scholar 

  80. Wang, X. & Goldstein, D. B. Enhancer redundancy predicts gene pathogenicity and informs complex disease gene discovery. Preprint at bioRxiv https://doi.org/10.1101/459123 (2018).

  81. An, J.-Y. et al. Genome-wide de novo risk score implicates promoter variation in autism spectrum disorder. Science 362, eaat6576 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  82. Fisher, R. A. Statistical Methods for Research Workers (Oliver and Boyd, 1932).

  83. Stouffer, S. A., Suchman, E. A., Devinney, L. C., Star, S. A. & Williams Jr, R. M. The American soldier: Adjustment during army life. (Studies in social psychology in World War II) Vol. 1 (Princeton Univ. Press, 1949).

  84. Liu, L. et al. Analysis of rare, exonic variation amongst subjects with autism spectrum disorders and population controls. PLOS Genet. 9, e1003443 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Tang, Z.-Z. & Lin, D.-Y. MASS: meta-analysis of score statistics for sequencing studies. Bioinformatics 29, 1803–1805 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Tang, Z.-Z. & Lin, D.-Y. Meta-analysis of sequencing studies with heterogeneous genetic associations. Genet. Epidemiol. 38, 389–401 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  87. Feng, S., Liu, D., Zhan, X., Wing, M. K. & Abecasis, G. R. RAREMETAL: fast and powerful meta-analysis for rare variants. Bioinformatics 30, 2828–2829 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. Liu, D. J. et al. Meta-analysis of gene-level tests for rare variant association. Nat. Genet. 46, 200–204 (2014).

    Article  CAS  PubMed  Google Scholar 

  89. Lee, S., Teslovich, T. M., Boehnke, M. & Lin, X. General framework for meta-analysis of rare variants in sequencing association studies. Am. J. Hum. Genet. 93, 42–53 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Bagnall, R. D. et al. Exome-based analysis of cardiac arrhythmia, respiratory control, and epilepsy genes in sudden unexpected death in epilepsy. Ann. Neurol. 79, 522–534 (2016).

    Article  CAS  PubMed  Google Scholar 

  91. Sanna-Cherchi, S. et al. Exome-wide association study identifies greb1l mutations in congenital kidney malformations. Am. J. Hum. Genet. 101, 789–802 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Freischmidt, A. et al. Haploinsufficiency of TBK1 causes familial ALS and fronto-temporal dementia. Nat. Neurosci. 18, 631–636 (2015).

    Article  CAS  PubMed  Google Scholar 

  93. Farhan, S. M. K. et al. Enrichment of rare protein truncating variants in amyotrophic lateral sclerosis patients. Preprint at bioRxiv https://doi.org/10.1101/307835 (2018).

  94. Christophersen, I. E. et al. Large-scale analyses of common and rare variants identify 12 new loci associated with atrial fibrillation. Nat. Genet. 49, 946–952 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  95. Bellenguez, C. et al. Contribution to Alzheimer’s disease risk of rare variants in TREM2, SORL1, and ABCA7 in 1779 cases and 1273 controls. Neurobiol. Aging 59, 220.e1–220.e9 (2017).

    Article  CAS  Google Scholar 

  96. Singh, T. et al. The contribution of rare variants to risk of schizophrenia in individuals with and without intellectual disability. Nat. Genet. 49, 1167–1173 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  97. Do, R. et al. Exome sequencing identifies rare LDLR and APOA5 alleles conferring risk for myocardial infarction. Nature 518, 102–106 (2015).

    Article  CAS  PubMed  Google Scholar 

  98. Groopman, E. E. et al. Diagnostic utility of exome sequencing for kidney disease. N. Engl. J. Med. 380, 142–151 (2019).

    Article  CAS  PubMed  Google Scholar 

  99. Telenti, A. et al. Deep sequencing of 10,000 human genomes. Proc. Natl Acad. Sci. USA. 113, 11901–11906 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  100. Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLOS Med. 12, e1001779 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  101. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  102. Van Hout, C. V. et al. Whole exome sequencing and characterization of coding variation in 49,960 individuals in the UK Biobank. Preprint at bioRxiv https://doi.org/10.1101/572347 (2019).

  103. Zhang, D. et al. SEQSpark: a complete analysis tool for large-scale rare variant association studies using whole-genome and exome sequence data. Am. J. Hum. Genet. 101, 115–122 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  104. Landrum, M. J. et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, D980–D985 (2014).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The authors thank T. Hayeck for creating the figure in Box 2.

Author information

Authors and Affiliations

Authors

Contributions

G.P., S.P. and J.H. researched data for the article. G.P., S.P. and D.B.G. wrote the article. G.P., S.P., J.H., A.S.A. and D.B.G. reviewed/edited the manuscript before submission. All authors contributed to discussing the content of the article.

Corresponding author

Correspondence to David B. Goldstein.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information

Nature Reviews Genetics thanks S. Lee, X. Lin and B. Neale for their contribution to the peer review of this work.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Bravo/TOPMed: https://bravo.sph.umich.edu/freeze5/hg38/

CADD: https://cadd.gs.washington.edu/

CCR: https://www.rebrand.ly/ccrregions

ExAC: http://exac.broadinstitute.org/

Digenic analysis tool: https://github.com/igm-team/Digenic

gnomAD: http://gnomad.broadinstitute.org/

LOFTEE: https://github.com/konradjk/loftee

MTR: http://mtr-viewer.mdhs.unimelb.edu.au/

PolyPhen-2: http://genetics.bwh.harvard.edu/pph2/

PrimateAI: https://github.com/Illumina/PrimateAI

SIFT: https://sift.bii.a-star.edu.sg/

SpliceAI: https://github.com/illumina/SpliceAI

subRVIS: http://www.subrvis.org/

TraP: http://trap-score.org/

UK Biobank: https://www.ukbiobank.ac.uk/

Glossary

Penetrant alleles

Alleles highly associated with a trait; the more penetrant the allele, the higher the percentage of individuals with that allele who also express a disease phenotype.

Deleterious variation

Genetic variation that is predicted to disrupt gene function and therefore lead to reduced fitness.

Allelic heterogeneity

The presence of different pathogenic variants in the same gene or at the same chromosome locus that all lead to the same or to very similar phenotypes.

Causal allele

A functional allele that increases disease risk.

Haploinsufficient disease genes

Disease-associated genes for which a single functional copy is insufficient to maintain normal function. Therefore, loss-of-function alleles are pathogenic even when heterozygous.

Background variation

Usually benign variants in the general population that are unconnected to the disease.

Bidirectional effects

Effects within a given gene, wherein some variants increase risk of disease, while others reduce risk.

Transition/transversion ratio

(Ti/Tv). Ratio of the number of transitions (interchanges of two-ring purines (A to G or vice versa) or of one-ring pyrimidines (C to T or vice versa)) to the number of transversions (interchanges of purine for pyrimidine bases).

Index samples

Individual samples or patients who are the focus of a study.

Consanguineous populations

Populations in which marriages between people who are second cousins or closer are common.

Bottlenecked populations

Populations that have gone through a severe and abrupt reduction in their number of individuals, which often leads to reduced genetic diversity.

Population stratification

Also known as population structure. Presence of a difference in allele frequencies due to systematic differences in ancestry between cases and controls.

Phred quality

(QUAL). The Phred-scaled posterior probability that all samples in a call set consist of homozygous reference alleles.

Genotype Phred quality

(GQ). Represents the Phred-scaled confidence that the genotype assignment is correct for a given sample.

Quality by depth

(QD). The Phred quality (QUAL) score normalized by allele depth for a variant.

Mapping quality

(MQ). Estimation of the overall mapping quality of reads supporting a variant call.

Variant quality score log-odds

(VQSLOD). A score, produced by the Genome Analysis Toolkit’s variant quality score recalibration, that represents the log-odds ratio of a variant being true versus being false under the trained Gaussian mixture model.

Trio sequencing

Procedure in which the index patient and both parents are sequenced in order to identify causative variants in the patient.

Phase

Defined as alleles that belong to the same parental haplotype and therefore affect the same copy of a gene; variants that are not in phase are on different haplotypes and therefore affect both copies of a gene.

Compound heterozygous

Presence of two different mutant alleles in a particular gene that affect both copies of the gene because they are not in phase.

Diagnostic yield

Rate of discovered diagnostic variants within a collection of cases being tested.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Povysil, G., Petrovski, S., Hostyk, J. et al. Rare-variant collapsing analyses for complex traits: guidelines and applications. Nat Rev Genet 20, 747–759 (2019). https://doi.org/10.1038/s41576-019-0177-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41576-019-0177-4

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing