Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A unified test of linkage analysis and rare-variant association for analysis of pedigree sequence data

Abstract

High-throughput sequencing of related individuals has become an important tool for studying human disease. However, owing to technical complexity and lack of available tools, most pedigree-based sequencing studies rely on an ad hoc combination of suboptimal analyses. Here we present pedigree-VAAST (pVAAST), a disease-gene identification tool designed for high-throughput sequence data in pedigrees. pVAAST uses a sequence-based model to perform variant and gene-based linkage analysis. Linkage information is then combined with functional prediction and rare variant case-control association information in a unified statistical framework. pVAAST outperformed linkage and rare-variant association tests in simulations and identified disease-causing genes from whole-genome sequence data in three human pedigrees with dominant, recessive and de novo inheritance patterns. The approach is robust to incomplete penetrance and locus heterogeneity and is applicable to a wide variety of genetic traits. pVAAST maintains high power across studies of monogenic, high-penetrance phenotypes in a single pedigree to highly polygenic, common phenotypes involving hundreds of pedigrees.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: A schematic illustration of pVAAST.
Figure 2: Rare Mendelian and common complex disease simulations.
Figure 3: pVAAST results on the enteropathy pedigree.
Figure 4: pVAAST identifies the dominant causal gene GATA4 in cardiac septal defect pedigree.
Figure 5: pVAAST identifies the recessive causal genes for Miller's syndrome (DHODH) and primary ciliary dyskinesia (DNAH5) with a two-generation pedigree.
Figure 6: The genome-wide ranking and lod score of GATA4 in challenging situations of pedigree studies.

Similar content being viewed by others

References

  1. Borecki, I.B. & Province, M.A. Linkage and association: basic concepts. Adv. Genet. 60, 51–74 (2008).

    Article  Google Scholar 

  2. Muller, H.J. Our load of mutations. Am. J. Hum. Genet. 2, 111–176 (1950).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. Lee, S. et al. Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am. J. Hum. Genet. 91, 224–237 (2012).

    Article  CAS  Google Scholar 

  4. Neale, B.M. et al. Testing for an unusual distribution of rare variants. PLoS Genet. 7, e1001322 (2011).

    Article  CAS  Google Scholar 

  5. Ng, P.C. & Henikoff, S. Predicting the effects of amino acid substitutions on protein function. Annu. Rev. Genomics Hum. Genet. 7, 61–80 (2006).

    Article  CAS  Google Scholar 

  6. Adzhubei, I.A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).

    Article  CAS  Google Scholar 

  7. Roach, J.C. et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328, 636–639 (2010).

    Article  CAS  Google Scholar 

  8. Schaid, D.J., McDonnell, S.K., Sinnwell, J.P. & Thibodeau, S.N. Multiple genetic variant association testing by collapsing and kernel methods with pedigree or population structured data. Genet. Epidemiol. 37, 409–418 (2013).

    Article  Google Scholar 

  9. Oualkacha, K. et al. Adjusted sequence kernel association test for rare variants controlling for cryptic and family relatedness. Genet. Epidemiol. 37, 366–376 (2013).

    Article  Google Scholar 

  10. Hoischen, A. et al. De novo mutations of SETBP1 cause Schinzel-Giedion syndrome. Nat. Genet. 42, 483–485 (2010).

    Article  CAS  Google Scholar 

  11. Sebat, J. et al. Strong association of de novo copy number mutations with autism. Science 316, 445–449 (2007).

    Article  CAS  Google Scholar 

  12. Yandell, M. et al. A probabilistic disease-gene finder for personal genomes. Genome Res. 21, 1529–1542 (2011).

    Article  CAS  Google Scholar 

  13. Hu, H. et al. VAAST 2.0: improved variant classification and disease-gene identification using a conservation-controlled amino acid substitution matrix. Genet. Epidemiol. 37, 622–634 (2013).

    Article  Google Scholar 

  14. Jung, J., Weeks, D.E. & Feingold, E. Gene-dropping vs. empirical variance estimation for allele-sharing linkage statistics. Genet. Epidemiol. 30, 652–665 (2006).

    Article  Google Scholar 

  15. Fishelson, M. & Geiger, D. Exact genetic linkage computations for general pedigrees. Bioinformatics 18 (suppl. 1), S189–S198 (2002).

    Article  Google Scholar 

  16. Rosner, B. Fundamentals of biostatistics, edn. 7 (Cengage Learning, Boston, 2011).

  17. Dreszer, T.R. et al. The UCSC Genome Browser database: extensions and updates 2011. Nucleic Acids Res. 40, D918–D923 (2012).

    Article  CAS  Google Scholar 

  18. Boisson-Dupuis, S. et al. Inborn errors of human STAT1: allelic heterogeneity governs the diversity of immunological and infectious phenotypes. Curr. Opin. Immunol. 24, 364–378 (2012).

    Article  CAS  Google Scholar 

  19. Hori, T. et al. Autosomal-dominant chronic mucocutaneous candidiasis with STAT1-mutation can be complicated with chronic active hepatitis and hypothyroidism. J. Clin. Immunol. 32, 1213–1220 (2012).

    Article  CAS  Google Scholar 

  20. Liu, L. et al. Gain-of-function human STAT1 mutations impair IL-17 immunity and underlie chronic mucocutaneous candidiasis. J. Exp. Med. 208, 1635–1648 (2011).

    Article  CAS  Google Scholar 

  21. van de Veerdonk, F.L. et al. STAT1 mutations in autosomal dominant chronic mucocutaneous candidiasis. N. Engl. J. Med. 365, 54–61 (2011).

    Article  CAS  Google Scholar 

  22. Uzel, G. et al. Dominant gain-of-function STAT1 mutations in FOXP3 wild-type immune dysregulation-polyendocrinopathy-enteropathy-X-linked-like syndrome. J. Allergy Clin. Immunol. 131, 1611–1623 (2013).

    Article  CAS  Google Scholar 

  23. Takezaki, S. et al. Chronic mucocutaneous candidiasis caused by a gain-of-function mutation in the STAT1 DNA-binding domain. J. Immunol. 189, 1521–1526 (2012).

    Article  CAS  Google Scholar 

  24. Soltész, B. et al. New and recurrent gain-of-function STAT1 mutations in patients with chronic mucocutaneous candidiasis from Eastern and Central Europe. J. Med. Genet. 50, 567–578 (2013).

    Article  Google Scholar 

  25. Garg, V. et al. GATA4 mutations cause human congenital heart defects and reveal an interaction with TBX5. Nature 424, 443–447 (2003).

    Article  CAS  Google Scholar 

  26. Abecasis, G.R., Cherny, S.S., Cookson, W.O. & Cardon, L.R. Merlin—rapid analysis of dense genetic maps using sparse gene flow trees. Nat. Genet. 30, 97–101 (2002).

    Article  CAS  Google Scholar 

  27. Feng, B.J., Tavtigian, S.V., Southey, M.C. & Goldgar, D.E. Design considerations for massively parallel sequencing studies of complex human disease. PLoS ONE 6, e23221 (2011).

    Article  CAS  Google Scholar 

  28. Coon, H. et al. Genetic risk factors in two Utah pedigrees at high risk for suicide. Transl. Psychiatr. 3, e325 (2013).

    Article  CAS  Google Scholar 

  29. Epstein, M.P. et al. A permutation procedure to correct for confounders in case-control studies, including tests of rare variation. Am. J. Hum. Genet. 91, 215–223 (2012).

    Article  CAS  Google Scholar 

  30. Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).

    Article  Google Scholar 

  31. Marchani, E.E. et al. Identification of rare variants from exome sequence in a large pedigree with autism. Hum. Hered. 74, 153–164 (2012).

    Article  CAS  Google Scholar 

  32. Heinzen, E.L. et al. De novo mutations in ATP1A3 cause alternating hemiplegia of childhood. Nat. Genet. 44, 1030–1034 (2012).

    Article  CAS  Google Scholar 

  33. Zhao, K. et al. Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nat. Commun. 2, 467 (2011).

    Article  Google Scholar 

  34. Vigouroux, Y. et al. Population structure and genetic diversity of New World maize races assessed by DNA microsatellites. Am. J. Bot. 95, 1240–1253 (2008).

    Article  Google Scholar 

  35. Shapiro, M.D. et al. Genomic diversity and evolution of the head crest in the rock pigeon. Science 339, 1063–1067 (2013).

    Article  CAS  Google Scholar 

  36. Domyan, E.T. et al. Epistatic and combinatorial effects of pigmentary gene mutations in the domestic pigeon. Curr. Biol. 24, 459–464 (2014).

    Article  CAS  Google Scholar 

  37. Elston, R.C. & Stewart, J. A general model for the genetic analysis of pedigree data. Hum. Hered. 21, 523–542 (1971).

    Article  CAS  Google Scholar 

  38. Madsen, B.E. & Browning, S.R. A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 5, e1000384 (2009).

    Article  Google Scholar 

  39. Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H. & Teller, E. Equation of state calculations by fast computing machines. J. Chem. Phys. 21, 1087–1092 (1953).

    Article  CAS  Google Scholar 

  40. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    Article  CAS  Google Scholar 

  41. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Article  CAS  Google Scholar 

  42. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

  43. Abecasis, G.R. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

    Article  Google Scholar 

  44. Drmanac, R. et al. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327, 78–81 (2010).

    Article  CAS  Google Scholar 

  45. Li, Y. et al. Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants. Nat. Genet. 42, 969–972 (2010).

    Article  CAS  Google Scholar 

  46. Reese, M.G. et al. A standard variation file format for human genome sequences. Genome Biol. 11, R88 (2010).

    Article  Google Scholar 

Download references

Acknowledgements

An allocation of computer time on the University of Texas MD Anderson Research Computing High Performance Computing (HPC) facility is gratefully acknowledged. This work was supported by US National Institutes of Health grants R01 GM104390 (M.Y., L.B.J., C.D.H. and H.H.), R01 DK091374 (S.L.G., C.D.H. and L.B.J.), R01 CA164138 (S.V.T. and C.D.H.), R44HG006579 (M.G.R. and M.Y.) and R01 GM59290 (L.B.J.) as well as the University of Luxembourg—Institute for Systems Biology Program. D.S. was supported by grants from the NHLBI (UO1 HL100406 and U01 HL098179) related to this project. H.C. was supported by NIH grants R01 MH094400 and R01 MH099134. H.H. was supported by the MD Anderson Cancer Center Odyssey Program. J.X. was supported by NIH grant R00HG005846.

Author information

Authors and Affiliations

Authors

Contributions

C.D.H. conceived of the project. C.D.H. oversaw and coordinated the research. C.D.H. and H.H. designed the algorithms. H.H. and B.M. wrote the software. C.D.H., H.H. and P.S. contributed to the statistical development. C.D.H., H.H., J.C.R., M.Y., S.V.T., D.S., K.V.V., L.H., L.B.J., M.G.R. and S.L.G. designed the experiments. H.H., H.C., W.W., R.L.M., J.D.D., S.W., H.L., J.X., Shankaracharya, R.H., B.M., J.C. and G.G. performed the experiments. H.H., C.D.H., M.Y., S.V.T., S.L.G. and L.B.J. analyzed and interpreted the data. H.H. generated the figures. H.H., C.D.H., L.B.J., M.Y., S.L.G., P.S., and S.V.T. wrote the paper. S.L.G., D.S., V.G., D.J.G., L.H., H.L., R.H., K.V.V., R.L.M., J.D.D., G.G. participated in pedigree identification, recruitment and validation.

Corresponding authors

Correspondence to Mark Yandell or Chad D Huff.

Ethics declarations

Competing interests

M.G.R. is a founder and officer of Omicia, Inc.

Supplementary information

Supplementary Text and Figures

Supplementary Notes 1–4, Supplementary Figures 1–10 and Supplementary Table 1 (PDF 10164 kb)

Supplementary Code

pVAAST source code (ZIP 97 kb)

Source data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, H., Roach, J., Coon, H. et al. A unified test of linkage analysis and rare-variant association for analysis of pedigree sequence data. Nat Biotechnol 32, 663–669 (2014). https://doi.org/10.1038/nbt.2895

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt.2895

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics