A unified test of linkage analysis and rare-variant association for analysis of pedigree sequence data

Article metrics


High-throughput sequencing of related individuals has become an important tool for studying human disease. However, owing to technical complexity and lack of available tools, most pedigree-based sequencing studies rely on an ad hoc combination of suboptimal analyses. Here we present pedigree-VAAST (pVAAST), a disease-gene identification tool designed for high-throughput sequence data in pedigrees. pVAAST uses a sequence-based model to perform variant and gene-based linkage analysis. Linkage information is then combined with functional prediction and rare variant case-control association information in a unified statistical framework. pVAAST outperformed linkage and rare-variant association tests in simulations and identified disease-causing genes from whole-genome sequence data in three human pedigrees with dominant, recessive and de novo inheritance patterns. The approach is robust to incomplete penetrance and locus heterogeneity and is applicable to a wide variety of genetic traits. pVAAST maintains high power across studies of monogenic, high-penetrance phenotypes in a single pedigree to highly polygenic, common phenotypes involving hundreds of pedigrees.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: A schematic illustration of pVAAST.
Figure 2: Rare Mendelian and common complex disease simulations.
Figure 3: pVAAST results on the enteropathy pedigree.
Figure 4: pVAAST identifies the dominant causal gene GATA4 in cardiac septal defect pedigree.
Figure 5: pVAAST identifies the recessive causal genes for Miller's syndrome (DHODH) and primary ciliary dyskinesia (DNAH5) with a two-generation pedigree.
Figure 6: The genome-wide ranking and lod score of GATA4 in challenging situations of pedigree studies.


  1. 1

    Borecki, I.B. & Province, M.A. Linkage and association: basic concepts. Adv. Genet. 60, 51–74 (2008).

  2. 2

    Muller, H.J. Our load of mutations. Am. J. Hum. Genet. 2, 111–176 (1950).

  3. 3

    Lee, S. et al. Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am. J. Hum. Genet. 91, 224–237 (2012).

  4. 4

    Neale, B.M. et al. Testing for an unusual distribution of rare variants. PLoS Genet. 7, e1001322 (2011).

  5. 5

    Ng, P.C. & Henikoff, S. Predicting the effects of amino acid substitutions on protein function. Annu. Rev. Genomics Hum. Genet. 7, 61–80 (2006).

  6. 6

    Adzhubei, I.A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).

  7. 7

    Roach, J.C. et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328, 636–639 (2010).

  8. 8

    Schaid, D.J., McDonnell, S.K., Sinnwell, J.P. & Thibodeau, S.N. Multiple genetic variant association testing by collapsing and kernel methods with pedigree or population structured data. Genet. Epidemiol. 37, 409–418 (2013).

  9. 9

    Oualkacha, K. et al. Adjusted sequence kernel association test for rare variants controlling for cryptic and family relatedness. Genet. Epidemiol. 37, 366–376 (2013).

  10. 10

    Hoischen, A. et al. De novo mutations of SETBP1 cause Schinzel-Giedion syndrome. Nat. Genet. 42, 483–485 (2010).

  11. 11

    Sebat, J. et al. Strong association of de novo copy number mutations with autism. Science 316, 445–449 (2007).

  12. 12

    Yandell, M. et al. A probabilistic disease-gene finder for personal genomes. Genome Res. 21, 1529–1542 (2011).

  13. 13

    Hu, H. et al. VAAST 2.0: improved variant classification and disease-gene identification using a conservation-controlled amino acid substitution matrix. Genet. Epidemiol. 37, 622–634 (2013).

  14. 14

    Jung, J., Weeks, D.E. & Feingold, E. Gene-dropping vs. empirical variance estimation for allele-sharing linkage statistics. Genet. Epidemiol. 30, 652–665 (2006).

  15. 15

    Fishelson, M. & Geiger, D. Exact genetic linkage computations for general pedigrees. Bioinformatics 18 (suppl. 1), S189–S198 (2002).

  16. 16

    Rosner, B. Fundamentals of biostatistics, edn. 7 (Cengage Learning, Boston, 2011).

  17. 17

    Dreszer, T.R. et al. The UCSC Genome Browser database: extensions and updates 2011. Nucleic Acids Res. 40, D918–D923 (2012).

  18. 18

    Boisson-Dupuis, S. et al. Inborn errors of human STAT1: allelic heterogeneity governs the diversity of immunological and infectious phenotypes. Curr. Opin. Immunol. 24, 364–378 (2012).

  19. 19

    Hori, T. et al. Autosomal-dominant chronic mucocutaneous candidiasis with STAT1-mutation can be complicated with chronic active hepatitis and hypothyroidism. J. Clin. Immunol. 32, 1213–1220 (2012).

  20. 20

    Liu, L. et al. Gain-of-function human STAT1 mutations impair IL-17 immunity and underlie chronic mucocutaneous candidiasis. J. Exp. Med. 208, 1635–1648 (2011).

  21. 21

    van de Veerdonk, F.L. et al. STAT1 mutations in autosomal dominant chronic mucocutaneous candidiasis. N. Engl. J. Med. 365, 54–61 (2011).

  22. 22

    Uzel, G. et al. Dominant gain-of-function STAT1 mutations in FOXP3 wild-type immune dysregulation-polyendocrinopathy-enteropathy-X-linked-like syndrome. J. Allergy Clin. Immunol. 131, 1611–1623 (2013).

  23. 23

    Takezaki, S. et al. Chronic mucocutaneous candidiasis caused by a gain-of-function mutation in the STAT1 DNA-binding domain. J. Immunol. 189, 1521–1526 (2012).

  24. 24

    Soltész, B. et al. New and recurrent gain-of-function STAT1 mutations in patients with chronic mucocutaneous candidiasis from Eastern and Central Europe. J. Med. Genet. 50, 567–578 (2013).

  25. 25

    Garg, V. et al. GATA4 mutations cause human congenital heart defects and reveal an interaction with TBX5. Nature 424, 443–447 (2003).

  26. 26

    Abecasis, G.R., Cherny, S.S., Cookson, W.O. & Cardon, L.R. Merlin—rapid analysis of dense genetic maps using sparse gene flow trees. Nat. Genet. 30, 97–101 (2002).

  27. 27

    Feng, B.J., Tavtigian, S.V., Southey, M.C. & Goldgar, D.E. Design considerations for massively parallel sequencing studies of complex human disease. PLoS ONE 6, e23221 (2011).

  28. 28

    Coon, H. et al. Genetic risk factors in two Utah pedigrees at high risk for suicide. Transl. Psychiatr. 3, e325 (2013).

  29. 29

    Epstein, M.P. et al. A permutation procedure to correct for confounders in case-control studies, including tests of rare variation. Am. J. Hum. Genet. 91, 215–223 (2012).

  30. 30

    Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).

  31. 31

    Marchani, E.E. et al. Identification of rare variants from exome sequence in a large pedigree with autism. Hum. Hered. 74, 153–164 (2012).

  32. 32

    Heinzen, E.L. et al. De novo mutations in ATP1A3 cause alternating hemiplegia of childhood. Nat. Genet. 44, 1030–1034 (2012).

  33. 33

    Zhao, K. et al. Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nat. Commun. 2, 467 (2011).

  34. 34

    Vigouroux, Y. et al. Population structure and genetic diversity of New World maize races assessed by DNA microsatellites. Am. J. Bot. 95, 1240–1253 (2008).

  35. 35

    Shapiro, M.D. et al. Genomic diversity and evolution of the head crest in the rock pigeon. Science 339, 1063–1067 (2013).

  36. 36

    Domyan, E.T. et al. Epistatic and combinatorial effects of pigmentary gene mutations in the domestic pigeon. Curr. Biol. 24, 459–464 (2014).

  37. 37

    Elston, R.C. & Stewart, J. A general model for the genetic analysis of pedigree data. Hum. Hered. 21, 523–542 (1971).

  38. 38

    Madsen, B.E. & Browning, S.R. A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 5, e1000384 (2009).

  39. 39

    Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H. & Teller, E. Equation of state calculations by fast computing machines. J. Chem. Phys. 21, 1087–1092 (1953).

  40. 40

    McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

  41. 41

    Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

  42. 42

    A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

  43. 43

    Abecasis, G.R. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

  44. 44

    Drmanac, R. et al. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327, 78–81 (2010).

  45. 45

    Li, Y. et al. Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants. Nat. Genet. 42, 969–972 (2010).

  46. 46

    Reese, M.G. et al. A standard variation file format for human genome sequences. Genome Biol. 11, R88 (2010).

Download references


An allocation of computer time on the University of Texas MD Anderson Research Computing High Performance Computing (HPC) facility is gratefully acknowledged. This work was supported by US National Institutes of Health grants R01 GM104390 (M.Y., L.B.J., C.D.H. and H.H.), R01 DK091374 (S.L.G., C.D.H. and L.B.J.), R01 CA164138 (S.V.T. and C.D.H.), R44HG006579 (M.G.R. and M.Y.) and R01 GM59290 (L.B.J.) as well as the University of Luxembourg—Institute for Systems Biology Program. D.S. was supported by grants from the NHLBI (UO1 HL100406 and U01 HL098179) related to this project. H.C. was supported by NIH grants R01 MH094400 and R01 MH099134. H.H. was supported by the MD Anderson Cancer Center Odyssey Program. J.X. was supported by NIH grant R00HG005846.

Author information

C.D.H. conceived of the project. C.D.H. oversaw and coordinated the research. C.D.H. and H.H. designed the algorithms. H.H. and B.M. wrote the software. C.D.H., H.H. and P.S. contributed to the statistical development. C.D.H., H.H., J.C.R., M.Y., S.V.T., D.S., K.V.V., L.H., L.B.J., M.G.R. and S.L.G. designed the experiments. H.H., H.C., W.W., R.L.M., J.D.D., S.W., H.L., J.X., Shankaracharya, R.H., B.M., J.C. and G.G. performed the experiments. H.H., C.D.H., M.Y., S.V.T., S.L.G. and L.B.J. analyzed and interpreted the data. H.H. generated the figures. H.H., C.D.H., L.B.J., M.Y., S.L.G., P.S., and S.V.T. wrote the paper. S.L.G., D.S., V.G., D.J.G., L.H., H.L., R.H., K.V.V., R.L.M., J.D.D., G.G. participated in pedigree identification, recruitment and validation.

Correspondence to Mark Yandell or Chad D Huff.

Ethics declarations

Competing interests

M.G.R. is a founder and officer of Omicia, Inc.

Supplementary information

Supplementary Text and Figures

Supplementary Notes 1–4, Supplementary Figures 1–10 and Supplementary Table 1 (PDF 10164 kb)

Supplementary Code

pVAAST source code (ZIP 97 kb)

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hu, H., Roach, J., Coon, H. et al. A unified test of linkage analysis and rare-variant association for analysis of pedigree sequence data. Nat Biotechnol 32, 663–669 (2014) doi:10.1038/nbt.2895

Download citation

Further reading