Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

Accurate liability estimation improves power in ascertained case-control studies

Abstract

Linear mixed models (LMMs) have emerged as the method of choice for confounded genome-wide association studies. However, the performance of LMMs in nonrandomly ascertained case-control studies deteriorates with increasing sample size. We propose a framework called LEAP (liability estimator as a phenotype; https://github.com/omerwe/LEAP) that tests for association with estimated latent values corresponding to severity of phenotype, and we demonstrate that this can lead to a substantial power increase.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Synthetic data demonstrating that the power of LEAP increases with sample size and disease heritability.
Figure 2: Analysis of real data sets with LEAP and other methods.

Similar content being viewed by others

References

  1. Welter, D. et al. Nucleic Acids Res. 42, D1001–D1006 (2014).

    Article  CAS  PubMed  Google Scholar 

  2. Golan, D., Lander, E.S. & Rosset, S. Proc. Natl. Acad. Sci. USA 111, E5272–E5281 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Price, A.L., Zaitlen, N.A., Reich, D. & Patterson, N. Nat. Rev. Genet. 11, 459–463 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Fakiola, M. et al. Nat. Genet. 45, 208–213 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Sawcer, S. et al. Nature 476, 214–219 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Tsoi, L.C. et al. Nat. Genet. 44, 1341–1348 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Yang, J., Zaitlen, N.A., Goddard, M.E., Visscher, P.M. & Price, A.L. Nat. Genet. 46, 100–106 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Dempster, E.R. & Lerner, I.M. Genetics 35, 212–236 (1950).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. Zaitlen, N. et al. Bioinformatics 28, 1729–1737 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Zaitlen, N. et al. PLoS Genet. 8, e1003032 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Hayeck, T. et al. Preprint at http://biorxiv.org/content/early/2014/09/04/008755 (2014).

  12. Price, A.L. et al. Nat. Genet. 38, 904–909 (2006).

    CAS  PubMed  Google Scholar 

  13. Wright, S. Ann. Eugen. 15, 323–354 (1949).

    Article  Google Scholar 

  14. The Wellcome Trust Case Control Consortium. Nature 447, 661–678 (2007).

  15. The UK IBD Genetics Consortium & the Wellcome Trust Case Control Consortium 2. Nat. Genet. 41, 1330–1334 (2009).

  16. Yang, J. et al. Eur. J. Hum. Genet. 19, 807–812 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Hindorff, L.A. et al. Proc. Natl. Acad. Sci. USA 106, 9362–9367 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Fusi, N., Lippert, C., Lawrence, N.D. & Stegle, O. Nat. Commun. 5, 4890 (2014).

    Article  CAS  PubMed  Google Scholar 

  19. Zhou, X., Carbonetto, P. & Stephens, M. PLoS Genet. 9, e1003264 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Widmer, C. et al. Sci. Rep. 4, 6874 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Yang, J. et al. Nat. Genet. 42, 565–569 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. Lee, S.H., Wray, N.R., Goddard, M.E. & Visscher, P.M. Am. J. Hum. Genet. 88, 294–305 (2011).

    PubMed  PubMed Central  Google Scholar 

  23. Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction 2nd edn. (Springer, 2009).

  24. Lippert, C. et al. Nat. Methods 8, 833–835 (2011).

    Article  CAS  PubMed  Google Scholar 

  25. Listgarten, J. et al. Nat. Methods 9, 525–526 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Patterson, N., Price, A.L. & Reich, D. PLoS Genet. 2, e190 (2006).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Balding, D.J. & Nichols, R.A. Genetica 96, 3–12 (1995).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This work was supported by the Israeli Science Foundation. This study makes use of data generated by the Wellcome Trust Case Control Consortium. A full list of the investigators who contributed to the generation of the data is available from http://www.wtccc.org.uk. Funding for the project was provided by the Wellcome Trust under award 076113. The MS and ulcerative colitis data sets were filtered by A. Gusev. We thank N. Zaitlen and S. Rosset for helpful discussions.

Author information

Authors and Affiliations

Authors

Contributions

O.W. and D.H. designed research, conducted experiments, contributed analytic tools, analyzed data and wrote the paper. C.L. and D.G. designed research and wrote the paper.

Corresponding authors

Correspondence to Omer Weissbrod or David Heckerman.

Ethics declarations

Competing interests

C.L. and D.H. performed work on this manuscript while employed by Microsoft.

Integrated supplementary information

Supplementary Figure 1 Liability distributions in balanced case-control data sets.

Individuals with liability greater than the prevalence-specific cutoff are cases, and the remainder are controls. The liabilities of controls and of cases follow a zero-mean normal distribution, conditioned on being smaller or greater than the liability cutoff, respectively. The distribution of case liabilities becomes increasingly sharply peaked as prevalence decreases.

Supplementary Figure 2 Type 1 error rates under different prevalence levels.

All experiments were run with FST=0.01 and samples where 30% of the individuals in one of the two populations are sib-pairs. The gray shaded area is the 95% confidence interval of the null distribution.

Supplementary Figure 3 Type 1 error rates under different population structure and family relatedness levels.

Supplementary Figure 4 Power evaluations with different sample sizes, under 0.1% prevalence.

The mean relative increase in power of every method over an LMM is shown next to its name, in percentage units. For example, the number 3 indicates that a method has average power 3% greater than that of an LMM. Also shown is the 95% confidence interval of the mean increase, obtained via 10,000 bootstrap samples.

Supplementary Figure 5 Power evaluations under different prevalence levels, with samples of 6,000 individuals.

Supplementary Figure 6 Similarity between the estimated and true liabilities of controls.

The figure shows results for data sets with 6,000 individuals, and their 95% confidence intervals (computed via 10-fold cross validation for each data set, averaged over 10 data sets). The similarity measures shown are the Pearson correlation and the root mean square error, after normalizing the liabilities to have zero mean and unit variance. The evaluation was applied only for controls, because liabilities of cases are trivial to estimate, as they are tightly clustered near the liability cutoff (Fig. S1).

Supplementary Figure 7 Power evaluations under different heritability levels.

Supplementary Figure 8 Population-structure experiments.

The figure shows the mean ratio of normalized test statistics for causal SNPs between each evaluated method and an LMM under 0.1% prevalence, 30% sib-pairs, and various population structure levels.

Supplementary Figure 9 Power evaluations under different population-structure levels.

Supplementary Figure 10 Family relatedness experiments.

The figure shows the mean ratio of normalized test statistics for causal SNPs between each evaluated method and an LMM under 0.1% prevalence, FST=0.01 and various percentages of individuals in one of the two populations who are sib-pairs.

Supplementary Figure 11 Power evaluations under different family relatedness levels.

Supplementary Figure 12 Polygenicity experiments.

The figure shows the mean ratio of normalized test statistics for causal SNPs between each evaluated method and an LMM under 0.1% prevalence and various numbers of causal SNPs.

Supplementary Figure 13 Power evaluations under different numbers of causal SNPs.

Supplementary Figure 14 The mean ratio of normalized test statistics for causal SNPs between each evaluated method and an LMM, in the presence of covariates.

LEAP+covar is a variant of LEAP that uses covariates as well as genetic variants for liability estimation (Supplementary Note 2).

Supplementary Figure 15 Power evaluations in the presence of covariates.

LEAP+covar is a variant of LEAP that uses covariates as well as genetic variants for liability estimation (Supplementary Note 2).

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–15, Supplementary Tables 1–3 and Supplementary Notes 1–3 (PDF 5514 kb)

Source data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Weissbrod, O., Lippert, C., Geiger, D. et al. Accurate liability estimation improves power in ascertained case-control studies. Nat Methods 12, 332–334 (2015). https://doi.org/10.1038/nmeth.3285

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.3285

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research