Abstract
Linear mixed models (LMMs) have emerged as the method of choice for confounded genome-wide association studies. However, the performance of LMMs in nonrandomly ascertained case-control studies deteriorates with increasing sample size. We propose a framework called LEAP (liability estimator as a phenotype; https://github.com/omerwe/LEAP) that tests for association with estimated latent values corresponding to severity of phenotype, and we demonstrate that this can lead to a substantial power increase.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Comparison of mixed model based approaches for correcting for population substructure with application to extreme phenotype sampling
BMC Genomics Open Access 04 February 2022
-
Exploration of a diversity of computational and statistical measures of association for genome-wide genetic studies
BioData Mining Open Access 09 July 2019
-
OSCA: a tool for omic-data-based complex trait analysis
Genome Biology Open Access 28 May 2019
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout


References
Welter, D. et al. Nucleic Acids Res. 42, D1001–D1006 (2014).
Golan, D., Lander, E.S. & Rosset, S. Proc. Natl. Acad. Sci. USA 111, E5272–E5281 (2014).
Price, A.L., Zaitlen, N.A., Reich, D. & Patterson, N. Nat. Rev. Genet. 11, 459–463 (2010).
Fakiola, M. et al. Nat. Genet. 45, 208–213 (2013).
Sawcer, S. et al. Nature 476, 214–219 (2011).
Tsoi, L.C. et al. Nat. Genet. 44, 1341–1348 (2012).
Yang, J., Zaitlen, N.A., Goddard, M.E., Visscher, P.M. & Price, A.L. Nat. Genet. 46, 100–106 (2014).
Dempster, E.R. & Lerner, I.M. Genetics 35, 212–236 (1950).
Zaitlen, N. et al. Bioinformatics 28, 1729–1737 (2012).
Zaitlen, N. et al. PLoS Genet. 8, e1003032 (2012).
Hayeck, T. et al. Preprint at http://biorxiv.org/content/early/2014/09/04/008755 (2014).
Price, A.L. et al. Nat. Genet. 38, 904–909 (2006).
Wright, S. Ann. Eugen. 15, 323–354 (1949).
The Wellcome Trust Case Control Consortium. Nature 447, 661–678 (2007).
The UK IBD Genetics Consortium & the Wellcome Trust Case Control Consortium 2. Nat. Genet. 41, 1330–1334 (2009).
Yang, J. et al. Eur. J. Hum. Genet. 19, 807–812 (2011).
Hindorff, L.A. et al. Proc. Natl. Acad. Sci. USA 106, 9362–9367 (2009).
Fusi, N., Lippert, C., Lawrence, N.D. & Stegle, O. Nat. Commun. 5, 4890 (2014).
Zhou, X., Carbonetto, P. & Stephens, M. PLoS Genet. 9, e1003264 (2013).
Widmer, C. et al. Sci. Rep. 4, 6874 (2014).
Yang, J. et al. Nat. Genet. 42, 565–569 (2010).
Lee, S.H., Wray, N.R., Goddard, M.E. & Visscher, P.M. Am. J. Hum. Genet. 88, 294–305 (2011).
Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction 2nd edn. (Springer, 2009).
Lippert, C. et al. Nat. Methods 8, 833–835 (2011).
Listgarten, J. et al. Nat. Methods 9, 525–526 (2012).
Patterson, N., Price, A.L. & Reich, D. PLoS Genet. 2, e190 (2006).
Balding, D.J. & Nichols, R.A. Genetica 96, 3–12 (1995).
Acknowledgements
This work was supported by the Israeli Science Foundation. This study makes use of data generated by the Wellcome Trust Case Control Consortium. A full list of the investigators who contributed to the generation of the data is available from http://www.wtccc.org.uk. Funding for the project was provided by the Wellcome Trust under award 076113. The MS and ulcerative colitis data sets were filtered by A. Gusev. We thank N. Zaitlen and S. Rosset for helpful discussions.
Author information
Authors and Affiliations
Contributions
O.W. and D.H. designed research, conducted experiments, contributed analytic tools, analyzed data and wrote the paper. C.L. and D.G. designed research and wrote the paper.
Corresponding authors
Ethics declarations
Competing interests
C.L. and D.H. performed work on this manuscript while employed by Microsoft.
Integrated supplementary information
Supplementary Figure 1 Liability distributions in balanced case-control data sets.
Individuals with liability greater than the prevalence-specific cutoff are cases, and the remainder are controls. The liabilities of controls and of cases follow a zero-mean normal distribution, conditioned on being smaller or greater than the liability cutoff, respectively. The distribution of case liabilities becomes increasingly sharply peaked as prevalence decreases.
Supplementary Figure 2 Type 1 error rates under different prevalence levels.
All experiments were run with FST=0.01 and samples where 30% of the individuals in one of the two populations are sib-pairs. The gray shaded area is the 95% confidence interval of the null distribution.
Supplementary Figure 4 Power evaluations with different sample sizes, under 0.1% prevalence.
The mean relative increase in power of every method over an LMM is shown next to its name, in percentage units. For example, the number 3 indicates that a method has average power 3% greater than that of an LMM. Also shown is the 95% confidence interval of the mean increase, obtained via 10,000 bootstrap samples.
Supplementary Figure 6 Similarity between the estimated and true liabilities of controls.
The figure shows results for data sets with 6,000 individuals, and their 95% confidence intervals (computed via 10-fold cross validation for each data set, averaged over 10 data sets). The similarity measures shown are the Pearson correlation and the root mean square error, after normalizing the liabilities to have zero mean and unit variance. The evaluation was applied only for controls, because liabilities of cases are trivial to estimate, as they are tightly clustered near the liability cutoff (Fig. S1).
Supplementary Figure 8 Population-structure experiments.
The figure shows the mean ratio of normalized test statistics for causal SNPs between each evaluated method and an LMM under 0.1% prevalence, 30% sib-pairs, and various population structure levels.
Supplementary Figure 10 Family relatedness experiments.
The figure shows the mean ratio of normalized test statistics for causal SNPs between each evaluated method and an LMM under 0.1% prevalence, FST=0.01 and various percentages of individuals in one of the two populations who are sib-pairs.
Supplementary Figure 12 Polygenicity experiments.
The figure shows the mean ratio of normalized test statistics for causal SNPs between each evaluated method and an LMM under 0.1% prevalence and various numbers of causal SNPs.
Supplementary Figure 14 The mean ratio of normalized test statistics for causal SNPs between each evaluated method and an LMM, in the presence of covariates.
LEAP+covar is a variant of LEAP that uses covariates as well as genetic variants for liability estimation (Supplementary Note 2).
Supplementary Figure 15 Power evaluations in the presence of covariates.
LEAP+covar is a variant of LEAP that uses covariates as well as genetic variants for liability estimation (Supplementary Note 2).
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–15, Supplementary Tables 1–3 and Supplementary Notes 1–3 (PDF 5514 kb)
Source data
Rights and permissions
About this article
Cite this article
Weissbrod, O., Lippert, C., Geiger, D. et al. Accurate liability estimation improves power in ascertained case-control studies. Nat Methods 12, 332–334 (2015). https://doi.org/10.1038/nmeth.3285
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.3285
This article is cited by
-
Comparison of mixed model based approaches for correcting for population substructure with application to extreme phenotype sampling
BMC Genomics (2022)
-
Liability threshold modeling of case–control status and family history of disease increases association power
Nature Genetics (2020)
-
Exploration of a diversity of computational and statistical measures of association for genome-wide genetic studies
BioData Mining (2019)
-
OSCA: a tool for omic-data-based complex trait analysis
Genome Biology (2019)
-
Case–control association mapping by proxy using family history of disease
Nature Genetics (2017)