FaST linear mixed models for genome-wide association studies

Journal name:
Nature Methods
Volume:
8,
Pages:
833–835
Year published:
DOI:
doi:10.1038/nmeth.1681
Received
Accepted
Published online

We describe factored spectrally transformed linear mixed models (FaST-LMM), an algorithm for genome-wide association studies (GWAS) that scales linearly with cohort size in both run time and memory use. On Wellcome Trust data for 15,000 individuals, FaST-LMM ran an order of magnitude faster than current efficient algorithms. Our algorithm can analyze data for 120,000 individuals in just a few hours, whereas current algorithms fail on data for even 20,000 individuals (http://mscompbio.codeplex.com/).

At a glance

Figures

  1. Computational costs of FaST-LMM and EMMAX.
    Figure 1: Computational costs of FaST-LMM and EMMAX.

    (a,b) Memory footprint (a) and run time (b) of the algorithms running on a single processor as a function of the cohort size in synthetic datasets based on GAW14 data. In each run, we used 7,579 SNPs both to estimate genetic similarity (RRM for FaST-LMM and identity by state for EMMAX) and to test for association. In the 'FaST-LMM full' analysis, the variance parameters were re-estimated for each test, and in the FaST-LMM analysis these parameters were estimated only once for the null model, as in EMMAX. FaST-LMM and FaST-LMM full had the same memory footprint. EMMAX would not run on the datasets that contained 20 or more times the cohort size of the GAW14 data because the memory required to store the large matrices exceeded the 32 GB available.

  2. Accuracy of association P values resulting from SNP sampling on WTCCC data for the Crohn's disease phenotype.
    Figure 2: Accuracy of association P values resulting from SNP sampling on WTCCC data for the Crohn's disease phenotype.

    Each point in the plot shows the negative log P values of association for a particular SNP from an LMM using a 4,000-SNP sample and all SNPs to compute the RRM. The complete set used all 340,000 SNPs from all but chromosome 1, whereas the 4,000-SNP sample used equally spaced SNPs from these chromosomes. All 28,000 SNPs in chromosome 1 were tested. Dashed lines show the genome-wide significance threshold (5 × 10−7). The correlation for the points in the plot is 0.97.

References

  1. Balding, D.J. Nat. Rev. Genet. 7, 781791 (2006).
  2. Yu, J. et al. Nat. Genet. 38, 203208 (2006).
  3. Kang, H.M. et al. Genetics 107, 17091723 (2008).
  4. Zhang, Z. et al. Nat. Genet. 42, 355360 (2010).
  5. Kang, H.M. et al. Nat. Genet. 42, 348354 (2010).
  6. Zhao, K. et al. PLoS Genet. 3, e4 (2007).
  7. Price, A.L., Zaitlen, N.A., Reich, D. & Patterson, N. Nat. Rev. Genet. 11, 459463 (2010).
  8. Henderson, C.R. Applications of Linear Models in Animal Breeding (University of Guelph, Guelph, Ontario, Canada, 1984).
  9. Goddard, M.E., Wray, N., Verbyla, K. & Visscher, P.M. Stat. Sci. 24, 517529 (2009).
  10. Hayes, B.J., Visscher, P.M. & Goddard, M.E. Genet. Res. 91, 4760 (2009).
  11. Fisher, R. Trans. R. Soc. Edinb. 52, 399433 (1918).
  12. Yang, J. et al. Nat. Genet. 42, 565569 (2010).
  13. Welham, S. & Thompson, R. J. R. Stat. Soc. B 59, 701714 (1997).
  14. Demidenko, E. Mixed Models Theory and Applications (Wiley, Hoboken, New Jersey, USA, 2004).
  15. Listgarten, J., Kadie, C., Schadt, E.E. & Heckerman, D. Proc. Natl. Acad. Sci. USA 107, 1646516470 (2010).
  16. Devlin, B. & Roeder, K. Biometrics 55, 9971004 (1999).
  17. Edenberg, H.J. et al. BMC Genet. 6 (suppl. 1), S2 (2005).
  18. Wellcome Trust Case Control Consortium. Nature 447, 661678 (2007).

Download references

Author information

  1. These authors contributed equally to this work.

    • Christoph Lippert,
    • Jennifer Listgarten &
    • David Heckerman

Affiliations

  1. Microsoft Research, Los Angeles, California, USA.

    • Christoph Lippert,
    • Jennifer Listgarten,
    • Ying Liu,
    • Carl M Kadie,
    • Robert I Davidson &
    • David Heckerman
  2. Max Planck Institutes Tübingen, Tübingen, Germany.

    • Christoph Lippert

Contributions

C.L., J.L. and D.H. designed and performed research, contributed analytic tools, analyzed data and wrote the paper. Y.L. designed and performed research. C.M.K. and R.I.D contributed analytic tools.

Competing financial interests

C.L., J.L., C.M.K., R.I.D. and D.H. are employees of Microsoft. Y.L. was employed by Microsoft while performing this research.

Corresponding authors

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (406K)

    Supplementary Figure 1, Supplementary Notes 1–2

Zip files

  1. Supplementary Software 1 (106M)

    FaST-LMM software and associated files.

Additional data