FaST linear mixed models for genome-wide association studies


We describe factored spectrally transformed linear mixed models (FaST-LMM), an algorithm for genome-wide association studies (GWAS) that scales linearly with cohort size in both run time and memory use. On Wellcome Trust data for 15,000 individuals, FaST-LMM ran an order of magnitude faster than current efficient algorithms. Our algorithm can analyze data for 120,000 individuals in just a few hours, whereas current algorithms fail on data for even 20,000 individuals (

Figure 1: Computational costs of FaST-LMM and EMMAX.
Figure 2: Accuracy of association P values resulting from SNP sampling on WTCCC data for the Crohn's disease phenotype.


We thank E. Renshaw for help with implementation of Brent's method and the χ2 distribution function, J. Carlson for help with tools used to manage the data and deploy runs on our computer cluster, and N. Pfeifer for an implementation of the ATT. A full list of the investigators who contributed to the generation of the Wellcome Trust Case-Control Consortium data we used in this study is available from Funding for the project was provided by the Wellcome Trust (076113 and 085475). The GAW14 data were provided by the members of the Collaborative Study on the Genetics of Alcoholism (US National Institutes of Health grant U10 AA008401).

C.L., J.L. and D.H. designed and performed research, contributed analytic tools, analyzed data and wrote the paper. Y.L. designed and performed research. C.M.K. and R.I.D contributed analytic tools.

Correspondence to Christoph Lippert, Jennifer Listgarten or David Heckerman.

C.L., J.L., C.M.K., R.I.D. and D.H. are employees of Microsoft. Y.L. was employed by Microsoft while performing this research.

Lippert, C., Listgarten, J., Liu, Y. et al. FaST linear mixed models for genome-wide association studies. Nat Methods 8, 833–835 (2011).

