A resource-efficient tool for mixed model association analysis of large-scale data

Abstract

The genome-wide association study (GWAS) has been widely used as an experimental design to detect associations between genetic variants and a phenotype. Two major confounding factors, population stratification and relatedness, could potentially lead to inflated GWAS test statistics and hence to spurious associations. Mixed linear model (MLM)-based approaches can be used to account for sample structure. However, genome-wide association (GWA) analyses in biobank samples such as the UK Biobank (UKB) often exceed the capability of most existing MLM-based tools especially if the number of traits is large. Here, we develop an MLM-based tool (fastGWA) that controls for population stratification by principal components and for relatedness by a sparse genetic relationship matrix for GWA analyses of biobank-scale data. We demonstrate by extensive simulations that fastGWA is reliable, robust and highly resource-efficient. We then apply fastGWA to 2,173 traits on array-genotyped and imputed samples from 456,422 individuals and to 2,048 traits on whole-exome-sequenced samples from 46,191 individuals in the UKB.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Median λ of null variants under different simulation scenarios.
Fig. 2: Mean χ2 of causal variants under different simulation scenarios.
Fig. 3: Estimates of genetic variance by fastGWA and BOLT-LMM for 24 traits in the UKB.

Data availability

The individual-level genotype and phenotype data are available through formal application to the UK Biobank (http://www.ukbiobank.ac.uk). All the summary-level statistics are available at our data portal (http://cnsgenomics.com/software/gcta/#DataResource). Source data for Extended Data Figs. 13 are available online.

Code availability

fastGWA is available at http://cnsgenomics.com/software/gcta/#fastGWA. The fastGWA online tool was built on the code modified from the PheWeb project (https://github.com/statgen/pheweb/).

References

  1. 1.

    Visscher, P. M. et al. 10 Years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. 2.

    Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).

    CAS  PubMed  Google Scholar 

  3. 3.

    Klein, R. J. et al. Complement factor H polymorphism in age-related macular degeneration. Science 308, 385–389 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. 4.

    DeWan, A. et al. HTRA1 promoter polymorphism in wet age-related macular degeneration. Science 314, 989–992 (2006).

    CAS  PubMed  Google Scholar 

  5. 5.

    Burton, P. R. et al. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).

    CAS  Google Scholar 

  6. 6.

    Frayling, T. M. et al. A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 316, 889–894 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. 7.

    Scott, L. J. et al. A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 316, 1341–1345 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. 8.

    Sanna, S. et al. Common variants in the GDF5-UQCC region are associated with variation in human height. Nat. Genet. 40, 198–203 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. 9.

    Unoki, H. et al. SNPs in KCNQ1 are associated with susceptibility to type 2 diabetes in east asian and european populations. Nat. Genet. 40, 1098–1102 (2008).

    CAS  PubMed  Google Scholar 

  10. 10.

    Yasuda, K. et al. Variants in KCNQ1 are associated with susceptibility to type 2 diabetes mellitus. Nat. Genet. 40, 1092–1097 (2008).

    CAS  PubMed  Google Scholar 

  11. 11.

    Hunter, D. J. et al. A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat. Genet. 39, 870–874 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. 12.

    Aulchenko, Y. S., Ripke, S., Isaacs, A. & Van Duijn, C. M. GenABEL: an R library for genome-wide association analysis. Bioinformatics 23, 1294–1296 (2007).

    CAS  PubMed  Google Scholar 

  13. 13.

    Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39, 906–913 (2007).

    CAS  PubMed  Google Scholar 

  14. 14.

    Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. 15.

    Bycroft, C. et al. The UK biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. 16.

    Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).

    PubMed  PubMed Central  Google Scholar 

  17. 17.

    Cardon, L. R. & Palmer, L. J. Population stratification and spurious allelic association. Lancet 361, 598–604 (2003).

    PubMed  Google Scholar 

  18. 18.

    Freedman, M. L. et al. Assessing the impact of population stratification on genetic association studies. Nat. Genet. 36, 388–393 (2004).

    CAS  PubMed  Google Scholar 

  19. 19.

    Voight, B. F. & Pritchard, J. K. Confounding from cryptic relatedness in case-control association studies. PLoS Genet. 1, e32 (2005).

    PubMed  PubMed Central  Google Scholar 

  20. 20.

    Astle, W. & Balding, D. J. Population structure and cryptic relatedness in genetic association studies. Statist. Sci. 24, 451–471 (2009).

    Google Scholar 

  21. 21.

    Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. 22.

    Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. 23.

    Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet 38, 203–208 (2006).

    CAS  PubMed  Google Scholar 

  24. 24.

    Aulchenko, Y. S., de Koning, D. J. & Haley, C. Genomewide rapid association using mixed model and regression: a fast and simple method for genomewide pedigree-based quantitative trait loci association analysis. Genetics 177, 577–585 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Kang, H. M. et al. Efficient control of population structure in model organism association mapping. Genetics 178, 1709–1723 (2008).

    PubMed  PubMed Central  Google Scholar 

  26. 26.

    Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Zhang, Z. et al. Mixed linear model approach adapted for genome-wide association studies. Nat. Genet. 42, 355–360 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. 28.

    Lippert, C. et al. FaST linear mixed models for genome-wide association studies. Nat. Methods 8, 833 (2011).

    CAS  PubMed  Google Scholar 

  29. 29.

    Korte, A. et al. A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat. Genet. 44, 1066–1071 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. 30.

    Listgarten, J. et al. Improved linear mixed models for genome-wide association studies. Nat. Methods 9, 525–526 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. 31.

    Segura, V. et al. An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat. Genet. 44, 825–830 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Svishcheva, G. R., Axenovich, T. I., Belonogova, N. M., van Duijn, C. M. & Aulchenko, Y. S. Rapid variance components-based method for whole-genome association analysis. Nat. Genet. 44, 1166–1170 (2012).

    CAS  PubMed  Google Scholar 

  33. 33.

    Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  34. 34.

    Jakobsdottir, J. & McPeek, M. S. MASTOR: mixed-model association mapping of quantitative traits in samples with related individuals. Am. J. Hum. Genet. 92, 652–666 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. 35.

    Yang, J., Zaitlen, N. A., Goddard, M. E., Visscher, P. M. & Price, A. L. Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 46, 100–106 (2014).

    PubMed  PubMed Central  Google Scholar 

  36. 36.

    Loh, P. R. et al. Efficient bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. 37.

    Canela-Xandri, O., Law, A., Gray, A., Woolliams, J. A. & Tenesa, A. A new tool called DISSECT for analysing large genomic data sets using a big data approach. Nat. Commun. 6, 10162 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  38. 38.

    Loh, P. R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906–908 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. 39.

    Zhou, W. et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. 40.

    Eu-Ahsunthornwattana, J. et al. Comparison of methods to account for relatedness in genome-wide association studies with family-based data. PLoS. Genet. 10, e1004445 (2014).

    PubMed  PubMed Central  Google Scholar 

  41. 41.

    Zaitlen, N. et al. Using extended genealogy to estimate components of heritability for 23 quantitative and dichotomous traits. PLoS. Genet. 9, e1003520 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. 42.

    Patterson, H. D. & Thompson, R. Recovery of inter-block information when block sizes are unequal. Biometrika 58, 545–554 (1971).

    Google Scholar 

  43. 43.

    Chen, H. et al. Control for population structure and relatedness for binary traits in genetic association studies via logistic mixed models. Am. J. Hum. Genet. 98, 653–666 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. 44.

    Gilmour, A. R., Thompson, R. & Cullis, B. R. Average information REML: an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics 51, 1440–1450 (1995).

  45. 45.

    Yang, J. et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47, 1114–1120 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  46. 46.

    Ge, T., Chen, C.-Y., Neale, B. M., Sabuncu, M. R. & Smoller, J. W. Phenome-wide heritability analysis of the UK Biobank. PLoS Genet. 13, e1006711 (2017).

    PubMed  PubMed Central  Google Scholar 

  47. 47.

    Band, G. & Marchini, J. BGEN: a binary file format for imputed genotype and haplotype data. Preprint at bioRxiv https://doi.org/10.1101/308296 (2018).

  48. 48.

    Devlin, B., Roeder, K. & Wasserman, L. Genomic control, a new approach to genetic-based association studies. Theor Popul. Biol. 60, 155–166 (2001).

    CAS  PubMed  Google Scholar 

  49. 49.

    Verbeke, G. & Lesaffre, E. The effect of misspecifying the random-effects distribution in linear mixed models for longitudinal data. Comput. Stat. Data Anal. 23, 541–556 (1997).

    Google Scholar 

  50. 50.

    Zhou, X., Carbonetto, P. & Stephens, M. Polygenic modeling with bayesian sparse linear mixed models. PLoS Genet. 9, e1003264 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. 51.

    Wu, Y., Zheng, Z., Visscher, P. M. & Yang, J. Quantifying the mapping precision of genome-wide association studies using whole-genome sequencing data. Genome Biol. 18, 86 (2017).

    PubMed  PubMed Central  Google Scholar 

  52. 52.

    Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  53. 53.

    Canela-Xandri, O., Rawlik, K. & Tenesa, A. An atlas of genetic associations in UK Biobank. Nat. Genet. 50, 1593–1599 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. 54.

    Amin, N., Van Duijn, C. M. & Aulchenko, Y. S. A genomic background based method for association analysis in related individuals. PloS ONE 2, e1274 (2007).

    PubMed  PubMed Central  Google Scholar 

  55. 55.

    Galinsky, K. J. et al. Fast principal-component analysis reveals convergent evolution of ADH1B in europe and east asia. Am. J. Hum. Genet. 98, 456–472 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  56. 56.

    Abraham, G., Qiu, Y. & Inouye, M. FlashPCA2: principal component analysis of Biobank-scale genotype datasets. Bioinformatics 33, 2776–2778 (2017).

    CAS  PubMed  Google Scholar 

  57. 57.

    Loh, P. R. et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 47, 1385–1392 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. 58.

    Van Hout, C. V. et al. Whole exome sequencing and characterization of coding variation in 49,960 individuals in the UK Biobank. Preprint at bioRxiv https://doi.org/10.1101/572347 (2019).

  59. 59.

    Regier, A. A. et al. Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects. Nat. Commun. 9, 4038 (2018).

    PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank H. Wang and J. Sidorenko for assistance in data preparation, A. McRae for organizing computing resources, P.-R. Loh for constructive comments on the manuscript, L. Yengo for helpful discussion, the Neale Lab for making the data processing pipelines publicly available, and Alibaba Cloud Australia and New Zealand for hosting the online tool. This research was supported by the Australian Research Council (DP160101343, DP160101056, FT180100186, and FL180100072), the Australian National Health and Medical Research Council (1078037, 1078901, 1113400, and 1107258), and the Sylvia & Charles Viertel Charitable Foundation. This study makes use of data from the UK Biobank (project ID: 12514). A full list of acknowledgements relating to this data set can be found in the Supplementary Note.

Author information

Affiliations

Authors

Contributions

J.Y. conceived and supervised the study. J.Y., L.J., and Z.Z. designed the experiment. Z.Z. developed the software tools. L.J. and Z.Z. performed the simulations and data analyses under the assistance and guidance from J.Y., P.M.V., T.Q., N.R.W., and K.E.K. P.M.V., N.R.W., and J.Y. contributed resources and funding. L.J. and J.Y. wrote the manuscript with the participation of all authors. All authors reviewed and approved the final manuscript.

Corresponding author

Correspondence to Jian Yang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Comparison between fastGWA-REML and AI-REML.

The phenotypes were simulated based on real genotypes of 100,000 individuals from the UKB with Vg = 0.4 (see part 5 of the Supplementary Note for details of the simulation method and data). Plotted are the \(\hat \sigma _g^2\) values estimated by fastGWA-REML against those estimated by the AI-REML in GCTA. Each dot represents one simulation replicate (100 simulations in total). The Pearson’s correlation coefficient of \(\hat \sigma _g^2\) between the two methods is >0.9999. Source data

Extended Data Fig. 2 Comparison between the approximate and exact fastGWA tests.

We selected four quantitative traits from the UKB for comparison, including height (HT, nHT = 455,332), forced expiratory volume in 1-second (FEV, nFEV = 415,931), pulse rate (PR, nPR = 149,082), and educational attainment (EA, nEA = 304,998) (see Supplementary Table 4 for more information about the traits). Plotted are the estimated variant effects (a) or χ2-statistics (b) of 8,531,416 variants computed by the exact fastGWA method (fastGWA-Exact) against those by the fastGWA test using the GRAMMAR-GAMMA approximation (see part 2 of the Supplementary Note for details). The Pearson’s correlation coefficients of the estimated variant effect or χ2-statistic between the two methods are > 0.9999 for all the four traits. Source data

Extended Data Fig. 3 The first and second principal components (PC1 and PC2) of all of the UKB participants of European ancestry (n = 456,422) compared to their self-reported ethnicity.

The red dots represent those individuals who self-reported as ‘British’, the green dots represent those who self-reported as ‘Irish’, and the purple dots represent those who self-reported as ‘other-white background’. Source data

Extended Data Fig. 4 Comparison of \(\hat \sigma _g^2\) estimated by fastGWA-REML to that estimated by BOLT-REML (used in BOLT-LMM) at different degrees of relatedness in simulations.

The x-axis represents different degrees of relatedness with (0, 0) representing no common environmental effect, (1st, 0.1Vp) or (1st, 0.2Vp) representing common environmental effects explaining 10% or 20% of the phenotypic variance (Vp) among 1st degree relatives, (≥2nd, 0.1Vp) or (≥2nd, 0.2Vp) representing common environmental effects explaining 10% or 20% of Vp among all pairs of the 1st and 2nd degree relatives, and (≥2nd, Gradient) representing common environmental effects explaining 20% of Vp among the 1st degree relatives and 10% of Vp among the 2nd degree relatives. The y-axis represents the value of \(\hat \sigma _g^2\). The black dashed line represents the true simulation parameter (h2 = 0.4). Each boxplot represents the distribution of \(\hat \sigma _g^2\) across 100 simulation replicates. The line inside each box indicates the median value, notches indicate the 95% confidence interval of the median, the central box indicates the interquartile range (IQR), and whiskers indicate data up to 1.5 times the IQR. We also show the Haseman–Elston (HE) regression estimate of \(\sigma _g^2\) in the fastGWA model, with a gray bar to indicate its expected value computed using the approximation theory presented in part 9 of the Supplementary Note.

Extended Data Fig. 5 Comparison of false positive rate (FPR) among different association methods.

We used the simulated data as presented in Figs. 1 and 2 to compute the FPR of each association method across different simulation scenarios with different levels of common environmental effects. Each boxplot represents the distribution of FPR across 100 simulation replicates. The line inside each box indicates the median value, notches indicate the 95% confidence interval of the median, the central box indicates the interquartile range (IQR), whiskers indicate data up to 1.5 times the IQR and outliers are shown as separate dots. In each simulation replicate, the P value of each variant was calculated based on the reported effect estimate and s.e. using a \(\chi _{df = 1}^2\) test.

Extended Data Fig. 6 Genomic inflation and power of fastGWA with the sparse GRM thresholded at different genetic relatedness cut-off values.

This simulation was performed based on real genotypes from the UKB (see simulation settings in part 5 of the Supplementary Note). We constructed different sparse GRMs by setting off-diagonal elements below a certain threshold (varying from 0.03 to 0.10) to 0 and performed fastGWA analyses using these sparse GRMs. Each boxplot represents the distribution of estimates (that is, median λ, or mean χ2) across 100 simulation replicates. The line inside each box indicates the median value, notches indicate the 95% confidence interval of the median, the central box indicates the interquartile range (IQR), and whiskers indicate data up to 1.5 times the IQR.

Extended Data Fig. 7 Comparison of genomic inflation and power between fastGWA, fastGWA-LOCO, and fastGWA-Ped.

Shown are the results from the analyses of a simulated data set based on the simulation strategy described in part 5 of the Supplementary Note (with \(\sigma _g^2 = 0.4V_p\), \(\sigma _c^2 = 0.1V_p,\,or\,0.2V_p\) for all 1st and 2nd relatives and \(\sigma _c^2 = 0\) for all unrelated individuals). We did not observe any increase in power when applying the LOCO scheme to fastGWA because fastGWA estimates pedigree relatedness by a sparse GRM, to model phenotypic covariance between close relatives due to genetic and/or common environmental effects, and the pedigree relatedness estimated using all autosomes are similar to those using 21 chromosomes under the LOCO scheme. Each boxplot represents the distribution of estimates (that is, median λ, or mean χ2) across 100 simulation replicates. The line inside each box indicates the median value, notches indicate the 95% confidence interval of the median, the central box indicates the interquartile range (IQR), and whiskers indicate data up to 1.5 times the IQR.

Extended Data Fig. 8 Comparison of genomic inflation between BOLT-LMM (estimating the variance components only once using all variants) and BOLT-LMM_fine-tuning (re-estimating the variance components when a chromosome is left out).

The simulation setting was the same as the (0, 0) scenario in Fig. 1. The median λ was computed at the null variants. Each boxplot represents the distribution of median λ across 100 simulation replicates. The line inside each box indicates the median value, notches indicate the 95% confidence interval of the median, the central box indicates the interquartile range (IQR), and whiskers indicate data up to 1.5 times the IQR.

Extended Data Fig. 9 Genomic inflation of BOLT-LMM-Mix using LD score based on different LD window sizes and references.

a, Results from simulations based on the simulated genotype data (part 5 of the Supplementary Note) using the same setting as in the (0, 0) case in Fig. 1. The LD scores were computed from the sample using three window sizes; that is, 1 Mb (BOLT-LMM-Mix_wind-1Mb), 10 Mb (BOLT-LMM-Mix_wind-10Mb), and 20 Mb (BOLT-LMM-Mix_wind-20Mb). b, Results from simulations based on real genotypes (part 5 of the Supplementary Note) using the same settings as in the (0, 0) and (≥2nd, 0.1Vp) cases in Fig. 1. Two sets of LD score were tested; LD scores computed from the sample using a window size of 1 Mb (BOLT-LMM-Mix_UKB-LDsc) and LD scores obtained from the BOLT-LMM website (BOLT-LMM-Mix_provided-LDsc). Each boxplot represents the distribution of estimates (that is, median λ, or mean χ2) across 100 simulation replicates. The line inside each box indicates the median value, notches indicate the 95% confidence interval of the median, the central box indicates the interquartile range (IQR), and whiskers indicate data up to 1.5 times the IQR.

Extended Data Fig. 10 Comparison between the reported genetic relatedness and the SNP-derived genetic relatedness of the UKB participants.

The y-axis represents the SNP-derived genetic relatedness computed from GCTA using 565,631 common variants on HapMap3 (175,708 individual pairs with estimated genetic relatedness ≥ 0.05). The x-axis represents the expected genetic relatedness based on the pedigree information provided by the UKB (monozygotic twin, 1; parent-offspring/full sib, 0.5; second degree relatives, 0.25; third degree relatives, 0.125; and unlabelled pair, ‘none’) on x-axis. Each circle represents one pair of relatives, the dashed diagonal line represents y = x, and the red horizontal lines represent the mean value of each relatedness group.

Supplementary information

Supplementary Information

Supplementary Figures 1–10, Notes 1–11 and Tables 1–8

Reporting Summary

Source data

Source Data Extended Data Fig. 1

The statistical source data to generate Figure 1.

Source Data Extended Data Fig. 2

The statistical source data to generate Figure 2.

Source Data Extended Data Fig. 3

The statistical source data to generate Figure 3.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jiang, L., Zheng, Z., Qi, T. et al. A resource-efficient tool for mixed model association analysis of large-scale data. Nat Genet 51, 1749–1755 (2019). https://doi.org/10.1038/s41588-019-0530-8

Download citation

Further reading