Quality control and conduct of genome-wide association meta-analyses

Winkler, Thomas W; Day, Felix R; Croteau-Chonka, Damien C; Wood, Andrew R; Locke, Adam E; Mägi, Reedik; Ferreira, Teresa; Fall, Tove; Graff, Mariaelisa; Justice, Anne E; Luan, Jian'an; Gustafsson, Stefan; Randall, Joshua C; Vedantam, Sailaja; Workalemahu, Tsegaselassie; Kilpeläinen, Tuomas O; Scherag, André; Esko, Tonu; Kutalik, Zoltán; Heid, Iris M; Loos, Ruth J F

doi:10.1038/nprot.2014.071

Protocol
Published: 24 April 2014

Quality control and conduct of genome-wide association meta-analyses

Thomas W Winkler¹,
Felix R Day ORCID: orcid.org/0000-0003-3789-7651²,
Damien C Croteau-Chonka^3,4,
Andrew R Wood⁵,
Adam E Locke⁶,
Reedik Mägi⁷,
Teresa Ferreira⁸,
Tove Fall^9,10,
Mariaelisa Graff¹¹,
Anne E Justice¹¹,
Jian'an Luan²,
Stefan Gustafsson⁹,
Joshua C Randall¹²,
Sailaja Vedantam^13,14,15,
Tsegaselassie Workalemahu¹⁶,
Tuomas O Kilpeläinen¹⁷,
André Scherag^18,19,
Tonu Esko^7,13,14,15,
Zoltán Kutalik^20,21,22,
Iris M Heid¹^na1,
Ruth J F Loos^23,24,25^na1 &
The Genetic Investigation of Anthropometric Traits (GIANT) Consortium

Nature Protocols volume 9, pages 1192–1212 (2014)Cite this article

24k Accesses
270 Citations
15 Altmetric
Metrics details

Subjects

Abstract

Rigorous organization and quality control (QC) are necessary to facilitate successful genome-wide association meta-analyses (GWAMAs) of statistics aggregated across multiple genome-wide association studies. This protocol provides guidelines for (i) organizational aspects of GWAMAs, and for (ii) QC at the study file level, the meta-level across studies and the meta-analysis output level. Real-world examples highlight issues experienced and solutions developed by the GIANT Consortium that has conducted meta-analyses including data from 125 studies comprising more than 330,000 individuals. We provide a general protocol for conducting GWAMAs and carrying out QC to minimize errors and to guarantee maximum use of the data. We also include details for the use of a powerful and flexible software package called EasyQC. Precise timings will be greatly influenced by consortium size. For consortia of comparable size to the GIANT Consortium, this protocol takes a minimum of about 10 months to complete.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Workflow of the QC and the meta-analysis.**

**Figure 2: SE-N plots to reveal issues with trait transformations.**

**Figure 3: P-Z plot to reveal analytical issues with beta, standard error and P values.**

**Figure 4: Different patterns of allele frequencies in the EAF plot.**

**Figure 5: Lambda-N plot to reveal issues with population stratification.**

Genome-wide association studies

Article 26 August 2021

Emil Uffelmann, Qin Qin Huang, … Danielle Posthuma

A resource-efficient tool for mixed model association analysis of large-scale data

Article 25 November 2019

Longda Jiang, Zhili Zheng, … Jian Yang

Opportunities and challenges for the use of common controls in sequencing studies

Article 17 May 2022

Genevieve L. Wojcik, Jessica Murphy, … Audrey E. Hendricks

References

Hindorff, L.A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA 106, 9362–9367 (2009).
Article CAS Google Scholar
McCarthy, M.I. & Hirschhorn, J.N. Genome-wide association studies: past, present and future. Human Mol. Genet. 17, R100–R101 (2008).
Article CAS Google Scholar
Hirschhorn, J.N. & Gajdos, Z.K. Genome-wide association studies: results from the first few years and potential implications for clinical medicine. Annu. Rev. Med. 62, 11–24 (2011).
Article CAS Google Scholar
Visscher, P.M., Brown, M.A., McCarthy, M.I. & Yang, J. Five years of GWAS discovery. Am. J. Hum. Genet. 90, 7–24 (2012).
Article CAS Google Scholar
Anderson, C.A. et al. Data quality control in genetic case-control association studies. Nat. Protoc. 5, 1564–1573 (2010).
Article CAS Google Scholar
Randall, J.C. et al. Sex-stratified genome-wide association studies including 270,000 individuals show sexual dimorphism in genetic loci for anthropometric traits. PLoS Genet. 9, e1003500 (2013).
Article CAS Google Scholar
Surakka, I. et al. A genome-wide screen for interactions reveals a new locus on 4p15 modifying the effect of waist-to-hip ratio on total cholesterol. PLoS Genet. 7, e1002333 (2011).
Article CAS Google Scholar
Manning, A.K. et al. A genome-wide approach accounting for body mass index identifies genetic variants influencing fasting glycemic traits and insulin resistance. Nat. Genet. 44, 659–669 (2012).
Article CAS Google Scholar
Voight, B.F. et al. The metabochip, a custom genotyping array for genetic studies of metabolic, cardiovascular, and anthropometric traits. PLoS Genet. 8, e1002793 (2012).
Article CAS Google Scholar
Cortes, A. & Brown, M.A. Promise and pitfalls of the Immunochip. Arthritis Res. Ther. 13, 101 (2011).
Article Google Scholar
Huyghe, J.R. et al. Exome array analysis identifies new loci and low-frequency variants influencing insulin processing and secretion. Nat. Genet. 45, 197–201 (2013).
Article CAS Google Scholar
Teslovich, T.M. et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707–713 (2010).
Article CAS Google Scholar
Heid, I.M. et al. Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. Nat. Genet. 42, 949–960 (2010).
Article CAS Google Scholar
Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838 (2010).
Article CAS Google Scholar
Speliotes, E.K. et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat. Genet. 42, 937–948 (2010).
Article CAS Google Scholar
Scott, R.A. et al. Large-scale association analyses identify new loci influencing glycemic traits and provide insight into the underlying biological pathways. Nat. Genet. 44, 991–1005 (2012).
Article CAS Google Scholar
Schunkert, H. et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat. Genet. 43, 333–338 (2011).
Article CAS Google Scholar
Loos, R.J. et al. Common variants near MC4R are associated with fat mass, weight and risk of obesity. Nat. Genet. 40, 768–775 (2008).
Article CAS Google Scholar
Willer, C.J. et al. Six new loci associated with body mass index highlight a neuronal influence on body weight regulation. Nat. Genet. 41, 25–34 (2009).
Article CAS Google Scholar
Lindgren, C.M. et al. Genome-wide association scan meta-analysis identifies three loci influencing adiposity and fat distribution. PLoS Genet. 5, e1000508 (2009).
Article Google Scholar
Berndt, S.I. et al. Genome-wide meta-analysis identifies 11 new loci for anthropometric traits and provides insights into genetic architecture. Nat. Genet. 45, 501–512 (2013).
Article CAS Google Scholar
Cochran, W.G. The combination of estimates from different experiments. Biometrics 10, 101–129 (1954).
Article Google Scholar
Manning, A.K. et al. Meta-analysis of gene-environment interaction: joint estimation of SNP and SNP × environment regression coefficients. Genet. Epidemiol. 35, 11–18 (2011).
Article Google Scholar
de Bakker, P.I. et al. Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum. Mol. Genet. 17, R122–R128 (2008).
Article CAS Google Scholar
Fuchsberger, C., Taliun, D., Pramstaller, P.P., Pattaro, C. & CKDGen Consortium. GWAtoolbox: an R package for fast quality control and handling of genome-wide association studies meta-analysis data. Bioinformatics 28, 444–445 (2012).
Article CAS Google Scholar
Kottgen, A. et al. Genome-wide association analyses identify 18 new loci associated with serum urate concentrations. Nat. Genet. 45, 145–154 (2013).
Article Google Scholar
Kottgen, A. et al. New loci associated with kidney function and chronic kidney disease. Nat. Genet. 42, 376–384 (2010).
Article Google Scholar
Schizophrenia Psychiatric Genome-Wide Association Study Consortium. Genome-wide association study identifies five new schizophrenia loci. Nat. Genet. 43, 969–976 (2011).
Knoppers, B.M., Dove, E.S., Litton, J.E. & Nietfeld, J.J. Questioning the limits of genomic privacy. Am. J. Hum. Genet. 91, 577–578: author reply 579 (2012).
Article CAS Google Scholar
Gymrek, M., McGuire, A.L., Golan, D., Halperin, E. & Erlich, Y. Identifying personal genomes by surname inference. Science 339, 321–324 (2013).
Article CAS Google Scholar
Visscher, P.M. & Hill, W.G. The limits of individual identification from sample allele frequencies: theory and statistical analysis. PLoS Genet. 5, e1000628 (2009).
Article Google Scholar
International HapMap Consortium. et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).
Genomes Project Consortium. et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).
Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).
Article CAS Google Scholar
Yang, J. et al. Genomic inflation factors under polygenic inheritance. Eur. J. Hum. Genet. 19, 807–812 (2011).
Article Google Scholar
Willer, C.J., Li, Y. & Abecasis, G.R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
Article CAS Google Scholar
Higgins, J.P., Thompson, S.G., Deeks, J.J. & Altman, D.G. Measuring inconsistency in meta-analyses. BMJ 327, 557–560 (2003).
Article Google Scholar
DerSimonian, R. & Laird, N. Meta-analysis in clinical trials. Control. Clin. Trials 7, 177–188 (1986).
Article CAS Google Scholar
Whitlock, M.C. Combining probability from independent tests: the weighted Z-method is superior to Fisher's approach. J. Evol. Biol. 18, 1368–1373 (2005).
Article CAS Google Scholar
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing (2013).

Download references

Acknowledgements

This work was supported by grants from the German Federal Ministry of Education and Research (BMBF) (01ER1206 for I.M.H.); the Leenaards Foundation and the Swiss National Science Foundation (31003A-143914 for Z.K.); the US National Institutes of Health (DK078150, T32 HL007427 for D.C.C.-C.; R01DK075787 for T.E.); the UK Medical Research Council (MRC; U106179471, U106179472 for F.R.D.); the European Research Council (SZ-245 50371-GLUCOSEGENES-FP7-IDEAS-ERC for A.R.W.); the Targeted Financing from the Estonian Ministry of Science and Education (SF0180142s08 for T.E.); the Development Fund of the University of Tartu (SP1GVARENG for T.E.); the European Regional Development Fund to the Centre of Excellence in Genomics (EXCEGEN, 3.2.0304.11-0312 for T.E.); and FP7 (313010 for T.E.). We are also thankful for the GIANT Consortium and the many participating research groups that have allowed us to develop this protocol.

Author information

Iris M Heid and Ruth J F Loos: These authors jointly supervised this work.

Authors and Affiliations

Department of Genetic Epidemiology, Institute of Epidemiology and Preventive Medicine, University of Regensburg, Regensburg, Germany
Thomas W Winkler & Iris M Heid
Medical Research Council (MRC) Epidemiology Unit, Institute of Metabolic Science, Addenbrooke's Hospital, Cambridge, UK
Felix R Day & Jian'an Luan
Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, USA
Damien C Croteau-Chonka
Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA
Damien C Croteau-Chonka
Genetics of Complex Traits, University of Exeter Medical School, University of Exeter, Exeter, UK
Andrew R Wood
Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, USA
Adam E Locke
Estonian Genome Center, University of Tartu, Tartu, Estonia
Reedik Mägi & Tonu Esko
Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
Teresa Ferreira
Department of Medical Sciences, Molecular Epidemiology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
Tove Fall & Stefan Gustafsson
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
Tove Fall
Department of Epidemiology, University of North Carolina, Chapel Hill, North Carolina, USA
Mariaelisa Graff & Anne E Justice
Wellcome Trust Sanger Institute, Cambridge, UK
Joshua C Randall
Divisions of Endocrinology and Genetics and Center for Basic and Translational Obesity Research, Boston Children's Hospital, Boston, Massachusetts, USA
Sailaja Vedantam & Tonu Esko
Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts, USA
Sailaja Vedantam & Tonu Esko
Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
Sailaja Vedantam & Tonu Esko
Department of Nutrition, Harvard School of Public Health, Boston, Massachusetts, USA
Tsegaselassie Workalemahu
The Novo Nordisk Foundation Center for Basic Metabolic Research, Section of Metabolic Genetics, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
Tuomas O Kilpeläinen
Institute for Medical Informatics, Biometry and Epidemiology (IMIBE), University Hospital of Essen, University of Duisburg-Essen, Essen, Germany
André Scherag
Clinical Epidemiology, Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Jena, Germany
André Scherag
Department of Medical Genetics, University of Lausanne, Lausanne, Switzerland
Zoltán Kutalik
Institute of Social and Preventive Medicine (IUMSP), Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne, Switzerland
Zoltán Kutalik
Swiss Institute of Bioinformatics, Lausanne, Switzerland
Zoltán Kutalik
The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA
Ruth J F Loos
The Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, New York, USA
Ruth J F Loos
The Genetics of Obesity and Related Metabolic Traits Program, Icahn School of Medicine at Mount Sinai, New York, New York, USA
Ruth J F Loos

Authors

Thomas W Winkler
View author publications
You can also search for this author in PubMed Google Scholar
Felix R Day
View author publications
You can also search for this author in PubMed Google Scholar
Damien C Croteau-Chonka
View author publications
You can also search for this author in PubMed Google Scholar
Andrew R Wood
View author publications
You can also search for this author in PubMed Google Scholar
Adam E Locke
View author publications
You can also search for this author in PubMed Google Scholar
Reedik Mägi
View author publications
You can also search for this author in PubMed Google Scholar
Teresa Ferreira
View author publications
You can also search for this author in PubMed Google Scholar
Tove Fall
View author publications
You can also search for this author in PubMed Google Scholar
Mariaelisa Graff
View author publications
You can also search for this author in PubMed Google Scholar
Anne E Justice
View author publications
You can also search for this author in PubMed Google Scholar
Jian'an Luan
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Gustafsson
View author publications
You can also search for this author in PubMed Google Scholar
Joshua C Randall
View author publications
You can also search for this author in PubMed Google Scholar
Sailaja Vedantam
View author publications
You can also search for this author in PubMed Google Scholar
Tsegaselassie Workalemahu
View author publications
You can also search for this author in PubMed Google Scholar
Tuomas O Kilpeläinen
View author publications
You can also search for this author in PubMed Google Scholar
André Scherag
View author publications
You can also search for this author in PubMed Google Scholar
Tonu Esko
View author publications
You can also search for this author in PubMed Google Scholar
Zoltán Kutalik
View author publications
You can also search for this author in PubMed Google Scholar
Iris M Heid
View author publications
You can also search for this author in PubMed Google Scholar
Ruth J F Loos
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

The Genetic Investigation of Anthropometric Traits (GIANT) Consortium

Contributions

T.W.W., F.R.D., D.C.C.-C., A.R.W., A.E.L., R.M., T. Ferreira, T.O.K., A.S., T.E., Z.K., I.M.H. and R.J.F.L. comprised the writing group. T.W.W., F.R.D., D.C.C.-C., A.R.W., A.E.L., R.M., T. Ferreira, T.O.K., A.S., T.E. and Z.K. were involved in the pipeline and procedure development. T.W.W., F.R.D., D.C.C.-C., A.R.W., A.E.L., R.M., T. Ferreira, T. Fall, M.G., A.E.J., J.L., S.G., J.C.R., S.V., T.W., T.O.K., A.S., T.E. and Z.K. were the analysts contributing to the QC of the recent GIANT papers.

Corresponding authors

Correspondence to Iris M Heid or Ruth J F Loos.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

A full list of members is available in the Supplementary Note.

Integrated supplementary information

Supplementary Figure 1 Ftp-site directory structure.

The DATA_UPLOAD directory is used for the collection of raw study-specific results, i.e. used by the collaborators to upload their results. Once all or at least files from >80% of studies have been collected, the DATA_UPLOAD folder should be frozen. The folder should be protected from further changes, be renamed to DATA_UPLOAD_FREEZE and a new DATA_UPLOAD folder should be created to collect any additional results. The CLEANED_FILES directory should be used for collection of cleaned files that passed the file-level QC routines. The META_ANALYSIS directory should be used to upload meta-analysis results and contains sub-folders, one for each meta-analyst (folders ANALYST_1 and ANALYST_2) and one to collect and freeze the final meta-analysis results (FINAL_RESULT).

Supplementary Figure 2 Effect of the trait transformation issue.

On the example of the phenotype hip circumference with and without adjustment for BMI (HIP, HIPadjBMI) in the GIANT Metabochip studies (81,000 subjects), it can be seen that (a) the trait transformation issue only affected the trait adjusted for BMI (SE-N plots; magenta: uncleaned studies affected by the issue; green: cleaned studies) ,(b) the uncleaned data had deteriorated power for the BMI-adjusted trait (QQ plot of association P-values from the Meta-analysis for all SNPs; red: meta-analysis on uncleaned data; green: meta-analysis on cleaned data) and (c) the uncleaned data yielded estimates biased towards the null for the BMI-adjusted trait (estimates from the Meta-analysis on uncleaned data on Y-axis and from cleaned data on X-axis).

Supplementary Figure 3 EasyQC panel of P-Z plots.

Example EasyQC panel of plots to check whether reported P-Values (X-axis, on -log10 scale) match P-Values calculated from the Z-statistic using the reported beta estimates and standard errors (Y-axis, on –log10 scale) with one plot per file. Clearly, several files show deviations, which were due to deviating software specifications used by these studies, which were resolved with study analysts.

Supplementary Figure 4 EasyQC panel of EAF-plots.

Example panel of plots to check issues with allele frequencies. Each plot contrasts the allele frequency of the input file (y-axis) with the allele frequency of the reference (x-axis). In this case the meta-analyzed GIANT height results have been used as reference to compare it to study-specific GWA results for height. Several issues can immediately be detected, which should be solved with the study analysts.

Supplementary information

Supplementary Figure 1

Ftp-site directory structure. (PDF 219 kb)

Supplementary Figure 2

Effect of the trait transformation issue. (PDF 384 kb)

Supplementary Figure 3

EasyQC panel of P-Z plots. (PDF 772 kb)

Supplementary Figure 4

EasyQC panel of EAF-plots. (PDF 670 kb)

Supplementary Table 1

Description of EasyQC report variables (File-level QC). (PDF 218 kb)

Supplementary Table 2

Description of EasyQC report variables (Meta-level QC). (PDF 213 kb)

Supplementary Table 3

Description of EasyQC report variables (Meta-analysis QC). (PDF 439 kb)

Supplementary Methods

Creation of the SNP identifier reference panel. (PDF 402 kb)

Supplementary Manual

Exemplary GWA analysis plan. (PDF 574 kb)

Supplementary Note

Membership list of the GIANT Consortium. (PDF 711 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Winkler, T., Day, F., Croteau-Chonka, D. et al. Quality control and conduct of genome-wide association meta-analyses. Nat Protoc 9, 1192–1212 (2014). https://doi.org/10.1038/nprot.2014.071

Download citation

Published: 24 April 2014
Issue Date: May 2014
DOI: https://doi.org/10.1038/nprot.2014.071

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.