A saturated map of common genetic variants associated with human height

Common single-nucleotide polymorphisms (SNPs) are predicted to collectively explain 40–50% of phenotypic variation in human height, but identifying the specific variants and associated regions requires huge sample sizes1. Here, using data from a genome-wide association study of 5.4 million individuals of diverse ancestries, we show that 12,111 independent SNPs that are significantly associated with height account for nearly all of the common SNP-based heritability. These SNPs are clustered within 7,209 non-overlapping genomic segments with a mean size of around 90 kb, covering about 21% of the genome. The density of independent associations varies across the genome and the regions of increased density are enriched for biologically relevant genes. In out-of-sample estimation and prediction, the 12,111 SNPs (or all SNPs in the HapMap 3 panel2) account for 40% (45%) of phenotypic variance in populations of European ancestry but only around 10–20% (14–24%) in populations of other ancestries. Effect sizes, associated regions and gene prioritization are similar across ancestries, indicating that reduced prediction accuracy is likely to be explained by linkage disequilibrium and differences in allele frequency within associated regions. Finally, we show that the relevant biological pathways are detectable with smaller sample sizes than are needed to implicate causal genes and variants. Overall, this study provides a comprehensive map of specific genomic regions that contain the vast majority of common height-associated variants. Although this map is saturated for populations of European ancestry, further research is needed to achieve equivalent saturation in other ancestries.

funding from the European Community's Seventh Framework Programme (FP7/2007(FP7/ -2013 /grant agreement HEALTH-F4-2007-201413 by the European Commission under the programme "Quality of Life and Management of the Living Resources" of 5th Framework Programme (No. QLG2-CT-2002-01254) as well as FP7 project EUROHEADPAIN (nr 602633).

Estonian Biobank (EstBB)
The EstBB gratefully acknowledges the contributions of the participants. Data analyses were carried out in part in the High-Performance Computing Center of University of Tartu. We would like to thank participants and support staff of Estonian Biobank. K.L was supported by Estonian Research Council grants PUT 1371, EMBO Installation grant 3573, and The European Regional Development Fund. This study was funded by the European Union through the European Regional Development Fund (Project No. 2014-2020.4.01.15-0012 and Project No. 2014-2020, by the European Union through Horizon 2020 grant no. 810645 and by the Estonian Research Council grants PUT (PRG687, PRG1291). T.E was supported by Estonian Research Council grant PUT (PRG1291). A.M was supported by the European Union through the European Regional Development Fund for the development of CoEs (Project No. 2014-2020. K.P. was supported by the European Union through the European Regional Development Fund (Project No. 2014-2020. Recruitment and maintenance of the EstBB is supported by annual funding from the Estonian Ministry of Social Affairs.  (FP/2007(FP/ -2013 (ERC Grant Agreement no. 310644 MACULA). The EUGENDA samples were genotyped as part of the IAMDGC exome chip project supported by CIDR (contract number HHSN268201200008I) and funded by EY022310 (to Jonathan L. Haines, Case Western Reserve University, Cleveland) and 1 × 01HG006934-01 (to Gonçalo R. Abecasis, University of Michigan, Department of Biostatistics).

Exeter 10,000 Study (EXTEND)
EXTEND is supported by the National Institute for Health Research Exeter Clinical Research Facility.

Ewha Womans University Hospital PCOS Study (pcosN)
The pcosN gratefully acknowledges the contributions of the participants and of the study staff. YSC acknowledges support from the National Research Foundation of Korea (NRF) Grant (2020R1I1A2075302).

Family Heart Study (FamHS)
FamHS acknowledges the contributions of the participants and of the study staff. FamHS was funded by NIDDK R01-DK-089256 and NHLBI R01HL117078.

The Fenland Study (Fenland)
We are grateful to all the volunteers and to the General Practitioners and practice staff for assistance with recruitment. We thank the Fenland Study Investigators, Fenland Study Co-ordination team and the Epidemiology Field, Data and Laboratory teams. The Fenland Study (10.22025/2017.10.101.00001)is funded by the Medical Research Council (MC_UU_12015/1). We further acknowledge support for genotyping from the Medical Research Council (MC_PC_13046).

Finland-United States Investigation of NIDDM Genetics (FUSION)
We would like to thank the Finnish volunteers who generously participated in the FUSION study. Support for FUSION was provided by NIH grants R01-DK062370 (M.B.), R01-DK072193 (K.L.M.), and intramural project number 1Z01-HG000024 (F.S.C.). Genome-wide genotyping was conducted by the Johns Hopkins University Genetic Resources Core Facility SNP Center at the Center for Inherited Disease Research (CIDR), with support from CIDR NIH contract no. N01-HG-65403.

The Finnish Cardiovascular Study (FINCAVAS)
The authors thank the staff of the Department of Clinical Physiology for collecting the exercise test data. The Finnish Cardiovascular Study (FINCAVAS) has been financially supported by the Competitive Research Funding of the Tampere University Hospital (Grant 9M048 and 9N035), the Finnish Cultural Foundation, the Finnish Foundation for Cardiovascular Research, the Emil Aaltonen Foundation, Finland, the Tampere Tuberculosis Foundation, EU Horizon 2020 (grant 755320 for TAXINOMISIS and grant 848146 for To Aition), and the Academy of Finland grant 322098.

Finnish Twin Cohort Study (FTC)
We thank the twins for active participations and the staff of the study for their hard work. Phenotype and genotype data collection in the twin cohort has been supported by the Wellcome Trust Sanger Institute, the Broad Institute, ENGAGE -European Network for Genetic and Genomic Epidemiology, FP7-HEALTH-F4-2007, grant agreement number 201413, National Institute of Alcohol Abuse and Alcoholism (grants AA-12502, AA-00145, and AA-09203 to R J Rose and AA15416 and K02AA018755 to D M Dick) and the Academy of Finland (grants 100499, 205585, 118555, 141054, 264146, 308248, and 312073 to JKaprio). JKaprio acknowledges support by the Academy of Finland (grants 265240, 263278).

FINRISK
The FINRISK surveys have been mainly funded by budgetary funds from THL. Additional funding has been obtained from the Academy of Finland and various domestic foundations.

Framingham Heart Study
This research was conducted in part using data and resources from the Framingham Heart Study of the National Heart Lung and Blood Institute of the National Institutes of Health and Boston University School of Medicine. The analyses reflect intellectual input and resource development from the Framingham Heart Study investigators participating in the SNP Health Association Resource (SHARe) project. This work was partially supported by the National Heart, Lung and Blood Institute's Framingham Heart Study (Contract Nos. N01-HC-25195 and HHSN268201500001I) and its contract with Affymetrix, Inc for genotyping services . A portion of this research utilized the Linux Cluster for Genetic Analysis (LinGA-II) funded by the Robert Dawson Evans Endowment of the Department of Medicine at Boston University School of Medicine and Boston Medical Center. This research was partially supported by grant R01-DK122503 from the National Institute of Diabetes and Digestive and Kidney.

GENAF
The GENAF study is funded by the Norwegian Research Council with a Mobility Grant (240149) and Young Research Talent grant (287086); the South-Eastern Health Authorities with a PhD-grant (2019122); Vestre Viken Hospital Trust with a PhD-grant; afib.no -the Norwegian Atrial Fibrillation Research Network; "Indremedisinsk Forskningsfond" at Baerum Hospital.

Gene-Lifestyle Interactions and Complex Traits Involved in Elevated Disease Risk V2 (GLACIERV2)
We are grateful to the study participants, health professionals, investigators, data managers and support staff who have contributed to GLACIERV2 and to the Northern Sweden Health and Disease Study. Individual investigator effort was funded by Swedish Research Council, Novo Nordisk Foundation, Swedish Heart Lung Foundation, and European Research Council (CoG-2015_681742_NASCENT).

Generation Scotland (GS)
(contract number HHSN268200782096C). Assistance with data cleaning was provided by the GENEVA Coordinating Center (U01 HG 004446; PI Bruce S Weir). Study recruitment and assembly of datasets were supported by a Cooperative Agreement with the Division of Adult and Community Health, Centers for Disease Control and Prevention, and by grants from the National Institute of Neurological Disorders and Stroke (NINDS) and the NIH Office of Research on Women's Health (R01 NS45012, U01 NS069208-01). Dr. Cole was partially supported by an American Heart Association (AHA)-Bayer Discovery Grant (Grant 17IBDG33700328), the AHA Cardiovascular Genome-Phenome Study (Grant-15GPSPG23770000), NIH (Grants: R01-NS114045; R01-NS100178; R01-NS105150), and the US Department of Veterans Affairs.

Genetics of Lipid Lowering Drugs and Diet Network (GOLDN)
GOLDN acknowledges the contributions of the participants and of the study staff. GOLDN was funded by the NHLBI grants R01 HL09135701, R01 HL091357 and R01 HL104135.

German Chronic Kidney Disease Study (GCKD)
We are grateful for the willingness of the patients to participate in the GCKD study. The enormous effort of the study personnel of the various regional centers is highly appreciated. We thank the large number of nephrologists who provide routine care for the patients and collaborate with the GCKD study. The GCKD study was/is funded by grants from the Federal Ministry of Education and Research (BMBF, grant number 01ER0804, K.U.E.) and the KfH Foundation for Preventive Medicine. Genotyping was supported by Bayer Pharma AG. The work of A.K. was supported by the German Research Foundation (DFG) -Project-ID 431984000 -SFB 1453, and DFG grant KO 3598/5-1. The work of M.W. was supported by the German Research Foundation (DFG) -Project-ID 431984000 -SFB 1453.

GerMiFS
The GerMiFS gratefully acknowledges the contributions of the participants and of the study staff.

GGAF
The GGAF is supported by funding to the 5 sources that form GGAF. The AF RISK study is supported by the Netherlands Heart Foundation (grant NHS2010B233), and the Center for Translational Molecular Medicine. Both the Young-AF and Biomarker-AF studies are supported by funding from the University Medical Center Groningen. The GIPS-III trial was supported by grant 95103007 from ZonMw, the Netherlands Organization for Health Research and Development. The PREVEND study is supported by the Dutch Kidney Foundation (grant E0.13) and the Netherlands Heart Foundation (grant NHS2010B280). Prof.Dr. Rienstra acknowledges support from the Netherlands Cardiovascular Research Initiative: an initiative supported by the Netherlands Heart Foundation, CVON 2014-9: "Reappraisal of Atrial Fibrillation: interaction between hyperCoagulability, Electrical remodelling, and Vascular destabilization in the progression of AF (RACE V)".

GoDARTS
We are grateful to all the participants in this study, the general practitioners, the Scottish School of Primary Care for their help in recruiting the participants, and to the whole team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists, and nurses. The study complies with the Declaration of Helsinki. We acknowledge the support of the Health Informatics Centre, University of Dundee for managing and supplying the anonymised data and NHS Tayside, the original data owner. M.I.M. was a Wellcome Investigator and NIHR Senior Investigator. This work was supported by: NIDDK (U01-DK105535) and Wellcome (090532,098381,106130,203141,212259). The Wellcome Trust United Kingdom Type 2 Diabetes Case Control Collection (GoDARTS) was funded by The Wellcome Trust (072960/Z/03/Z, 084726/Z/08/Z, 084727/Z/08/Z, 085475/Z/08/Z, 085475/B/08/Z). GoDARTS was funded by the Wellcome Trust (084727/Z/08/Z, 085475/Z/08/Z, 085475/B/08/Z) and as part of the EU IMI-SUMMIT program.

GRAPHIC (Genetic Regulation of Arterial Pressure In humans in the Community)
C.P.N is funded by the BHF (SP/16/4/32697). C.P.N., P.S.B. and N.J.S. are supported by the National Institute for Health Research (NIHR) Leicester Cardiovascular Biomedical Research Centre (BRC-1215-20010).

Hunter Community Study (HCS)
The authors would like to thank the men and women participating in the HCS as well as all the staff, investigators and collaborators who have supported or been involved in the project to date. The University of Newcastle provided $300 000 from its Strategic Initiatives Fund, and $600 000 from the Gladys M Brawn Senior Research Fellowship scheme; Vincent Fairfax Family Foundation, a private philanthropic trust, provided $195 000; The Hunter Medical Research Institute provided media support during the initial recruitment of participants.

HyperGen-Axiom (Hypertension Genetic Epidemiology Network Axiom chip data)
The HyperGen-Axiom was funded by NIH grant R01HL086718. XZ was funded by NHGRI HG011052.

Indian Diabetes Consortium (INDICO)
The INDICO gratefully acknowledges the contributions of the participants and the study staff. INDICO was supported by Council of Scientific and Industrial Research (CSIR), Government of India through Centre for Cardiovascular and Metabolic Disease Research (CARDIOMED) project (Grant no. BSC0122); also partially funded by Department of Science and Technology, Government of India through PURSE II CDST/SR/PURSE PHASE II/11 provide to Jawaharlal Nehru University, New Delhi, INDIA.

Indian Diabetes Prevention Study 3 (IDPP3)
The IDPP3 gratefully acknowledges the participants of the volunteers in the study and the field staff who followed the volunteers on regular basis. Investigator support from the Council of Scientific and Industrial Research, Ministry of Science and Technology, Govt. of India, New delhi, India.

INGI-FVG (INGI-Friuli Venezia Giulia)
The authors gratefully acknowledge the subjects from the INGI-FVG cohort. This study was funded by 5 per mille 2015 senses, Genetics of senses and related diseases, CUP: C92F17003560001, to PG.

INSPIRE_AF
Dr Cutler is supported by funding from the Dell Loy Hansen Heart Foundation.

Inter99
The Inter99 was initiated by Torben Jørgensen (PI), Knut Borch-Johnsen (co-PI), Hans Ibsen and Troels F. Thomsen. The steering committee comprises the former two and Charlotta Pisinger. The study was financially supported by research grants from the Danish Research Council, the Danish Centre for Health Technology Assessment, Novo Nordisk Inc., Research Foundation of Copenhagen County, Ministry of Internal Affairs and Health, the Danish Heart Foundation, the Danish Pharmaceutical Association, the Augustinus Foundation, the Ib Henriksen Foundation, the Becket Foundation, and the Danish Diabetes Association. Novo Nordisk Foundation Center for Basic Metabolic Research is an independent Research Center, based at the University of Copenhagen, Denmark and partially funded by an unconditional donation from the Novo Nordisk Foundation (www.cbmr.ku.dk) (Grant number NNF18CC0034900).
iPSYCH Analyses of the iPSYCH cohort was conducted on the GenomeDenmark and Computerome High Performance Computing facilities. iPSYCH was supported by The Lundbeck Foundation, the Stanley Medical Research Institute, the Aarhus and Copenhagen universities and university hospitals; the research was conducted using the Danish National Biobank resource, supported by the Novo Nordisk Foundation.

Jackson Heart Study (JHS)
The Jackson Heart Study (JHS) is supported and conducted in collaboration with Jackson State University (HHSN268201800013I), Tougaloo College (HHSN268201800014I), the Mississippi State Department of Health (HHSN268201800015I) and the University of Mississippi Medical Center (HHSN268201800010I, HHSN268201800011I and HHSN268201800012I) contracts from the National Heart, Lung, and Blood Institute (NHLBI) and the National Institute on Minority Health and Health Disparities (NIMHD). The authors also wish to thank the staffs and participants of the JHS. The views expressed in this manuscript are those of the authors and do not necessarily represent the views of the National Heart, Lung, and Blood Institute; the National Institutes of Health; or the U.S. Department of Health and Human Services. The project described was supported by the National Center for Advancing Translational Sciences, National Institutes of Health, through Grant KL2TR002490 (LMR). LMR was also supported by T32HL129982.

Japan PGx Data Science Consortium (JPDSC)
The authors thank the Japan PGx Data Science Consortium (JPDSC) for kindly providing genotype and phenotype data. The JPDSC was comprised of six pharmaceutical companies in Japan, namely Astellas Pharma, Inc.; Daiichi Sankyo Co., Ltd.; Mitsubishi Tanabe Pharma Corporation; Otsuka Pharmaceutical Co., Ltd.; Taisho Pharmaceutical Co., Ltd.; and Takeda Pharmaceutical Co., Ltd.

JHU_AF
Dr. Nazarian is supported by grants from the US NIH/NHLBI, as well as Biosense Webster, ImriCor, and ADAS software.

Johnston County Osteoarthritis Project (JoCoOA)
The investigators wish to thank the staff and participants in the Johnston County Osteoarthritis Project, without whom this work would not be possible. The JoCoOA is supported in part by S043, S1734, & S3486 from the CDC/Association of Schools of Public Health; 5-P60-AR30701 & 5-P60-AR49465 from NIAMS/NIH, and U01 DP003206, U01 DP006266 from the CDC.

Justification for the Use of Statins in Primary Prevention: an Intervention Trial Evaluating Rosuvastatin (JUPITER)
The JUPITER trial (PI: Ridker) and its substudy for genetics (Pis: Chasman and Ridker) were funded by Astra-Zeneca.

Kangbuk Samsung Cohort Study (KSCS)
The KSCS gratefully acknowledges the contributions of the participants and of the study staff. We are thankful for the computing resources provided by the Global Science experimental Data hub Center (GSDC) Project and the Korea Research Environment Open NETwork (KREONET) of the Korea Institute of Science and Technology Information (KISTI). HNK acknowledges support from the National Research Foundation of Korea (NRF) Grant (NRF-2020R1A2C1012931) and the Medical Research Funds from Kangbuk Samsung Hospital.

Kita-Nagoya Genome Epidemiology Study (CAGE-KING)
CAGE-KING was supported in part by Grants-in-Aid from MEXT (nos. 24390169, 16H05250, 15K19242, 16H06277, 19K19434, 20K10514, 21H03206) as well as by a grant from the Funding Program for Next-Generation World-Leading Researchers (NEXT Program, no. LS056).

Korea Association REsource (KARE)
This study was conducted with bioresources from National Biobank of Korea, the Korea Disease Control and Prevention Agency, Republic of Korea. The Korean Association Resource (KARE) was supported by grants from National Institute of Health, Republic of  and intramural grants from the Korea National Institute of Health (2019-NG-053-02).

Korea National Diabetes Program (KNDP)
The KNDP gratefully acknowledges the contributions of the participants and of the study staff. This study was supported by a grant from the Korea Healthcare Technology R&D Project, Ministry of Health and Welfare, Republic of Korea (A102065).

Kuwait Obesity and Diabetes Genetics Programme (KODGP )
The KODGP gratefully acknowledges the contributions of the participants and of the study staff. The KODGP was supported by institutional funding by Kuwait Foundation for Advancements of Sciences.

LIFE-Adult
We thank all participants of the LIFE-Adult study for spending their time and blood samples. LIFE-Adult genotyping was performed at the Cologne Center for Genomics (CCG, University of Cologne, Peter Nuernberg and Mohammad R. Toliat). For genotype imputation, compute infrastructure provided by ScaDS (Dresden/Leipzig Competence Center for Scalable Data Services and Solutions) at the Leipzig University Computing Centre was used. LIFE-Adult is funded by the Leipzig Research Center for Civilization Diseases (LIFE). LIFE is an organizational unit affiliated to the Medical Faculty of the University of Leipzig. LIFE is funded by means of the European Union, by the European Regional Development Fund (ERDF) and by funds of the Free State of Saxony within the framework of the excellence initiative.

Lifelines Cohort Study
The authors wish to acknowledge the services of the Lifelines Cohort Study, the contributing research centers delivering data to Lifelines, and all the study participants. The Lifelines Biobank initiative has been made possible by funding from the Dutch Ministry of Health, Welfare and Sport, the Dutch Ministry of Economic Affairs, the University Medical Center Groningen (UMCG the Netherlands), University of Groningen and the Northern Provinces of the Netherlands. The generation and management of GWAS genotype data for the Lifelines Cohort Study is supported by the UMCG Genetics Lifelines Initiative (UGLI). UGLI is partly supported by a Spinoza Grant from NWO, awarded to Cisca Wijmenga.

Living Biobank
The Living Biobank was supported by grants from the Ministry of Health, Singapore, the National University of Singapore and the National University Health System, Singapore. In addition, genotyping for Living Biobank was funded by the Agency for Science, Technology and Research, Singapore, and Merck Sharp & Dohme Corp., Whitehouse Station, NJ, USA.

London Life Sciences Prospective Population Study (LOLIPOP)
The LOLIPOP study is supported by the National Institute for Health Research (NIHR) Comprehensive Biomedical Research Centre Imperial College Healthcare NHS Trust. We acknowledge support of the MRC-PHE Centre for Environment and Health, and the NIHR Health Protection Research Unit on Health Impact of Environmental Hazards. The work was carried out in part at the NIHR/Wellcome Trust Imperial Clinical Research Facility. The views expressed are those of the author(s) and not necessarily those of the Imperial College Healthcare NHS Trust, the NHS, the NIHR or the Department of Health. We thank the participants and research staff who made the study possible. The LOLIPOP was funded by the British Heart Foundation (SP/04/002), the Medical Research Council (G0601966, G0700931), the Wellcome Trust (084723/Z/08/Z, 090532 & 098381) the NIHR (RP-PG-0407-10371), the NIHR Official Development Assistance (ODA, award 16/136/68), the European Union FP7 (EpiMigrant, 279143) and H2020 programs (iHealth-T2D, 643774). JC is supported by the Singapore Ministry of Health's National Medical Research Council under its Singapore Translational Research Investigator (STaR) Award (NMRC/STaR/0028/2017).

Long Life Family Study (LLFS)
LLFS acknowledges the contributions of the participants and of the study staff. LLFS was funded by NIA grants U01AG023746, U01AG023712, U01AG023749, U01AG023755, U01AG023744, and U19AG063893.

LURIC
We thank the LURIC study team who were either temporarily or permanently involved in patient recruitment as well as sample and data handling, in addition to the laboratory staff at the Ludwigshafen General Hospital and the Universities of Freiburg, Ulm, and Heidelberg, Germany. LURIC was supported by the 7th Framework Program RiskyCAD (grant agreement number 305739) of the European Union and the H2020 Program TO_AITION (grant agreement number 848146) of the European Union. The work of G.E.D. is supported by the European Union's Horizon 2020 research and innovation programme under the ERA-Net Cofund action N° 727565 (OCTOPUS project) and the German Ministry of Education and Research (grant number 01EA1801A).

Metabolic Syndrome in Men (METSIM)
The METSIM study was funded by the Academy of Finland (grant no.77299 and 124243)

Mexican hypertriglyceridemia (MexTG)
The MexTG cohort gratefully acknowledges the contributions of the participants of the study staff. The MexTG cohort was funded by NIH grant R01 HL095056.

Mexican-American Coronary Artery Disease (MACAD) Study
This research was supported in part by the Mexican-American Coronary Artery Disease (MACAD) National Heart, Lung, and Blood Institute, contracts R01-HL088457, R01-HL-60030.

Mexico City 1 and Mexico City 2 (MC1 & MC2)
MC1 & MC2 gratefully acknowledge the contributions of the participants. We thank Miguel Alexander Vazquez Moreno, Daniel Locia and Araceli Méndez Padrón for technical support in Mexico. The Mexico City 1 and Mexico City 2 studies were supported in Mexico by the Fondo Sectorial de Investigación en Salud y Seguridad Social (SSA/IMSS/ISSSTECONACYT, project 150352), Temas Prioritarios de Salud Instituto Mexicano del Seguro Social (2014-FIS/IMSS/PROT/PRIO/14/34), and the Fundación IMSS. In Canada, this research was enabled in part by two CIHR Operating grants to EJP, a CIHR New Investigator Award to EJP and by support provided by Compute Ontario (www.computeontario.ca), and Compute Canada (www.compute.canada.ca).

MGH Cardiology and Metabolic Patient Cohort (CAMP)
The MGH Cardiology and Metabolic Patient Cohort is comprised of 3850 subjects recruited from the ambulatory MGH Cardiology Practice between 2009 and 2012. Dr. Lubitz is supported by NIH grants 1R01HL139731 and R01HL157635, and American Heart Association 18SFRN34250007. Dr. Ellinor is supported by the Fondation Leducq (14CVD01), the NIH (1RO1HL092577, R01HL157635, K24HL105780) and the American Heart Association (18SFRN34110082).

MGH Stroke
Dr. Lubitz is supported by NIH grants 1R01HL139731 and R01HL157635, and American Heart Association 18SFRN34250007.

Minnesota Center for Twin and Family Research (MCTFR)
Work conducted within the MCTFR was supported by grants from the National Institutes of Health DA044283, DA042755, DA037904, AA009367, DA005147, and DA036216.

Montreal Heart Institute Biobank (MHIBB)
We thank all participants and staff of the André and France Desmarais MHI Biobank.

MrOS Gothenburg
MrOS in Sweden is supported by the Swedish Research Council, the Swedish Foundation for Strategic Research, the ALF/LUA research grant in Gothenburg, the Lundberg Foundation, the Knut and Alice Wallenberg Foundation, the Torsten Soderberg Foundation, and the Novo Nordisk Foundation.

Multiethnic cohort -African American Breast Cancer (MEC-AABC); Multiethnic cohort -African American Prostate Cancer (MEC-AAPC); Multiethnic cohort -Latina American Breast Cancer (MEC-LABC); Multiethnic cohort -Latino American Prostate Cancer (MEC-LAPC)
The Multiethnic Cohort (MEC) is a population-based prospective cohort study including approximately 215,000 men and women from Hawaii and California. All participants were 45-75 years of age at baseline, and primarily of 5 ancestries: Japanese Americans, African Americans, European Americans, Hispanic/Latinos, and Native Hawaiians. (PMIDs: 10695593; 23449381) MEC was funded by the National Cancer Institute in 1993 to examine lifestyle risk factors and genetic susceptibility to cancer. All eligible cohort members completed baseline and follow-up questionnaires.

Myocardial Infarction Genetics Consortium (MIGEN)
Funding support was provided by grants 1K08HG010155 and 1U01HG011719 (to A.V.K.) from the National Human Genome Research Institute

The Nagahama study
We are extremely grateful to the Nagahama City Office and the nonprofit organization Zeroji Club for their help in performing the Nagahama study. The work was supported by a university grant, the Center of Innovation Program, the Global University Project from the Ministry of Education, Culture, Sports, Science and Technology of Japan; the Practical Research Project for Rare/Intractable Diseases (ek0109070, ek0109283, ek0109196, ek0109348), and the Program for an Integrated Database of Clinical and Genomic Information (kk0205008), from the Japan Agency for Medical Research and Development (AMED), Takeda Medical Research Foundation.

National Survey of Health and Development (NSHD)
We thank the NSHD participants for their continued support and lifelong contribution to the study. The NSHD is funded by the UKRI Medical Research Council [MC_UU_00019/1].

NESCOG
NESCOG research was part of Science Live, the innovative research program of science center NEMO that enables scientists to carry out their research using NEMO visitors as volunteers. The Netherlands Organization for Scientific Research (NWO) Division for the Social Sciences (MaGW) provided funding for this research throughVIDI 016-065-318 to D.P.

Netherlands Epidemiology of Obesity Study (NEO)
We greatly appreciate all participants of the Netherlands Epidemiology of Obesity study, and all participating general practitioners for inviting eligible individuals. We furthermore thank P.R. van Beelen and all research nurses for collecting the data, P.J. Noordijk and her team for sample handling and storage and I. de Jonge, MSc for all data management of the NEO study. The NEO study is supported by the participating Departments, the Division and the Board of Directors of the Leiden University Medical Centre, and by the Leiden University, Research Profile Area 'Vascular and Regenerative Medicine.

The Netherlands Study of Depression and Anxiety (NESDA)
For NESDA, funding was obtained from the Netherlands Organization for Scientific Research (Geestkracht program grant 10-000-1002); the Center for Medical Systems Biology (CSMB, NWO opinions presented are solely those of the authors and do not necessarily represent those of the NICOLA Study team. Support for the study came from The Atlantic Philanthropies, the Economic and Social Research Council, the UKCRC Centre of Excellence for Public Health Northern Ireland, the Centre for Ageing Research and Development in Ireland, the Office of the First Minister and Deputy First Minister, the Health and Social Care Research and Development Division of the Public Health Agency, the Wellcome Trust/Wolfson Foundation and Queen's University Belfast provide core financial support for NICOLA. The analysis of molecular biomarkers for NICOLA's Wave 1 was funded by the Economic and Social Research Council, award reference ES/L008459/1. Generic analysis of data was supported by ESRC (ES/L008459/1) and the Science Foundation Ireland-Department for the Economy (SFI-DfE) Investigator Program Partnership Award (15/IA/3152). LJS is supported by an award from NI HSC R&D division STL/5569/19; UKRI (Medical Research Council) MC_PC_20026.

Nurses' Health Study (NHS)
The authors would like to thank the participants and staff of the Nurses Health Study for their valuable contributions. The Nurses' Health Study was supported by NIH grants UM1 CA186107, P01 CA87969 and R01 CA49449.

Nurses' Health Study II (NHS II)
The authors would like to thank the participants and staff of the Nurses' Health Study II for their valuable contributions. The Nurses' Health Study II was supported by NIH Grants U01 CA176726 and R01 CA67262.

Ogliastra Genetic Park (OGP)
The OGP expresses its gratitude to all the study participants for their contributions and to the municipal administrations for their economic and logistic support. The OGP study was supported by grant from the Italian Ministry of Education, University and Research (MIUR) n°: 5571/DSPAR/2002.

Orkney Complex Disease Study (ORCADES)
DNA extractions were performed at the Edinburgh Clinical Research Facility, University of Edinburgh. We would like to acknowledge the invaluable contributions of the research nurses in Orkney, the administrative team in Edinburgh and the people of Orkney. PRHJT acknowledges funding from the Medical Research Council Doctoral Training Programme in Precision Medicine (MR/N013166/1). The Orkney Complex Disease Study (ORCADES) was supported by the Chief Scientist Office of the Scottish Government (CZB/4/276, CZB/4/710), a Royal Society URF to J.F.W., the MRC Human Genetics Unit quinquennial programme "QTL in Health and Disease", Arthritis Research UK and the European Union framework program 6 EUROSPAN project (contract no. LSHG-CT-2006-018947).

Oxford Biobank
This research was funded by the National Institute for Health Research Oxford Biomedical Research Centre and the British Heart Foundation [RG/17/1/32663].

Penn Medicine Biobank (PMBB)
The PMBB gratefully acknowledges the contributions of the participants and of the study staff. The Penn Medicine BioBank is funded by a gift from the Smilow family, the National Center for Advancing Translational Sciences of the National Institutes of Health under CTSA Award Number UL1TR001878, and the Perelman School of Medicine at the University of Pennsylvania. SMD is supported by IK2-CX001780. This publication does not represent the views of the Department of Veterans Affairs or the United States Government.

Physicians Health Study (PHS)
We are grateful to the participants and staff of the Physicians' Health Study for their valuable contributions. The Physicians Health Study was supported by grant CA141298.

Precocious Coronary Artery Disease (PROCARDIS)
PROCARDIS was supported by the European Community Sixth Framework Program (LSHM-CT-2007-037273), AstraZeneca, the Swedish Research Council, the Knut and Alice Wallenberg Foundation, the Swedish Heart-Lung Foundation, the Torsten and Ragnar Soderberg Foundation, the Strategic Cardiovascular Program of Karolinska Institutet and Stockholm County Council, the Foundation for Strategic Research and the Stockholm County Council (560283). Wellcome Trust core award (090532/Z/09/Z, 203141/Z/16/Z, 201543/B/16/Z); HEALTH-F2-2013-601456 (CVGenes@Target), the TriPartite Immunometabolism Consortium [TrIC]-Novo Nordisk Foundation's Grant number NNF15CC0018486, VIAgenomics (SP/19/2/344612) and support from the NIHR Oxford Biomedical Research Centre. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health. HW is supported by the British Heart Foundation Centre for Research Excellence. Maria Sabater-Lleal is supported by a Miguel Servet contract from the ISCIII Spanish Health Institute (CP17/00142) and co-financed by the European Social Fund.

PROMIS
AVK was funded by NHGRI 1K08HG010155. GH was funded by the Swedish Research Council grant 2016-06830.

PROspective Study of Pravastatin in the Elderly at Risk for vascular disease (PROSPER)
The PROSPER study was supported by an investigator initiated grant obtained from Bristol-Myers Squibb. Prof. Dr. J. W. Jukema is an Established Clinical Investigator of the Netherlands Heart Foundation (grant 2001 D 032). Support for genotyping was provided by the seventh framework program of the European commission (grant 223004) and by the Netherlands Genomics Initiative (Netherlands Consortium for Healthy Aging grant 050-060-810). We would also like to thank the following for funding support:

FHCRC
The FHCRC studies were supported by grants R01-CA080122, R01-CA056678, R01-CA082664, and R01-CA092579, and K05-CA175147 from the US National Cancer Institute, National Institutes of Health, with additional support from the Fred Hutchinson Cancer Research Center (P30-CA015704). We thank all the individuals who participated in these studies.

IMPACT
The IMPACT study was funded by The Ronald and Rita McAulay Foundation, CR-UK Project grant (C5047/A1232), Cancer Australia, AICR Netherlands A10-0227, Cancer Australia and Cancer Council Tasmania, NIHR, EU Framework 6, Cancer Councils of Victorial and South Australia, Philanthropic donation to Northshore University Health System. We acknowledge support from the National Institute for Health Research (NIHR) to the Biomedical Research Centre at The Institute of Cancer Research and Royal Marsden Foundation NHS Trust. We acknowledge the IMPACT study steering committee, collaborating centres and participants.

Prostate cancer Genome-wide Association Study to Uncover Susceptibility loci (PEGASUS)
PEGASUS gratefully acknowledges contributions of the study participants. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services nor does mention of trade names, commercial products, or organization indicate endorsement by the U.S. Government. PEGASUS was supported by the Intramural Research Program of the Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH (ZIA CP010152-20).

Pune maternal Nutrition Study (PMNS)
The PMNS gratefully acknowledges the contributions of the participants and of the study staff. Investigator support from the Council of Scientific and Industrial Research, Ministry of Science and Technology, Govt. of India, New delhi, India.

Quebec Family Study (QFS)
Thanks are expressed to the participants in the Québec Family Study and the staff of the Physical Activity Sciences Laboratory at Université Laval for their contribution in this study. The Quebec Family Study was funded by multiple grants from the Medical Research Council of Canada and the Canadian Institutes of Health Research. This work was supported by a team grant from the Canadian Institutes of Health Research (FRN-CCT-83028).

Ragama Health Study (RHS)
The RHS was supported by a Grant from the National Center for Global Health and Medicine (NCGM).

The Raine Study
The authors are grateful to the Raine Study participants and their families, and to the Raine Study team for cohort coordination and data collection. The authors gratefully acknowledge the NHMRC for their long-term funding to the study over the last 30 years and also the following institutes for providing funding for Core Management of the

Relationship between Insulin Sensitivity and Cardiovascular disease Study (RISC)
The RISC Study was supported by European Union grant QLG1-CT-2001-01252 and AstraZeneca.

The Religious Orders Study and the Rush Memory and Aging Project batch 1 (ROSMAP1)
The ROSMAP are grateful to the participants of the studies, and the faculty and staff of the Rush Alzheimer's Disease Center. Funding from NIA grant P30AG10161, P30AG72975, R01AG17917, RF1AG15819, R01AG30146, U01AG46152, U01AG61256; Translational Genomics Research Institute.

Shanghai breast cancer study (SBCS)
The generation and management of GWAS genotype data for the SBCS was supported by R01CA64277 and R01CA15847.

Shanghai Women's Health Study(SWHS)
The generation and management of GWAS genotype data for the SWHS was supported by UM1CA182910 and R01CA148677.

Singapore Chinese Health Study -Coronary Artery Disease (SCHS CAD)
We thank Siew-Hong Low of the National University of Singapore for supervising the field work of the Singapore Chinese Health Study and the Ministry of Health in Singapore for assistance with the identification of AMI cases via database linkages. We also acknowledge the founding, longstanding principal investigator of the Singapore Chinese Health Study, Mimi C. Yu. The Singapore Chinese Health Study (SCHS) was supported by the U.S. National Institutes of Health (Grant Numbers R01CA144034 and UM1 CA182876), and by the Singapore National Medical Research Council(Grant Number 1270/2010). Genotyping of the SCHS CAD subset was funded by the HUJ-CREATE Programme of the National Research Foundation, Singapore (Project Number 370062002).

SORBS
We thank all those who participated in the study. Sincere thanks are given to Dr. Knut Krohn (University of Leipzig) for the genotyping support. This work was supported by grants from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation -Projektnummer 209933838 -SFB 1052; B03, C01; SPP 1629 TO 718/2-1).

Special Turku Coronary Risk Factor Intervention Project (STRIP) parents
We thank the study participants and their families as well as the research group who collected the data.

SR -Silk Road
The authors gratefully acknowledge the subjects from the SR cohort. This study is part of the scientific activities carried out within the scientific expedition Marcopolo 2010. We thank the Terramadre organization and the Terramadre communities who participated in the project. This study was funded by the Italian Ministry of Health-RC 01/21 to MPC and D70-RESRICGIROTTO to GG.

The Rare Variants for Hypertension in Taiwan Chinese (THRV)
The Rare Variants for Hypertension in Taiwan Chinese (THRV) is supported by the National Heart, Lung, and Blood Institute (NHLBI) grant (R01HL111249) and its participation in TOPMed is supported by an NHLBI supplement (R01HL111249-04S1). THRV is a collaborative study between Washington University in St. Louis, The Lundquist Institute at Harbor UCLA, University of Texas in Houston, Taichung Veterans General Hospital, Taipei Veterans General Hospital, Tri-Service General Hospital, National Health Research Institutes, National Taiwan University, and Baylor University. THRV is based (substantially) on the parent SAPPHIRe study, along with additional population-based and hospitalbased cohorts. SAPPHIRe was supported by NHLBI grants (U01HL54527, U01HL54498) and Taiwan funds, and the other cohorts were supported by Taiwan funds.

Tracking Adolescents' Individual Lives Survey -Population Cohort (TRAILS Pop) and Clinical Cohort (TRAILS CC)
We are grateful to all adolescents who participated in this research and to everyone who worked on this project and made it possible. TRAILS has been financially supported by grants from the Netherlands Research Infrastructure BBMRI-NL (CP 32), the participating universities, and Accare Centre for Child and Adolescent Psychiatry. Statistical analyses were carried out on the Genetic Cluster Computer (http://www.geneticcluster.org), which is financially supported by the Netherlands Scientific Organization (NWO 480-05-003) along with a supplement from the Dutch Brain Foundation.

Tromso-6 Migraine (TromsoMig)
We thank all participants that attended the Tromsø Study.

The Trøndelag Health Study (HUNT)
The Trøndelag Health Study (HUNT) is a collaboration between HUNT Research Centre (Faculty of Medicine and Health Sciences, NTNU, Norwegian University of Science and Technology), Trøndelag County Council, Central Norway Regional Health Authority, and the Norwegian Institute of Public Health. The genetic investigations of the HUNT Study is a collaboration between investigators from the HUNT study and University of Michigan Medical School and the University of Michigan School of Public Health. The K.G. Jebsen Center for Genetic Epidemiology is financed by Stiftelsen Kristian Gerhard Jebsen; Faculty of Medicine and Health Sciences, NTNU, Norwegian University of Science and Technology and Central Norway Regional Health Authority.

TwinGene
TwinGene is part of the Swedish Twin Registry, managed by Karolinska Institutet and receiving funding through the Swedish Research Council under the grant no 2017-00641.

TwinsUK (TUK)
We are grateful to the twins who took part in TwinsUK and the whole TwinsUK team, which includes academic researchers, clinical staff, laboratory technicians, administrative staff and research managers.

UK Biobank
We are grateful to UK Biobank participants. This research has been conducted using the UK Biobank Resource under project 12505. LY was funded by the Australian Research Council (DE200100425). PMV was funded by the Australian Research Council (FL180100072) and the Australian National Health and Medical Research Council (Grant #1113400). JY was funded by the Australian National Health and Medical Research Council (Grant #1113400) and Westlake Education Foundation. R.E.M. was supported by US National Institutes of Health (NIH) grant K25 HL150334. PRL was funded by US NIH grant DP2 ES030554, a Burroughs Wellcome Fund Career Award at the Scientific Interfaces, the Next Generation Fund at the Broad Institute of MIT and Harvard, and a Sloan Research Fellowship.

Understanding Society: The UK Household Longitudinal Study (UKHLS)
These data are from Understanding Society: The UK Household Longitudinal Study, which is led by the Institute for Social and Economic Research at the University of Essex and funded by the Economic and Social Research Council. The data were collected by NatCen and the genome wide scan data were analysed by the Wellcome Trust Sanger Institute. The Understanding Society DAC have an application system for genetics data and all use of the data should be approved by them. The application form is at: https://www.understandingsociety.ac.uk/about/health/data. The UK Household Longitudinal Study was funded by grants from the Economic & Social Research Council (ES/H029745/1) and the Wellcome Trust (WT098051).

VA Million Veteran Program (MVP)
The

Wake Forest School of Medicine Study (WFSM)
WFSM gratefully acknowledges the contributions of the participants and of the study staff. Genotyping services were provided by the Center for Inherited Disease Research (CIDR). CIDR is fully funded through a federal contract from the National Institutes of Health to The Johns Hopkins University, contract number HHSC268200782096C. This work was supported by National Institutes of Health grants R01 DK087914, R01 DK066358, R01 DK053591, U01 DK105556 and by the Wake Forest School of Medicine grant M01 RR07122 and Venture Fund.

Wenzhou Medical University Biobank -Tibetans (WMUB-T)
The WMUB-T gratefully acknowledges the contributions of the participants and of the study staff. Jian Y. is funded by the Westlake Education Foundation.

Whitehall II (WHII)
We thank all the participating civil service departments and their welfare, personnel, and establishment officers;

Wellcome Genetic (WELLGEN)
The WELLGEN is grateful and obliged to the study participants and other staff members who contributed to the study. Investigator support from the Council of Scientific and Industrial Research, Ministry of Science and Technology, Govt. of India, New delhi, India.

Women's Genome Health Study (WGHS)
The WGHS is supported by the National Heart, Lung, and Blood Institute (HL043851 and HL080467) and the National Cancer Institute (CA047988 and UM1CA182913) to Julie Buring and I-Min Lee, with funding for genotyping provided by Amgen. (Ridker, Chasman, PIs).

Women's Health Initiative (WHI)
The WHI program is funded by the National Heart, Lung, and Blood Institute, National Institutes of Health, U.S. Department of Health and Human Services through contracts 75N92021D00001, 75N92021D00002, 75N92021D00003, 75N92021D00004, 75N92021D00005.

Yonsei Avellino Corneal Dystrophy Study (ACD)
The ACD gratefully acknowledges the contributions of the participants and of the study staff. We are thankful for the computing resources provided by the Global Science experimental Data hub Center (GSDC) Project and the Korea Research Environment Open NETwork (KREONET) of the Korea Institute of Science and Technology Information (KISTI).

Young Finns Study (YFS)
We thank the teams that collected data at all measurement time points; the persons who participated as both children and adults in these longitudinal studies; and biostatisticians Irina Lisinen, Johanna Ikonen, Noora Kartiosuo, Ville Aalto, and Jarno Kankaanranta for data management and statistical advice. The Young Finns Study has been financially supported by the Academy  Fig. 1. Principal components analysis of contributing studies to the height meta-analysis alongside 26 genetic ancestry groups from the 1000 Genomes Project. Using data from 2,504 samples from the 1000 Genomes Project (1KGP), genotypes for 354,568 HapMap3 SNPs with frequency data from all participating studies were extracted. LD-pruning was subsequently performed using PLINK with a window size of 1Mb, a shift size of 50 variants, and an LD r 2 cut-off of 0.1 (PLINK command: --indep-pairwise 1000 50 0.1). After LD-pruning, 18,125 SNPs remained for subsequent analysis. Allele frequencies for the pruned set of variants were subsequently calculated within each of the 26 1KGP ancestral groups and aligned to the same reference allele. Principal components analysis was subsequently performed by using the 1KGP frequency data to build the model prior to projection of participating studies, having ensured study allele frequencies were also aligned to the same 1KGP reference allele.

Suppl. Fig. 2. Frequency and imputation accuracy distribution of HapMap 3 SNPs across ancestry groups. Panel a. Minor allele frequency (MAF) distribution of HapMap 3 SNPs across 5 ancestries: European (EUR), Hispanic (HIS), African (AFR), East-Asian (EAS) and South-Asian (SAS). Panel b. Average (across cohorts)
proportion (y-axis) of HapMap 3 SNPs with a imputation accuracy statistic (INFO) above a certain threshold (x-axis). Vertical lines highlight two thresholds (0.3 and 0.9) commonly used to ascertained SNPs on imputation accuracy. Overall, Panel b shows that HapMap 3 SNPs are well imputed across all ancestry groups with >98% of SNPs with INFO>0.3 and >80% of SNPs with INFO>0.9.

Suppl. Fig. 3. Biases in conditional and joint effect estimates. Panel a:
Prediction accuracy (squared correlation R 2 ) of polygenic scores (PGS) based on genomewide significant (GWS) SNPs identified in 5 ancestry groups (EUR: European, EAS: East-Asian, HIS: Admixed Hispanic ethnicity, AFR: African (mostly Admixed African American) and SAS: South-Asian). For each set of GWS SNPs, Panel a compares the prediction accuracy obtained when the PGS is calculated from marginal SNP effects (PGSGWAS) versus when the PGS is calculated from joint SNP effects (PGSCOJO) estimated with the GCTA-COJO algorithm using default parameters. In ancestry groups, except HIS and AFR, the accuracy of PGSCOJO is larger than that of PGSGWAS. Panel b contrasts the per-chromosome correlation between PGSGWAS and height (x-axis) with the per-chromosome correlation between PGSCOJO and height (y-axis). These correlations were calculated in 2 samples of African ancestry participants from the UK Biobank (UKB, NAFR-UKB=6,911) and the PAGE study (NAFR-PAGE=8,238) and in 1 sample of Admixed Hispanic individuals also from the PAGE study (NHIS-PAGE=5,798). Panel c represents for each chromosome (x-axis) the slope from regressing marginal SNP effects on joint SNP effects (y-axis). Regression slopes were estimated for GWS SNPs identified on each chromosome in GWAS meta-analyses in African and Hispanic ancestry groups. These results are further described and discussed in Suppl. Note 1.
Suppl. Fig. 4. Impact of the collinearity threshold (CT) used in the GCTA-COJO algorithm on prediction accuracy of polygenic scores (PGS) based using estimates joint effects. By default, the GCTA-COJO algorithm implements a stepwise model selection to identify sets of jointly significant SNPs such that the variance of genotypes at any SNP retained in the final model is not explained at more than CT=90% by other SNPs included in the model. We varied the value of the CT between 0.9 (least stringent) and 0.1 (most stringent) and monitored the prediction accuracy of PGS in different samples. Each panel represents CT on the x-axis and the prediction accuracy on the y-axis in a given ancestry group match with the ancestry of GWAS participants. ) onto SNP effects estimated in a standard population-based GWAS ( ). The x-axis in Panel d is the -log10 of the p-value threshold used to ascertain SNPs for estimating the regression slope. For each p-value threshold, the effects of ascertained SNPs were corrected for winner's curse (Methods). The horizontal red dotted line represented the expected slope under assortative mating (AM) assuming an equilibrium heritability ℎ 2 = 0.8 and a spousal correlation = 0.25 (Suppl. Note 2). a and e, b and f, c and g, d and h represent quantifications of PS in GWAS conducted in individuals from the African (AFR), East-Asian (EAS), South-Asian (SAS) and Hispanic (HIS) ancestry groups, respectively. The x-axes in all the top panels (a -d) represent pairs of subpopulations in the 1000 Genomes Project (1KG) within the corresponding ancestry groups. The y-axes in Panels (a -d) show estimates of (+/-standard errors; S.E.) for SNP effects estimated in the corresponding ancestry group along axes of genetic differentiation between subpopulations indicated on the x-axis. Red dots on top of each bar indicate statistical significance as defined in Suppl. Note 2 based on different thresholds according to the number of pairs of subpopulation within ancestry groups. The x-axes in all the bottom panels (e -h) correspond to 20 principal components (PC) calculated in 1KGP samples from the corresponding ancestry group. The y-axes in Panels e -h show the squared correlations between PC loadings and marginal SNP effects in the corresponding ancestry-specific GWAS meta-analysis. These results are further described and discussed in Suppl. Note 2.

Suppl. Fig. 6. Quantification of confounding due to population stratification (PS) in various non-European ancestry (non-EUR) GWAS of height. Panels
Suppl. Fig. 7. Quantification of confounding due to population stratification (PS) in our cross-ancestry GWAS meta-analysis of height. The x-axis in panels (a -e) shows pairs of subpopulations in the 1000 Genomes Project (1KG) within 5 ancestry groups indicated by the title of the panel (African: AFR, East-Asian: EAS, South-Asian: SAS, HIS: Hispanic, European: EUR). The y-axis in panels (a -e) shows estimates of (+/-standard errors; S.E.) for SNP effects in our cross-ancestry meta-analysis along axes of genetic differentiation between subpopulations indicated on the x-axis. Each dot in Panel f represents the squared correlation (y-axis) between SNP effects and loadings of principal components (PC: 1 to 20, indicated on the x-axis) calculated in 5 ancestry groups indicated in the panel legend. These results are further described and discussed in Suppl. Note 2.

Suppl. Fig. 8. Correlation of marginal SNP effects between European ancestry (EUR, on the x-axis) and non-EUR GWAS (i.e. Hispanic (HIS), African (AFR), East-Asian (EAS) and South-Asian (SAS) on the y-axis).
Correlations r(b) were corrected for estimation errors as described in the Methods section. Each estimate of the correlation of SNP effect is based on SNPs reaching marginal genome-wide significance in any of the 5 ancestries analysed. Standard errors for r(b) were obtained using jackknife. Error bars denote standard errors of SNP effects.
Suppl. Fig. 9. Correlation of marginal SNP effects between East-Asian ancestry (EAS, on the x-axis) and non-EAS GWAS (i.e. Hispanic (HIS), African (AFR), European (EUR) and South-Asian (SAS) on the y-axis). Correlations r(b) were corrected for estimation errors as described in the Methods section. Each estimate of the correlation of SNP effect is based on SNPs reaching marginal genome-wide significance in any of the 5 ancestries analysed. Standard errors for r(b) were obtained using jackknife. Error bars denote standard errors of SNP effects.

Suppl. Fig. 10. Correlation of marginal SNP effects between Hispanic ethnicity (HIS, on the x-axis) and non-HIS GWAS (i.e. European (EUR), African (AFR), East-Asian (EAS) and South-Asian (SAS) on the y-axis).
Correlations r(b) were corrected for estimation errors as described in the Methods section. Each estimate of the correlation of SNP effect is based on SNPs reaching marginal genome-wide significance in any of the 5 ancestries analysed. Standard errors for r(b) were obtained using jackknife. Error bars denote standard errors of SNP effects. Fig. 11. Correlation of marginal SNP effects between African ancestry (AFR, on the x-axis) and non-AFR GWAS (i.e. Hispanic (HIS), European (EUR), East-Asian (EAS) and South-Asian (SAS) on the y-axis). Correlations r(b) were corrected for estimation errors as described in the Methods section. Each estimate of the correlation of SNP effect is based on SNPs reaching marginal genome-wide significance in any of the 5 ancestries analysed. Standard errors for r(b) were obtained using jackknife. Error bars denote standard errors of SNP effects. Error bars denote standard errors of SNP effects.

Suppl. Fig. 12. Correlation of marginal SNP effects between South-Asian ancestry (SAS, on the x-axis) and non-SAS GWAS (i.e. Hispanic (HIS), African (AFR), East-Asian (EAS) and European (EUR) on the y-axis).
Correlations r(b) were corrected for estimation errors as described in the Methods section. Each estimate of the correlation of SNP effect is based on SNPs reaching marginal genome-wide significance in any of the 5 ancestries analysed. Standard errors for r(b) were obtained using jackknife. Error bars denote standard errors of SNP effects.
Suppl. Fig. 13. Cross-ancestry correlation of marginal SNP effects ascertained at various significant thresholds. The x-axis in each panel represents significance thresholds used to ascertained SNPs and the y-axis the correlation r(b) of estimated marginal effects (Methods) across 10 pairs of GWAS performed in 5 ancestry groups (African: AFR, East-Asian: EAS, South-Asian: SAS, HIS: Hispanic, European: EUR). Each panel represents which of the five ancestry-group specific GWAS was used to ascertain SNPs. SNPs were ascertained using GCTA-COJO with a linkage disequilibrium reference set from the corresponding ancestry group. Within each panel, correlations were calculated using the subset of COJO SNPs, which marginal significance also met the significance threshold indicated on the x-axis. Each colour represents a pair of ancestry group.

Suppl. Fig. 14. Correlation of marginal and conditional SNPs effects between discovery and replication (Estonian Biobank -EBB) GWAS. Panels a and b
show correlations of marginal SNP effects for a subset of jointly associated SNPs (identified using the GCTA-COJO methods; Methods), which marginal effects also reach genome-wide significance. Panels c and d represent joint effect re-estimated using approximate conditional analyses (implemented in the GCTA software). Genotypes of ~350,000 unrelated participants of the UK Biobank were used as linkage disequilibrium (LD) reference. Fig. 15. Schematic representation of the measure of signal density. The horizontal arrow represents a chromosome and each circle a specific association. For each association, the density is defined as the number of other independent associations within a certain window. In the example above, the window around the first SNP contains 1 SNP, so its density is 1. Similarly, the density at the third SNP (from the left) is 0 because the window around it does not contain any other association.

Suppl.
Suppl. Fig. 16. Distribution of signal density. Signal density (x-axes) is defined for each height-associated SNP as the number of other associations detected within 100 kb based on the METAFE and ancestries group specific meta-analyses. Y-axes represent the number of height-associated SNPs with a signal density indicated on the x-axis. Suppl. Fig. 17. Independent signal density at the ACAN gene locus across ancestries. Independent associations were identified from GWAS performed in 5 ancestries (African: AFR; European: EUR; East-Asian: EAS; South-Asian: SAS and Hispanic: HIS) as well as from the meta-analysis of all ancestries (ALL). Genomic segments with a signal density >1 are found in each ancestry group. Fig. 18. Variance of height explained by SNPs in genome-wide significant (GWS) loci defined with various window sizes. Stratified SNP-based heritability (ℎ SNP 2 ) estimates were obtained for three partitions of the genome: (1) GWS SNPs alone vs. all other HapMap 3 (HM3) SNPs; (2) GWS SNPs +/-all HM3 SNPs within 10 kb vs. all other HM3 SNPs and (3) GWS SNPs +/-all HM3 SNPs within 35 kb vs. all other HM3 SNPs. Analyses were performed in samples of four different ancestries: European (EUR: meta-analysis of UK Biobank (UKB); N=14,587 + Lifelines data; N=14,058), African (AFR: UKB), East-Asian (EAS: UKB) and South-Asian (SAS: UKB). Estimates from stratified analyses were compared with SNP-based heritability estimates obtained from analysing all SNPs jointly (horizontal red bar; dotted lines represented standard errors). Analyses were repeated using a random set of 12,111 SNPs (and redefining loci relative to those), which minor allele frequency and linkage disequilibrium distribution matched that of GWS SNPs (RND: gold bars). Fig. 19. Variance of body mass index (BMI) explained by height-associated genome-wide significant (GWS) loci defined with various window sizes. Stratified SNP-based heritability (ℎ SNP 2 ) estimates were obtained for three partitions of the genome: (1) GWS SNPs alone vs. all other HapMap 3 (HM3) SNPs; (2) GWS SNPs +/-all HM3 SNPs within 10 kb vs. all other HM3 SNPs and (3) GWS SNPs +/-all HM3 SNPs within 35 kb vs. all other HM3 SNPs. Analyses were performed in UK Biobank samples of four different ancestries: European (EUR), African (AFR), East-Asian (EAS) and South-Asian (SAS). Estimates from stratified analyses were compared with SNP-based heritability estimates obtained from analysing all SNPs jointly (horizontal red bar; dotted lines represented standard errors). Analyses were repeated using a random set of 12,111 SNPs (and redefining loci relative to those), which minor allele frequency and linkage disequilibrium distribution matched that of GWS SNPs (RND: gold bars). Fig. 20. Optimal weighting of PGS and parental information in simulated data. We simulated a population of N=2,000 individuals and a trait controlled by M=1,000 causal variants. For simplicity, we assumed that a variable proportion of causal SNPs is used to calculate the PGS, and that SNP effects are estimated with negligible errors. We show below in Equation (3.11) how this proportion is chosen to achieve the desired prediction accuracy. We considered two scenarios: (i) random mating, i.e. r=0 (Panels a and b) and assortative mating (for 20 generations) based on a spousal phenotypic correlation r=0.25 (Panels c and d). In all simulations, we assumed a heritability h 2 = 0.8 and varied the expected prediction accuracy (R SNP 2 ) between 0.05 and 0.8. The notation R SNP 2 is general and applies to any PGS based on independent SNPs, not just genome-wide significant as described in the main text. For each simulated population, we compared our predictions from Equation (S3.1) and (S3.2) with estimated regression coefficients obtained from regressing y on ŷ and y ̅ p . The vertical green bar in panel c, denotes the threshold above which PGS information outweighs parental information. The vertical grey bar in panels a and c denotes the threshold when R SNP 2 = ℎ 2 2 ⁄ . This threshold is predicted using Equation (S3.4). We also compared our variance explained by fitting both predictors with our predicted expectation from Equation (S3.3). Each dot is generated using 100 replicates. Overall, we found a perfect consistency between our theoretical and simulation results, which provides an empirical validation of these predictions. Fig. 21. Selection of the number of gene set clusters using the "Elbow method". Gene sets were hierarchically clustered at different sizes and the distance between each cluster evaluated. 20 clusters was chosen as an appropriate number of gene set clusters to evaluate for enrichment. Suppl. Fig. 22. Proportional heritability (h 2 ) explained, enrichment, and normalized estimates for MAGMA (panels A-C) and DEPICT (panels D-F). The error bars represent the 95% confidence interval, calculated as estimate +/-1.96*standard error. The labels for each subpanel indicate the ancestry represented in the GWAS used for LDSC, and the x-axis labels indicate the ancestry represented in the "discovery" GWAS used to prioritize the genes. EUR: European ancestry; AA: African-Americans of admixed European and African ancestries; EAS: East Asian ancestry. Analyses underlying this figure are further described in Suppl. Note 5. Suppl. Fig. 23. Gene-level saturation of GWAS discoveries as a function of sample size. Increase in sample size from ~4 million to ~5 million is achieved by including ~1 million participants of non-European ancestry. Panel a shows the enrichment of genome-wide significant (GWS) SNPs identified from an approximate conditional and joint (COJO) analysis within 462 genes associated with skeletal growth disorders from the Online Mendelian Inheritance in Man (OMIM) database (yaxis) as a function of GWAS sample size (x-axis). Standard Error (S.E.) were calculated as the standard deviation of enrichment statistics (odds ratio in the 2x2 contingency table contrasting for each gene: "is the gene an OMIM gene" vs. "does the gene contain a GWS SNP") across 1,000 randomly sampled sets of 462 non-OMIM genes length-matched with OMIM genes. The average enrichment calculated across the 1,000 random gene sets is represented with dotted lines. Presence of a GWS SNP within a gene was assessed relative to gene start and stop position, considering flanking regions within 0 kb, 10 kb, 20 kb and 30 kb. Panel b shows the proportion of OMIM overlapping genes with at least one GWS SNPs (y-axis). As in Panel a, dotted lines represents the null distribution from 1,000 random sets of genes length-matched with OMIM genes. Standard errors (S.E.) were calculated as the standard deviation of the proportion observed across the 1,000 draws from the null distribution. Panel c represents the proportion of OMIM genes near GWS SNPs after subtracting the mean of the null distribution at each sample size. Panel d represents the enrichment of OMIM genes as a function of the strength of association of 12,111 independent GWS SNPs identified in our largest GWAS (N~5.4M). GWS SNPs were grouped into 10 decile groups of ~1,211 SNPs. Enrichment near OMIM genes is stronger of SNPs explaining a larger proportion of height variance (top decile). Panel e shows the median per-SNP variance explained (y-axis) as a function of the median distance to the closest OMIM gene. Large GWAS tend to identify variants with smaller effect sizes and further away from OMIM genes. Panels f shows the number of genes prioritised using Summary-data based Mendelian Randomization (SMR; P<5×10 -8 ), which expression may act a mediator of the effects of SNP on height. SMR analyses were based on expression quantitative trait loci (eQTL) identified in the GTEx and eQTLgen studies (Methods). The z-axis (in red) shows the number of OMIM genes overlapping with SMR genes identified from analysing GWAS with various sample sizes (x-axis).

Suppl.
Suppl. Fig. 24. Variant-level saturation of GWAS discoveries as a function of sample size. Increase in sample size from ~4 million to ~5 million is achieved by including ~1 million participants of non-European ancestry. Panel a shows number of independent genome-wide significant (GWS) SNPs and loci identified at various GWAS sample sizes (Details about down-sampled GWAS are given in Table 2). GWS loci were defined using various window sizes including 35kb, 50kb and 100 kb. Panel b shows the percentage of the genome covered by GWS loci. Coverage was calculated as the cumulative length of GWS loci in Mb divided 3,039 Mb, the estimated length of the human genome. Panel c shows the prediction accuracy ( GWS 2 ) of various polygenic scores on GWS SNPs identified at various sample sizes. In Panels a and c, dotted lines represent y-axis values for our largest European ancestry GWAS (N~4 million). Fig. 25. Partitioned SNP-based heritability of height in African ancestry individuals. Panels a represent partitioned SNP-based heritability estimates from a sample of 6,911 unrelated African ancestry (AFR) individuals from the UK Biobank, independent of our discovery GWAS. This analysis focuses on 16,374,566 SNPs with a minor allele frequency (MAF)>1% in AFR. These SNPs were further stratified according to their MAF in European ancestry (EUR) populations: 7,365,878 SNPs with MAF>1% in EUR (47%) vs. 8,114,046 SNPs with MAF<1% in EUR (53%) and their position within vs. outside genome-wide significant (GWS) loci. Panel b shows the MAF distribution of the 16M SNPs in AFR and panels (c -d) the distribution of these SNPs in EUR. The SNPbased heritability contributed by SNPs within GWS loci is denoted ℎ GWS 2 , while that contributed by SNPs outside these loci is denoted ℎ other 2 . These results are further discussed in Suppl. Note 6.

Long range LD in admixed populations can bias estimation of approximate conditional SNP effects
We compared the prediction accuracy of polygenic scores (PGS) based on genome-wide significant (GWS) SNPs identified in each ancestry groups. For each set of ancestry-specific GWS SNPs we calculated a PGS using either marginal SNP effects (hereafter denoted PGSGWAS) or conditional effects (hereafter denoted PGSCOJO) approximated using GCTA-COJO (Methods). Given that COJO is designed to detect secondary signals, i.e. explaining additional trait variance, we expect the prediction accuracy of PGSCOJO to be similar, if not outperform, that of PGSGWAS.
Consistently, we found that PGSCOJO yields a higher accuracy than PGSGWAS in most cases except in HIS and AFR (Suppl. Fig. 3a). We further investigated that observation and found that the poor performances of PGSCOJO relative to PGSGWAS in HIS and AFR were driven by specific chromosomes such chromosome 6, 9 and 20 (Suppl. Fig. 3b), where estimated conditional effects were abnormally large (Suppl. Fig. 3c).
We hypothesized that these unexpected observations could be explained by estimation errors during the stepwise model selection procedure, e.g., because of collinearity between SNPs included in the model. Note that GCTA sets a default threshold of 0.9 for collinearity between SNPs, which means that the variance of genotypes at a given SNP included in the model cannot be explained at >90% by all other SNPs included in the model. To explore the impact of this parameter on our observations we performed a sensitivity analysis by varying the collinearity threshold between 0.1 and 0.9.
We found that using a more stringent collinearity threshold reduces the prediction accuracy of PGSCOJO in EUR and EAS (accuracy in SAS remained unchanged) but produces an opposite effect in AFR and HIS (Suppl. Fig. 4). More specifically, setting the collinearity threshold below 0.5 restored the prediction accuracy of PGSCOJO up to a level comparable to that of PGSGWAS. Therefore, our sensitivity analyses suggest that stringent collinearity thresholds are preferable when applying COJO to GWAS from admixed ancestry groups such as HIS and AFR.
Consequently, the COJO results presented in the main text are based upon a collinearity threshold of 0.1 for HIS and AFR and the default threshold of 0.9 for all other COJO analyses. We chose 0.1 because it produces the most parsimonious model (i.e. fewer number of associations) without impacting prediction accuracy.

Impact of ancestry composition of LD reference panel on COJO results
We re-analysed summary statistics from our cross-ancestry GWAS meta-analysis using two LD reference sets. First, we randomly selected 37,900 EUR, 4,400 EAS, 4,250 HIS, 2,750 AFR and 700 SAS individuals (i.e. 50000 individuals in total) to form a LD reference set with ancestries proportions matching that in our cross-ancestry meta-analysis. The second set contained 50,000 individuals with EUR ancestries. We restricted analyses with both LD reference sets to 882,755 HM3 SNPs, which passed quality control (Hardy-Weinberg Equilibrium test, missingness and imputation quality) in all five ancestry groups.
We found that COJO based on the multi-ancestry LD panel only detected 3,635 (3380 using a collinearity-threshold of 0.1) independent associations vs. 11,001 associations using the EUR LD reference. The latter number is smaller than the 12,111 reported in the main text but consistent with a ~10-15% smaller number of HM3 SNPs used as input. We also repeated analyses using the 37,900 EUR individuals as LD reference and found that COJO detects 11,065 SNPs, indicating that using this multi-ancestry LD panel leads to underestimation by COJO of the number of associations. This conclusion is supported by the fact that a PGS based on 11,001 COJO SNPs detected using a EUR LD panel explains a significantly larger amount of height variance than that of a PGS based on only 3,380 COJO SNPs detected with a multi-ancestry panel (EUR: 38.2% vs 26.4%;SAS: 20.3% vs 13.4%;EAS: 19.5% vs 13.3% and AFR: 9.0% vs 5.0%). Moreover, we ran another COJO analysis of our cross-ancestry GWAS using LD information from 10,636 AFR individuals (i.e. same LD panel for our AFR GWAS metaanalysis). Note that 242,891 / 882,755 (i.e. 27.5%) SNPs were filtered out by GCTA prior to analysis because of expected large differences in allele frequencies between our cross-ancestry GWAS including >75% of EUR individuals and the AFR LD panel (by default GCTA exclude SNPs with an absolute frequency difference is >0.1). Nevertheless, we detected 5,701 quasi-independent joint associations (i.e. more associations than using a mix-ancestry panel), explaining 24.6%, 13.2%, 10.9% and 3.5% of height variance in EUR, SAS, EAS and AFR individuals respectively. The latter predictive performances are lower to that obtained with a PGS from 3,380 COJO SNPs.
Altogether, these results demonstrate that COJO with a composite LD reference panel does not improve and likely hinders the detection of associations in our cross-ancestry GWAS meta-analysis. We emphasize that extending the COJO methodology for analysing multi-ancestry GWAS is an independent research question, which goes beyond the scope of our study.

LD score regression analysis of European ancestries GWAS
First, we performed a LD score regression (LDSC) analysis 16 of our GWAS meta-analysis of EUR participants (N=4,080,687). We assessed the degree of PS using the attenuation ratio statistic (RLDSC), which provides a quantification of PS that is independent of sample size. The estimated RLDSC is ~3.8% (S.E. 0.8%), suggesting that most of the inflation of association test statistic is explained by polygenicity and not PS. In comparison, GWAS of height without any adjustment for population stratification produce values of RLDSC ~10-13%. 67 Our estimated RLDSC is slightly smaller than that from LDSC analyses of previously published GWAS of height (Wood et al. (2014) Yengo et al. (2018). However, as a measure of PS, RLDSC is not strictly comparable across studies. In fact, the expectation of RLDSC (across repeated GWAS) not only depends on how much trait variance is explained by PS, but also on the degree of genetic differentiation between cohorts (FST) and the trait heritability within each cohort. 16 The last two factors can vary from one GWAS meta-analysis to another as a function of cohort composition. In summary, these LDSC analyses suggest that uncorrected PS only marginally affects SNP effects from our large GWAS of height in EUR participants.

Assessment of allele frequencies for height increasing alleles across the North-South axis of Europe reveals attenuated correlation relative to previous studies
As an alternative to RLDSC, we next quantified PS in our GWAS using the correlation between strength of association (p-value) and height-increasing allele frequency differences between Great Britain (GBR sample in the 1000 Genomes Project -1KGP) and the Italian Tuscan population (TSI sample in 1KGP). This statistic was previously introduced by Sohail and colleagues 10 to reveal biases in SNP effect estimates from the Wood et al. study, that were induced by uncorrected PS along the North-South gradient of Europe. More precisely, the strategy implemented by Sohail et al. consists in grouping SNPs based on strength of association, then regress the mean height-increasing allele frequency differences between GBR and TSI for each SNP bin onto the mean p-value of the corresponding bin. The slope of that linear regression ( ) measures the degree of PS. We estimated using 101,360 near independent HM3 SNPs with MAF>1% and calculated standard errors using a bootstrap strategy based on 1,000 independent draws.
As previously reported, we found a significant of ~1.28% (S.E. 0.08%; = 2.3 × 10 −61 ) using summary statistics of the Wood et al. (2014) study but no significant from within-family GWAS in 17,942 independent UKB siblings pairs ( =-0.08%, S.E. = 0.07%; P=0.27). We show in Suppl. Fig. 5ab estimates of from GWAS summary statistics of Yengo et al. (2018), EUR participants of 23andMe (23andME-EUR), all EUR participants of the UKB (UKB-456k; GWAS using BOLT-LMM), unrelated EUR participants of the UKB (UKB-350k; GWAS using PLINK), the meta-analysis EUR participants from multiple cohorts of the GIANT consortium (GIANT-EUR; N~1.6M); and the meta-analysis of GIANT-EUR and 23andMe-EUR. Overall, we find that decreases with sample size, consistent with an increased signal-to-noise ratio. In particular, is ~0.13% (P=0.1) in our largest GWAS meta-analysis of N~4.1M EUR participants, which demonstrates a better correction of PS than previously published EUR GWAS of height.
Furthermore, we assessed the squared correlation between estimated effects at these 101,360 independent HM3 SNPs and SNP loadings from 20 principal components (PCs) calculated in 503 EUR samples from 1KGP. Across SNPs, we found that SNP loadings on PC2 explain most of the variance in estimated SNP effects (Suppl. Fig. 5c). This observation is not surprising given that PC2 is the PC that correlates the most with the North-South axis of Europe, and therefore explains the consistency with our results based on . However, <0.3% of the variance of SNP effects estimated in our largest GWAS is explained by SNP loadings, which is much lower than ~2.3% obtained when analysing SNP effects from Wood et al. (2014).

Comparison of estimated SNP effects between GWAS meta-analyses and family-based GWAS
Finally, we directly compared SNP effects from our GWAS ( ) with that of a within-family GWAS in 17,942 independent UKB siblings pairs ( ). We used S PS = cov( GWAS , SIB )/var( GWAS ) as our metric of interest in this comparison, where both cov( GWA , SIB ) and var( GWAS ) are calculated across SNPs. When SNP effects are estimated using ordinary least-squares (OLS) regression, the expectation of S PS in the absence of PS is [S PS ] = 1. Therefore, a significant deviation of below 1, may indicate confounding due to residual PS. However, the statistical properties of S PS based on SNP effects estimated using linear mixed models (LMM) (or meta-analyses of OLS and LMM estimates) are not well characterised, which may affect our interpretation below.
We found that estimates of based on SNPs strongly associated with height are much closer to 1 than when all SNPs are used, i.e. regardless of strength of association (P<1; Suppl. Fig. 5d). We observed the lowest value of ~0.29 (S.E. 0.01) when using effects of all 101,360 SNPs estimated in the Wood et al. study. In comparison, estimated SNP effects from our largest EUR GWAS yields an >0.8 regardless of strength of association, yet still significantly lower than 1 ( < 7 × 10 −30 ). Lee et al. 68 previously showed that assortative mating (AM) on height can produce values of < 1. Under the assumption that the population has reached an equilibrium after many generations of AM with a constant spousal correlation ( ), they showed that where ℎ 2 is the full narrow-sense heritability in the current generation (at equilibrium). We note here an error in the Supplementary Notes of Lee et al. (2018), who used in their derivations the heritability in the base population undergoing random mating (ℎ 0 2 ) instead of the equilibrium heritability, ℎ 2 . In practice, differences between ℎ 0 2 = (ℎ 2 − ℎ 4 )/(1 − ℎ 4 ) and ℎ 2 are small. Therefore, using one or the other heritability has a limited impact on the expected value of . Using Equation (1.1) and assuming an equilibrium heritability, ℎ 2 = 0.8 and a spousal correlation, = 0.25, we expect to be ~1 − 0.8 × 0.25 = 0.8, which is consistent with our observations (Suppl. Fig. 5d).
Altogether, these analyses show that estimated SNP effects from our EUR GWAS are inflated by ~10-20% relative to that from a family-based GWAS and that this inflation is not larger than expected because of phenotypic AM on height.

Assessment of population stratification in non-European ancestries GWAS
We extended the previous analyses performed in EUR to quantify the impact of residual PS in GWAS of height performed in the four other ancestry groups, i.e. HIS, SAS, EAS and AFR.

LD score regression analysis of non-European ancestries GWAS
We performed LD score regression analyses using LD scores estimated from the same ancestrymatched samples as in our COJO analyses (i.e. 10,636 AFR samples,5,875 EAS samples,9,448 SAS samples and 4,883 HIS samples). LD scores were calculated from imputed HapMap3 SNPs using the LDSC software (version 1.0.1) with a window size of 1 cM.
It is noteworthy that values of RLDSC above 20% * as observed in our SAS GWAS may also reflect strong LD differences between GWAS participants and samples used to estimate LD scores. We applied the DENTIST method (Detecting Errors iN analyses of summary staTISTics) to distinguish these two potential explanations. In brief, DENTIST compares the observed distribution of Z-scores from GWAS to an expected distribution based on a reference LD matrix. Deviations from that expected distribution reflect errors in the GWAS summary statistics or inconsistencies in LD patterns. DENTIST detected 213 outlier SNPs in the SAS GWAS (P<5 × 10 −8 ) relative to LD patterns from 9,448 unrelated SAS from the UKB. However, excluding these 213 outliers SNPs did not substantially affect the value of the RLDSC statistic (26.2%; S.E. 3.6%).
Altogether, these LD score regression analyses suggest the presence of residual PS that might potentially confound estimates of SNP effects in our HIS, EAS and SAS GWAS.

Correlation between for height-increasing alleles frequencies and genetic differentiation within four ancestry groups
Next, we estimated for each non-EUR GWAS meta-analysis along various axes of within-continent genetic differentiation defined by pairs of 1KGP subpopulations. For example, we estimated in our AFR GWAS meta-analysis along an axis that differentiates Yoruba populations in Nigeria (West Africa) from Luhya populations in Kenya (East Africa), as well as in our EAS GWAS meta-analysis along an axis that differentiates Japanese populations from Han Chinese populations. We used ancestry-specific significance thresholds calculated as 0.05 divided by the number of pairs of subpopulations within the corresponding 1KGP ancestry group. More specifically, we considered 7 subpopulations in AFR (21 pairs), 5 subpopulations in EAS (10 pairs), 5 subpopulations in SAS (10 pairs) and 4 subpopulations in HIS (6 pairs).
* a rule-of-thumb recommended by the authors of the LDSC software.
In conclusion, we detected a small amount residual PS in all non-EUR GWAS meta-analyses, in particular in SAS (smallest sample size).

Effect of residual population stratification on cross-ancestry GWAS meta-analysis
In this final section, we focus on SNP effects from our cross-ancestry GWAS meta-analysis (referred to as METAFE in the main text). Using these estimated SNP effects, we quantified along multiple axes of within-continent genetic differentiation and also the squared correlations between SNP effects and within-ancestry PC loadings and SNP effects (as in the previous section). Overall, remain below 0.5% across all pairs of 1KGP subpopulations (Suppl. Fig. 7a-e), and the squared correlation between SNP effects and PC loadings was also smaller than 0.15% (Suppl. Fig. 7f).
In summary, the various analyses shown here demonstrate that residual PS has a minimal confounding effect on estimated SNP effects from our cross-ancestry GWAS meta-analysis.
Supplementary Note 4: Optimal weighting of PGS and parental information to maximize prediction accuracy in the presence of assortative mating Overview of theory, simulations and application to real data from the UK Biobank For a given individual, we denote y their phenotype, y and y the phenotypes of their mother and father respectively, y ̅ p = (y + y )/2 the average of their parents' phenotypes and ŷ their own PGS. We consider combined predictor that is a linear combination of ŷ and y ̅ p . Under the assumption that the resemblance between relatives is solely due to genetic factors, our main result is that the optimal weighting α PGS ŷ + α PA y ̅ p is given by where h 2 denotes the heritability of the trait in the current population, the correlation between spouses phenotypes in the population, and R ŷ,y 2 = corr(ŷ, y) 2 , the prediction accuracy of the PGS. The expected accuracy (R ŷ+y ̅ p 2 ) of the combined predictor using these optimal weights, is given by 1 − R ŷ,y 2 (1 + )/2 Suppl. Fig. 20 shows the results of simulations performed to verify the results from Equations (S3.1-S3.3). These simulations use an arbitrary number of SNPs included in the PGS and are not designed to match the number of SNPs used in various PGS analyses presented in the main text. Nevertheless, our conclusions are general and applicable to our empirical data under the assumption that each SNP in the PGS contributes about the same amount of genetic variance. We define the regression weights as ω PGS = α PGS /(α PGS + α PA ) and ω PA = α PA /(α PGS + α PA ). Therefore, values of ω PGS such that ω PGS > 0.5 imply that the PGS has a stronger weight that the parental average. Next, we estimated α PGS , α PA and R ŷ+y ̅ p 2 in 981 trios from the UK Biobank (Methods). For this analysis, we used a PGS based on 12,111 GWS SNPs identified in our largest GWAS meta-analysis. We found α PGS~0 .375 (S. E. = 0.025) and α PA~0 .634 (S. E. = 0.034) . The variance explained by fitting both predictors is R ŷ+y ̅ p 2 = 0.542 (S.E. = 0.032), which is larger than the accuracy of each single predictor (R ŷ 2 = 0.38, S.E. 0.031; and R y ̅ p 2 = 0.439, S.E. 0.032). Next, we estimated the spousal correlation r = 0.233 (S. E. = 0.031) and the heritability ĥ 2 = 0.894 (S. E. = 0.032) using mid-parent regression. Besides, the prediction accuracy of the PGS is R ŷ,y 2~0 .4. Therefore, from these estimates of r, h 2 and R ŷ,y we predict using Equations (S3.1-S3.3) that α PGS = 0.377, α PA = 0.656 and R ŷ+y ̅ p 2 = 0.599. These three predictions are not statistically distinct from estimated values, which further validates our model.
We assume that R ŷ,y is known, e.g., from quantifying the accuracy of the PGS in some validation sample.

Prediction accuracy and proportion of causal variants captured
We assume that the trait of interest in underlain by M independent causal SNPs and that m (m ≤ M) of them are included in a PGS. Moreover, we assume that the population has been undergoing assortative mating for multiple generations, until an equilibrium is reached. We derive below how large m needs to be for the prediction accuracy of the derived PGS, in the equilibrium population, to equal R ŷ,y 2 .
Using a similar reasoning, Yengo et al. 71 (Eq. 1.20 in their Supplementary Note) derived the relationship between 0 and the proportion = h SNP 2 /h 2 of equilibrium heritability explained by the m SNPs included in the PGS as: (1 − ) ⁄ .

Proof of Equation (Int. 3.1) and (Int. 3.2)
We assume an infinitesimal model, where each causal SNP explains the same amount of trait variance. For simplicity, we assume the squared effect size of each causal SNP to equal 2 = σ g,0 2 / ; and that SNP effects are estimated with negligible errors so that they could be assumed to be equal to their true value. Finally, we assume that the m first SNPs are included in the PGS.

Overview and main results
We assessed the enrichment of broad categories of biological pathways for different GWAS sample sizes, using two different gene set enrichment methods, DEPICT 42 and MAGMA. 43 Specifically, we evaluated the prioritization of 14,462 gene sets, hierarchically clustered into 20 groups of related gene sets based on gene set membership (see Methods below, Suppl. Fig. 21, Suppl. Table 13). We observed an enrichment of OMIM genes in clusters 1, 2, 5, 6, 11, 16, and 17 (Bonferroni P < 0.05 vs. random genes (Extended Data Fig. 8, Suppl. Table 14). At all sample sizes tested (range N=130,010 to N=5,314,291), similar sets of the clusters consistently showed significant enrichments in DEPICT (clusters 2, 5, 11, 16, and 17) and MAGMA-prioritized gene sets (clusters 5, 11, 16, and 17; Suppl. Fig. 21). Thus, the broad patterns of gene set enrichment are apparent even at moderate sample sizes and remain quite stable as sample sizes increase.
In contrast with clusters of gene sets, individual genes may require larger sample sizes or multiple ancestries to be implicated by GWAS. To address these questions, we assessed the fraction of OMIM genes that contain an approximately independent genome-wide significant signal (identified with COJO) across the range of GWAS sample sizes. As sample size increases and the number of independent signals increases, the percent of the 462 OMIM genes overlapping a signal also increases (Suppl. Fig.  23b); however, after subtracting the null background from randomly sampled sets of 462 genes, the percentage above background of OMIM genes that overlap GWAS signal plateaus at a sample size of ~2.5 M (Suppl. Fig. 23c). In comparing the trans-ancestry meta-analysis with the largest Europeanancestry-only GWAS with, we did not observe a noticeable increase in overlapping OMIM genes above background.
We also sought to examine more directly whether the height GWAS results implicate highly similar biology across different continental ancestries. We used MAGMA and DEPICT to prioritize genes based on GWAS results for EUR, EAS, and AA ancestries. We then compared the enrichment of heritability with stratified LD score regression (LDSC) 39,40 for each set of prioritized genes, evaluated either in the same ancestry or in the other two ancestries. Genes prioritized in one ancestry by both MAGMA and DEPICT showed comparable enrichment of heritability when evaluated either in that ancestry or in the other two ancestries (Suppl. Fig. 22, Suppl. Table 15), strongly confirming the shared biology implicated by GWAS results from different ancestries.

Methods
Evaluation of gene set enrichment analysis (GSEA) methods across sample sizes. For GWAS summary statistics from multiple sample sizes (Tables 1 -2) two GSEA approaches were applied (DEPICT and MAGMA). DEPICT release 173 was used; the top 1000 SNPs pruned by p-value from each set of summary statistics were used as input for each sample. MAGMA version v1.07b was used; SNPs were annotated with genes within 100kb, and genes were removed if the missingness of their pathway membership was over 0.2.
To evaluate the ability of GSEA methods to identify groups of gene sets at different sample sizes, 14,462 gene sets, each consisting of Z-scores for 19,987 genes (gene sets selected and gene membership Zscores calculated in ref. 42 ), were hierarchically clustered into 20 clusters as follows. Pairwise distances between gene sets were defined as the Euclidean distance between the gene sets' Z-scores and the elbow method was used to choose the number of clusters, evaluating average distances between cluster centroids as the number of clusters is varied (Suppl. Fig. 21). For DEPICT and MAGMA, enrichment of prioritization in each cluster was defined as the number of prioritized gene sets in each cluster divided by the size of each cluster; a gene set was considered prioritized if it was in the top 10% of gene sets as prioritized by the GSEA method. Enrichment of OMIM genes in each gene set was defined as the number of OMIM genes in each gene set divided by the size of each gene set divided by the proportion of all genes in OMIM, and then enrichment of OMIM genes in each cluster was defined as the average of the enrichment of OMIM genes in each gene set in that cluster. Genes were defined to be "in a gene set" if the gene's gene-set Z-score is > 1.96, as described previously. 72 Null distributions for each cluster were generated by randomly selected prioritized gene sets (for DEPICT and MAGMA) or prioritized genes to evaluate enrichment significance.
To evaluate saturation of height-associated gene identification, the percentage of OMIM genes overlapping independent COJO signals was calculated. "Overlapping" was defined as having at least one COJO SNP within the gene body, as defined with the plink version 1.9 hg19 gene list (URL: https://www.cog-genomics.org/static/bin/plink/glist-hg19). A null distribution was calculated by drawing an equivalent number of random genes (binned by size into 20 bins, same number of genes per bin) to match OMIM genes, and calculating the percent of the random genes near a COJO SNP.
Benchmarking of gene prioritization across different ancestries We applied DEPICT and MAGMA to prioritize genes on height GWAS of European, African-American and East Asian ancestry, resulting in three sets of prioritized genes for each method. To allow for a fair comparison, we used subsets of the available cohorts to create three equally sized GWAS (N~100,000). For MAGMA, we converted gene set prioritizations to gene prioritizations as described previously. 72 For both DEPICT and MAGMA, we then used Benchmarker 72 to evaluate the performance of these three sets of genes in each of the three different ancestries, resulting in three within-ancestry and six crossancestry scenarios.
The Benchmarker method is based on a leave-one-chromosome-out approach where one chromosome is withheld, and GWAS results for the remaining 21 chromosomes are used to prioritize genes on the withheld chromosome, iterating across each withheld chromosome. For each of the discovery GWAS ancestries, we selected the top 10% of the prioritized genes on each left out chromosome, resulting in 1,893 prioritized genes. We subsequently annotated SNPs within 50kb of the prioritized genes to generate a LD score annotation for these SNPs using LDSC. 16 Lastly, we applied stratified LDSC 39 (S-LDSC) to compare the three annotation sets to the full GWAS results for each of the three ancestries to determine whether the performance of genes prioritized and then evaluated across the same ancestry would be more enriched for heritability compared with those prioritized and evaluated in different ancestries. Reference panels were based on the 1000 Genomes Phase 3 reference panels 5 for LD score estimation, matching the reference panel ancestry with the GWAS results for that same ancestry. In addition, a category of SNPs that locate within 50kb of any gene in the prioritization method and a set of 53 annotations of known genomic importance were included in the S-LDSC as conditional covariates. The analysis was based on 1,217,311 HapMap3 SNPs. The results of the S-LDSC is summarized using proportional ℎ SNP 2 (proportion of heritability explained by the annotation), the regression coefficient (average per-SNP contribution of the annotation to heritability), and enrichment in heritability (h 2 divided by the proportion of SNPs in the annotation). To assess the performance difference between two annotations, we calculated p-values based on standard errors from the different estimates.