FinnGen provides genetic insights from a well-phenotyped isolated population

Kurki, Mitja I.; Karjalainen, Juha; Palta, Priit; Sipilä, Timo P.; Kristiansson, Kati; Donner, Kati M.; Reeve, Mary P.; Laivuori, Hannele; Aavikko, Mervi; Kaunisto, Mari A.; Loukola, Anu; Lahtela, Elisa; Mattsson, Hannele; Laiho, Päivi; Della Briotta Parolo, Pietro; Lehisto, Arto A.; Kanai, Masahiro; Mars, Nina; Rämö, Joel; Kiiskinen, Tuomo; Heyne, Henrike O.; Veerapen, Kumar; Rüeger, Sina; Lemmelä, Susanna; Zhou, Wei; Ruotsalainen, Sanni; Pärn, Kalle; Hiekkalinna, Tero; Koskelainen, Sami; Paajanen, Teemu; Llorens, Vincent; Gracia-Tabuenca, Javier; Siirtola, Harri; Reis, Kadri; Elnahas, Abdelrahman G.; Sun, Benjamin; Foley, Christopher N.; Aalto-Setälä, Katriina; Alasoo, Kaur; Arvas, Mikko; Auro, Kirsi; Biswas, Shameek; Bizaki-Vallaskangas, Argyro; Carpen, Olli; Chen, Chia-Yen; Dada, Oluwaseun A.; Ding, Zhihao; Ehm, Margaret G.; Eklund, Kari; Färkkilä, Martti; Finucane, Hilary; Ganna, Andrea; Ghazal, Awaisa; Graham, Robert R.; Green, Eric M.; Hakanen, Antti; Hautalahti, Marco; Hedman, Åsa K.; Hiltunen, Mikko; Hinttala, Reetta; Hovatta, Iiris; Hu, Xinli; Huertas-Vazquez, Adriana; Huilaja, Laura; Hunkapiller, Julie; Jacob, Howard; Jensen, Jan-Nygaard; Joensuu, Heikki; John, Sally; Julkunen, Valtteri; Jung, Marc; Junttila, Juhani; Kaarniranta, Kai; Kähönen, Mika; Kajanne, Risto; Kallio, Lila; Kälviäinen, Reetta; Kaprio, Jaakko; Kerimov, Nurlan; Kettunen, Johannes; Kilpeläinen, Elina; Kilpi, Terhi; Klinger, Katherine; Kosma, Veli-Matti; Kuopio, Teijo; Kurra, Venla; Laisk, Triin; Laukkanen, Jari; Lawless, Nathan; Liu, Aoxing; Longerich, Simonne; Mägi, Reedik; Mäkelä, Johanna; Mäkitie, Antti; Malarstig, Anders; Mannermaa, Arto; Maranville, Joseph; Matakidou, Athena; Meretoja, Tuomo; Mozaffari, Sahar V.; Niemi, Mari E. K.; Niemi, Marianna; Niiranen, Teemu; O´Donnell, Christopher J.; Obeidat, Ma´en; Okafo, George; Ollila, Hanna M.; Palomäki, Antti; Palotie, Tuula; Partanen, Jukka; Paul, Dirk S.; Pelkonen, Margit; Pendergrass, Rion K.; Petrovski, Slavé; Pitkäranta, Anne; Platt, Adam; Pulford, David; Punkka, Eero; Pussinen, Pirkko; Raghavan, Neha; Rahimov, Fedik; Rajpal, Deepak; Renaud, Nicole A.; Riley-Gillis, Bridget; Rodosthenous, Rodosthenis; Saarentaus, Elmo; Salminen, Aino; Salminen, Eveliina; Salomaa, Veikko; Schleutker, Johanna; Serpi, Raisa; Shen, Huei-yi; Siegel, Richard; Silander, Kaisa; Siltanen, Sanna; Soini, Sirpa; Soininen, Hilkka; Sul, Jae Hoon; Tachmazidou, Ioanna; Tasanen, Kaisa; Tienari, Pentti; Toppila-Salmi, Sanna; Tukiainen, Taru; Tuomi, Tiinamaija; Turunen, Joni A.; Ulirsch, Jacob C.; Vaura, Felix; Virolainen, Petri; Waring, Jeffrey; Waterworth, Dawn; Yang, Robert; Nelis, Mari; Reigo, Anu; Metspalu, Andres; Milani, Lili; Esko, Tõnu; Fox, Caroline; Havulinna, Aki S.; Perola, Markus; Ripatti, Samuli; Jalanko, Anu; Laitinen, Tarja; Mäkelä, Tomi P.; Plenge, Robert; McCarthy, Mark; Runz, Heiko; Daly, Mark J.; Palotie, Aarno

doi:10.1038/s41586-022-05473-8

Download PDF

Article
Open access
Published: 18 January 2023

FinnGen provides genetic insights from a well-phenotyped isolated population

Mitja I. Kurki^1,2,3,4,
Juha Karjalainen^1,2,3,4,
Priit Palta^1,5,
Timo P. Sipilä¹,
Kati Kristiansson⁶,
Kati M. Donner¹,
Mary P. Reeve¹,
Hannele Laivuori^1,7,8,9,
Mervi Aavikko¹,
Mari A. Kaunisto¹,
Anu Loukola¹⁰,
Elisa Lahtela¹,
Hannele Mattsson⁶,
Päivi Laiho⁶,
Pietro Della Briotta Parolo¹,
Arto A. Lehisto¹,
Masahiro Kanai^1,2,3,4,11,
Nina Mars¹,
Joel Rämö¹,
Tuomo Kiiskinen¹,
Henrike O. Heyne^1,2,3,12,13,
Kumar Veerapen^1,2,3,4,
Sina Rüeger¹,
Susanna Lemmelä^1,6,
Wei Zhou^2,3,4,
Sanni Ruotsalainen¹,
Kalle Pärn¹,
Tero Hiekkalinna⁶,
Sami Koskelainen⁶,
Teemu Paajanen⁶,
Vincent Llorens¹,
Javier Gracia-Tabuenca¹⁴,
Harri Siirtola¹⁴,
Kadri Reis⁵,
Abdelrahman G. Elnahas⁵,
Benjamin Sun^15,16,
Christopher N. Foley^17,18,
Katriina Aalto-Setälä¹⁹,
Kaur Alasoo²⁰,
Mikko Arvas²¹,
Kirsi Auro²²,
Shameek Biswas²³,
Argyro Bizaki-Vallaskangas²⁴,
Olli Carpen¹⁰,
Chia-Yen Chen²⁵,
Oluwaseun A. Dada¹,
Zhihao Ding²⁶,
Margaret G. Ehm²⁷,
Kari Eklund^28,29,
Martti Färkkilä³⁰,
Hilary Finucane^2,3,4,
Andrea Ganna^1,2,3,4,
Awaisa Ghazal¹,
Robert R. Graham³¹,
Eric M. Green³¹,
Antti Hakanen³²,
Marco Hautalahti³³,
Åsa K. Hedman^34,35,
Mikko Hiltunen³⁶,
Reetta Hinttala^37,38,39,
Iiris Hovatta^40,41,
Xinli Hu³⁴,
Adriana Huertas-Vazquez⁴²,
Laura Huilaja^43,44,
Julie Hunkapiller⁴⁵,
Howard Jacob⁴⁶,
Jan-Nygaard Jensen²⁶,
Heikki Joensuu⁴⁷,
Sally John²⁵,
Valtteri Julkunen^48,49,
Marc Jung²⁶,
Juhani Junttila⁵⁰,
Kai Kaarniranta^51,52,
Mika Kähönen^19,53,
Risto Kajanne¹,
Lila Kallio³²,
Reetta Kälviäinen^54,55,
Jaakko Kaprio^1,56,
FinnGen,
Nurlan Kerimov²⁰,
Johannes Kettunen^6,38,57,
Elina Kilpeläinen¹,
Terhi Kilpi⁶,
Katherine Klinger⁵⁸,
Veli-Matti Kosma^59,60,
Teijo Kuopio⁶¹,
Venla Kurra^62,63,
Triin Laisk⁵,
Jari Laukkanen^61,64,
Nathan Lawless²⁶,
Aoxing Liu¹,
Simonne Longerich⁴²,
Reedik Mägi⁵,
Johanna Mäkelä⁶⁵,
Antti Mäkitie^66,67,
Anders Malarstig^68,69,
Arto Mannermaa^59,60,
Joseph Maranville²³,
Athena Matakidou⁷⁰,
Tuomo Meretoja⁴⁷,
Sahar V. Mozaffari³¹,
Mari E. K. Niemi¹,
Marianna Niemi^19,71,
Teemu Niiranen^6,72,
Christopher J. O´Donnell⁷³,
Ma´en Obeidat⁷³,
George Okafo²⁶,
Hanna M. Ollila^1,74,
Antti Palomäki⁷²,
Tuula Palotie^75,76,
Jukka Partanen^21,77,
Dirk S. Paul⁷⁰,
Margit Pelkonen⁷⁸,
Rion K. Pendergrass⁴⁵,
Slavé Petrovski⁷⁰,
Anne Pitkäranta⁷⁹,
Adam Platt⁸⁰,
David Pulford⁸¹,
Eero Punkka¹⁰,
Pirkko Pussinen⁷⁶,
Neha Raghavan⁴²,
Fedik Rahimov⁴⁶,
Deepak Rajpal⁵⁸,
Nicole A. Renaud⁷³,
Bridget Riley-Gillis⁴⁶,
Rodosthenis Rodosthenous¹,
Elmo Saarentaus¹,
Aino Salminen⁷⁶,
Eveliina Salminen^67,82,
Veikko Salomaa⁶,
Johanna Schleutker³²,
Raisa Serpi⁵⁰,
Huei-yi Shen¹,
Richard Siegel⁸³,
Kaisa Silander⁶,
Sanna Siltanen⁸⁴,
Sirpa Soini⁶,
Hilkka Soininen⁸⁵,
Jae Hoon Sul⁴²,
Ioanna Tachmazidou⁷⁰,
Kaisa Tasanen^43,44,
Pentti Tienari^86,87,
Sanna Toppila-Salmi⁸⁸,
Taru Tukiainen¹,
Tiinamaija Tuomi^1,89,90,91,
Joni A. Turunen^47,92,
Jacob C. Ulirsch^2,3,
Felix Vaura^6,93,
Petri Virolainen³²,
Jeffrey Waring⁴⁶,
Dawn Waterworth⁹⁴,
Robert Yang⁹⁵,
Mari Nelis⁹⁶,
Anu Reigo⁵,
Andres Metspalu⁵,
Lili Milani⁵,
Tõnu Esko⁵,
Caroline Fox⁴²,
Aki S. Havulinna^1,6,
Markus Perola⁶,
Samuli Ripatti¹,
Anu Jalanko¹,
Tarja Laitinen⁸⁴,
Tomi P. Mäkelä⁹⁷,
Robert Plenge²³,
Mark McCarthy⁴⁵,
Heiko Runz²⁵,
Mark J. Daly^1,2,3,4^na1 &
…
Aarno Palotie ORCID: orcid.org/0000-0002-2527-5874^1,2,3,4^na1

Nature volume 613, pages 508–518 (2023)Cite this article

57k Accesses
924 Citations
156 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 24 February 2023

This article has been updated

Abstract

Population isolates such as those in Finland benefit genetic research because deleterious alleles are often concentrated on a small number of low-frequency variants (0.1% ≤ minor allele frequency < 5%). These variants survived the founding bottleneck rather than being distributed over a large number of ultrarare variants. Although this effect is well established in Mendelian genetics, its value in common disease genetics is less explored^1,2. FinnGen aims to study the genome and national health register data of 500,000 Finnish individuals. Given the relatively high median age of participants (63 years) and the substantial fraction of hospital-based recruitment, FinnGen is enriched for disease end points. Here we analyse data from 224,737 participants from FinnGen and study 15 diseases that have previously been investigated in large genome-wide association studies (GWASs). We also include meta-analyses of biobank data from Estonia and the United Kingdom. We identified 30 new associations, primarily low-frequency variants, enriched in the Finnish population. A GWAS of 1,932 diseases also identified 2,733 genome-wide significant associations (893 phenome-wide significant (PWS), P < 2.6 × 10^–11) at 2,496 (771 PWS) independent loci with 807 (247 PWS) end points. Among these, fine-mapping implicated 148 (73 PWS) coding variants associated with 83 (42 PWS) end points. Moreover, 91 (47 PWS) had an allele frequency of <5% in non-Finnish European individuals, of which 62 (32 PWS) were enriched by more than twofold in Finland. These findings demonstrate the power of bottlenecked populations to find entry points into the biology of common diseases through low-frequency, high impact variants.

South Asian medical cohorts reveal strong founder effects and high rates of homozygosity

Article Open access 08 June 2023

Rare coding variant analysis for human diseases across biobanks and ancestries

Article 29 August 2024

Complex trait susceptibilities and population diversity in a sample of 4,145 Russians

Article Open access 23 July 2024

Main

Large biobank studies have become an important source of genetic discoveries. The FinnGen study aims to construct a resource that combines the power of nationwide biobanks, structured national healthcare data and a unique, isolated population. Owing to increased genetic drift, isolated populations with recent bottlenecks can have deleterious, disease-predisposing alleles at considerably higher frequencies than permitted by selection in larger and older outbred populations. Counterbalancing this enrichment of specific low-frequency alleles, the other consequence of a recent bottleneck is that isolated populations have considerably fewer rare variants overall^1,3. As a result, isolated populations provide an opportunity to identify high-impact disease variants that are rare in other populations^1,2. In Finland, a strong founding bottleneck occurred about 120 generations ago followed by rapid population expansion. This bottleneck effect has resulted in numerous strongly deleterious alleles that occur more frequently in Finland compared with other European populations. This is manifested in the Finnish Disease Heritage, a set of 36 mostly recessive diseases that are more prevalent in Finland than elsewhere in the world⁴. This population history (which facilitates the identification of low-frequency deleterious alleles) combined with longitudinal information from registers that record hospital in-patient and outpatient diagnoses, purchases of prescription medications and many other national health registries centrally collected for decades provides valuable opportunities for understanding the genetic basis of health and disease.

FinnGen is a public–private partnership research project that combines imputed genotype data generated from newly collected and legacy samples from Finnish biobanks and digital health record data from Finnish health registries (https://www.finngen.fi/en) with the aim to provide new insights into disease genetics. FinnGen includes 9 Finnish biobanks, research institutes, universities and university hospitals, 13 international pharmaceutical industry partners and the Finnish Biobank Cooperative (FINBB) in a pre-competitive partnership. As of August 2020 (release 5 described in this article), samples from 412,000 individuals have been collected and have been 224,737 analysed with the aim to have a cohort of 500,000 participants ( Supplementary Methods, section 2). The project utilizes data from the nationwide longitudinal health register collected since 1969 from every resident in Finland.

Here we describe the FinnGen project and its current genotype and phenotype content and highlight a series of genetic discoveries from the first data collection phase. In other articles, we describe more detailed studies that showcase different aspects of the rich data available from population registries. Here we first show that FinnGen register-based phenotypes are comparable to those used in disease-specific GWASs in 15 previously well-studied common diseases. We demonstrate the power of the combination of data from an isolated population and other registers to discover new low-frequency variant associations, even in previously well-studied diseases in which FinnGen has a much smaller number of cases than in published disease-specific GWASs. Finally, through a GWAS of 1,932 end points followed by statistical fine-mapping, we demonstrate the ability to identify probable causal coding variants even with low allele frequencies (AFs).

Phenotyping and genotyping

In Finland, similar to the other Nordic countries, there are nationwide electronic health registers that were originally established primarily for administrative purposes to monitor the usage of health care nationwide and over the lifespan of each Finnish resident. These registers have almost complete coverage of major health-related events such as hospitalizations, prescription drug purchases (not including hospital-administered medications), medical procedures or deaths, with a history of data collection spanning more than 50 years. Phenotypes based on health registers (end points) were created by combining data (mainly using classification codes from the International Classification of Diseases (ICD) and the Anatomical Chemical Therapeutic (ACT)) from one or more nationwide health registers (Extended Data Fig. 1, Supplementary Table 1 and Supplementary Figs. 1–4). For the phenome-wide GWAS, we initially constructed more than 2,800 end points by combining data from different health registers, including hospital discharge registers, prescription medication purchase registers and cancer registers (Fig. 1 and Supplementary Methods, section 1; see also https://r5.risteys.finngen.fi/).

**Fig. 1: FinnGen sample collection and phenotyping.**

FinnGen release 5 presented here contains genotype data for 224,737 individuals after quality control (QC). A total of 154,714 individuals were genotyped with a custom Axiom FinnGen1 array. Data on 70,023 additional individuals were derived from legacy collections (Supplementary Table 2) genotyped with non-custom genotyping arrays (QC details provided in Supplementary Methods, section 3). We developed and utilized a population-specific imputation reference panel of 3,775 high-coverage (25–30 times) whole-genome sequence data for Finnish individuals, containing 16,962,023 single nucleotide polymorphisms, and insertions and deletions (minor allele count of ≥3) ( Supplementary Methods, section 3). The majority (16,387,711) of the variants were confidently imputed (information (INFO) score of >0.6; Supplementary Fig. 5).

Population structure and relatedness

To study the genetic ancestry data of 224,737 FinnGen participants that passed genotyping QC ( Supplementary Methods, section 3), we combined the FinnGen data with 2,504 phase 3 reference samples from the 1000 Genomes Project⁵ and used principal component analysis (PCA) to identify FinnGen participants who have non-Finnish genetic ancestry. Most participants have broadly Finnish ancestry; 3,676 out of 224,737 (1.63%) outliers were removed (Extended Data Fig. 2 and Supplementary Methods, section 4). We estimated that 165,448 (73.6%) of FinnGen participants have third-degree or closer relatives, which is higher than the estimated 30.3% in the UK Biobank (UKBB)⁶; this result is partially explained by the family-based legacy cohorts in FinnGen. We removed 5,780 duplicates and monozygotic twins (one from each pair removed randomly) and genetic population outliers (Supplementary Methods, section 4) and built a set of approximately unrelated individuals for which the relation between any pair is third degree or higher. In total, we obtained data for 156,977 independent individuals, which were used to compute the PCA, and data for the 61,980 related individuals were projected onto these principal components (PCs) (Supplementary Methods, section 4, and Supplementary Table 5). The first two PCs captured the well-known east–west and north–south genetic differences in Finland⁷ (Supplementary Fig. 9). Out of the total remaining 218,957 genotyped samples, we had phenotype data for 218,792 individuals (56.5% females (123,579)), which were then used in all analyses.

GWAS of nationwide health registries

To benchmark our register-based phenotyping and to explore the value of the isolated setting of Finland, we selected 15 diseases with more than 1,000 cases in FinnGen and for which well-powered GWAS data have been published. We evaluated the accuracy of our phenotyping by comparing the genetic correlations and effect sizes with the previous GWAS results (Supplementary Table 6). None of the genetic correlations were significantly lower than 1 (the lowest genetic correlation was 0.89 (standard error = 0.07) in age-related macular degeneration (AMD); Supplementary Table 6). For diseases with a large number of cases in FinnGen, the effect sizes of lead variants in known loci were largely consistent between FinnGen and previously published meta-analyses. This result demonstrates that our register-based phenotyping is comparable to existing disease-specific GWASs (Fig. 1e, Supplementary Information and Supplementary Table 6). The effect sizes varied more in some diseases that have a smaller number of cases in FinnGen (for example, ankylosing spondylitis, n = 1462, r² = 0.62).

GWAS of these 15 diseases identified 235 loci (that is, regions selected for fine-mapping; Methods) and 275 independent genome-wide significant associations (here onwards, ‘association’ means an independent signal) outside the human leukocyte antigen (HLA) region (GRCh38, chromosome 6: 25–34 Mb). A phenome-wide association study (PheWAS) of FinnGen imputed classical HLA gene alleles has been previously reported⁸. Overall, 44 of the non-HLA associations were driven by low-frequency lead variants (we define ‘low frequency’ as an AF of <5% in non-Finnish, Swedish or Estonian European (NFSEE) individuals in the Genome Aggregation Database (gnomAD; v.2.0.1)⁹) that were more than twice as frequent in Finnish individuals compared with NFSEE individuals. We use NFSEE as a general continental European reference point, excluding individuals from Finland, Sweden and Estonia. As there were large-scale migrations from Finland to Sweden in the twentieth century, many of the chromosomes from sequencing studies of Swedish individuals are of recent Finnish origin. Moreover, the geographically close and linguistically and genetically similar⁹ population of Estonia is likely to share elements of the same ancestral founder effect.

Replication of many such enriched variant associations in the Finnish population is hindered by low AFs or missingness in other European populations. People from Finland are genetically more similar to people from Estonia than other European countries⁹. Therefore we first conducted replication using data from 136,724 individuals from the Estonian Biobank (EstBB) and then extended the analysis to individuals from the UKBB (Methods and see Supplementary Table 7 for definitions of end points and case–control numbers). The effect sizes in genome-wide significant hits in FinnGen were mostly concordant with the EstBB (average inverse variance weighted slope of 1.5 (with FinnGen higher) and r² = 0.69) and the UKBB (slope = 1.1, r² = 0.84) (Extended Data Fig. 3). FinnGen had a higher case prevalence in the 15 disease diagnoses than in the UKBB, which is probably due to slightly different ascertainment schemes. By contrast, the EstBB had the highest case prevalence in ophthalmic diseases (AMD and glaucoma) and inflammatory skin conditions (atopic dermatitis and psoriasis) (Fig. 2a).

**Fig. 2: Comparison of previously unknown and known lead variants in loci identified in the 15 studied diseases.**

After a meta-analysis of the EstBB and UKBB data, 241 of the 275 associations remained genome-wide significant (Supplementary Table 8). We performed a further meta-analysis of 232 associations that did not meet the genome-wide significance threshold in FinnGen (5 × 10⁻⁸ < P < 1 × 10⁻⁶), and 57 of those were genome-wide significant after meta-analysis. This meta-analysis resulted in 298 genome-wide significant associations (see also Supplementary Table 8 for results after multiple testing correction for 15 end points).

To determine whether the observed associations have been previously reported, we queried the GWAS Catalog association database (and largest recent relevant GWAS) for genome-wide significant (P < 5 × 10⁻⁸) variants that are in linkage disequilibrium (LD) (r² > 0.1 in the FinnGen imputation panel) with observed lead variants in FinnGen. As the lowest AF of the new findings was low (0.15%), in addition to published GWASs, we checked whether credible set variants in these loci have also been previously reported in ClinVar. We observed six known pathogenic or likely pathogenic variants, such as a frameshift variant in PALB2 (p.Leu531fs; AF of 0.1%, not observed outside Finland in gnomAD; Supplementary Table 8) associated with breast cancer. Thirty out of the 298 associations have not been previously reported in the largest published meta-analysis so far (Supplementary Table 6), in a manual literature search, the GWAS Catalog or in ClinVar (Table 1). As expected, we observed that lead variants in novel loci were mostly of low frequency and enriched in Finland compared with known loci from previous GWASs. Specifically, 27 lead variants had minor allele frequency (MAF) values of <5% in gnomAD NFSEE individuals, and 88% of novel and 11% of known loci (after LD pruning, see below) had gnomAD NFSEE MAF values of <5% (Fisher’s exact test, P = 4.29 × 10⁻¹⁷). In most cases, the AFs of lower frequency variants (MAF < 5% in gnomAD NFSEE population) were the highest in FinnGen followed by the EstBB and lowest in NFSEE individuals in gnomAD (Fig. 2d).

Table 1 A total of 30 previously unreported associations identified in a GWAS of 15 selected, previously extensively studied phenotypes

Full size table

Next we performed statistical fine-mapping (Methods) on all 298 genome-wide significant associations (each association is independent; that is, 298 credible sets). Coding variants (missense, frameshift, canonical splice site, stop gained, stop lost or inframe deletion) with posterior inclusion probability (PIP) values of ≥0.05 were observed in 44 (18.7%) out of the 95% credible sets (17 coding variants had PIP > 0.5). Here onwards, we report coding variants with PIP > 0.05 as putatively causal. We recognize that there may be occasions in which assignment of the causal variant to a coding variant is incorrect (see our accompanying paper¹⁰ for discussions on fine-mapping calibration and replicability). In addition to identifying putative causal coding variants, we sought to identify potential gene expression regulatory mechanisms by colocalizing credible sets with fine-mapped expression quantitative trait locus (eQTL) datasets from the eQTL Catalogue (Methods).

We then wanted to describe the AF spectrum and putative mechanisms of action of risk variants. To do so, we LD pruned the 298 genome-wide significant associations and prioritized the most significant phenotype among the same hits to represent a single putative causal variant (LD r² value between lead variants of <0.2). This process resulted in 281 previously unknown associations (27 new).

Most of the 281 previously unknown associations were common variant associations. However, 53 of these had a lead variant frequency of less than 5% in NFSEE individuals, and 38 of them were enriched by more than two times in the Finnish population compared with the NFSEE population. We observed a coding variant more often in the credible sets of associations that were enriched by more than twofold (19 out of 38; 50%) than in non-enriched associations (6 out of 15; 40%) at lower frequencies (MAF < 5%).

Following the discovery of 27 new associations, we sought to determine potential mechanisms of action through the identification of coding variants in their credible sets and potential regulatory effects by colocalization with eQTL associations from the eQTL Catalogue. We identified putative causal coding variants in 9 out of 27 loci and eQTL colocalization in 4 out of 27 loci. In two out of the four eQTL loci, we observed a coding variant in credible sets (IL4R and MYH14; the eQTLs point to different genes than the coding variants). The two remaining eQTL colocalizations were breast cancer loci colocalizing with H2BP2 eQTL in lung tissue and type 2 diabetes colocalizing with PRRG4 in lipopolysaccharide-stimulated monocytes. The disease relevance of these eQTLs is currently not evident.

No credible coding variants or eQTLs were identified in 16 out of 27 loci (Supplementary Table 8). The fraction of associations in which we observed eQTLs was small (14.8%). Most of the new associations were driven by variants with low AFs in NFSEE populations (Table 1 and Fig. 2b,d). The low fraction of observed eQTL colocalizations is probably explained by the low AF of 25 out of the 27 of the variants in available eQTL studies (such as GTEx), for which the majority of the samples do not have Finnish or Estonian ancestry.

We next aimed to explore the benefits of the FinnGen dataset in GWAS discovery. We extrapolated observed meta-analysis results in FinnGen, the UKBB and the EstBB to match the sample size of the UKBB in 14 demonstration diseases (excluding Alzheimer’s disease; Supplementary Methods). The distribution of extrapolated P values was shifted towards greater significance in FinnGen compared with those of the UKBB and the EstBB in a matched total sample size scenario for the 14 demonstration diseases ( Supplementary Methods and Supplementary Fig. 11). Moreover, frequency enrichment was a major driver in the gain of power in low-frequency variants (Supplementary Fig. 12). In individual end points with similar sample prevalence in FinnGen and the UKBB, similar for inflammatory bowel disease (IBD), the greatest gain in power was in variants in which the AFs are <0.5% in the UKBB (see Supplementary Fig. 13 for a comparison for each end point and biobank).

The identification of a new signal for IBD mapping to a single variant in an intron of TNRC18 highlights the value of FinnGen for discovery, even when the case sample size is below that of existing meta-analyses. This variant has a strong risk-increasing effect (AF = 3.6%, odds ratio (OR) = 3.2, P = 2.4 × 10⁻⁶¹), which eclipses the significance of signals at IL23R, NOD2 and the major histocompatibility complex. The variant is enriched by 114-fold in the Finnish population compared with the NFSEE population, in whom the AF is too low (0.04%) to have been identified in previous GWASs (this FinnGen association was also reported in ref. ¹¹). We were, however, able to replicate this association in the EstBB (AF = 1.3%, OR = 3.9, P = 2.8 × 10⁻⁶) owing to the relatively higher frequency in the genetically related Estonian population. This variant was also associated with risk for multiple other inflammatory conditions evaluated in FinnGen, including interstitial lung disease (OR = 1.43, P = 6.3 × 10⁻²⁶), ankylosing spondylitis (OR = 4.2, P = 1.8 × 10⁻³⁴), iridocyclitis (OR = 2.3, P = 1.2 × 10⁻²⁷) and psoriasis (OR = 1.6, P = 1.1 × 10⁻¹³). However, the same allele appears to be protective for an end point that combines multiple autoimmune diseases (https://r5.risteys.finngen.fi/phenocode/AUTOIMMUNE) (OR = 0.84, P = 6.2 × 10⁻¹²; for example, type 1 diabetes (OR = 0.64, P = 2.7 × 10⁻⁷) and hypothyroidism (OR = 0.85, P = 7.8 × 10⁻⁷).

The highest number (eight loci) of new and enriched low-frequency associations were identified in type 2 diabetes, which is probably due to the large number of patients with type 2 diabetes in FinnGen release 5 (29,193). Other noteworthy observations from this set of 30 findings for 15 well-studied diseases are described in Supplementary Note 1.

Coding variant associations

Motivated by the identification of high-effect coding variant associations within the selected 15 diseases, we performed a PheWAS followed by fine-mapping to identify putative causal coding variants enriched in the Finnish population.

In a GWAS of 1,932 distinct end points and 16,387,711 variants (Supplementary Table 4; case overlap < 50% and n cases > 80), we identified 2,733 independent associations in 2,496 loci across 807 end points (Supplementary Table 9) at a genome-wide significance threshold (P < 5 × 10⁻⁸). Moreover, 893 signals in 771 loci across 247 end points at PWS thresholds (P < 2.6 × 10⁻¹¹) were identified. The HLA region was excluded here, and a PheWAS of imputed classical HLA gene alleles in FinnGen is reported in ref. ⁸.

Using statistical fine-mapping, we observed a coding variant (missense, frameshift, canonical splice site, stop gained, stop lost or inframe deletion; PIP > 0.05) in 369 associations (13.5% of all associations) spanning 202 end points. Full results with all 2,803 end points (including end points with a case overlap of >50% that are excluded here) are publicly available from a customized browser based on the PheWeb code base (https://r5.finngen.fi) and as summary statistic files (https://www.finngen.fi/en/access_results).

To put the frequency spectrum and putative mechanisms of action in an interpretable context, we chose a single most-significant association per signal by LD-based merging (r²> 0.3 lead variants merged), which resulted in 1,838 unique associations in 681 end points (Supplementary Table 10). Overall, 493 of the associations in 112 end points were PWS (P < 2.6 × 10⁻¹¹). Although most of the 493 PWS unique associations were driven by common variants, 143 and 97 had a lead variant frequency of <5% and <1%, respectively, in gnomAD NFSEE populations. We observed that 82 (57.3%) of the 143 low-frequency (MAF < 5%) lead variants were enriched by more than twofold in Finland compared with NFSEE populations. To estimate the number of putative new associations, we searched for known significant associations using the Open Targets API platform (GWAS Catalogue and the UKBB) and ClinVar for each of the 1,838 associations. Among these, 864 (47%) were not associated with any phenotype in those databases (75 out of 493 (15%) of the stringent P < 2.6 × 10⁻¹¹ associations). The fraction of previously unreported associations among genome-wide significant (702 out of 841 (84%)) and stringent (69 out of 143 (48%)) associations were notably higher among low-frequency variants (MAF < 5% in NFSEE individuals).

After statistical fine-mapping of the 493 unique PWS associations, we identified a coding variant (PIP > 0.05) in 73 (14.8%) of the credible sets associated with 42 end points (Supplementary Table 10). Most (43) of the fine-mapped coding variants had PIP values of >0.5 and 28 had PIP values of >0.9 (Fig. 3a). The highest proportion and the majority (54 out of 73) of associated coding variants had NFSEE MAF < 10% (Fig. 3b,c). The coding variant associations were more enriched in Finland than noncoding associations in associations driven by variants with AFs of <5% in NFSEE people (Fig. 3d; Wilcoxon rank sum test P = 3.6 × 10⁻³). For example, we observed a coding variant in 42% (34 out of 89) of the associations with a lead variant that was enriched by more than two times in Finland compared with NFSEE people among low-frequency associations (NFSEE MAF < 5%). By contrast, the proportion of coding variants was lower at 21.7% (13 out of 60) in non-enriched associations (see Extended Data Fig. 4 for enrichment in various NFSEE MAF bins). The higher proportion of coding variants in those that were enriched by more than two times persisted when the PIP threshold was increased to 0.2 (enriched, 30 out of 77 (35.8%); non-enriched, 11 out of 58 (18.9%)).

**Fig. 3: Characteristics of unique associations in end points identified in FinnGen.**

The fine-mapping properties and replicability of 67 FinnGen traits across diverse biobanks (FinnGen, Biobank Japan and the UKBB) are explored in detail in another manuscript¹⁰, and functional variant associations in the UKBB and FinnGen are described in ref. ¹².

We next wanted to quantify the benefits of population isolates such as Finland in GWAS discovery. To this end, we assessed whether lower frequency (MAF < 5% in NFSEE people) variants enriched in the Finnish population were more likely to be associated with a phenotype than would be expected by chance. We randomly sampled 1,000,000 times the number of genome-wide significant variants observed (143) from a set of frequency-matched variants (MAF NFSEE < 5%) that were not associated with any end point (P > 0.001). None of the 1 million random draws had a higher proportion of variants enriched by more than twofold in the Finnish population than was observed in the significant associations (57.3% observed versus 33% expected; P = 1.0 × 10⁻¹⁶).

Known pathogenic variant associations

Among the genome-wide significant coding variant associations, we identified 13 variant associations (AF range of 0.04–2%) classified as pathogenic or likely pathogenic in ClinVar (Supplementary Table 10). Nine out of the 13 variants were enriched by more than 20-fold in Finland compared with NFSEE populations. Some of these variants have previously been primarily considered recessive. Here, however, we observed that some were a risk variant in the heterozygous state. An example is a rare frameshift variant at NPHS1 associated with nephrotic syndrome, including the congenital form (ICD-10: N04,p.Leu41fs; AF FinnGen = 0.9%; gnomAD NFSEE = 0.009%; OR = 185, P = 4.3 × 10⁻²⁷). Congenital nephrotic syndrome in Finnish individuals is a recessively inherited rare disease, and is in the Finnish Disease Heritage database⁴. The pathogenic variant associations listed in ClinVar include a missense variant in XPA (xeroderma pigmentosum) associated with non-melanoma neoplasm of skin (‘other malignant neoplasm of skin’) (p.Arg228Ter; AF FinnGen = 0.02%, gnomAD NFSEE = 0%; OR = 4.4, P = 8.3 × 10⁻¹⁸), and the abovementioned frameshift variant in PALB2 associated with breast cancer (p.Leu531fs, ‘malignant neoplasm of breast’; p.Ala82Pro; AF FinnGen = 0.2%, gnomAD NFSEE = 0%; OR = 28.8, P = 3.7 × 10⁻³³). Furthermore, a known pathogenic recessively acting missense variant in CERKL was associated with hereditary retinal dystrophy (p.Cys125Trp; AF FinnGen = 0.6%, gnomAD NFSEE = 0%; OR = 98,716, P = 5.15 × 10⁻²⁵). This association is, however, driven by compound heterozygotes, as previously detailed¹³. These associations demonstrate that imputation using a population-specific genotyping array and an imputation panel combined with national-registry-based phenotyping in the isolated Finnish population can successfully identify associations and fine-map causal variants even in rare variants and phenotypes. An extended study of ClinVar variants and variants with specific biallelic Mendelian effects in FinnGen is provided in a companion paper¹³.

Associations in known disease genes

In the remaining 135 genome-wide significant coding variant associations not reported as pathogenic in ClinVar, 77 had NFSEE MAF values of <5%. Of the 77 variants, 54 were more than 5 times more common in Finland than in NFSEE populations, and 19 had not been previously observed in NFSEE people (Supplementary Table 2). Nine out of the 19 variants are in a gene in which other variants are pathogenic for various traits, 3 of which are for the same or related traits. These FinnGen associations include the following variants: a RFX6 frameshift variant associated with type 2 diabetes (p.His293LeufsTer7; AF = 0.15%, OR = 3.7, P = 1.2 × 10⁻¹⁰; ClinVar, ‘monogenic diabetes and others’); a TERT missense variant (AF = 0.15%, OR = 1,032, P = 6.5 × 10⁻²¹) associated with idiopathic pulmonary fibrosis (ClinVar, ‘idiopathic pulmonary fibrosis’); a missense in MYH14 associated with sensorineural hearing loss (p.Ala1156Ser; AF = 0.04%, OR = 19.9, P = 1 × 10⁻¹⁵; ClinVar, ‘non-syndromic hearing loss’ and others); and a stop gained variant in TG associated with autoimmune hypothyroidism (p.Gln655Ter; AF = 0.1%, OR = 3.2, P = 3.9 × 10⁻¹¹). These variants in RFX6, TERT and TG have been previously observed in Finnish and Nordic cohorts^14,15,16, but had uncertain significance (single carrier in TG) or conflicting interpretation (TERT) in ClinVar. Pathogenic variants in RFX6 cause Mitchell–Riley syndrome with recessive inheritance (characterized by neonatal diabetes). However, heterozygote enrichment of RFX6-truncating variants have been observed in maturity-onset diabetes of the young¹⁴, for which the same variant observed here was identified in a replication in Finnish data. RFX6 is a regulator of transcription factors involved in beta-cell maturation and has a specific role in releasing gastric inhibitory peptide (GIP) and GLP1 in response to meals. Our results propose that around 1:700 individuals in Finland carry a frameshift variant that has been previously shown to reduce incretin levels and to lead to isolated diabetes¹⁴. It is tempting to speculate that early administration of GLP1 analogues would benefit carriers of this diabetes-associated variant.

New disease associations

Among the previously undescribed genome-wide significant coding variant associations without previous associations in Open Targets (GWAS Catalog and the UKBB) or ClinVar, we observed 29 that had NFSEE MAF values of <5% and were 2 times more frequent in Finland, 9 of which had no copies in NFSEE populations (Supplementary Table 11). We summarize selected new discoveries and biological knowledge gained in Supplementary Table 12. A missense variant not observed outside Finland (p.Val70Phe; AF = 0.2%, OR = 3.0, P = 2.1 × 10⁻⁹) in PLTP was associated with coronary revascularization (n = 12,271 coronary angioplasty or bypass grafting). PLTP is a lipid-transfer protein in human plasma that transfers phospholipids from triglyceride-rich lipoproteins to high-density lipoprotein, and its activity is associated with atherogenesis in humans and mice¹⁷. Noncoding variations near PLTP independent of p.Val70Phe are associated with lipid levels (high-density lipoprotein and triglycerides)¹⁸ and coronary artery disease¹⁹. The identification of a coding variant in this gene provides support for PLTP as the causal gene for symptomatic atherosclerosis in this locus. Other variants associated with coronary artery disease included a missense variant (p.Gly567Arg; AF = 0.9%, OR = 2.0, P = 5.2 × 10⁻¹²) in HHIPL1, which was associated with coronary revascularization (n = 12,271), and a splice acceptor variant (c.7325-2A>G; AF = 0.7%, OR = 2.5, P = 2.9 × 10⁻⁰⁸) in NBEAL1, which was associated with coronary artery bypass grafting (n = 5,779). Both genes are susceptibility loci for coronary artery disease¹⁹ and have been suggested as causal, although for NBEAL1 the evidence is inconsistent²⁰. HHIPL1 encodes a secreted sonic hedgehog regulator that modulates atherosclerosis-relevant smooth muscle cell phenotypes and promotes atherosclerosis in mice²¹. NBEAL1 regulates cholesterol metabolism by modulating low-density lipoprotein (LDL) receptor expression, and genetic variants in NBEAL1 are associated with decreased expression of NBEAL1 in arteries²². Our results strengthen the evidence that both these genes are causal in the loci.

A missense variant in LAG3 (p.Pro67Thr; AF = 0.08%, gnomAD NFSEE = 0%) was associated with autoimmune hypothyroidism (n = 22,997, OR = 3.2, P = 4.6 × 10^–8, lead variant P = 4.57 × 10^–8). LAG3 encodes an immune checkpoint protein that is involved in inhibitory signalling of immune response, especially in T cells²³. LAG3 has been a target of active immune checkpoint inhibitor cancer immunotherapy development. One such immunotherapy was recently approved by the US Food and Drug Administration as a combination treatment for unresectable or metastatic melanoma²⁴. Immune checkpoint inhibition therapies aim to enhance immune responses against tumour cells. Excessive immune responses, however, can exert deleterious effects on healthy tissue and lead to autoimmune disease. A common side effect of immune checkpoint inhibitors, including those that target LAG3, is hypothyroidism. The p.Pro67Thr variant could be acting as an inhibitor of LAG3 immunoregulatory activity, which in turn leads to susceptibility to hypothyroidism. In a PheWAS of p.Pro67Thr, we observed a nominally increased risk for other immune-related conditions (for example, psoriatic arthropathies (M13_PSORIARTH_ICD10) n = 1,455, OR = 7.8, P = 3.3 × 10⁻³; urticaria and erythema (L12_URTICARIAERYTHEMA), n = 6,328, OR = 3.7, P = 2.7 × 10⁻⁴; and streptococcal septicaemia (AB1_STREPTO_SEPSIS), n = 1,090, OR = 15, P = 2.2 × 10⁻³), but we did not observe protective effects with any cancers. It should be noted, however, that owing to the rarity of the variant, the data were not sufficiently powered to detect more subtle effects.

We found a missense variant (p.Tyr212Phe, rs35937944) in COLGALT2 that was enriched by >20-fold in the Finnish population. This variant was associated with a reduced risk for arthrosis (OR = 0.79, P = 2.57 × 10⁻¹⁰), coxarthrosis (OR = 0.68, P = 1.34 × 10⁻¹⁹) and gonarthrosis (OR = 0.80, P = 7.5 × 10⁻⁷). A noncoding variant near COLGALT2 has recently been described as a GWAS locus for osteoarthritis²⁵. COLGALT2 encodes the procollagen galactosyltransferase 2, which initiates post-translational modification of collagens by transferring β-galactose to hydroxylysine residues, an important step to ensure structure and function of bone and connective tissue. Modulating COLGALT2 enzymatic activity with drugs could be a potential strategy to reduce arthritis risk.

CD63 is a cell surface protein involved in basophil activation and mast cell degranulation. We identified a missense variant in CD63 (rs148781286) that was enriched by >42-fold in the Finnish population. This variant was associated with childhood asthma (OR = 3.5, P = 3.37 × 10^–9). In a combined analysis with data from the EstBB and the UKBB, this variant was also associated with atopic dermatitis²⁶. Mediators secreted by basophils and mast cells correlate with asthma severity in the clinic, and a CD63-based basophil activation test has been reported to predict asthma outcome in young children with wheezing episodes²⁷. The observation of a putative causal relationship between genetic variations in CD36, basophil activation and childhood asthma risk and severity may point to a new intervention point for targeted asthma therapies.

A missense variant in TUBA1C (p.Ala331Val; AF = 0.2%, OR = 35.2, P = 1.4 × 10⁻¹⁰) was associated with sudden idiopathic hearing loss (n = 1,491). No relevant phenotype has previously been reported for variants in TUBA1C. TUBA1C encodes an α-tubulin isotype. The precise roles of α-tubulin isotypes are unknown, but mutations in other tubulins can cause various neurodevelopmental disorders²⁸. The p.Ala331Val variant was also associated with vestibular neuritis (inflammation of the vestibular nerve; n = 1,224, OR = 40.9, P = 3.2 × 10⁻¹⁰). Pure vestibular neuritis presents acutely with vertigo but not hearing loss, and accurate diagnosis of vertigo in acute settings is challenging and misdiagnosis is possible.

A >30-fold-enriched missense variant, pThr155Met (rs145955907), in ZAP70 was associated with sarcoidosis (OR = 2.05, P = 1.03 × 10⁻⁸). Previously, homozygote or compound heterozygote mutations in ZAP70 have been described in cell-mediated combined immunodeficiency caused by abnormal T cell receptor signalling²⁹. Associations of heterozygote variants have not been associated with any disease so far. Given its crucial role in cell signalling, the ZAP70 association with sarcoidosis seems in line with its key role in immunity.

A 75-fold-enriched missense variant, p.Ala777Thr (rs199680517), in PPP1R26 was associated with endometriosis (OR = 1.97, P = 3.41 × 10⁻⁸). PPP1R26 (protein phosphatase 1 regulatory subunit 26) has been associated with tumour formation and has been observed to be upregulated in various malignancies. Cellular GWAS analyses have identified one variant to be associated with carboplatin-induced toxicity³⁰. In one study, a copy number variant has been associated with endometriosis, but how this gene contributes to endometriosis susceptibility remains speculative³¹.

We also report several of these coding associations in separate manuscripts. One such new observation is a missense variant (p.Arg20Gln; AF = 3%, gnomAD NFSEE = 0.7%) in SPDL1 with a pleiotropic association. It is associated with a strongly increased risk of idiopathic pulmonary fibrosis (OR = 3.1, P = 1.0 × 10⁻¹⁵) but protective with an end point that combines all cancers (OR = 0.82, P = 2.1 × 10⁻¹⁵)³². Other associations between variants and disease described in separate manuscripts include the following: an inframe deletion in MFGE8 and coronary atherosclerosis (p.Asn239dup; AF = 2.9%, gnomAD NFSEE = 0%, OR = 0.74, P = 5.4 × 10⁻¹⁵)³³; a frameshift variant in MEPE (p.Lys101IlefsTer26; AF = 0.3%, gnomAD NFSEE = 0.07%, OR = 18.9, P = 1.5 × 10⁻¹¹) and otosclerosis³⁴; and a missense variant in ANGPTL7 (p.Arg220Cys; AF = 4.2%, gnomAD NFSEE = 0.06%, OR = 0.7, P = 7.2 × 10⁻¹⁶) and glaucoma³⁵.

Coding variants associated with drug use

An notable registry available in FinnGen is a prescription medication purchase registry (KELA; Supplementary Table 1), which links all prescription medication purchases for all FinnGen participants since 1995. Using prescription records from this registry, we identified two enriched low-frequency coding variants that were associated with drug purchase of statin medications (three or more purchases per individual) (Supplementary Table 11). A missense variant in TM6SF2 (p.Leu156Pro, rs187429064) was associated with a decreased likelihood of being prescribed statins (AF = 5.2%, gnomAD NFSEE = 1.2%; OR = 0.86, P = 3.8 × 10⁻¹³) but with an increased likelihood for insulin medication for diabetes (OR = 1.17, P = 8.2 × 10⁻¹¹) and type 2 diabetes (OR = 1.15, P = 2.6 × 10⁻⁸). In addition, the same variant showed a strong association with a strongly increased risk of hepatocellular carcinoma (ICD-10 C22 ‘hepatic and bile duct cancer’; OR = 3.7, P = 5.9 × 10⁻¹⁰). The hepatic and bile duct cancer association did not change after conditioning on statin medication (OR = 3.7, P = 7.1 × 10⁻¹⁰). Consistent with a decrease in the likelihood of being prescribed statins, TM6SF2 p.Leu156Pro and another independent (r² = 0.003) missense variant (p.Gly167Lys, rs58542926) have previously been associated with decreased LDL and total cholesterol levels³⁶. In a mouse model, both p.Gly167Lys and Leu156Pro lead to increased protein turnover and reduced cellular TM6SF2 levels³⁷. TM6SF2 p.Gly167Lys leads to decreases in hepatic large, very LDL particle secretion and increases in intracellular lipid accumulation³⁸. These effects probably explain its associations with non-alcoholic fatty liver disease³⁹, alcohol-related cirrhosis⁴⁰, hepatocellular carcinoma⁴¹ and incident type 2 diabetes⁴². Our results provide, in a single PheWAS analysis, strong evidence of a previously unknown p.Leu156Pro variant that has similar consequences of decreasing circulating lipid levels and increasing the risk of diabetes, cirrhosis and liver cancer, as observed for p.Gly167Lys. Such pleiotropy of the variant can be explored in the custom PheWeb browser (http://r5.finngen.fi/variant/19-19269704-A-G).

Conclusions

In this paper and accompanying publications, we present FinnGen, one of the largest nationwide genetic studies with access to comprehensive electronic health register data of all participants. The final aim of the study is to collect data for 500,000 biobank participants by the end of 2023. The interim releases of FinnGen have already contributed to many new discoveries and insights into human genetic variation and how it affects disease and health^{35,43,44,45,46,47}, including contributions to the COVID-19 host genetics initiative⁴⁸ and the global biobank meta-analysis initiative⁴⁹. Summary statistics from each data release will be made publicly available after a 1-year embargo period, and all summary statistics described here are freely available at www.finngen.fi/en/access_results.

An important feature of FinnGen compared with other similar projects, such as the UKBB⁶, is the specific genetic makeup of the Finnish population. In the GWAS of selected, well-studied diseases, we were able to identify several new associations with a fraction of the cases compared with the largest published GWAS. These associations were largely observed with variants that were increased in frequency in the Finnish population bottleneck and would have required prohibitively large sample sizes in older, non-bottlenecked populations (Fig. 2d).

Moreover, in the GWAS of 1,932 end points, we observed that variants in the Finnish population that were enriched by more than twofold were 1.7-times more likely to be associated with a phenotype than would be expected by chance.

Furthermore, we observed that putative coding variant associations were not only of lower AF but also more often enriched in Finland than noncoding variant associations (Fig. 3). This observation is expected, as coding variant associations are more deleterious on average and selection drives the AFs down. However, some of these deleterious alleles survived the bottleneck and increased in frequency, which facilitated the identification of their associations with diseases.

Imputation with a population-specific imputation panel provides high imputation accuracy down to very low AFs (Supplementary Fig. 5), which enabled the identification of associations with low-frequency variants using a GWAS approach instead of direct sequencing. This high imputation accuracy combined with broad population registry-based phenotyping facilitates the identification of very low-frequency variants associated with rare phenotypes, which have largely been missed in the majority of GWASs published so far⁵⁰. We demonstrated this by identifying known ClinVar variant associations with diseases such as congenital nephrotic syndrome or polycystic liver disease, which are both registered in the Finnish Disease Heritage database. Furthermore, we uncovered new low-frequency variant associations with common and rare phenotypes, including clinically challenging but not well genetically studied sudden idiopathic hearing loss or carpal tunnel syndrome. The recently reported³⁵ Gln175His variant in ANGPT7, which is enriched in the Finnish population and is protective against glaucoma, is also an example of the benefit of the bottleneck effect in the discovery of disease-associated variants.

The university-hospital-based recruitment, together with legacy case cohorts of several diseases, is another feature of FinnGen. This strategy captures cases in many disease areas and distinguishes it from many working-age population cohorts. For example, in the UKBB, in which recruitment was based on postal invitation to individuals aged 40–69 years and living within 40 km (25 miles) of one of the assessment centres⁵¹, the participants are likely to be healthier than in hospital-based collections. The approach in FinnGen has advantages and disadvantages. For many disease-focused studies, it provides a higher number of cases and a relatively economical way of recruiting a large sample within a feasible time frame. For example, in the 15 common diseases studied in this paper, the sample prevalence in FinnGen was higher than in the UKBB. The difference was the most extreme for Alzheimer’s disease (2.7% in FinnGen compared with 0.2% in UKBB), a disease of old age, and the most similar in asthma (9.4% in FinnGen compared with. 7.4% in the UKBB) (Fig. 2a). FinnGen also has a relatively high sample prevalence of severe mental disorders such as schizophrenia (2.5%, n = 5,562) and bipolar disease (2.1%, n = 4,501), which are often underrepresented in biobank studies. A key aspect of the recruitment strategy for the Finnish biobank is that legislation enables participants to donate samples with broad consent to medical research in general. This makes recruitment cost-effective, as the same samples and data can be used, after appropriate application steps, for many medical research studies. However, owing to the recruitment strategy, FinnGen is not epidemiologically representative, and some disease prevalence estimates might be over or underrepresented in FinnGen compared with population values (for example, asthma is 10.4% in FinnGen, 7.7 in FinRegistry, and type 2 diabetes is 14.5% in FinnGen, 8.2% in FinRegistry (https://www.finregistry.fi/)). The recruitment strategies for FinnGen are not anticipated to cause significant biases to the GWAS results presented here, but would be an aspect to consider, for example, when studying disease progression or building predictive models. We further explored the benefit of the FinnGen approach and showed that data from FinnGen has greater discovery power than data from the UKBB in a matched sample size scenario for 14 common diseases (Supplementary Fig. 11).

In conclusion, FinnGen as a large-scale biobank resource with specific features of the Nordic healthcare system and population structure provides opportunities for a wide range of genetic discoveries. These include identification of disease-associated coding variants, identification of variant pleiotropy and longitudinal analyses of disease trajectories. Combining results with other large-scale biobank projects can further improve our understanding of the role of genetic variation in health and disease, especially in genetically understudied diseases.

Methods

Biobank samples

The FinnGen study (https://www.finngen.fi/en) is an ongoing research project that utilizes samples from a nationwide network of Finnish biobanks and digital healthcare data from national health registers. FinnGen aims to produce genomic data with linkage to health register data of 500,000 biobank participants. Samples in the FinnGen study include legacy samples (prospected number 200,000) from previous research cohorts (often disease-specific) that have been transferred to the Finnish biobanks, and prospective samples (prospected number 300,000) collected by biobanks across Finland. Prospective samples from six regional hospital biobanks represent a wide variety of patients enrolled in specialized health care, samples from a private healthcare biobank enable enrichment of the FinnGen cohort with patients underrepresented in specialized health care, whereas participants recruited through the Blood Service Biobank enrich the cohort with healthier individuals. Samples have not specifically been collected for FinnGen, but the study has incorporated all that have been available in the biobanks (see Supplementary Methods for details). In the current study, we included samples from 224,737 biobank participants.

Phenotyping

Registry data on all FinnGen participants were collected and processed from the following different national health registers: hospital and outpatient visits in HILMO, a care register for health care (in-patient and outpatient primary and secondary diagnoses: ICD-8, ICD-9 and ICD-10; operations: NOMESCO Classification of Surgical Procedures and Hospital League surgical procedure codes); AvoHILMO, a register of primary health care (main and secondary diagnosis using ICD-10 and ICPC2 codes, operations and procedures using NOMESCO and national SPAT codes); Cause of Death (immediate, underlying and contributing causes of death on the death certificate with ICD-8, ICD-9 and ICD-10 codes); reimbursed medication entitlements and prescribed medicine purchases (specific Social Insurance Institution of Finland reimbursement codes and ATC codes, respectively); and the Finnish Cancer Registry (using ICD-O-3 codes). Pseudonymized register data were combined with the minimum phenotype dataset from the Finnish biobanks (age, sex, year of sampling, height, weight and smoking status). Clinical end points were constructed from the register codes using the Finnish version of the International Classification of Diseases, 10th revision (ICD-10) diagnosis codes and harmonizing those with definitions from ICD-8 and ICD-9. The Finnish ICD version is mostly identical to the international ICD classification, but has minor modifications. For example, there are additions to certain disease classifications in the fourth and fifth character level to add specificity. When relevant, the information on reimbursed medication and/or prescription medicine purchases and operations augmented the end point data. Cancer end points were constructed on the basis of the Finnish Cancer Registry and Cause of Death data. The definitions of FinnGen disease end points and their respective controls for each release are available at https://www.finngen.fi/en/researchers/clinical-endpoints, and FinnGen end points can also be browsed at https://r5.risteys.finngen.fi/. See Supplementary Methods, section 1 for further details.

Some of the end points have a high number of overlapping cases. Therefore, to avoid reporting highly repetitive end points, we clustered all end points if there was an overlap of >50% of cases between them and chose the one with the most genome-wide significant hits. On a few occasions, a manual choice was made to select the most representative end point among the correlating end points. After clustering, we had 1,932 end points for the main GWAS analysis.

Genotyping and QC

Samples were genotyped with Illumina (Illumina) and Affymetrix arrays (Thermo Fisher Scientific). Genotype calls were made with GenCall and zCall algorithms for Illumina and the AxiomGT1 algorithm for Affymetrix data. Chip genotyping data produced with previous chip platforms and reference genome builds were lifted over to build v.38 (GRCh38/hg38) following a previously described protocol⁵². In sample-wise QC, individuals with genetically inferred sex not matching the reported sex in registries, high genotype missingness (>5%) and excess heterozygosity (±4 standard deviations) were removed. In variant-wise QC, variants with high missingness (>2%), low Hardy–Weinberg equilibrium (P <1 × 10^–6) and minor allele count < 3 were removed. Chip-genotyped samples were pre-phased with Eagle v.2.3.5 (https://data.broadinstitute.org/alkesgroup/Eagle/) using default parameters, except the number of conditioning haplotypes was set to 20,000.

Genotype imputation with a population-specific reference panel

The population-specific Sequencing Initiative Suomi (SISu) v.3 imputation reference panel was developed by using high-coverage (25–30 times) whole-genome sequencing data for 3,775 Finnish individuals. In brief, the variant call set was produced using the GATK HaplotypeCaller algorithm by following GATK best practices for variant calling. Genotype-wise, sample-wise and variant-wise QC was performed using the Hail framework (https://github.com/hail-is/hail) v.0.1, and the resulting high-quality whole-genome sequencing data were phased (Supplementary Methods). Genotype imputation was carried out using the SISu v.3 reference panel with Beagle 4.1 (v.08Jun17.d8b, https://faculty.washington.edu/browning/beagle/b4_1.html) as described in a previous protocol⁵³. Post-imputation QC involved non-reference concordance analyses, checking expected conformity of the imputation INFO values distribution, MAF differences between the target dataset and the imputation reference panel, and checking chromosomal continuity of the imputed genotype calls. After these steps, variants with imputation INFO scores of <0.6 or MAF values of <0.0001 were excluded.

Association analysis and fine-mapping

The mixed-model logistic regression method SAIGE (v.0.35.8.8)⁵⁴ was used for association analysis. We used sex, age, genotyping batch and ten PCs as covariates (see Supplementary Methods for details). We used SuSiE⁵⁵ for fine-mapping. We fine-mapped all regions with variants that had values of P < 1 × 10⁻⁶ and extended regions 1.5 Mb upstream and downstream from each lead variant. Finally, overlapping regions were merged and subjected to fine-mapping. The major histocompatibility complex region (chromosome 6: 25–36 Mb) was excluded owing to its complex LD structure. We allowed up to ten independent signals per region, and SuSiE reports a 95% credible set for each independent signal. As LD, we used in-sample dosages (that is, cases and controls used for each phenotype) computed with LDStore2. The FinnGen fine-mapping pipeline is available in GitHub (https://github.com/FINNGEN/finemapping-pipeline).

To define independent signals within a locus, we utilized fine-mapping results. For each locus, we report the credible set as an independent hit if it represents a primary strongest signal with lead P < 5 × 10⁻⁸. For secondary hits, we required genome-wide significance and log Bayes factor (BF) > 2. The BF filtering was necessary because SuSiE sometimes reports multiple credible sets for a single strong signal but this is indicated in SuSiE as a low BF (the model does not improve by adding another signal in the region that is not an independent signal).

Browser development

The https://r5.finngen.fi browser was developed based on the PheWeb⁵⁶ codebase.

Estimation of expected number of enriched variant associations

We aimed to estimate whether we observed variant associations that were enriched by more than twofold in the Finnish population in the lower frequency range (NFSEE MAF < 5%) than would be expected by chance. To this end, we sampled a subset of variants (NFSEE MAF < 5%) that were not associated with any end point in FinnGen (P > 0.001). We drew 1 million random samples of the number of independent hits (143) observed in a GWAS from the set of non-associated variants. To closely follow the observed frequency distribution, we further matched the random samples to contain the same number of variants in each frequency bin ((0,0.001], (0.001,0.005], (0.005,0.01] and then in 0.01 bins up to 0.05). We computed the mean and standard deviations of per cent twofold enriched variants from the random samples and calculated P values from the normal distribution using the randomized mean and standard deviation.

EstBB and UKBB replication

The EstBB is a population-based biobank at the Institute of Genomics, University of Tartu. The current cohort size is 200,000 individuals (aged ≥18 years), reflecting the age, sex and geographical distribution of the adult Estonian population. Overall, 83% of the samples are from Estonian individuals, 14% from Russian people and 3% from other ethnicities. All participants were recruited by general practitioners, physicians in hospitals and during promotional events. After recruitment, all participants completed a questionnaire about their health status, lifestyle and diet. Specifically, the questionnaire included personal data (place of birth, place(s) of living, nationality, among others), genealogical data (family history of medical conditions spanning four generations), educational and occupational history, and lifestyle data (physical activity, dietary habits (food frequency questionnaires), smoking status, alcohol consumption, women’s health and quality of life). The EstBB database is linked with national registries (such as the Cancer Registry and Causes of Death Registry), hospital databases and the database of the national health insurance fund, which holds treatment and procedure service bills. Diseases and health problems are recorded as ICD-10 codes and prescribed medicine according to the ATC classification. These health data are continuously updated through periodical linking to national electronic databases and registries. All participants were genotyped with genome-wide chip arrays and further imputed with a population-specific imputation panel consisting of 2,244 high-coverage (30 times) whole-genome sequence data from individuals and 16,271,975 high-quality variants⁵⁷. Researchers at the EstBB ran an association analysis of the 15 phenotypes (Supplementary Table 8) used in this study in 136,724 individuals. The association analysis was conducted with SAIGE52 mixed models with age, sex and ten PCs used as covariates.

We used the Pan UKBB (https://pan.ukbb.broadinstitute.org/) project European subset association analysis summary statistics in the UKBB replication⁵⁸ (Supplementary Table 7).

As both the EstBB and the UKBB are on human genome build 37, we lifted over the coordinates to build 38 to match FinnGen. Variants were then matched on the basis of chromosome, position, reference and alternative alleles.

Inverse variance weighted meta-analysis was used to perform a meta-analysis on the three cohorts (code available at https://github.com/FINNGEN/META_ANALYSIS).

Variant annotation

We utilized Variant Effect Predictor (https://www.ensembl.org/info/docs/tools/vep/index.html) for annotating imputation panel variants. For coding variants, we chose a single most-severe consequence and corresponding gene among canonical transcripts. We considered stop gained, frameshift variant, splice donor, splice acceptor, missense variant, start lost, stop lost, inframe insertion and inframe deletion as coding variants. We executed the variant annotation using Hail⁵⁹.

Colocalization

We applied colocalization to all fine-mapped regions. As a colocalization approach, we used the probabilistic model for integrating GWAS and eQTL data presented in eCAVIAR⁶⁰. Given the PIP values of each phenotype in a region of interest, we calculated the colocalization posterior probability (CLPP). In contrast to eCAVIAR, we used SuSiE⁵⁵ to estimate the posterior inclusion probabilities.

For a pair of phenotypes, we searched for an intersection of variants between their credible sets CS_k, k = 1…k, and computed the CLPP as follows:

$${{\rm{CLPP}}}_{k}=\sum _{{\rm{i}}}{\rm{in}}\,{{\rm{CS}}}_{k}\,{\rm{p}}{1}_{{\rm{i}}}\,\times \,{\rm{p}}{2}_{{\rm{i}}},$$

where p1 and p2 are the PIP values from phenotypes 1 and 2, respectively.

We performed colocalization between FinnGen end points, the eQTL Catalogue⁶¹ and selected 36 continuous end points and 57 biomarkers from the UKBB¹⁰. eQTL Catalogue and UKBB traits were processed with a functionally equivalent fine-mapping pipeline¹⁰ to FinnGen and ref. ⁶¹, and credible sets provided by those studies were used in colocalization.

Annotating putatively new associations

For each association lead variants, we used the Open Targets⁶² API platform (https://api.platform.opentargets.org/) to search whether any genome-wide significant hits (P < 5 × 10⁻⁸) have been reported for the variant (or tagging LD variants r² > 0.2) in the GWAS Catalog or the UKBB as harmonized by Open Targets (annotated 19 May 2022). We also searched whether the variant was reported as pathogenic or likely pathogenic in ClinVar⁶³ (ClinVar release date 7 May 2022).

Automatic annotation of known GWAS hits

To identify new hits from the GWAS results, we compared the fine-mapped results against genome-wide significant hits (P < 5 × 10⁻⁸) in the GWAS Catalog association database⁶⁴ and manually curated genome-wide significant hits from large GWASs (Table 1). We checked and reported separately matches in credible set variants and matches with any variants in LD with a lead variant (highest PIP) after fine-mapping. LD lookup variants were chosen using the following criteria: (1) they were less than 1,500 kb away from the lead variant; (2) they had a P < 0.01; (3) and their LD squared Pearson’s correlation with the lead variant was higher than a dynamic LD threshold based on the P value of the lead variant so that the expected P value of the linked variant would be nominally significant (r² = 5/inverse chi-squared survival function (P value)).

A variant was considered to be already associated if its chromosome and position were identical to the GWAS Catalog association and if its reference and alternative allele matched the strand-aligned and effect-aligned association alleles. Because the GWAS Catalog associations do not have complete allele information, the allele information for associations was retrieved from dbSNP data, human genome build 153, assembly 38. The GWAS Catalog version used was released on 21 April 2021.

Ethics statement

Participants in FinnGen provided informed consent for biobank research on basis of the Finnish Biobank Act. Alternatively, separate research cohorts, collected before the Finnish Biobank Act came into effect (in September 2013) and the start of FinnGen (August 2017) were collected on the basis of study-specific consent and later transferred to the Finnish biobanks after approval by Fimea, the National Supervisory Authority for Welfare and Health. Recruitment protocols followed the biobank protocols approved by Fimea. The Coordinating Ethics Committee of the Hospital District of Helsinki and Uusimaa (HUS) approved the FinnGen study protocol (number HUS/990/2017).

The FinnGen study is approved by the THL (approval number THL/2031/6.02.00/2017, amendments THL/1101/5.05.00/2017, THL/341/6.02.00/2018, THL/2222/6.02.00/2018, THL/283/6.02.00/2019 and THL/1721/5.05.00/2019), the Digital and Population Data Service Agency (VRK43431/2017-3, VRK/6909/2018-3 and VRK/4415/2019-3), the Social Insurance Institution (KELA) (KELA 58/522/2017, KELA 131/522/2018, KELA 70/522/2019 and KELA 98/522/2019) and Statistics Finland (TK-53-1041-17).

The Biobank Access Decisions for FinnGen samples and data utilized in FinnGen Data Freeze 5 include the following datasets: THL Biobank BB2017_55, BB2017_111, BB2018_19, BB_2018_34, BB_2018_67, BB2018_71, BB2019_7, BB2019_8 and BB2019_26; Finnish Red Cross Blood Service Biobank 7.12.2017; Helsinki Biobank HUS/359/2017; Auria Biobank AB17-5154; Biobank Borealis of Northern Finland_2017_1013; Biobank of Eastern Finland 1186/2018; Finnish Clinical Biobank Tampere MH0004; Central Finland Biobank 1-2017; and Terveystalo Biobank STB 2018001.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Based on National and European regulations (GDPR) access to individual-level sensitive health data must be approved by national authorities for specific research projects and for specifically listed and approved researchers. The health data described here was generated and provided by the National Health Register Authorities (Finnish Institute of Health and Welfare, Statistics Finland, KELA, Digital and Population Data Services Agency) and approved, either by the individual authorities or by the Finnish Data Authority, Findata, for use in the FinnGen project. Therefore, we, the authors of this paper, are not in a position to grant access to individual-level data to others. However, any researcher can apply for the health register data from the Finnish Data Authority Findata (https://findata.fi/en/permits/) and for individual-level genotype data from Finnish biobanks via the Fingenious portal (https://site.fingenious.fi/en/) hosted by the Finnish Biobank Cooperative FINBB (https://finbb.fi/en/). All Finnish biobanks can provide access for research projects within the scope regulated by the Finnish Biobank Act, which is research utilizing the biobank samples or data for the purposes of promoting health, understanding the mechanisms of disease or developing products and treatment practices used in health and medical care. The genotype data for the FinnGen release 5 used in this study was returned to the biobanks at the same time as the public release of the FinnGen release 5 summary results was done. All summary statistics described in this manuscript can be found in the Supplementary Information. All information regarding data download of summary statistics of additive GWAS of FinnGen release 5 can be found through the following link: https://finngen.gitbook.io/documentation/v/r5/data-download. You can learn more about accessing other FinnGen data here: https://www.finngen.fi/en/access_results. A full list of FinnGen end points for release 5 is available at: https://www.finngen.fi/en/researchers/clinical-endpoints. A full list of gene variants captured by the FinnGen specific Axiom array can be found at: https://www.finngen.fi/en/researchers/genotyping and https://www.dropbox.com/s/n8srnyy547resrq/finngen2_proposal_5_5_2019.tsv?dl=0.

Code availability

Central data analysis and processing pipelines used are freely available: fine-mapping pipeline (https://github.com/FINNGEN/finemapping-pipeline); meta-analysis (https://github.com/FINNGEN/META_ANALYSIS); genetic ancestry and PCA pipeline (https://github.com/FINNGEN/pca_kinship); and GWAS SAIGE pipeline (https://github.com/FINNGEN/saige-pipelines). Please see https://finngen.gitbook.io/documentation/ for a detailed description of data production and analysis including code used to run analyses. Please see https://github.com/FINNGEN/ for further code repositories used to run analyses in FinnGen.

R v4.0.3 (https://www.r-project.org/) was used to create plots and analyse data. R codes used to reproduce figures are available upon request.

Change history

24 February 2023
A Correction to this paper has been published: https://doi.org/10.1038/s41586-023-05837-8

References

Lim, E. T. et al. Distribution and medical impact of loss-of-function variants in the Finnish founder population. PLoS Genet. 10, e1004494 (2014).
Article PubMed PubMed Central Google Scholar
Xue, Y. et al. Enrichment of low-frequency functional variants revealed by whole-genome sequencing of multiple isolated European populations. Nat. Commun. 8, 15927 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Zuk, O. et al. Searching for missing heritability: designing rare variant association studies. Proc. Natl Acad. Sci. USA https://doi.org/10.1073/pnas.1322563111 (2014).
Norio, R. The Finnish Disease Heritage III: the individual diseases. Hum. Genet. 112, 470–526 (2003).
Article PubMed Google Scholar
Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article ADS PubMed Google Scholar
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Kerminen, S. et al. Fine-scale genetic structure in Finland. G3 7, 3459–3468 (2017).
Article PubMed PubMed Central Google Scholar
Ritari, J., Koskela, S., Hyvärinen, K., FinnGen & Partanen, J. HLA-disease association and pleiotropy landscape in over 235,000 Finns. Hum. Immunol. 83, 391–398 (2022).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Kanai, M. et al. Insights from complex trait fine-mapping across diverse populations. Preprint at medRxiv https://doi.org/10.1101/2021.09.03.21262975 (2021).
Sakaue, S. et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat. Genet. 53, 1415–1424 (2021).
Article CAS PubMed Google Scholar
Sun, B. B. et al. Genetic associations of protein-coding variants in human disease. Nature 603, 95–102 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Heyne, H. O. et al. Mono- and biallelic effects of on disease at biobank scale. Nature https://doi.org/10.1038/s41586-022-05420-7 (2022).
Patel, K. A. et al. Heterozygous RFX6 protein truncating variants are associated with MODY with reduced penetrance. Nat. Commun. 8, 888 (2017).
Article ADS PubMed PubMed Central Google Scholar
Norberg, A. et al. Novel variants in Nordic patients referred for genetic testing of telomere-related disorders. Eur. J. Hum. Genet. 26, 858–867 (2018).
Article CAS PubMed PubMed Central Google Scholar
Löf, C. et al. Detection of novel gene variants associated with congenital hypothyroidism in a Finnish patient cohort. Thyroid 26, 1215–1224 (2016).
Article PubMed PubMed Central Google Scholar
Jiang, X.-C. & Yu, Y. The role of phospholipid transfer protein in the development of atherosclerosis. Curr. Atheroscler. Rep. 23, 9 (2021).
Article PubMed PubMed Central Google Scholar
Teslovich, T. M. et al. Biological, clinical, and population relevance of 95 loci for blood lipids. Nature 466, 707–713 (2010).
Article ADS CAS PubMed PubMed Central Google Scholar
van der Harst, P. & Verweij, N. Identification of 64 novel genetic loci provides an expanded view on the genetic architecture of coronary artery disease. Circ. Res. 122, 433–443 (2018).
Article PubMed PubMed Central Google Scholar
Shadrina, A. S. et al. Prioritization of causal genes for coronary artery disease based on cumulative evidence from experimental and in silico studies. Sci. Rep. 10, 10486 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Dimitra, A. et al. HHIPL1, a gene at the 14q32 coronary artery disease locus, positively regulates hedgehog signaling and promotes atherosclerosis. Circulation 140, 500–513 (2019).
Article Google Scholar
Bindesbøll, C. et al. NBEAL1 controls SREBP2 processing and cholesterol metabolism and is a susceptibility locus for coronary artery disease. Sci. Rep. 10, 4528 (2020).
Article ADS PubMed PubMed Central Google Scholar
Graydon, C. G., Mohideen, S. & Fowke, K. R. LAG3’s enigmatic mechanism of action. Front. Immunol. https://doi.org/10.3389/fimmu.2020.615317 (2021).
FDA approves anti-LAG3 checkpoint. Nat. Biotechnol. 40, 625 (2022).
Article Google Scholar
Boer, C. G. et al. Deciphering osteoarthritis genetics across 826,690 individuals from 9 populations. Cell 184, 4784–4818.e17 (2021).
Article CAS PubMed PubMed Central Google Scholar
Sliz, E. et al. Uniting biobank resources reveals novel genetic pathways modulating susceptibility for atopic dermatitis. J. Allergy Clin. Immunol. 149, 1105–1112.e9 (2022).
Article CAS PubMed Google Scholar
Li, J. et al. Utility of basophil activation test for predicting the outcome of wheezing in children: a pilot study. BMC Immunol. 22, 4 (2021).
Article CAS PubMed PubMed Central Google Scholar
Chakraborti, S., Natarajan, K., Curiel, J., Janke, C. & Liu, J. The emerging role of the tubulin code: from the tubulin molecule to neuronal function and disease. Cytoskeleton 73, 521–550 (2016).
Article CAS PubMed Google Scholar
Sharifinejad, N. et al. Clinical, immunological, and genetic features in 49 patients with ZAP-70 deficiency: a systematic review. Front. Immunol. 11, 831 (2020).
Article CAS PubMed PubMed Central Google Scholar
Mulford, A. J., Wing, C., Dolan, M. E. & Wheeler, H. E. Genetically regulated expression underlies cellular sensitivity to chemotherapy in diverse populations. Hum. Mol. Genet. 30, 305–317 (2021).
Article CAS PubMed PubMed Central Google Scholar
Mafra, F. et al. Copy number variation analysis reveals additional variants contributing to endometriosis development. J. Assist. Reprod. Genet. 34, 117–124 (2017).
Article PubMed Google Scholar
Koskela, J. T. et al. Genetic variant in SPDL1 reveals novel mechanism linking pulmonary fibrosis risk and cancer protection. Preprint at medRxiv https://doi.org/10.1101/2021.05.07.21255988 (2021).
Ruotsalainen, S. E. et al. Inframe insertion and splice site variants in MFGE8 associate with protection against coronary atherosclerosis. Commun. Biol. 5, 802 (2022).
Article CAS PubMed PubMed Central Google Scholar
Rämö, J. T. et al. Genome-wide screen of otosclerosis in population biobanks: 27 loci and shared associations with skeletal structure. Nat. Commun. https://doi.org/10.1038/s41467-022-32936-3 (2023).
Tanigawa, Y. et al. Rare protein-altering variants in ANGPTL7 lower intraocular pressure and protect against glaucoma. PLoS Genet. 16, e1008682 (2020).
Article CAS PubMed PubMed Central Google Scholar
Surakka, I. et al. The impact of low-frequency and rare variants on lipid levels. Nat. Genet. 47, 589–597 (2015).
Article CAS PubMed PubMed Central Google Scholar
Ehrhardt, N. et al. Hepatic Tm6sf2 overexpression affects cellular ApoB-trafficking, plasma lipid levels, hepatic steatosis and atherosclerosis. Hum. Mol. Genet. 26, 2719–2731 (2017).
Article CAS PubMed PubMed Central Google Scholar
Prill, S. et al. The TM6SF2 E167K genetic variant induces lipid biosynthesis and reduces apolipoprotein B secretion in human hepatic 3D spheroids. Sci. Rep. 9, 11585 (2019).
Article ADS PubMed PubMed Central Google Scholar
Pirola, C. J. & Sookoian, S. The dual and opposite role of the TM6SF2-rs58542926 variant in protecting against cardiovascular disease and conferring risk for nonalcoholic fatty liver: a meta-analysis. Hepatology 62, 1742–1756 (2015).
Article CAS PubMed Google Scholar
Buch, S. et al. A genome-wide association study confirms PNPLA3 and identifies TM6SF2 and MBOAT7 as risk loci for alcohol-related cirrhosis. Nat. Genet. 47, 1443–1448 (2015).
Article CAS PubMed Google Scholar
Tang, S. et al. Association of TM6SF2 rs58542926 T/C gene polymorphism with hepatocellular carcinoma: a meta-analysis. BMC Cancer 19, 1128 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kim, D. S. et al. Novel association of TM6SF2 rs58542926 genotype with increased serum tyrosine levels and decreased apoB-100 particles in Finns. J. Lipid Res. 58, 1471–1481 (2017).
Article CAS PubMed PubMed Central Google Scholar
Mars, N. et al. Polygenic and clinical risk scores and their impact on age at onset and prediction of cardiometabolic diseases and common cancers. Nat. Med. 26, 549–557 (2020).
Article CAS PubMed Google Scholar
Kiiskinen, T. et al. Genomic prediction of alcohol-related morbidity and mortality. Transl Psychiatry 10, 23 (2020).
Article CAS PubMed PubMed Central Google Scholar
Strausz, S. et al. Genetic analysis of obstructive sleep apnoea discovers a strong association with cardiometabolic health. Eur. Respir. J. 57, 2003091 (2021).
Article CAS PubMed Google Scholar
Helkkula, P. et al. ANGPTL8 protein-truncating variant associated with lower serum triglycerides and risk of coronary disease. PLoS Genet. 17, e1009501 (2021).
Rahimov, F. et al. High incidence and regional distribution of cleft palate in Finns are associated with a functional variant in an IRF6 enhancer. Preprint at Research Square https://doi.org/10.21203/rs.3.rs-941741/v1 (2021).
Niemi, M. E. K. et al. Mapping the human genetic architecture of COVID-19. Nature https://doi.org/10.1038/s41586-021-03767-x (2021).
Zhou, W. et al. Global Biobank Meta-analysis Initiative: powering genetic discovery across human disease. Cell Genom. 2, 100192 (2022).
Broekema, R. V., Bakker, O. B. & Jonkers, I. H. A practical view of fine-mapping and gene prioritization in the post-genome-wide association era. Open Biol. 10, 190221 (2020).
Article CAS PubMed PubMed Central Google Scholar
Fry, A. et al. Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population. Am. J. Epidemiol. 186, 1026–1034 (2017).
Article PubMed PubMed Central Google Scholar
Pärn, K. et al. Genotyping chip data lift-over to reference genome build GRCh38/hg38 V.2. protocols.io https://doi.org/10.17504/protocols.io.nqtddwn (2019).
Palta, P. Genotype imputation workflow v3.0 V.1. protocols.io https://doi.org/10.17504/protocols.io.nmndc5e (2018).
Zhou, W. et al. Efficiently controlling for case–control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).
Article CAS PubMed PubMed Central Google Scholar
Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Ser. B Stat. Methodol. 82, 1273–1300 (2020).
Article MathSciNet MATH Google Scholar
Gagliano Taliun, S. A. Exploring and visualizing large-scale genetic associations by using PheWeb. Nat. Genet. 52, 550–552 (2020).
Article CAS PubMed PubMed Central Google Scholar
Mitt, M. et al. Improved imputation accuracy of rare and low-frequency variants using population-specific high-coverage WGS-based imputation reference panel. Eur. J. Hum. Genet. 25, 869–876 (2017).
Article PubMed PubMed Central Google Scholar
Pan-UK Biobank (Pan UK Biobank Team, 2020); https://pan.ukbb.broadinstitute.org.
Hail v.0.2 (Hail Team, 2019); https://github.com/hail-is/hail.
Hormozdiari, F. et al. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99, 1245–1260 (2016).
Article CAS PubMed PubMed Central Google Scholar
Kerimov, N. et al. A compendium of uniformly processed human gene expression and splicing quantitative trait loci. Nat. Genet. 53, 1290–1299 (2021).
Article CAS PubMed PubMed Central Google Scholar
Ochoa, D. et al. Open Targets Platform: supporting systematic drug–target identification and prioritisation. Nucleic Acids Res. 49, D1302–D1310 (2021).
Article CAS PubMed Google Scholar
Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018).
Article CAS PubMed Google Scholar
Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank all those who contributed samples and data for the FinnGen scientific project; and P. VandeHaar for technical consultation on PheWeb. The FinnGen project is funded by two grants from Business Finland (HUS 4685/31/2016 and UH 4386/31/2016) and the following industry partners: AbbVie, AstraZeneca UK, Biogen, Bristol Myers Squibb (and Celgene Corporation & Celgene International II), Genentech, Merck Sharp & Dohme LLC, a subsidiary of Merck & Co., Inc., Rahway, NJ, USA, Pfizer, GlaxoSmithKline Intellectual Property Development, Sanofi US Services, Maze Therapeutics, Janssen Biotech, Novartis, and Boehringer Ingelheim. The following biobanks are acknowledged for delivering samples to FinnGen: Auria Biobank (https://www.auria.fi/biopankki/), THL Biobank (https://www.thl.fi/biobank), Helsinki Biobank (https://www.helsinginbiopankki.fi), Biobank Borealis of Northern Finland (https://www.ppshp.fi/Tutkimus-ja-opetus/Biopankki/Pages/Biobank-Borealis-briefly-in-English.aspx), Finnish Clinical Biobank Tampere (https://www.tays.fi/en-US/Research_and_development/Finnish_Clinical_Biobank_Tampere), Biobank of Eastern Finland (https://www.ita-suomenbiopankki.fi/en), Central Finland Biobank (https://www.ksshp.fi/fi-FI/Potilaalle/Biopankki), Finnish Red Cross Blood Service Biobank (www.veripalvelu.fi/verenluovutus/biopankkitoiminta) and Terveystalo Biobank (https://www.terveystalo.com/fi/Yritystietoa/Terveystalo-Biopankki/Biopankki/). All Finnish biobanks are members of the BBMRI.fi infrastructure (https://www.bbmri.fi). The FINBB (https://finbb.fi/) is the coordinator of BBMRI-ERIC operations in Finland. The Finnish biobank data can be accessed through the Fingenious services (https://site.fingenious.fi/en/) managed by FINBB. We acknowledge the BBMRI.fi for creating the original biobank map used in Fig. 1a and providing us their permission to use it.

Funding

Open Access funding provided by University of Helsinki including Helsinki University Central Hospital.

Author information

These authors jointly supervised this work: Mark J. Daly, Aarno Palotie

Authors and Affiliations

Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, Helsinki, Finland
Mitja I. Kurki, Juha Karjalainen, Priit Palta, Timo P. Sipilä, Kati M. Donner, Mary P. Reeve, Hannele Laivuori, Mervi Aavikko, Mari A. Kaunisto, Elisa Lahtela, Pietro Della Briotta Parolo, Arto A. Lehisto, Masahiro Kanai, Nina Mars, Joel Rämö, Tuomo Kiiskinen, Henrike O. Heyne, Kumar Veerapen, Sina Rüeger, Susanna Lemmelä, Sanni Ruotsalainen, Kalle Pärn, Vincent Llorens, Oluwaseun A. Dada, Andrea Ganna, Awaisa Ghazal, Risto Kajanne, Jaakko Kaprio, Elina Kilpeläinen, Aoxing Liu, Mari E. K. Niemi, Hanna M. Ollila, Rodosthenis Rodosthenous, Elmo Saarentaus, Huei-yi Shen, Taru Tukiainen, Tiinamaija Tuomi, Aki S. Havulinna, Samuli Ripatti, Anu Jalanko, Mark J. Daly & Aarno Palotie
Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
Mitja I. Kurki, Juha Karjalainen, Masahiro Kanai, Henrike O. Heyne, Kumar Veerapen, Wei Zhou, Hilary Finucane, Andrea Ganna, Jacob C. Ulirsch, Mark J. Daly & Aarno Palotie
Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA
Mitja I. Kurki, Juha Karjalainen, Masahiro Kanai, Henrike O. Heyne, Kumar Veerapen, Wei Zhou, Hilary Finucane, Andrea Ganna, Jacob C. Ulirsch, Mark J. Daly & Aarno Palotie
Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
Mitja I. Kurki, Juha Karjalainen, Masahiro Kanai, Kumar Veerapen, Wei Zhou, Hilary Finucane, Andrea Ganna, Mark J. Daly & Aarno Palotie
Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
Priit Palta, Kadri Reis, Abdelrahman G. Elnahas, Triin Laisk, Reedik Mägi, Anu Reigo, Andres Metspalu, Lili Milani & Tõnu Esko
Finnish Institute for Health and Welfare (THL), Helsinki, Finland
Kati Kristiansson, Hannele Mattsson, Päivi Laiho, Susanna Lemmelä, Tero Hiekkalinna, Sami Koskelainen, Teemu Paajanen, Johannes Kettunen, Terhi Kilpi, Teemu Niiranen, Veikko Salomaa, Kaisa Silander, Sirpa Soini, Felix Vaura, Aki S. Havulinna & Markus Perola
Medical and Clinical Genetics, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
Hannele Laivuori
Department of Obstetrics and Gynecology, Tampere University Hospital, Tampere, Finland
Hannele Laivuori
Faculty of Medicine and Health Technology, Center for Child, Adolescent and Maternal Health, University of Tampere, Tampere, Finland
Hannele Laivuori
Helsinki Biobank, University of Helsinki and Hospital District of Helsinki and Uusimaa, Helsinki, Finland
Anu Loukola, Olli Carpen & Eero Punkka
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Masahiro Kanai
Digital Health Center, Hasso Plattner Institute for Digital Engineering, University of Potsdam Potsdam, Potsdam, Germany
Henrike O. Heyne
Hasso Plattner Institute for Digital Health at Mount Sinai, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Henrike O. Heyne
TAUCHI Research Center, Faculty of Information Technology and Communication Sciences, Tampere University, Tampere, Finland
Javier Gracia-Tabuenca & Harri Siirtola
Translational Biology, Research and Development, Biogen, Cambridge, MA, USA
Benjamin Sun
BHF Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
Benjamin Sun
Optima Partners, Edinburgh, UK
Christopher N. Foley
MRC Biostatistics Unit, School of Clinical Medicine, University of Cambridge, Cambridge, UK
Christopher N. Foley
Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
Katriina Aalto-Setälä, Mika Kähönen & Marianna Niemi
Institute of Computer Science, University of Tartu, Tartu, Estonia
Kaur Alasoo & Nurlan Kerimov
Finnish Red Cross Blood Service, Helsinki, Finland
Mikko Arvas & Jukka Partanen
GlaxoSmithKline, Espoo, Finland
Kirsi Auro
Bristol Myers Squibb, New York, NY, USA
Shameek Biswas, Joseph Maranville & Robert Plenge
Tampere University Hospital and Tampere University, Tampere, Finland
Argyro Bizaki-Vallaskangas
Biogen, Cambridge, MA, USA
Chia-Yen Chen, Sally John & Heiko Runz
Boehringer Ingelheim, Ingelheim am Rhein, Germany
Zhihao Ding, Jan-Nygaard Jensen, Marc Jung, Nathan Lawless & George Okafo
GlaxoSmithKline, Collegeville, PA, USA
Margaret G. Ehm
Division of Rheumatology, Department of Medicine, Helsinki University Central Hospital, Helsinki, Finland
Kari Eklund
Orton Orthopedic Hospital, Helsinki, Finland
Kari Eklund
Abdominal Center, Helsinki University Hospital, Helsinki University, Helsinki, Finland
Martti Färkkilä
Maze Therapeutics, South San Francisco, CA, USA
Robert R. Graham, Eric M. Green & Sahar V. Mozaffari
Auria Biobank, University of Turku and Turku University Hospital, Turku, Finland
Antti Hakanen, Lila Kallio, Johanna Schleutker & Petri Virolainen
FINBB, Finnish Biobank Cooperative, Helsinki, Finland
Marco Hautalahti
Pfizer, New York, NY, USA
Åsa K. Hedman & Xinli Hu
Department of Medicine, Karolinska Institute, Solna, Sweden
Åsa K. Hedman
Clinical Biobank Tampere, Tampere University and Tampere University Hospital, Tampere, Finland
Mikko Hiltunen
Medical Research Center Oulu and PEDEGO Research Unit, University of Oulu, Oulu, Finland
Reetta Hinttala
Biocenter Oulu, University of Oulu, Oulu, Finland
Reetta Hinttala & Johannes Kettunen
Oulu University Hospital, Oulu, Finland
Reetta Hinttala
Department of Psychology and Logopedics, Faculty of Medicine, University of Helsinki, Helsinki, Finland
Iiris Hovatta
SleepWell Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland
Iiris Hovatta
Merck & Co, Kenilworth, NJ, USA
Adriana Huertas-Vazquez, Simonne Longerich, Neha Raghavan, Jae Hoon Sul & Caroline Fox
PEDEGO Research Unit, University of Oulu, Oulu, Finland
Laura Huilaja & Kaisa Tasanen
Department of Dermatology and Medical Research Center Oulu, Oulu University Hospital, Oulu, Finland
Laura Huilaja & Kaisa Tasanen
Genentech, San Francisco, CA, USA
Julie Hunkapiller, Rion K. Pendergrass & Mark McCarthy
AbbVie, Chicago, IL, USA
Howard Jacob, Fedik Rahimov, Bridget Riley-Gillis & Jeffrey Waring
Helsinki University Hospital and University of Helsinki, Helsinki, Finland
Heikki Joensuu, Tuomo Meretoja & Joni A. Turunen
Neuro Center, Neurology, Kuopio University Hospital, Kuopio, Finland
Valtteri Julkunen
Institute of Clinical Medicine–Neurology, University of Eastern Finland, Kuopio, Finland
Valtteri Julkunen
Northern Finland Biobank Borealis, University of Oulu, Northern Ostrobothnia Hospital District, Oulu, Finland
Juhani Junttila & Raisa Serpi
Department of Ophthalmology, Kuopio University Hospital, Kuopio, Finland
Kai Kaarniranta
Department of Ophthalmology, Institute of Clinical Medicine, University of Eastern Finland, Kuopio, Finland
Kai Kaarniranta
Department of Clinical Physiology, Tampere University Hospital, Tampere, Finland
Mika Kähönen
Epilepsy Center, Kuopio University Hospital, Kuopio, Finland
Reetta Kälviäinen
Department of Neurology, University of Eastern Finland, Kuopio, Finland
Reetta Kälviäinen
Department of Public Health, University of Helsinki, Helsinki, Finland
Jaakko Kaprio
Computational Medicine, Center for Life Course Health Research, Faculty of Medicine, University of Oulu, Oulu, Finland
Johannes Kettunen
Translational Sciences, Sanofi R&D, Framingham, MA, USA
Katherine Klinger & Deepak Rajpal
Biobank of Eastern Finland, University of Eastern Finland, Kuopio, Finland
Veli-Matti Kosma & Arto Mannermaa
Kuopio University Hospital, Kuopio, Finland
Veli-Matti Kosma & Arto Mannermaa
Central Finland Biobank, Central Finland Health Care District, Jyväskylä, Finland
Teijo Kuopio & Jari Laukkanen
Department of Clinical Genetics, Tampere University Hospital, Tampere, Finland
Venla Kurra
Department of Clinical Genetics, Faculty of Medicine and Health Technology, Tampere, Finland
Venla Kurra
Department of Medicine, Institute of Clinical Medicine, University of Eastern Finland, Kuopio, Finland
Jari Laukkanen
FINBB, Finnish Biobank Cooperative, Turku, Finland
Johanna Mäkelä
Department of Otorhinolaryngology–Head and Neck Surgery, University of Helsinki, Helsinki, Finland
Antti Mäkitie
Helsinki University Hospital, Helsinki, Finland
Antti Mäkitie & Eveliina Salminen
Pfizer, Cambridge, MA, USA
Anders Malarstig
Department of Medical Epidemiology and Biostatistics, Karolinska Institute, Solna, Sweden
Anders Malarstig
Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
Athena Matakidou, Dirk S. Paul, Slavé Petrovski & Ioanna Tachmazidou
TAUCHI Research Center & Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
Marianna Niemi
Turku University Hospital and University of Turku, Turku, Finland
Teemu Niiranen & Antti Palomäki
Novartis Institutes for BioMedical Research, Cambridge, MA, USA
Christopher J. O´Donnell, Ma´en Obeidat & Nicole A. Renaud
Anesthesia, Critical Care, and Pain Medicine, Massachusetts General Hospital, Boston, MA, USA
Hanna M. Ollila
Department of Oral and Maxillofacial Diseases, Helsinki University Hospital, Helsinki, Finland
Tuula Palotie
Department of Oral and Maxillofacial Diseases, University of Helsinki, Helsinki, Finland
Tuula Palotie, Pirkko Pussinen & Aino Salminen
Finnish Hematological Biobank, Helsinki, Finland
Jukka Partanen
Department of Pulmonary Diseases, Kuopio University Hospital, Kuopio, Finland
Margit Pelkonen
Department of Otorhinolaryngology, Helsinki University Hospital and University of Helsinki, Helsinki, Finland
Anne Pitkäranta
Translational Science and Experimental Medicine, Research and Early Development, Respiratory and Immunology, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
Adam Platt
GlaxoSmithKline, Stevenage, UK
David Pulford
Department of Clinical Genetics, HUSLAB, HUS Diagnostic Center, University of Helsinki, Helsinki, Finland
Eveliina Salminen
Novartis Institutes for BioMedical Research, Basel, Switzerland
Richard Siegel
Finnish Clinical Biobank Tampere, Tampere University and Tampere University Hospital, Tampere, Finland
Sanna Siltanen & Tarja Laitinen
Department of Neurology, Institute of Clinical Medicine, University of Eastern Finland, Kuopio, Finland
Hilkka Soininen
Department of Neurology, Helsinki University Hospital, Helsinki, Finland
Pentti Tienari
Translational Immunology, Research Programs Unit, University of Helsinki, Helsinki, Finland
Pentti Tienari
Department of Allergy, Helsinki University Hospital and University of Helsinki, Helsinki, Finland
Sanna Toppila-Salmi
Abdominal Center, Endocrinology, Helsinki University Hospital, Helsinki, Finland
Tiinamaija Tuomi
Folkhalsan Research Center, Helsinki, Finland
Tiinamaija Tuomi
Research Program of Clinical and Molecular Metabolism, University of Helsinki, Helsinki, Finland
Tiinamaija Tuomi
Eye Genetics Group, Folkhälsan Research Center, Helsinki, Finland
Joni A. Turunen
University of Turku, Turku, Finland
Felix Vaura
Janssen Research & Development, Spring House, PA, USA
Dawn Waterworth
Janssen Biotech, Beerse, Belgium
Robert Yang
Genomics Core Facility, Institute of Genomics, University of Tartu, Tartu, Estonia
Mari Nelis
Helsinki Institute of Life Science (HiLIFE), University of Helsinki, Helsinki, Finland
Tomi P. Mäkelä

Authors

Mitja I. Kurki
View author publications
You can also search for this author in PubMed Google Scholar
Juha Karjalainen
View author publications
You can also search for this author in PubMed Google Scholar
Priit Palta
View author publications
You can also search for this author in PubMed Google Scholar
Timo P. Sipilä
View author publications
You can also search for this author in PubMed Google Scholar
Kati Kristiansson
View author publications
You can also search for this author in PubMed Google Scholar
Kati M. Donner
View author publications
You can also search for this author in PubMed Google Scholar
Mary P. Reeve
View author publications
You can also search for this author in PubMed Google Scholar
Hannele Laivuori
View author publications
You can also search for this author in PubMed Google Scholar
Mervi Aavikko
View author publications
You can also search for this author in PubMed Google Scholar
Mari A. Kaunisto
View author publications
You can also search for this author in PubMed Google Scholar
Anu Loukola
View author publications
You can also search for this author in PubMed Google Scholar
Elisa Lahtela
View author publications
You can also search for this author in PubMed Google Scholar
Hannele Mattsson
View author publications
You can also search for this author in PubMed Google Scholar
Päivi Laiho
View author publications
You can also search for this author in PubMed Google Scholar
Pietro Della Briotta Parolo
View author publications
You can also search for this author in PubMed Google Scholar
Arto A. Lehisto
View author publications
You can also search for this author in PubMed Google Scholar
Masahiro Kanai
View author publications
You can also search for this author in PubMed Google Scholar
Nina Mars
View author publications
You can also search for this author in PubMed Google Scholar
Joel Rämö
View author publications
You can also search for this author in PubMed Google Scholar
Tuomo Kiiskinen
View author publications
You can also search for this author in PubMed Google Scholar
Henrike O. Heyne
View author publications
You can also search for this author in PubMed Google Scholar
Kumar Veerapen
View author publications
You can also search for this author in PubMed Google Scholar
Sina Rüeger
View author publications
You can also search for this author in PubMed Google Scholar
Susanna Lemmelä
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Sanni Ruotsalainen
View author publications
You can also search for this author in PubMed Google Scholar
Kalle Pärn
View author publications
You can also search for this author in PubMed Google Scholar
Tero Hiekkalinna
View author publications
You can also search for this author in PubMed Google Scholar
Sami Koskelainen
View author publications
You can also search for this author in PubMed Google Scholar
Teemu Paajanen
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Llorens
View author publications
You can also search for this author in PubMed Google Scholar
Javier Gracia-Tabuenca
View author publications
You can also search for this author in PubMed Google Scholar
Harri Siirtola
View author publications
You can also search for this author in PubMed Google Scholar
Kadri Reis
View author publications
You can also search for this author in PubMed Google Scholar
Abdelrahman G. Elnahas
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Sun
View author publications
You can also search for this author in PubMed Google Scholar
Christopher N. Foley
View author publications
You can also search for this author in PubMed Google Scholar
Katriina Aalto-Setälä
View author publications
You can also search for this author in PubMed Google Scholar
Kaur Alasoo
View author publications
You can also search for this author in PubMed Google Scholar
Mikko Arvas
View author publications
You can also search for this author in PubMed Google Scholar
Kirsi Auro
View author publications
You can also search for this author in PubMed Google Scholar
Shameek Biswas
View author publications
You can also search for this author in PubMed Google Scholar
Argyro Bizaki-Vallaskangas
View author publications
You can also search for this author in PubMed Google Scholar
Olli Carpen
View author publications
You can also search for this author in PubMed Google Scholar
Chia-Yen Chen
View author publications
You can also search for this author in PubMed Google Scholar
Oluwaseun A. Dada
View author publications
You can also search for this author in PubMed Google Scholar
Zhihao Ding
View author publications
You can also search for this author in PubMed Google Scholar
Margaret G. Ehm
View author publications
You can also search for this author in PubMed Google Scholar
Kari Eklund
View author publications
You can also search for this author in PubMed Google Scholar
Martti Färkkilä
View author publications
You can also search for this author in PubMed Google Scholar
Hilary Finucane
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Ganna
View author publications
You can also search for this author in PubMed Google Scholar
Awaisa Ghazal
View author publications
You can also search for this author in PubMed Google Scholar
Robert R. Graham
View author publications
You can also search for this author in PubMed Google Scholar
Eric M. Green
View author publications
You can also search for this author in PubMed Google Scholar
Antti Hakanen
View author publications
You can also search for this author in PubMed Google Scholar
Marco Hautalahti
View author publications
You can also search for this author in PubMed Google Scholar
Åsa K. Hedman
View author publications
You can also search for this author in PubMed Google Scholar
Mikko Hiltunen
View author publications
You can also search for this author in PubMed Google Scholar
Reetta Hinttala
View author publications
You can also search for this author in PubMed Google Scholar
Iiris Hovatta
View author publications
You can also search for this author in PubMed Google Scholar
Xinli Hu
View author publications
You can also search for this author in PubMed Google Scholar
Adriana Huertas-Vazquez
View author publications
You can also search for this author in PubMed Google Scholar
Laura Huilaja
View author publications
You can also search for this author in PubMed Google Scholar
Julie Hunkapiller
View author publications
You can also search for this author in PubMed Google Scholar
Howard Jacob
View author publications
You can also search for this author in PubMed Google Scholar
Jan-Nygaard Jensen
View author publications
You can also search for this author in PubMed Google Scholar
Heikki Joensuu
View author publications
You can also search for this author in PubMed Google Scholar
Sally John
View author publications
You can also search for this author in PubMed Google Scholar
Valtteri Julkunen
View author publications
You can also search for this author in PubMed Google Scholar
Marc Jung
View author publications
You can also search for this author in PubMed Google Scholar
Juhani Junttila
View author publications
You can also search for this author in PubMed Google Scholar
Kai Kaarniranta
View author publications
You can also search for this author in PubMed Google Scholar
Mika Kähönen
View author publications
You can also search for this author in PubMed Google Scholar
Risto Kajanne
View author publications
You can also search for this author in PubMed Google Scholar
Lila Kallio
View author publications
You can also search for this author in PubMed Google Scholar
Reetta Kälviäinen
View author publications
You can also search for this author in PubMed Google Scholar
Jaakko Kaprio
View author publications
You can also search for this author in PubMed Google Scholar
Nurlan Kerimov
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Kettunen
View author publications
You can also search for this author in PubMed Google Scholar
Elina Kilpeläinen
View author publications
You can also search for this author in PubMed Google Scholar
Terhi Kilpi
View author publications
You can also search for this author in PubMed Google Scholar
Katherine Klinger
View author publications
You can also search for this author in PubMed Google Scholar
Veli-Matti Kosma
View author publications
You can also search for this author in PubMed Google Scholar
Teijo Kuopio
View author publications
You can also search for this author in PubMed Google Scholar
Venla Kurra
View author publications
You can also search for this author in PubMed Google Scholar
Triin Laisk
View author publications
You can also search for this author in PubMed Google Scholar
Jari Laukkanen
View author publications
You can also search for this author in PubMed Google Scholar
Nathan Lawless
View author publications
You can also search for this author in PubMed Google Scholar
Aoxing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Simonne Longerich
View author publications
You can also search for this author in PubMed Google Scholar
Reedik Mägi
View author publications
You can also search for this author in PubMed Google Scholar
Johanna Mäkelä
View author publications
You can also search for this author in PubMed Google Scholar
Antti Mäkitie
View author publications
You can also search for this author in PubMed Google Scholar
Anders Malarstig
View author publications
You can also search for this author in PubMed Google Scholar
Arto Mannermaa
View author publications
You can also search for this author in PubMed Google Scholar
Joseph Maranville
View author publications
You can also search for this author in PubMed Google Scholar
Athena Matakidou
View author publications
You can also search for this author in PubMed Google Scholar
Tuomo Meretoja
View author publications
You can also search for this author in PubMed Google Scholar
Sahar V. Mozaffari
View author publications
You can also search for this author in PubMed Google Scholar
Mari E. K. Niemi
View author publications
You can also search for this author in PubMed Google Scholar
Marianna Niemi
View author publications
You can also search for this author in PubMed Google Scholar
Teemu Niiranen
View author publications
You can also search for this author in PubMed Google Scholar
Christopher J. O´Donnell
View author publications
You can also search for this author in PubMed Google Scholar
Ma´en Obeidat
View author publications
You can also search for this author in PubMed Google Scholar
George Okafo
View author publications
You can also search for this author in PubMed Google Scholar
Hanna M. Ollila
View author publications
You can also search for this author in PubMed Google Scholar
Antti Palomäki
View author publications
You can also search for this author in PubMed Google Scholar
Tuula Palotie
View author publications
You can also search for this author in PubMed Google Scholar
Jukka Partanen
View author publications
You can also search for this author in PubMed Google Scholar
Dirk S. Paul
View author publications
You can also search for this author in PubMed Google Scholar
Margit Pelkonen
View author publications
You can also search for this author in PubMed Google Scholar
Rion K. Pendergrass
View author publications
You can also search for this author in PubMed Google Scholar
Slavé Petrovski
View author publications
You can also search for this author in PubMed Google Scholar
Anne Pitkäranta
View author publications
You can also search for this author in PubMed Google Scholar
Adam Platt
View author publications
You can also search for this author in PubMed Google Scholar
David Pulford
View author publications
You can also search for this author in PubMed Google Scholar
Eero Punkka
View author publications
You can also search for this author in PubMed Google Scholar
Pirkko Pussinen
View author publications
You can also search for this author in PubMed Google Scholar
Neha Raghavan
View author publications
You can also search for this author in PubMed Google Scholar
Fedik Rahimov
View author publications
You can also search for this author in PubMed Google Scholar
Deepak Rajpal
View author publications
You can also search for this author in PubMed Google Scholar
Nicole A. Renaud
View author publications
You can also search for this author in PubMed Google Scholar
Bridget Riley-Gillis
View author publications
You can also search for this author in PubMed Google Scholar
Rodosthenis Rodosthenous
View author publications
You can also search for this author in PubMed Google Scholar
Elmo Saarentaus
View author publications
You can also search for this author in PubMed Google Scholar
Aino Salminen
View author publications
You can also search for this author in PubMed Google Scholar
Eveliina Salminen
View author publications
You can also search for this author in PubMed Google Scholar
Veikko Salomaa
View author publications
You can also search for this author in PubMed Google Scholar
Johanna Schleutker
View author publications
You can also search for this author in PubMed Google Scholar
Raisa Serpi
View author publications
You can also search for this author in PubMed Google Scholar
Huei-yi Shen
View author publications
You can also search for this author in PubMed Google Scholar
Richard Siegel
View author publications
You can also search for this author in PubMed Google Scholar
Kaisa Silander
View author publications
You can also search for this author in PubMed Google Scholar
Sanna Siltanen
View author publications
You can also search for this author in PubMed Google Scholar
Sirpa Soini
View author publications
You can also search for this author in PubMed Google Scholar
Hilkka Soininen
View author publications
You can also search for this author in PubMed Google Scholar
Jae Hoon Sul
View author publications
You can also search for this author in PubMed Google Scholar
Ioanna Tachmazidou
View author publications
You can also search for this author in PubMed Google Scholar
Kaisa Tasanen
View author publications
You can also search for this author in PubMed Google Scholar
Pentti Tienari
View author publications
You can also search for this author in PubMed Google Scholar
Sanna Toppila-Salmi
View author publications
You can also search for this author in PubMed Google Scholar
Taru Tukiainen
View author publications
You can also search for this author in PubMed Google Scholar
Tiinamaija Tuomi
View author publications
You can also search for this author in PubMed Google Scholar
Joni A. Turunen
View author publications
You can also search for this author in PubMed Google Scholar
Jacob C. Ulirsch
View author publications
You can also search for this author in PubMed Google Scholar
Felix Vaura
View author publications
You can also search for this author in PubMed Google Scholar
Petri Virolainen
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey Waring
View author publications
You can also search for this author in PubMed Google Scholar
Dawn Waterworth
View author publications
You can also search for this author in PubMed Google Scholar
Robert Yang
View author publications
You can also search for this author in PubMed Google Scholar
Mari Nelis
View author publications
You can also search for this author in PubMed Google Scholar
Anu Reigo
View author publications
You can also search for this author in PubMed Google Scholar
Andres Metspalu
View author publications
You can also search for this author in PubMed Google Scholar
Lili Milani
View author publications
You can also search for this author in PubMed Google Scholar
Tõnu Esko
View author publications
You can also search for this author in PubMed Google Scholar
Caroline Fox
View author publications
You can also search for this author in PubMed Google Scholar
Aki S. Havulinna
View author publications
You can also search for this author in PubMed Google Scholar
Markus Perola
View author publications
You can also search for this author in PubMed Google Scholar
Samuli Ripatti
View author publications
You can also search for this author in PubMed Google Scholar
Anu Jalanko
View author publications
You can also search for this author in PubMed Google Scholar
Tarja Laitinen
View author publications
You can also search for this author in PubMed Google Scholar
Tomi P. Mäkelä
View author publications
You can also search for this author in PubMed Google Scholar
Robert Plenge
View author publications
You can also search for this author in PubMed Google Scholar
Mark McCarthy
View author publications
You can also search for this author in PubMed Google Scholar
Heiko Runz
View author publications
You can also search for this author in PubMed Google Scholar
Mark J. Daly
View author publications
You can also search for this author in PubMed Google Scholar
Aarno Palotie
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

FinnGen

Mitja I. Kurki
, Juha Karjalainen
, Priit Palta
, Timo P. Sipilä
, Kati Kristiansson
, Kati M. Donner
, Mary P. Reeve
, Hannele Laivuori
, Mervi Aavikko
, Mari A. Kaunisto
, Anu Loukola
, Elisa Lahtela
, Hannele Mattsson
, Päivi Laiho
, Pietro Della Briotta Parolo
, Arto A. Lehisto
, Masahiro Kanai
, Nina Mars
, Joel Rämö
, Tuomo Kiiskinen
, Henrike O. Heyne
, Kumar Veerapen
, Sina Rüeger
, Susanna Lemmelä
, Wei Zhou
, Sanni Ruotsalainen
, Kalle Pärn
, Tero Hiekkalinna
, Sami Koskelainen
, Teemu Paajanen
, Vincent Llorens
, Javier Gracia-Tabuenca
, Harri Siirtola
, Benjamin Sun
, Katriina Aalto-Setälä
, Mikko Arvas
, Kirsi Auro
, Shameek Biswas
, Argyro Bizaki-Vallaskangas
, Olli Carpen
, Chia-Yen Chen
, Oluwaseun A. Dada
, Zhihao Ding
, Margaret G. Ehm
, Kari Eklund
, Martti Färkkilä
, Hilary Finucane
, Andrea Ganna
, Awaisa Ghazal
, Robert R. Graham
, Eric M. Green
, Antti Hakanen
, Marco Hautalahti
, Åsa K. Hedman
, Mikko Hiltunen
, Reetta Hinttala
, Iiris Hovatta
, Xinli Hu
, Adriana Huertas-Vazquez
, Laura Huilaja
, Julie Hunkapiller
, Howard Jacob
, Jan-Nygaard Jensen
, Heikki Joensuu
, Sally John
, Valtteri Julkunen
, Marc Jung
, Juhani Junttila
, Kai Kaarniranta
, Mika Kähönen
, Risto Kajanne
, Lila Kallio
, Reetta Kälviäinen
, Jaakko Kaprio
, Nurlan Kerimov
, Johannes Kettunen
, Elina Kilpeläinen
, Terhi Kilpi
, Katherine Klinger
, Veli-Matti Kosma
, Teijo Kuopio
, Venla Kurra
, Jari Laukkanen
, Nathan Lawless
, Aoxing Liu
, Simonne Longerich
, Johanna Mäkelä
, Antti Mäkitie
, Anders Malarstig
, Arto Mannermaa
, Joseph Maranville
, Athena Matakidou
, Tuomo Meretoja
, Sahar V. Mozaffari
, Mari E. K. Niemi
, Marianna Niemi
, Teemu Niiranen
, Christopher J. O´Donnell
, Ma´en Obeidat
, George Okafo
, Hanna M. Ollila
, Antti Palomäki
, Tuula Palotie
, Jukka Partanen
, Dirk S. Paul
, Margit Pelkonen
, Rion K. Pendergrass
, Slavé Petrovski
, Anne Pitkäranta
, Adam Platt
, David Pulford
, Eero Punkka
, Pirkko Pussinen
, Neha Raghavan
, Fedik Rahimov
, Deepak Rajpal
, Nicole A. Renaud
, Bridget Riley-Gillis
, Rodosthenis Rodosthenous
, Elmo Saarentaus
, Aino Salminen
, Eveliina Salminen
, Veikko Salomaa
, Johanna Schleutker
, Raisa Serpi
, Huei-yi Shen
, Richard Siegel
, Kaisa Silander
, Sanna Siltanen
, Sirpa Soini
, Hilkka Soininen
, Jae Hoon Sul
, Ioanna Tachmazidou
, Kaisa Tasanen
, Pentti Tienari
, Sanna Toppila-Salmi
, Taru Tukiainen
, Tiinamaija Tuomi
, Joni A. Turunen
, Jacob C. Ulirsch
, Felix Vaura
, Petri Virolainen
, Jeffrey Waring
, Dawn Waterworth
, Robert Yang
, Caroline Fox
, Aki S. Havulinna
, Markus Perola
, Samuli Ripatti
, Anu Jalanko
, Tarja Laitinen
, Tomi P. Mäkelä
, Robert Plenge
, Mark McCarthy
, Heiko Runz
, Mark J. Daly
& Aarno Palotie

Contributions

M.I.K.: FinnGen analysis team leader, data analysis, browser development and wrote the first draft of the manuscript. J. Karjalainen: analysis, browser development and paper editing. P. Palta: paper writing and FinnGen genotype data development. T.P.S.: team leader. K. Kristiansson: paper editing, FinnGen phenotype and register data development. K.M.D.: paper editing and data QC. M.P.R.: paper editing and visualizations. H.L.: paper editing, study design and FinnGen phenotype and register data development. M. Aavikko: paper editing, study design and team coordination. M.A.K.: paper editing, visualizations and variant novelty annotation. A. Loukola: paper editing, methods, biobank samples and Supplementary Methods, section 2. E.L.: paper editing, FinnGen phenotype and register data development and clinical team coordination. H.M.: FinnGen phenotype and register data development. P.L.: overseeing and organizing sample preparation and logistics. P.D.B.P.: computation tool development and population structure analysis. A.A.L.: computation tool development and fine-mapping. M. Kanai: provided fine-mapped GTEx v.8 data, UKBB quantitative phenotype fine-mapping and contributed the original fine-mapping pipeline. N.M.: paper editing. J.R.: paper editing, and ear, nose and throat end point interpretation. T. Kiiskinen: paper editing, FinnGen phenotype and register data development and data analysis. H.O.H.: paper editing and annotating and interpreting ClinVar variants. K.V.: paper editing and finalization, and initial phenotype summary statistics curation. S. Rüeger: tool development and colocalization analysis. S. Lemmelä: paper editing, and creating all phenotype plots and tables for the paper. W.Z.: GWAS analysis contribution. S. Ruotsalainen: association analysis. K.P.: genotype data development. T.H.: end point development and register data processing. S.K.: register data processing. T. Paajanen: data processing. V.L.: end point development, tool development and construction of RISTEYS. J.G.-T.: paper editing and longitudinal data analysis. H. Siirtola: paper editing and visualization of registry data in FinnGen end points. K.R.: replication analysis and paper editing. A.G.E.: replication analysis. B.S.: simulation analyses for matched sample size comparison to the UKBB and the EstBB data and paper editing. C.N.F.: simulation analyses for matched sample size comparison to the UKBB and the EstBB data and paper editing. K.A.-S.: paper editing. K. Alasoo: paper editing and fine-mapping of the eQTL Catalogue data. M. Arvas: phenotype and HLA analysis and paper editing. K. Auro: study design, overseeing the project and paper editing. S.B.: paper editing. A.B.-V.: paper editing and end point development. O.C.: paper editing, study design and end point development. C.-Y.C.: data analysis and paper editing. O.A.D.: data processing and cloud administration. Z.D.: study design, data analysis and paper editing. M.G.E.: study design, overseeing the project and paper editing. K.E.: end point development and paper editing. M.F.: end point development, study design and paper editing. H.F.: provided fine-mapped GTEx v.8 data and UKBB quantitative phenotypes and paper editing. A. Ganna: study design, overseeing tool development and paper editing. A. Ghazal: participated in R4 imputation, full responsibility of running R5 production runs and paper editing. R.R.G.: study design, overseeing the project and paper editing. E.M.G.: study design, overseeing the project and paper editing. A.H.: study design, overseeing the project and paper editing. M. Hautalahti: biobank coordination. Å.K.H.: data analysis and paper editing. M. Hiltunen: cohort collection, clinical expertise and paper editing. R. Hinttala: data analysis and paper editing. I.H.: end point development and paper editing. X.H.: data analysis and paper editing. A.H.-V.: data analysis and paper editing. L.H.: end point development and paper editing. J.H.: study design, overseeing the project and paper editing. H. Jacob: study design, overseeing the project and paper editing. J.-N.J.: study design and paper editing. H.J.: end point development and paper editing. S.J.: study design, overseeing the project and paper editing. V.J.: end point development, study design and paper editing. M.J.: data analysis and paper editing. J.J.: study design, overseeing the project and paper editing. K. Kaarniranta: end point develpoment and paper editing. M. Kähönen: end point development, overseeing the project and paper editing. R. Kajanne: project coordination and paper editing. L.K.: sample collection and coordination. R. Kälviäinen: end point develpoment and paper editing. J. Kaprio: study design, register expertise and paper editing. N.K.: peformed fine-mapping on the eQTL Catalogue data and paper editing. J. Kettunen: data analysis and paper editing. E.K.: imputation and QC of R1–R5 datasets and paper editing. T. Kilpi: study design, overseeing the project and paper editing. K. Klinger: study design, overseeing the project and paper editing. V.-M.K.: study design, overseeing the project and paper editing. T. Kuopio: sample collection and coordination. V.K.: end point development and paper editing. T. Laisk: data analysis and paper editing. J.L.: study design, overseeing the project and paper editing. N.L.: study design, overseeing the project and paper editing. A. Liu: data analysis and paper editing. S. Longerich: study design, overseeing the project and paper editing. R.M.: genotype data preparation and paper editing. J. Mäkelä: sample collection and coordination. A. Mäkitie: end point development and paper editing. A. Malarstig: study design, overseeing the project and paper editing. A. Mannermaa: study design, overseeing the project and paper editing. J. Maranville: study design, overseeing the project and paper editing. A. Matakidou: study design, overseeing the project and paper editing. T.M.: end point development and paper editing. S.V.M.: data analysis and paper editing. M.E.K.N.: study design, study coordination and paper editing. M. Niemi: tool and end point development and paper editing. T.N.: end point development and paper editing. C.J.O.: study design, overseeing the project and paper editing. M.O.: data analysis and paper editing. G.O.: study design, overseeing the project and paper editing. H.M.O.: data analysis and paper editing. A.P.: end point development and validation, data analysis, study design and paper editing. T. Palotie: clinical definitions and end point development and paper editing. J.P.: study design, overseeing the project, HLA imputation and paper editing. D.S.P.: study design, overseeing the project and paper editing. M. Pelkonen: end point development and paper editing. R.K.P.: data analysis and paper editing. S.P.: study design and paper editing. A. Pitkäranta: study design, overseeing the project and paper editing. A. Platt: study design, overseeing the project and paper editing. D.P.: data analysis and paper editing. E.P.: sample collection and coordination. P. Pussinen: end point definition and paper editing. N.R.: data analysis and paper editing. F.R.: data analysis and paper editing. D.R.: data analysis and paper editing. N.A.R.: data analysis and paper editing. B.R.-G.: study design, overseeing the project and paper editing. R.R.: study design, study coordination and paper editing. E. Saarentaus: paper editing, and ear, nose and throat end point interpretation. A.S.: end point development and paper editing. E. Salminen: end point development, study design and paper editing. V.S.: end point development, study design and paper editing. J.S.: end point development and paper editing. R. Serpi: sample collection and coordination. H.-y.S.: project coordination and paper editing. R. Siegel: study design and paper editing. K.S.: biobank expertise and paper editing. S. Siltanen: study design, overseeing the project and paper editing. S. Soini: biobank and legal expertise, and paper editing. H. Soininen: end point development, clinical expertise and paper editing. J.H.S.: data analysis and paper editing. I.T.: data analysis and paper editing. K.T.: end point development, clinical expertise and paper editing. P.T.: neurology end point definitions, paper review and paper editing. S.T.-S.: paper editing. T. Tukiainen: study design, data analysis and paper editing. T. Tuomi: end point development, study design and paper editing. J.A.T.: ophthalmology end point definitions, paper review and paper editing. J.C.U.: fine-mapping and paper editing. F.V.: end point development and paper editing. P.V.: study design, overseeing the project and paper editing. J.W.: study design, data analysis and paper editing. D.W.: study design and paper editing. R.Y.: study design and paper editing. M. Nelis: study design and paper editing. A.R.: study design and paper editing. A. Metspalu: study design, Head of the Estonian Biobank and paper editing. L.M.: study design and paper editing. T.E.: study design and paper editing. C.F.: study design, overseeing the project and paper editing. A.S.H.: paper editing, FinnGen end point concept and implementation and register data development. M. Perola: paper editing, study design, ethics expert and THL science co-ordinator. S. Ripatti: paper editing, study design and data analysis expertise. A.J.: paper editing and participation in strategic planning of the project. T. Laitinen: paper editing, study design and overseeing the project. T.P.M.: paper editing, study design and overseeing the project. R.P.: paper editing, study design and overseeing the project. M.M.: paper writing and editing, study design and overseeing the project. H.R.: paper writing and editing, study design and overseeing the project. M.J.D.: paper writing and editing, study design and overseeing the project. A. Palotie: paper writing and editing, study design and overseeing the project.

Corresponding author

Correspondence to Aarno Palotie.

Ethics declarations

Competing interests

J.C.U.: has received compensation for consulting from Goldfinch Bio and is an employee of Patch Biosciences. K.E.: Consultation fees from Sobi, and Orion corporation. A. Palomäki: consulting fee from Abbvie, Amgen and Pfizer, lecture fee from Pfizer and Sanofi. H. Jacob: employee of AbbVie. F.R.: employee of AbbVie. B.R.-G.: employee of AbbVie. J.W.: employee of AbbVie. A. Matakidou: employee of AstraZeneca. D.S.P.: employee of AstraZeneca. S.P.: employee of AstraZeneca. A. Platt: employee of AstraZeneca. I.T.: employee of AstraZeneca. B.S.: employee of Biogen. C.-Y.C.: employee of Biogen. S.J.: employee of Biogen. H.R.: employee of Biogen. Z.D.: employee of Boehringer Ingelheim. J.-N.J.: employee of Boehringer Ingelheim. M.J.: employee of Boehringer Ingelheim. N.L.: employee of Boehringer Ingelheim. G.O.: employee of Boehringer Ingelheim. S.B.: employee of Bristol Myers Squibb. J. Maranville: employee of Bristol Myers Squibb. R.P.: employee of Bristol Myers Squibb. J.H.: employee of Genentech. R.K.P.: employee of Genentech. M.M.: employee of Genentech. K. Auro: employee of GlaxoSmithKline. M.G.E.: employee of GlaxoSmithKline. D.P.: employee of GlaxoSmithKline. R.Y.: employee of Janssen Biotech. D.W.: employee of Janssen Research & Development. R.R.G.: employee of Maze Therapeutics. E.M.G.: employee of Maze Therapeutics. S.V.M.: employee of Maze Therapeutics. A.H.-V.: employee of Merck Sharp & Dohme LLC, a subsidiary of Merck & Co., Inc., Rahway, NJ, USA. S. Longerich: employee of Merck Sharp & Dohme LLC, a subsidiary of Merck & Co., Inc., Rahway, NJ, USA. N.R.: employee of Merck Sharp & Dohme LLC, a subsidiary of Merck & Co., Inc., Rahway, NJ, USA. J.H.S.: employee of Merck Sharp & Dohme LLC, a subsidiary of Merck & Co., Inc., Rahway, NJ, USA. C.F.: employee of Merck Sharp & Dohme LLC, a subsidiary of Merck & Co., Inc., Rahway, NJ, USA. M.E.K.N.: employee of Novartis. C.J.O.: employee of Novartis. M.O.: employee of Novartis. N.R.: employee of Novartis. R. Siegel: employee of Novartis. Å.K.H.: employee of Pfizer. X.H.: employee of Pfizer. A. Malarstig: employee of Pfizer. K. Klinger: employee of Sanofi. D.R.: employee of Sanofi. J.P.: employee of the Finnish Red Cross Blood Service. A.G.E.: employee of University of Tartu. M.J.D.: founder of Maze Therapeutics. T. Kuopio: lecture fee from Amgen, Roche and MSD. J.A.T.: lecture fee from Blueprint Genetics Finland, and on the advisory board of Novartis Finland (unrelated to this work). P.T.: lecture or consulting fee from Alexion, Roche, Merck, Janssen-Cilag, Novartis and Biogen (unrelated to this work). M. Aavikko: lecturing fee and congress trip reimbursement from Ipsen and Novartis (unrelated to this work). A. Palotie: member of the Pfizer Genetics Scientific Advisory Panel. V.S.: received a honorarium from Sanofi for consulting, and ongoing research collaboration with Bayer (all unrelated to the present study). H. Joensuu: senior consultant for Orion Pharma Orion, Chairman of the Scientific Advisory Board in Neutron Therapeutics, received a fee for acting as the Chair of an ESMO meeting from Deciphera Pharmaceuticals, stock ownership in Orion Pharma and Sartar Therapeutics. The other authors declare no competing interests.

Peer review

Peer review information

Nature thanks Francesco Cucca, Lea Davis and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 FinnGen Age Distribution and Registers.

A) Distribution of the current age (age at the end of the follow-up) and age of death for FinnGen participants B) Follow-up time and main coding used in each register among FinnGen participants in FinnGen release 5. Abbreviations: CANCER = The Finnish Cancer Registry; DEATH = Cause of death register; INPATIENT = HILMO - Care Register for Health Care: Inpatient hospital visits; OUTPATIENT = HILMO - Care Register for Health Care: Specialty outpatient visits and day surgeries; PURCHASE = Drug Purchases: All Prescription drug purchases; REIMBURSEMENT = Drug Reimbursement: entitlements for prescription drug reimbursement for certain chronic diseases.

Extended Data Fig. 2 PCA classification of 224,737 FinnGen participants combined with 1000 genomes samples (AFR,AMR,EAST,EUR,FIN,SAS).

FinnGen outlier samples were removed as deviating from the bulk of the FinnGen samples.

Extended Data Fig. 3 Comparison of effect sizes between biobanks.

A,B) Effect size (log(OR), beta) comparison of 275 genome-wide significant lead variants identified in FinnGen among 15 analysed diseases in Estonia and UKBB. The sign of beta is aligned to be positive in Estonia and UKBB. C,D) beta comparison of variants only in known loci. E,F) beta comparison of novel loci. Dashed lines indicates identity line and solid lines are the regression line (red line and text weighted by pooled standard error of betas).

Extended Data Fig. 4 Enrichment of 493 unique phenome-wide significant associations binned by NFSEE MAF and split by whether 95% credible sets contain a coding variant.

The p-values of the test of difference in average enrichment are shown on the right side of each MAF bin. Lines indicate 95% confidence interval of the mean enrichment. Number of coding/non-coding variants in each bin : 21/27, 12/35, 11/22, 3/10, 7/24 and 19/277 given in the same order as in the figure x-axis.

Supplementary information

Supplementary Information

Comparison of effects size in known genome-wide significant loci between FinnGen and large published reference GWASs (Table 1). The y and x axes represent FinnGen and reference GWAS effect sizes, respectively. Beta values are aligned to be positive in reference studies. Lines extending from points indicate standard errors in respective studies. Regression lines omit the intercept and two types of regressions are provided: unweighted and weighted by pooled standard errors from the two studies. The solid line indicates the identity line and the dotted line and dashed lines indicate unweighted and weighted regression, respectively. Only variants with P <1 × 10⁻¹⁰ in the reference study were included to mitigate the effect of the winner’s curse of inflated beta values in the reference studies.

Reporting Summary

Supplementary Data 1

List of variants, summary statistics and references for figures in the Supplementary Information file.

Supplementary Tables 1–12

Supplementary Tables 1–12 and table legends.

Supplementary Methods

The file contains the following sections: phenotyping from nationwide population-based health registers; FinnGen participant recruitment and legacy cohorts, genotyping and genotype data QC; population structure and cryptic relatedness; GWAS and PheWAS analysis; and data access and dissemination. All supplementary figures are inlined in the appropriate sections.

Supplementary Note 1

Discussion of noteworthy findings from 15 previously well-studied benchmark diseases.

Supplementary Note 2

A list of all FinnGen working-group members and their affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kurki, M.I., Karjalainen, J., Palta, P. et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 613, 508–518 (2023). https://doi.org/10.1038/s41586-022-05473-8

Download citation

Received: 10 January 2022
Accepted: 21 October 2022
Published: 18 January 2023
Issue Date: 19 January 2023
DOI: https://doi.org/10.1038/s41586-022-05473-8

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.