Genomic prediction of alcohol-related morbidity and mortality

While polygenic risk scores (PRS) have been shown to predict many diseases and risk factors, the potential of genomic prediction in harm caused by alcohol use has not yet been extensively studied. Here, we built a novel polygenic risk score of 1.1 million variants for alcohol consumption and studied its predictive capacity in 96,499 participants from the FinnGen study and 39,695 participants from prospective cohorts with detailed baseline data and up to 25 years of follow-up time. A 1 SD increase in the PRS was associated with 11.2 g (=0.93 drinks) higher weekly alcohol consumption (CI = 9.85–12.58 g, p = 2.3 × 10–58). The PRS was associated with alcohol-related morbidity (4785 incident events) and the risk estimate between the highest and lowest quintiles of the PRS was 1.83 (95% CI = 1.66–2.01, p = 1.6 × 10–36). When adjusted for self-reported alcohol consumption, education, marital status, and gamma-glutamyl transferase blood levels in 28,639 participants with comprehensive baseline data from prospective cohorts, the risk estimate between the highest and lowest quintiles of the PRS was 1.58 (CI = 1.26–1.99, p = 8.2 × 10–5). The PRS was also associated with all-cause mortality with a risk estimate of 1.33 between the highest and lowest quintiles (CI = 1.20–1.47, p = 4.5 × 10–8) in the adjusted model. In conclusion, the PRS for alcohol consumption independently associates for both alcohol-related morbidity and all-cause mortality. Together, these findings underline the importance of heritable factors in alcohol-related health burden while highlighting how measured genetic risk for an important behavioral risk factor can be used to predict related health outcomes.


Introduction
Alcohol drinking is a major dose-dependent contributor to morbidity and mortality. Globally, 3 million annual deaths (5% of all deaths) result from alcohol consumption, and is also linked to more than 200 disease and injury outcomes 1 . As ethanol is a psychoactive substance with addictive properties 2 , alcohol consumption can lead to the development of alcohol use disorders (AUDs), globally prevalent mental disorders of pathological addictive or abusive drinking patterns, which are linked to worse health outcomes, negative socioeconomic effects, and increased mortality 3 . There is a strong connection between the health burden and the level of alcohol consumed 4 , and in total, alcohol has been estimated to be the most damaging of all substances of abuse, in terms of harm caused to self and others 5 .
Alcohol-related behaviors are also affected by genetic factors and the estimated heritability of alcohol consumption in twin studies has ranged between 35% and 65% (weighted average 37%) 6 and its single nucleotide polymorphism-based heritability has been estimated to be 10% 7 . Recent large-scale genome-wide association studies (GWAS) have identified multiple loci associated with alcohol consumption, underlining the importance of large study populations for unraveling the genetic architecture underlying alcohol-related traits 7,8 . Similarly, GWAS of alcohol dependence, AUD, and the Alcohol Use Disorders Identification Test (AUDIT) scores have shown the traits to be genetically distinct but positively correlated [9][10][11] .
Polygenic risk scores (PRSs) derived from GWAS summary statistics have showcased improved performance in disease prediction 12 . PRSs for known risk factors have also been shown to associate with the related disease 13 , and recently associations between multiple risk factor PRSs and related traits were confirmed and reported 14,15 . However, the link between PRSs for behavioral traits and associated health outcomes remains poorly understood.
The assessment of potential health risks related to alcohol has so far relied on traditional risk factors, including family history, without explicit measurement of genetic risk. Here we developed a highly polygenic risk score for alcohol consumption and studied whether alcohol-related polygenic burden predicts alcohol-use disorders and other alcohol-related morbidity and mortality in Finnish biobank cohorts (n = 96,499) linked to electronic health records. Furthermore, we studied whether the PRS for alcohol consumption predicts alcoholrelated outcomes beyond self-reported alcohol consumption and other related risk factors, thus providing more objective information independent of individual reporting bias or temporal fluctuations.

Study sample and definition of alcohol-related morbidity
The data are comprised of 96,499 Finnish individuals from FinnGen Data Freeze 2 (https://www.finngen.fi/), which includes prospective epidemiological and diseasebased cohorts as well as hospital biobank samples (Contributors S1, Table S1). The data were linked by the unique national personal identification numbers to national hospital discharge, death, and medication reimbursement registries. Additional details and information on the genotyping and imputation are provided in the online-only Supplementary Information.
Alcohol-related baseline measures were available for a subset of the FinnGen dataset consisting of national population survey cohorts: FINRISK, collected in 1992FINRISK, collected in , 1997FINRISK, collected in , 2002FINRISK, collected in , 2007FINRISK, collected in , and 2012 and Health 2000, collected in 2000. The baseline data included self-reported information assessed by questionnaires, anthropometric measures, and blood samples. More detailed descriptions of the FINRISK and Health 2000 studies have been published previously 16,17 .
Additionally, three Finnish twin cohorts, FinnTwin12, NAG-FIN, and Old Twin, were pooled and analyzed as one dataset. For these datasets, cohort baseline data were available, but the cohorts were not linked to electronical health records. For details regarding the twin datasets, see the online descriptions (https://wiki.helsinki.fi/display/ twineng/Twinstudy) 18,19 .

Genotyping and imputation
FinnGen, FINRISK, Health 2000, and Finnish Twin Cohort samples were genotyped with Illumina and Affymetrix genomewide SNP arrays. Individuals with non-European ancestry or uncertain sex were excluded. Within each cohort, every genotyping batch was first imputed separately and then merged together for association analyses. The details about the genotype calling, quality controls, and imputation are provided in the Supplementary material (Methods S1).

Polygenic risk scores
Summary statistics from the largest existing GWAS meta-analysis on alcohol consumption (8) were used for constructing the PRS. To avoid overfitting, a separate ad hoc meta-analysis was performed by GSCAN (Contributors S2), excluding all Finnish and 23andMe samples (n = 527,282 after exclusions). LDpred-method 20 was used to account for linkage disequilibrium (LD) among loci with whole-genome sequencing data on 2690 Finns serving as the external LD reference panel. We compared the PRSs generated with LDpred-parameters and their predictive ability in FINRISK (Fig S1). Any threshold above 0.003 worked practically similarly, and for simplicity we chose to use the LDpred-inf PRS in all the analyses. The final scores were generated with PLINK2 (ref. 21

Statistical analysis
The Cox proportional hazard model was used to estimate survival curves, hazard ratios (HRs), and 95% confidence interval (95% CI) in the survival analyses where age was used as the time scale. R's cox.zph function was used to test whether the proportional assumption criteria applied in our models. Linear regression in FINRISK and Health 2000 and linear mixed model in the Twin Cohort was used for estimating the relationship between the PRS and alcohol consumption. Logistic regression in the FINRISK and Health 2000 cohorts and linear mixed model in the Twin Cohort was used to estimate the relationship between alcohol abstinence and the PRS.
All the cohorts (FinnGen, FINRISK, Health 2000, and the Twin Cohort) were analyzed independently as single datasets where age, sex, genotyping array, and the first ten principal components of ancestry were used as core covariates. Additionally, body mass was used as a covariate in the model estimating the PRS-alcohol consumption relationship. Self-reported weekly average alcohol consumption from the past year (when unavailable, the past week's consumption) was used as the estimate for alcohol consumption. In the fully adjusted survival model analyses, log(x + 1) -transformed alcohol consumption-estimate, current smoking status, binary higher education status, binary marital/cohabitation status, and gamma-glutamyl transferase (GGT) blood levels at baseline served as covariates. The GGT levels were measured following uniform recommendations of the European Committee for Clinical Laboratory Standards (ECCLS) 22 enabling comparability between the cohorts.
In the survival analyses, all prevalent cases (in FINRISK and Health 2000) and individuals with covariate missingness were excluded. The PRS was normalized and included as a continuous variable in the models. In the survival analysis, the highest and lowest genetic risk for alcohol consumption were compared using PRS quintiles.
In analyses using baseline consumption data, the analyses were performed separately in the Health 2000, FINRISK Study, and Twin Cohorts and then metaanalyzed using fixed effects model.
In risk prediction, FINRISK cohorts with at least 10 years of follow-up (from 1992 to 2002) were used to train the model, and the predictive performance was tested in the Health 2000 cohort. The maximal follow-up window was restricted to 10 years. The change in the predictive performance was assessed by comparing models with and without the PRS using the correlated C-index approach 23 along with calculating the continuous reclassification improvement (NRI) 24 and integrated discrimination improvement (IDI) 25 . The Hosmer-Lemeshow goodnessof-fit test was used to test model calibration.

Ethical approval
The study was conducted in accordance with the principles of the Helsinki declaration. Written informed consent was obtained from all the study participants. For the Finnish Institute of Health and Welfare (THL)-driven FinnGen preparatory project and FinnGen project, all patients and control subjects had provided informed consent for biobank research, based on the Finnish Biobank Act. Alternatively, FINRISK and Health 2000 cohorts were based on study specific consents and later transferred to the THL Biobank after approval by Valvira, the National Supervisory Authority for Welfare and Health. Recruitment protocols followed the biobank protocols approved by Valvira.

Cohorts
Our primary dataset (FinnGen) is comprised of 96,499 unrelated individuals (54,262 women) with a total of 55,484,114 person-years of registry-based follow-up and 4785 first-observed alcohol-related major health events. Alcohol consumption estimates were available for a total of 39,695 individuals from the prospective cohorts (FIN-RISK, Health 2000, and Twin Cohort, Fig. S2). Two cohorts, FINRISK and Health 2000, have full registry data and information on self-reported alcohol consumption and related baseline data, and consist of 28,639 individuals (94.5% of the participants after excluding 964 prevalent alcohol-related morbidity cases), with 424,053 personyears of registry-based follow-up and 988 first ever alcohol-related events ( Table 1). The interview-based DSM-IV AUD-status was available in a subset of the Twin cohort for 713 cases and 1460 controls.

Alcohol consumption
In a meta-analysis of the three cohorts with alcohol consumption estimates available (n = 39,695), the PRS for alcohol consumption was strongly associated with selfreported alcohol consumption. A 1 SD increase in the PRS was associated with an 11.2 g (=0.93 drinks á 12 g) increase in weekly pure alcohol intake (beta = 11.2 [9.85-12.6 g], p = 2.3 × 10 -58 ) (Fig. 1, cohort-specific figures: Fig. S3). Adding the PRS to the model improved r 2 by~0.6 percentage points (from 9.17% to 9.80%). In addition, the PRS was negatively associated with alcohol abstinence (reported alcohol consumption 0). In FINRISK and Health2000, a 1 SD increase in the PRS for alcohol consumption was associated with a 13.

Alcohol-related morbidity
The PRS for alcohol consumption was strongly associated with increased risk for lifelong major alcoholrelated events derived from electronic health-records in the FinnGen dataset (n = 96,499, cases = 4785) (Fig. 2). The difference in the risk for alcohol-related morbidity events between the lowest and highest risk quintiles in the PRS was 83% (HR = 1.83 [1.66-2.01], p = 1.6 × 10 -36 ) and In the cohorts where alcohol consumption estimates and other related baseline data were available at the cohort entry time, the PRS was associated with an increased risk of incident major alcohol-related events and the association was maintained also in the fully adjusted model (n = 28,639, cases = 911). In a metaanalysis of the two cohorts, 1 PRS SD was associated with a 26% increased risk of incident alcohol-related events when the consumption-estimate was not in the model (HR = 1.26 [1.18-1.34], p = 1.1 × 0 -12 ) and with a 15% increase when alcohol consumption was in the model (HR = 1.15 [1.08-1.22], p = 2.1 × 10 -5 ). In a fully adjusted model, including marital status, education, smoking status, and GGT, the estimate was unchanged (HR = 1.15 [1.08-1.22], p = 2.0 × 10 -5 ) ( Table 2). The risk estimate between the highest and lowest quintiles of the PRS in the fully adjusted model was 1.58 (HR = 1.58 [1.26-1.99], p = 8.2 × 10 -5 ).

Mortality
We observed a similar increase in the risk of alcoholrelated and all-cause mortality. In FinnGen with 7249 deaths, 1 SD increase in the PRS for alcohol consumption was associated with 8% increase in the risk of death (HR  Table 2). Similarly, the PRS was associated with a higher risk of death from other than alcohol-related causes (n = 3790) when fully adjusted for all covariates (HR = 1.08 [1.05-1.12], p = 1.4 × 10 -6 ).

DSM-IV alcohol-use disorder
The PRS was also associated with an interview-based DSM-IV AUD diagnosis in the Nicotine Addiction Genetics Family cohort (440 cases, 1140 controls) and a subset of FinnTwin16 cohort (273 cases, 320 controls). A meta-analysis of the two cohorts (713 cases) resulted in a combined 20% increase in the prevalence of AUD

Prediction
The predictive performance of the PRS was evaluated in the Health 2000 cohort (5732 complete cases, 110 events) with a follow-up-window of 10 years based on the Cox model trained in the FINRISK cohort (18,427 complete cases with ≥ 10 years of follow-up, 628 events). In a model not including the alcohol consumption estimate, adding Table 2 Cohort specific and meta-analyzed associations between the alcohol consumption PRS and alcohol-related (a) morbidity and (b) mortality.

Discussion
We developed a highly polygenic risk score for alcohol consumption by obtaining weights from a recently published large-scale discovery sample and showed that the PRS was strongly associated with alcohol consumption in independent biobank cohort samples. An increased polygenic burden for alcohol consumption was associated with higher incidence of major alcohol-induced health events. The associations remained significant when we accounted for self-reported alcohol consumption and other relevant covariates; in a fully adjusted model, the relative risk-estimate between the highest and lowest quintiles of the PRS was 1.6. Furthermore, the PRS was also associated with both alcohol-related, non-alcohol related, and all-cause mortality.
Our PRS shows the utility of genetic information for prediction of alcohol-related harm. The PRS, developed from a genetic analysis of cross-sectional self-reported alcohol consumption, was associated with future risk of major alcohol-related health events. While a large number of PRSs have already been established for various traits and diseases 12 , the development of PRSs for behavioral traits, such as substance use, has until now been limited [26][27][28][29] and the studies have not assessed their impact on future major health events.
Our results show that using a large sample size with long follow-up, we were able to build a PRS of alcohol consumption that is associated not only with alcohol consumption in independent samples, but also with future incident alcohol-related health events. In line with the knowledge that alcohol consumption is a major contributor to the worldwide burden of death, especially among working-age adults 1 , we found the PRS to be associated also with all-cause mortality, further highlighting the importance of alcohol drinking as a cause of premature death.
Our score provides a genetic basis for potentially identifying a subset of high-risk individuals even early on in life, with potential for more targeted prevention of AUDs and other alcohol-related morbidity. Prevention is a cost-effective and efficient strategy to reduce alcoholrelated harms 30 and it is labeled one of the United Nations main health-related worldwide strategies of sustainable development (https://sustainabledevelopment.un.org/ sdg3). A higher genetic predisposition for alcoholrelated harms was detected both in the presence and absence of alcohol consumption data, as our PRS predicted alcohol-related harms beyond self-reported alcohol consumption. Health services are encouraged to support initiatives for screening and brief interventions for harmful drinking 31 as an effective strategy for tackling alcohol-related harm 32 . Thus, genetic information could potentially be used to improve the arsenal of possible strategies to detect high-risk individuals for targets of brief interventions. The fact that individuals in the highest PRS quintile showed an elevated risk for alcohol-related health events even in fully adjusted models could justify the use of genetic information even in clinical settings where a detailed history of alcohol consumption estimates, AUDIT-scores, or similar information are attainable. Communicating the information of higher risk for alcohol-related harm to patients could serve as a motivator for reducing drinking or committing to abstinence. However, the true effects of informing patients about their alcohol-related genomic risk warrants further research.
Self-reported alcohol consumption is known to be biased and problematic in terms of reliability and validity for predicting alcohol-related risks 33,34 . Also, GGT is known to be less-than-ideal biochemical measure of drinking 35 . Some inaccuracy derives from true measurement error, but another source is the lifelong temporal fluctuation of alcohol-drinking patterns not captured by a measure at one single timepoint. Our PRS was associated with alcohol-related harms even when adjusting for selfreported alcohol consumption estimate. One potential reason for this is that the PRS contains information from the latent genetic predisposition for alcohol consumption, thus overriding both the true measurement error and temporal fluctuations in alcohol drinking volume.
Furthermore, it has been hypothesized that alcohol consumption-based genetic discovery might inform more about low-level drinking than about problematic drinking and AUDs 36 . However, we built a PRS for alcohol consumption and successfully used it to predict alcoholrelated harms. Due to the robustness of a self-reported single timepoint alcohol consumption estimate and the fact that different alcohol-related traits are to some degree genetically distinct [9][10][11] , it is expected that a PRS developed directly for alcohol-related morbidity will outperform our PRS in predicting alcohol-related health burden. Supporting this assumption, the general pattern is that PRSs are more strongly associated with their respective diseases than with related phenotypes. 14,15 . Unfortunately, no high-quality summary statistics for alcohol-related harms including both somatic and psychiatric outcomes yet exist; the performed GWAS have only covered AUD and alcohol dependence 10,11 and been smaller in size than our discovery sample of choice, thus making future efforts for large-scale GWAS discovery based on alcohol-related harms more than necessary.
Our PRS was derived using European ancestry discovery samples and tested in the Finnish population. Its applicability in other populations therefore needs further evaluation as the alcohol-related genetic mechanisms may vary between populations. However, it has to be noted that the PRS derived from a non-Finnish sample performed well in the Finnish dataset, even though Finns are somewhat genetically different from the rest of the Europeans 37 .
Our design allowed us to study outcomes prospectively. Our registry-based follow-up captures alcohol-related outpatient and inpatient visits, withdrawal treatment prescription for alcoholism, and deaths, thus covering major alcohol-related health events over several decades. Nonetheless, some of the milder cases of alcohol-related health problems could have gone undetected.
In conclusion, a PRS for alcohol consumption was associated with elevated risk for incident alcohol-related health events and all-cause mortality. These findings underline the importance of heritable factors driving alcohol-related behavior. A successful attempt to predict alcohol-related health outcomes with a PRS shows promise in possible future utilization of genetic information in risk estimation and prediction of alcohol-related harms.