Alcohol consumption and mate choice in UK Biobank: comparing observational and Mendelian randomization estimates

Alcohol use is correlated within spouse-pairs, but it is difficult to disentangle the effects of alcohol consumption on mate-selection from social factors or cohabitation leading to spouses becoming more similar over time. We hypothesised that genetic variants related to alcohol consumption may, via their effect on alcohol behaviour, influence mate selection. Therefore, in a sample of over 47,000 spouse-pairs in the UK Biobank we utilised a well-characterised alcohol related variant, rs1229984 in ADH1B, as a genetic proxy for alcohol use. We compared the phenotypic correlation between spouses for self-reported alcohol use with the association between an individual’s self-reported alcohol use and their partner’s rs1229984 genotype using Mendelian randomization. This was followed up by an exploration of the spousal genotypic concordance for the variant. We found strong evidence that both an individual’s self-reported alcohol consumption and rs1229984 genotype are associated with their partner’s self-reported alcohol use. The Mendelian randomization analysis, found that each unit increase in an individual’s weekly alcohol consumption increased their partner’s alcohol consumption by 0.29 units (95% C.I. 0.20, 0.38; P=2.15×10−9). Furthermore, the rs1229984 genotype correlated within spouse-pairs, suggesting that some spousal correlation existed prior to cohabitation. Although the SNP is strongly associated with ancestry, our results suggest that this concordance is unlikely to be explained by population stratification. Overall, our findings suggest that alcohol behaviour directly influences mate selection.


Introduction
Human mate choice is highly non-random; spouse-pairs are generally more phenotypically similar than would be expected by chance [1][2][3][4][5][6] . Previous studies suggest that alcohol related phenotypes, ranging from consumption to alcohol dependence, are highly correlated within spouse-pairs [7][8][9][10][11][12][13] . However, the extent to which the spousal correlation is due to the effect of alcohol behaviour on mate selection (assortative mating) is currently unclear. Indeed, the spousal correlation may be related to assortment on other social and environmental factors (social homogamy) or a consequence of an individual's partner influencing their alcohol behaviour after the individuals have paired up (partner interaction effects) [11][12][13] . The mechanism explaining spousal concordance for alcohol consumption could have important implications. For example, partner interactions over time explaining the spousal concordance would suggest that public health policy may benefit from focusing on couples rather than individuals to reduce population level alcohol intake. One biological mechanism that partially explains the phenotypic concordance between spouse-pairs is that they are on average more genetically similar across the genome than non-spouse-pairs 14 . Genotypes implicated in the aetiology of height, education, blood pressure and several chronic diseases have been shown to be correlated within spouse-pairs [15][16][17][18] . It is not known whether genetic variants implicated in alcohol metabolism, via their effect on alcohol behaviour, contribute to mate selection.
Notably, genetic variants in the Alcohol Dehydrogenase (ADH) and Aldehyde Dehydrogenase (ALDH) gene families are associated with differences in alcohol consumption. For example, ADH1B is involved in the production of enzymes that oxidise alcohol and so individuals with certain alleles may find alcohol consumption unpleasant, resulting in lower intake. Similarly, a genetic variant in ALDH2, rare in non-east Asian populations, is associated with a "flush reaction" to alcohol 28 29 .
Alcohol consumption-related genetic variants can be useful to determine the most likely explanation for the spousal phenotypic correlation for alcohol use, by analogy with Mendelian randomization studies 30 . Genetic variants for alcohol consumption are in theory less susceptible to confounding from socioeconomic and behavioural factors than measured alcohol consumption so can be used to rule out the possibility that social homogamy is driving the spousal phenotypic correlation 30 31 . The timing of the effects of alcohol consumption can be discerned by evaluating the spousal genotypic correlation for alcohol use-related variants. Genotypic correlation would imply that an effect exists prior to pairing, suggesting that some degree of the spousal phenotypic correlation is attributable to assortative mating ( Figure 2).
In this study we aimed to explore spousal similarities for alcohol consumption using observational and genetic data. First, we estimated the association of an individual's self-reported alcohol use with the self-reported alcohol use of their partner. Second, we used a Mendelian randomization framework to estimate the effect of an individual's alcohol use on their spouse's alcohol use. Here, we used their partner's rs1229984 genotype, a missense mutation in ADH1B strongly associated with alcohol consumption as an instrumental variable for self-reported alcohol consumption. Third, we estimated the association of rs1229984 genotype between spouses, to evaluate the timing of possible causal effects, and investigate the possibility of bias from population stratification. As a positive control, to demonstrate the validity of derived spouse pairs and the usage of a Mendelian randomization framework, we also analysed height, known to be correlated between spouses, using similar methods.

UK Biobank
UK Biobank is a large-scale cohort study, including 502,655 participants aged between 40-69 years. Study participants were recruited from 22 recruitment centres across the United Kingdom between 2006 and 2010 32 .

European sub-sample and spouse pairs
Spouse information is not explicitly available, therefore we used similar methods to previous studies [15][16][17] to identify spouse-pairs in the UK Biobank. Firstly, the data-set was restricted to a subset of 463,827 individuals of recent European descent. Individuals of non-European descent were removed based on a k-means cluster analysis on the first 4 genetic principal components 33 .
Next, household sharing information was used to extract pairs of individuals who (a) report living with their spouse, (b) report the same length of time living in the house, (c) report the same number of occupants in the household, (d) report the same number of vehicles, (e) report the same accommodation type and rental status, (f) have identical home coordinates (rounded to the nearest km), (g) are registered with the same UK Biobank recruitment centre and (h) both have available genotype data. If more than two individuals shared identical information across all variables, these individuals were excluded from analysis. At this stage, we identified 52,471 potential spouse-pairs. We excluded 4,866 potential couples who were the same sex (9.

Height
At baseline, the height (cm) of UK Biobank participants was measured using a Seca 202 device at the assessment centre. Measured height was used as a positive control for the application of a Mendelian randomization framework in the context of assortative mating.

Self-reported alcohol variables
At baseline, study participants completed a questionnaire. Participants were asked to describe their current drinking status (never, previous, current, prefer not to say) and estimate their current alcohol intake frequency (daily or almost daily, three or four times a week, once or twice a week, one to three times a month, special occasions only, never, prefer not to say). Individuals reporting a current intake frequency of at least "once or twice a week" were asked to estimate their average weekly intake of a range of different alcoholic beverages (red wine, white wine, champagne, beer, cider, spirits, fortified wine).
From these variables, we derived three measures: ever or never consumed alcohol (current or former against never), a binary measure of current drinking for self-reported current drinkers (three or more times a week against less than three times a week) and an average intake of alcoholic units per week, derived by combining the self-reported estimated intakes of the different alcoholic beverages consumptions across the five drink types, as in a previous study 24 . The questionnaire used the following measurement units for each of the five alcoholic drink types: measures for spirits, glasses for wines and pints for beer/cider which were estimated to be equivalent to 1, 2 and 2.5 units respectively. Individuals reporting current intake frequency of "one to three times a month", "special occasions only" or "never" (for whom this phenotype was not collected), were assumed to have a weekly alcohol consumption volume of 0.
Genotyping 488,377 UK Biobank study participants were assayed using two similar genotyping arrays, the UK BiLEVE Axiom™ Array by Affymetrix1 (N= 49,950) and the closely-related UK Biobank Axiom™ Array (N= 438,427). Directly genotyped variants were pre-phased using SHAPEIT3 36 and then imputed using Impute4 using the UK10K 37 , Haplotype Reference Consortium 38 and 1000 Genomes Phase 3 35 reference panels. Post-imputation, data were available on approximately ~96 million genetic variants 39 40 .

Phenotypic spousal correlation for height
To verify the validity of the derived spouse-pair sample, we evaluated the spousal phenotypic correlation for height. Previous studies have found strong evidence of spousal correlation for height, so comparable results would be consistent with derived spouses being genuine. The spousal phenotypic correlation was estimated using a linear regression of an individual's height against the height of their partner, adjusting for sex. With one unique phenotype pairing within couples, each individual in the data-set was included only once as either the reference individual or their partner.

Phenotypic spousal correlation for self-reported alcohol behaviour
To evaluate the phenotypic correlation on alcohol use we compared selfreported alcohol behaviour between spouses. We estimated the spousal correlation for the two binary measures (ever or never consumed alcohol, three or more times a week) using a logistic regression of the relevant variable for an individual against the relevant variable for their partner, adjusting for sex. Similarly, linear regression was used to estimate the spousal-correlation for continuous weekly alcohol consumption volume, adjusting for sex. Spouse-pairs with any missing phenotype data, or where one or more spouses reported their weekly alcohol consumption volume to be more than five standard deviations away from the mean (calculated using the sample of individuals with non-zero weekly drinking) were removed from relevant analyses.
With one unique phenotype pairing within couples, each individual in the data-set was included only once as either the reference individual or their partner.

Mendelian randomization: Genetically influenced height and measured height of partner
We validated the application of an Mendelian randomization approach to assortative mating using height as a positive control; genotypes influencing height have previously demonstrated to be highly correlated between spouse-pairs 15 . As a measure of genetically influenced height, we started with 382 independent SNPs, generated using LD clumping (r 2 <0.001) in MR-Base 41 , from a recent Genome-wide Association Study (GWAS) of adult height in Europeans 42 .
For the purposes of the Mendelian randomization analysis, we restricted analyses to spouse-pairs with complete measured height data and genotype data.
First, we estimated the association between the 382 SNPs and height in the same individual, using the spouse-pair sample with sex included as a covariate. We removed 23 SNPs that were not strongly associated with height (P> 0.05) or with inconsistent directions of effect between our sample and the GWAS summary statistics. Second, we estimated the association between the 359 remaining SNPs and spousal height. PLINK 43 was used to estimate the SNP-phenotype associations also including sex as a covariate. We then estimated the effect of a 1 cm increase in an individual's height on their partner's height using the TwoSampleMR R package 41 and the internally derived weights described above. The fixed-effects Inverse-Variance Weighted (IVW) method was used as the primary analysis. Cochran's Q test and the I 2 statistic were used to test for heterogeneity in the fixed-effects IVW 44 .
MR Egger 45 was used to test for directional pleiotropy. The weighted median 46 and mode 47 were used to test the consistency of the effect estimate. With two unique pairings between genotype and phenotype in each couple, each individual in the data-set was included twice as both the reference individual and as the partner.

Mendelian randomization: Genetically influenced alcohol consumption volume and self-reported alcohol consumption of partner
We then applied the Mendelian randomization framework to investigate if an individual's genotype at rs1229984 in ADH1B affects the self-reported alcohol consumption volume of their partner. Given the rarity of individuals homozygous for the minor allele in European populations, the MAF is 2.9% in the 1000 Genomes CEU population 35 , we assumed a dominant model consistent with previous studies 48 49 . We restricted analysis to spouse-pairs where both members had genotype data, and one or more members had self-reported alcohol consumption volume.
First, we estimated the association of the rs1229984 genotype with alcohol consumption in the same individual after adjusting for sex. Second, we estimated the association between rs1229984 and spousal alcohol consumption after adjusting for sex. PLINK 43 was used to estimate the SNP-phenotype associations. We then estimated the effect of a 1 unit increase in an individual's weekly alcohol consumption volume on the same variable in their partner. The Wald ratio estimate was obtained using mr_wald_ratio function in the TwoSample MR R package 41 using internally derived weights. Sensitivity analyses were limited due to the use of a single genetic instrument. With two unique pairings between genotype and phenotype in each couple, each individual in the data-set was included twice as both the reference individual and as the partner.

Spousal genotypic correlation for rs1229984 genotype
We then investigated properties of the rs122984 variant in the UK Biobank that may be relevant to assortative mating. Starting with the UK Biobank subset of 463,827 individuals of recent European descent, we removed 78,540 related individuals (relevant methodology has been described previously 33 ) and tested Hardy-Weinberg Equilibrium (HWE) in the resulting sample of 385,287 individuals.
We then investigated the association of the SNP with genetic principal components and birth coordinates. As a sensitivity analysis we also restricted the sample to a more homogeneous sample of white British individuals, provided by the UK Biobank, and repeated analyses. With one unique genotype pairing within couples, each individual in the data-set was included only once as either the reference individual or their partner.
We then estimated the genotypic concordance between derived spouse-pairs for rs1229984 genotype using logistic regression, again assuming a dominant model.
As a sensitivity analysis, we then investigated the possibility that spousalconcordance for rs1229984 was driven by fine-scale assortative mating due to geography, which is itself associated with genetic variation within the UK 50 51 .

Phenotypic spousal correlation for self-reported alcohol behaviour
The majority of derived spouse-pairs had complete data for relevant selfreported alcohol behaviour phenotypes. Strong evidence was found for phenotypic correlation between spouse-pairs for all self-reported alcohol variables. Amongst 47,510 spouse-pairs, an individual self-reporting as a never-drinker was associated with increased odds (OR 14.06, 95% C.I., 11.95, 16.50 P<10 -16 ) of their partner selfreporting as a never-drinker. Similarly, when restricting to 42,844 pairs who both reported being current-drinkers, an individual drinking three or more times a week had increased odds (OR 6.64, 95% C.I., 6.34, 6.94 P<10 -16 ) of their partner also drinking three or more times a week.
For self-reported alcohol consumption volume; 47,510 spouse-pairs had either complete phenotype data or reported their consumption frequency as less than weekly (in which case their weekly volume was assumed to be 0). After removing 189 pairs with outlying values from one or more members, the final sample included 47,321 spouse-pairs. In this sample, each unit increase in an individual's weekly alcohol consumption volume was associated with a 0.38-unit increase (95% C.I. 0.37, 0.38 P<10 -16 ) in the same variable in their partner.

Genetically influenced height and height of partner
The application of Mendelian randomization to spousal height was consistent with the previous evidence for assortative mating on height. Across

Mendelian randomization framework: Genetically influenced alcohol consumption and self-reported alcohol behaviour of partner
To evaluate the degree to which an individual's alcohol consumption is affected by their partner's genetically influenced alcohol consumption, we used the same sample of 47,321 spouse-pairs from the previous phenotypic correlation analysis. In this sample, individuals with two copies of the ADH1B major allele

Characteristics of rs1229984 in the UK Biobank
In suggesting that population substructure differences may explain the HWE results.
The SNP was found to be strongly associated with both genetic principal components and birth coordinates in both samples. In the less restrictive European  Table 2).

Discussion
In this study, we used a large sample of derived spouse-pairs in a UK-based cohort to demonstrate that an individual's self-reported alcohol use and their genotype for an alcohol implicated variant, rs1229984 in ADH1B, are associated with their partner's self-reported alcohol use. Furthermore, we showed that the genotype of a variant influencing alcohol metabolism, rs1229984, is correlated within spousepairs. There are three possible explanations for our findings. First, that rs1229984 influences alcohol behaviour, which has a downstream effect on mate selection.
Second, that a participant's alcohol use is influenced by their partner's alcohol use.
Third, that given the strong association of the SNP with both genetic principal components and birth coordinates, the spousal concordance is related to factors influencing social homogamy, independent of alcohol behaviour, such as place of birth, ancestry or socio-economic status. Indeed, the allele frequency of rs1229984 was found to deviate between European and white British subsets of the UK Biobank.
However, we presented evidence suggesting that a substantial proportion of the spousal concordance is likely to be explained by the biological effects of the variant on alcohol consumption. Firstly, we have tested the association between a causal SNP for alcohol consumption, and not the measured consumption itself, thereby avoiding any post-birth confounding factors suggesting that alcohol use has a direct effect on spousal alcohol use. Secondly, because rs1229984 is correlated between spouses, there must be some degree of assortment on alcohol consumption prior to cohabitation. This suggests that the spousal correlation cannot be entirely due to the effect of the individual's alcohol consumption behaviour on their spouse's behaviour. Thirdly, in a sensitivity analysis, we controlled for shared ancestry, which could have induced confounding, by excluding spouse-pairs born more than 100 miles apart, and the within sub-population effect estimates remained consistent.
The strong evidence for spousal-correlation on the variant has implications for conventional Mendelian randomization studies (i.e. estimating the causal effect of an exposure on an outcome) 30 which use the SNP as a genetic proxy for alcohol intake 48 . Assortative mating could lead to a violation of the Mendelian randomization assumption, that the genetic instrument for the exposure is not strongly associated Interestingly, the minor allele of rs1229984 (i.e. associated with lower alcohol consumption) has been previously found to be positively associated with years in education 48 and socio-economic related variables, such as the Townsend deprivation index and number of vehicles in household 54 55 . These associations may be down-stream causal effects of alcohol consumption, which implies that some of the spousal concordance for alcohol consumption could be explained by assortative mating on educational attainment 15 or alternatively these associations may reflect maternal genotype and intrauterine effects 56 . Over time, assortative mating on alcohol consumption may further strengthen the associations between rs1229984 and socio-economic related variables 53 . Of further interest is that the variant has previously been shown to be under selection 57  populations. This is because of potential contextual influences; for example, in East Asian populations, males are much more likely to consume alcohol than females 59 60 . Additionally, there is some evidence that the effect of genetic contributors to alcohol varies across different populations 27 .
To conclude, our results suggest that there is non-random mating on rs1229984 in ADH1B, likely related to the effect of the variant on alcohol behaviour.
These results suggest that alcohol use influences mate selection and argue for a more nuanced approach to considering social and cultural factors when examining causality in epidemiological studies. Further research investigating other alcoholimplicated variants, and other societies and ethnicities, would strengthen these conclusions.  (2) Mendelian randomization framework. An association between an individual's alcohol influencing genotype and their spouse's alcohol use would suggest that the spousal correlation is explained by either assortative mating or partner interaction effects. Genetic variants are unlikely to be associated with socio-economic confounders suggesting that social homogamy is unlikely.
(3) Genotypic correlation. Genotypic correlation for alcohol related genetic variants would suggest that some degree of the spousal correlation is explained by assortative mating. Partner interaction effects cannot lead to genotypic correlation because genotypes are fixed from birth.