Introduction

Smoking is a risk factor for many serious diseases and a leading cause of preventable deaths worldwide.1 Genome-wide association studies relying on common markers have yielded loci associating with smoking behavior, including genomic regions containing genes encoding various subunits of nicotinic acetylcholine receptors (nAChRs) on chromosomes 15q25 (CHRNA5/CHRNA3/CHRNB4),2, 3 8p11 (CHRNB3/CHRNA6)3 and recently 20q11 (CHRNA4).4 Sequence variants within the CHRNA5/CHRNA3/CHRNB4 cluster on chromosome 15q25 have been shown to associate with the number of cigarettes smoked per day (CPD),2, 5 nicotine dependence (ND)2, 6 and the smoking-related diseases lung cancer (LC),2, 7, 8, 9 peripheral arterial disease (PAD),2 chronic obstructive pulmonary disease (COPD)10 and upper aerodigestive tract cancer.11 The role of the CHRNA5/CHRNA3/CHRNB4 locus for smoking, ND and the consequences of smoking was established through human genetics approaches, and these initial findings have paved the way for basic research aimed at elucidating the underlying mechanisms.12

nAChR channels are pentameric and comprised of various types of subunits. Twelve α subunits and three β subunits have been cloned, and various isoforms containing both α4 and β2 subunits are the most abundant in mammalian brain.13 nAChRs have a large dynamic range with respect to agonist activation, but this is partially explained by the presence of heteromers with different stoichiometric composition of subunits with different sensitivities. Measurements indicate that (α4)3(β2)2 is a low-sensitivity subtype (LS), whereas (α4)2(β2)3 is of high sensitivity (HS).14 Chronic nicotine exposure upregulates nAChRs in a complex process involving molecular chaperoning,15, 16 and upregulation of α4β2 receptors by nicotine favors the HS form.17 Lowered sensitivity to psychoactive drugs is generally believed to correlate with increased risk of substance dependence, and it has been suggested that increased expression of LS relative to HS nAChRs in the central nervous system would increase risk of ND.18

Results of animal studies indicate that both α4 and β2 subunits influence key addiction-related processes.13 CHRNA4 and CHRNB2 have long been considered among the strongest functional candidate genes in the genome when it comes to nicotine addiction and related phenotypes, and an efficacious smoking cessation aid (varenicline) was developed as a partial agonist of the α4β2 isoforms.19 A large number of candidate gene studies have targeted CHRNA4 and CHRNB2, but the results have been inconclusive. In a companion paper4 we show that common markers within CHRNA4 exhibit genome-wide significant association with score on the Fagerström Test of Nicotine Dependence (FTND),20 the most widely used measure of nicotine addiction. The genome-wide association study of FTND was limited to the study of common variants,4 and here we study rare variants within CHRNA4, by testing eight missense variants in CHRNA4 for association with FTND.

Materials and Methods

Study populations

The Icelandic studies include a number of ongoing projects in Iceland. All Icelandic studies were approved by the Icelandic Data Protection Authority and the National Bioethics Committee. All subjects who donated samples also signed informed consent. Personal identifiers of the patients and biological samples were encrypted by a third-party system provided by the Icelandic Data Protection Authority. In addition, replication studies were performed by genotyping samples from a number of studies conducted elsewhere. All subjects in these studies gave written informed consents and are of European origin. The association studies of the variants were performed using genotypes from whole-genome sequence data and imputation approaches previously outlined.21 Further details of the various studies involved are provided below (also see Results and Tables).

Smoking phenotypes

The discovery sample utilized FTND data from questionnaires answered by participants in deCODE’s study of ND.2, 3 Responses to FTND questions generate a score of 0–10, with higher scores representing greater ND.20 Information on smoking quantity (SQ) was also utilized, but SQ was available from a standardized smoking questionnaire used in deCODE’s studies that asks: ‘How many cigarettes per day do/did you smoke on average (on most days)?’ This means that current smokers answer their current consumption and former smokers refer to their consumption in the past. In cases where multiple records were available we recorded the maximum. The SQ was categorized into four levels, (1–10, 11–20, 21–30 and 31+ CPD). A total of 40 573 subjects (34 200 chip-typed) were included in the R336C analysis. Ever smokers were recruited in the years 1997–2014 as part of various Icelandic studies, and the FTND questionnaires were administered to participants in the study of ND (2004–2014). Analysis of subjects with complete FTND data showed that of those reporting 1–10 CPD, 85% had a total score of 3 or less on the FTND, and <3% scored above 6 on the FTND. We included 4313 chip-typed low-quantity smokers (CPD<11) but without additional FTND items answered to the FTND 0–3 group for the meta analysis of common marker association with FTND.4 In our studies of R336C we included additional low-quantity smokers in this category, bringing the total number in the category to 18 184, with 15 527 chip-typed. The number of subjects per category were for R336C analysis:

FTND 0–3 (mild ND): 18 184 (15 527 chip-typed)

FTND 4–6 (moderate ND): 2008 (1980 chip-typed)

FTND 7–10 (severe ND): 1106 (1085 chip-typed).

Smoking-related diseases

Lung cancer

Iceland The Icelandic LC study has been described previously.2 The primary source of information on the Icelandic LC cases is the Icelandic Cancer Registry, which covers the entire population of Iceland (http://www.cancerregistry.is). Briefly, according to the Icelandic Cancer Registry, a total of 4560 LC patients were diagnosed from 1 January 1955 to 31 December 2012. Recruitment of both prevalent and incident cases was initiated in 1998, the recruitment is ongoing and DNA samples from LC cases were subjected to whole-genome genotyping as they are collected. The LC associations were based on 1437 chip-typed LC cases (median age 70 years, range 16–97, 48% males) and 84 086 chip-typed controls in addition to 2572 LC cases and 137 443 controls who had at least one of their first- or second-degree relatives chip-typed. To study early-onset LC we considered subjects diagnosed before reaching an age of 58 years.

Nijmegen, The Netherlands The Dutch study population was previously described.22 The 634 patients with LC were identified through the population-based cancer registry of the Comprehensive Cancer Center IKO, Nijmegen, the Netherlands, and recruited through several independent studies.22 The 4670 cancer-free controls were selected from participants of the ‘Nijmegen Biomedical Study’.23 All controls are of self-reported European descent. The study protocols of the Nijmegen Biomedical Study were approved by the Institutional Review Board of the Radboud University Nijmegen Medical Centre.

Zaragoza, Spain The Spanish study population was previously described.22 The 455 LC cases were recruited from the Oncology Department of Zaragoza Hospital in Zaragoza, from June 2006 to June 2009. The 1444 Spanish controls were patients at the University Hospital in Zaragoza for diseases other than cancer, between November 2001 and May 2007. Study protocols were approved by the Institutional Review Board of Zaragoza University Hospital.

Denver, Colorado DNA samples from blood and clinical data were provided by the University of Colorado Cancer Center under COMIRB protocol 08–0380. Blood samples were collected from 1217 patients diagnosed with different diseases enrolled in any of 20 clinical research trials carried out at Colorado SPORE protocols between 1993 and 2008. Of these 1217 patients, 246 were LC cases and 971 had never had LC at the time of sample shipment. LC cases were identified either from data matches with the Colorado Central Cancer Registry or by having malignant lung tissue collected via enrollment in a surgical protocol. Work in this study was limited to cases and controls of self-reported European ancestry (195 cases and 798 controls).

Norway Comprehensive details of cases and controls have been previously published.24 Briefly, a systematic series of 426 patients with histologically proven non-small cell LC treated with surgery were ascertained through university hospitals in Oslo or Bergen between 1986 and 2001. All of the cases were either smokers at the time of surgery or ex-smokers. Healthy controls were recruited from a general health survey of individuals conducted by the National Health Surveys in the Oslo area (HUBRO) between 2000 and 2001, or in western Norway. Among them, smokers without any known history of cancer were randomly selected and frequency matched to the cases on age, smoking dose (pack-years) and gender. Cases and controls were interviewed using questionnaires that allowed collection of the same demographic and lifestyle information. Both cases and controls were Norwegians of European descent. The study was approved by the regional ethical committee in accordance with the Helsinki Declaration.

Chronic obstructive pulmonary disease

Iceland COPD was defined broadly and included diagnoses of chronic bronchitis or emphysema made between 1980 and 2011 at the Landspitali University Hospital. A total of 8246 subjects born between 1883 and 1964 were included, with most (5871) of the confirmed COPD diagnoses made during the years 1998–2011. The actual age at onset of COPD is difficult to assess, as patients have usually had signs and symptoms for a long time before admission to hospital. To define an early-onset group we used year of birth as a surrogate, using the group born between 1940 and 1964 (N=1470) representing the youngest 19.6% of patients. Association analyses were based on 3857 chip-typed cases, and 3639 patients with at least one first- or second-degree relative chip-typed.

United Kingdom A total of 206 subjects with COPD defined by spirometry (forced expiratory volume (FEV)/forced vital capacity (FVC) <0.7, FEV1% <80%), over 40 years of age, and 338 controls, as part of the Nottingham Smokers Study.25

Sweden Participants in a population study on COPD in Norrbotten (Olin study), Sweden.26 Those who answered questionnaire were followed up with spirometry and blood sample; 581 COPD patients (defined by spirometry (FEV1/FVC<0.7 and FEV1<80% of predicted values) or (FEV1/FVC<0.7)) and 918 controls.

Denmark Participants in a LC screening study of 4104 current or previous smokers (2004–2006),27 708 were diagnosed with COPD based on sprirometry (FEV1/FVC<0.7) and emphysema based on high resolution computed tomography scan.

Peripheral arterial disease

Iceland The study population has been described previously.2 Briefly, Icelandic individuals with PAD were recruited from a registry of individuals diagnosed with PAD at the Landspitali University Hospital, during the years 1983–2011. The PAD diagnosis was confirmed by vascular imaging or segmental pressure measurements. To study early-onset PAD we considered subjects diagnosed with PAD before reaching age 65.

Abdominal aortic aneurysm

Iceland Icelanders with abdominal aortic aneurysms (AAAs) were recruited from a registry of individuals who were admitted either for emergency repair of symptomatic or ruptured AAA, or for an elective surgery to the Landspitali University Hospital, in Reykjavik, Iceland, in 1980–2011. Individuals (with an unruptured aneurysm30 mm) were also recruited through a high-risk screening of first-degree relatives of AAA patients. To study early-onset AAA we considered subjects diagnosed with AAA before reaching age 65.

Single-track assay single-nucleotide polymorphism genotyping Single single-nucleotide polymorphism (SNP) genotyping was carried out by deCODE Genetics in Reykjavik, Iceland, applying the Centaurus (Nanogen) platform.7 The same assay was used to obtain genotypes for rs56175056 on three COPD sample sets, from the United Kingdom, Sweden and Denmark, as well as LC cases and controls from the Netherlands, Spain, the United States and Norway.

Illumina SNP chip genotyping The Icelandic chip-typed samples were assayed with the Illumina HumanHap300, HumanCNV370, HumanHap610, HumanHap1M, HumanHap660, Omni-1, Omni 2.5 or Omni Express bead chips (Illumina, San Diego, CA, USA) at deCODE genetics. SNPs were excluded if they had (i) yield lower than 95%, (ii) minor allele frequency <1% in the population, (iii) significant deviation from Hardy–Weinberg equilibrium in the controls (P<0.001), (iv) if they produced an excessive inheritance error rate (over 0.001) or (v) if there was substantial difference in allele frequency between chip types (from just a single chip if the problem that resolved all differences, but from all chips otherwise). All samples with a call rate below 97% were excluded from the analysis. For the HumanHap series of chips, 308 840 SNPs were used for long-range phasing, whereas for the Omni series of chips 642 079 SNPs were included.

Whole-genome sequencing and SNP calling SNPs were imputed based on whole-genome sequence data from about 2636 Icelanders, selected for various neoplastic, cardiovascular, metabolic and psychiatric conditions. The sample preparation, DNA-sequencing methodology, alignment and BAM file generation have been described previously.21

Genotype imputation methods

Long-range phasing and genotype imputation Long-range phasing of all chip-genotyped individuals was performed with methods described previously,21 and imputed into chip-typed individuals and their close relatives using methods21 based on IMPUTE.28 In brief, phasing is achieved using an iterative algorithm, which phases a single proband at a time given the available phasing information about everyone else that shares a long haplotype identically by state with the proband. Given the large fraction of the Icelandic population that has been chip-typed, accurate long-range phasing is available genome wide for all chip-typed Icelanders. The informativeness of genotype imputation was estimated by the ratio of the variance of imputed expected allele counts and the variance of the actual allele counts:

I=Var[E(θ|chipdata)]/Var(θ)

where θ{0,1}is the allele count. Var[E(θ|chipdata)] was estimated by the observed variance of the imputed expected counts and Var(θ) was estimated by p(1−p), where p is the allele frequency. For further information on the imputation methodology see previous descriptions of our methods.21

Association analysis

The details of our methodology for association analysis have been described,21 but a brief description is provided below.

Case:control association testing Logistic regression was used to test for association between SNPs and disease, treating disease status as the response and expected genotype counts from imputation or allele counts from direct genotyping as predictor. Testing was performed using the likelihood ratio statistic. When testing for association using the imputed genotypes, controls were matched to cases based on the informativeness of the imputed genotypes, such that for each case controls of matching informativeness were chosen. Failing to match cases and controls will lead to a highly inflated genomic control factor, and in some cases may lead to spurious false-positive findings.

Quantitative traits A generalized form of linear regression was used to test the correlation between the variants tested and quantitative traits (FTND and CPD) in Iceland. The generalized form assumes that the smoking behavior of related individuals is correlated proportional to the kinship between them rather than assuming that the smoking phenotypes of all individuals are independent.

Inflation factor adjustment In order to account for the relatedness and stratification within the case and control sample sets we applied the method of genomic control based on chip-typed markers using a subset of about 300 000 common variants. Quoted P-values were adjusted accordingly. The inflation factors for the various analyses are listed in the Supplementary Material.

Analysis of replication data Association analysis for each replication study group was carried out using Fisher’s exact text, and results for different study groups were combined using the Mantel–Haenszel exact test.

Results

To search for rare variants likely to influence function in CHRNA4 we analyzed whole-genome sequence data from 2636 Icelandic subjects, identifying eight non-synonymous variants present in high enough frequency to allow imputation into our long-range phased data set21 (usually approximately five carriers in our sequence data, currently corresponding to an allele frequency of about 0.1%), generating genotypes for 104 220 chip-typed Icelanders and 294 212 close relatives. Eight variants met our criteria, and following imputation they were tested for association with the FTND score (Table 1) with the significance threshold set at P=6.25 × 10−3 (0.05/8). The mutation encoding R336C associates with FTND score (P=1.2 × 10−4), whereas none of the other mutations tested showed significant association (Table 1). To improve imputation accuracy, we typed 1441 subjects for the R336C variant using a single-marker assay (Nanogen) and repeated the imputation process. As a result of this effort the imputation information increased from 0.939 to 0.999, and all results reported here were based on re-imputed data. On the basis of 1408 (original imputation) or 1441 individuals (re-imputation) with both typed and imputed genotypes the imputation accuracy for this variant as estimated by the correlation with typed genotypes changed from 0.946 (original imputation) to 0.997 (re-imputation).

Table 1 Association of eight rare missense variants in CHRNA4 with FTND

There is some evidence for a founder effect as the frequency of R336C (0.24%) is higher in the Icelandic population than in any other population with a reported frequency. In the ExAC Browser (www.exac.broadinstitute.org), 31 carriers of R336C are reported in non-Finnish European populations, indicating an allele frequency of 0.046%. The variant was not detected in Asian or Latino populations, and 2 carriers from African populations (f=0.019%) are reported.

To explore this association further we also examined association of the variant coding for R336C substitution with several sub-phenotypes related to nicotine addiction and the health consequences of smoking (Table 2). In selecting the smoking phenotypes and smoking-related diseases to study we were guided by previously observed associations between the key variant in CHRNA5, rs1051730/rs16969968 and CPD, LC, PAD, COPD and upper aerodigestive tract cancers.12 In addition, we have observed association of rs1051730-A with AAAs in Iceland (odds ratio (OR)=1.19, P=0.009). As a subtype of ND we also include the heavy smoking index,29 a measure that combines two of the key questions on the FTND,20 SQ and the time to first cigarette after waking up, with a score of 4 or higher being considered high and comparable to an FTND score of ~6.30 For the smoking-related diseases we also considered the effect of the variant in early-onset cases separately (Materials and Methods).

Table 2 CHRNA4 R336Ca confers risk of heavy smoking and smoking-related diseases

All the smoking-related diseases with observed associations with rs1051730/rs16969968 show some evidence of association with R336C (Table 2), and the effects are larger for early-onset groups. Many of the phenotypes tested are correlated, such that the individual associations with the different smoking-related diseases cannot be considered independent. To assess the overall significance of the effect that the variant has on the smoking-related diseases we combined the lists and assessed the whole group for association with R336C. This is justified by the fact that the difference between the smoking-related diseases can be looked upon as variation in expressivity of one environmental factor, namely tobacco smoke. The overall P-values obtained from this analysis are 6.8 × 10−5 and 2.1 × 10−7 for the total and early-onset smoking-related diseases, respectively, providing an estimate of the combined strength of the evidence for association with the health consequences of smoking (Table 2).

Although the associations with smoking-related diseases do not represent a fully independent replication of the FTND result, we note that because we often lack detailed information on smoking for subjects with smoking-related diseases there is very little overlap between the group of subjects with FTND data and those with smoking-related diseases. For example, only 140 subjects are both on the early-onset smoking-related disorders list and among the subjects scoring 4 or higher on the heavy smoking index, representing ~8% of the heavy smoking index cases and 4.7% of the subjects with early-onset smoking-related diseases. The overlap between these lists is so small because only a few of the patients with early-onset smoking-related disorders have filled out the FTND questionnaire. Hence, despite the correlation between smoking behavior and the smoking-related diseases our results for FTND and the smoking-related diseases are not due to direct confounding as there is very little overlap between the study groups. Furthermore, as was the case for CHRNA5 (refs 2, 31, 32) the observed effect on FTND and CPD alone is not expected to fully explain the observed OR for smoking-related diseases.

The associations between the R336C variant and the various smoking-related phenotypes are not explained by associations with common markers nearby, nor does the variant explain the association with rs2273500, the SNP most strongly associated in the meta analysis study of FTND,4 as the rare missense variant is fixated on the protective background of rs2273500. For the results of detailed conditional analyses see Supplementary Material.

We attempted replication studies of the findings for the R336C variant, focusing on LC (1710 cases and 7670 controls) and COPD (1495 cases and 2180 controls), using a single SNP assay for genotyping (Table 3). Combining both phenotypes, the OR was 2.3 (P=0.25). The low allele frequencies in these populations mean that we lack power to obtain meaningful results with the sample sizes studied, with only 10 carriers observed in 13 055 samples (average allele frequency of 0.038%).

Table 3 Results of replication studies for R336C

Discussion

A recent study of rare variants in CHRNA4 based on a comparison of sequence data for exon 5 for 1209 cases and 1183 controls observed that rare non-synonymous variants were underrepresented in nicotine-dependent subjects, and the authors concluded that such variants might be protective against ND.33 In the case of R336C, our results clearly show the opposite, namely that the variant encoding this mutation confers risk of ND and its consequences on health, and comparison of the results for the various variants such as P451L and R336C reveals a range of effect sizes (Table 1), suggesting that treating all rare missense variants in CHRNA4 alike is not a good approach. We note that only one carrier of R336C was observed by Xie et al.,33 a case with ND.

In a follow-up to the aforementioned study,33 three variants of human CHRNA4 (encoding for the substitutions R336C, P451L and R487Q) were subjected to detailed functional studies in Xenopus laevis oocytes.18 Electrophysiological studies found changes in the activation by nAChR agonists in channels containing these α4 rare variants, including changes following incubation with nicotine. The authors concluded that sequence variation at CHRNA4 alters the assembly and expression of human α4β2 nAChRs, resulting in receptors that are influenced to a greater extent by nicotine exposure than channels containing the common α4 variant. Specifically, incubation of oocytes transfected with the α4R336C variant with nicotine for 24 h had the same effect on nicotine-evoked currents as observed with ACh, revealing a shift in agonist effect that produced a large LS response. Both α4R336C and α4R487Q showed a shift from a monophasic, HS relationship to biphasic activation, with both variants gaining LS activation components. The size of the LS component was particularly large for α4R336C (see Figure 4 in McClure-Begley et al.18). Such an effect was not seen for α4P451L18 consistent with the lack of association with FTND for the P451L variant (Table 1).

We have presented several lines of evidence for the role of a variant encoding R336C in smoking behavior and risk of smoking-related diseases. The variant is rare, but there appears to be a founder effect in Iceland, allowing us to perform association studies with adequate power. First, there is independent and genome-wide significant association between common markers and FTND in the region.4 Second, testing a small number of missense variants in CHRNA4, we find significant association of the R336C variant with FTND and other smoking behavior phenotypes. Third, guided by previous results for CHRNA5 and smoking-related diseases,12 we find significant association with four smoking-related diseases, and highly significant association with the early-onset forms. Fourth, functional data show that the R336C substitution influences function, shifting the equilibrium following nicotine exposure in favor of a lower-sensitivity form.18 When taken together, these results allow us to conclude that carriers of this variant have increased risk of nicotine addiction and heavy smoking, which in turn confers considerable risk and lowered age of onset for a number of serious smoking-related diseases through gene–environment correlations34 akin to those previously observed for the D398N mutation in CHRNA5.2, 12