Genetic scores of smoking behaviour in a Chinese population

This study sought to structure a genetic score for smoking behaviour in a Chinese population. Single-nucleotide polymorphisms (SNPs) from genome-wide association studies (GWAS) were evaluated in a community-representative sample (N = 3,553) of Beijing, China. The candidate SNPs were tested in four genetic models (dominance model, recessive model, heterogeneous codominant model and additive model), and 7 SNPs were selected to structure a genetic score. A total of 3,553 participants (1,477 males and 2,076 females) completed the survey. Using the unweighted score, we found that participants with a high genetic score had a 34% higher risk of trying smoking and a 43% higher risk of SI at ≤18 years of age after adjusting for age, gender, education, occupation, ethnicity, body mass index (BMI) and sports activity time. The unweighted genetic scores were chosen to best extrapolate and understand these results. Importantly, genetic score was significantly associated with smoking behaviour (smoking status and SI at ≤18 years of age). These results have the potential to guide relevant health education for individuals with high genetic scores and promote the process of smoking control to improve the health of the population.

participants (848 males and 1,254 females) in 2010. After excluding 818 participants duplicated in both surveys and 8 unsuccessful genotyping results, a total of 3,553 participants (1,477 males and 2,076 females) were included as our study sample (Fig. 1). Trained interviewers met with the participants face-to-face to complete a standardized questionnaire addressing a range of demographic factors, medical history and health-related behaviours (particularly smoking exposure status).
Measurement of smoking behaviour. A smoker was defined as a person who had ever smoked a tobacco product daily for at least 6 months 30 . A heavy smoker was defined as a person who had ever smoked more than 20 cigarettes per day 31 . Additionally, an SI age of ≤ 18 years was used a measurement of smoking behaviour 1 because previous studies have shown that compared with SI during adulthood, tobacco use prior to 18 years of age leads to behavioural consequences (such as drug abuse) during adulthood, in addition to more serious health consequences (including mental and physical effects) 32 .

Measurement of covariates.
The categories of educational attainment included 0-6 years (primary school or less), 6-12 years (middle school to high school or the equivalent) and ≥ 13 years (completed a university or other tertiary education). The occupation types were classified into the following three categories: white collar (professional, government), light physical labour (skilled worker, service, merchant) and hard physical labour (farmer, factory worker, manufacturing and transportation worker). Ethnicity was classified into the following two categories: Han and minority. Body mass index (BMI) was classified into the following three categories: normal (< 24.00), overweight (24.00-27.99) and obese (≥ 28.00) 33 . Sports activity time was classified into the following three categories: < 1 hour/week, 1-4 hours/week, and > 4 hours/week.
Genotyping. The standard proteinase K-phenol-chloroform method was used to extract DNA from whole peripheral blood samples. The laboratory staff was blinded to the identities of the subjects and their smoking status.
Among the 21 previously reported SNPs, we excluded rs1051730, rs879048, rs2036527, rs8034191, rs11638372 and rs16969968 due to minor allele frequencies (MAFs) < 0.1 in the HAPMAP-CHB (Chinese Han Beijing) population (Supplementary Table S1); however, the 15 remaining candidate SNPs were included in our analysis (Fig. 2). The MassARRAY system was used to genotype the candidate SNPs. Genetic score. Genotyping revealed an LD plot ( Supplementary Fig. S1) for the 15 SNPs: using run tagger, we chose rs6474412 to represent this LD plot (Supplementary Table S2). To evaluate the effects of these SNPs on smoking behaviour, we examined the SNPs in four genetic models (dominance model, recessive model, heterogeneous codominant model and additive model 34 ) and in males and females separately. We then excluded the SNPs with no significant effect on smoking behaviour in our population. The final genetic score was built on 7 SNPs (Supplementary Tables S3-9).
Similar to previous studies that evaluated genetic scores for smoking behaviour 35 and obesity 36 , our genetic score was based on 3 methods. In the first two methods, each SNP was weighted according to the size of its relative effect (β coefficient) using two types of β coefficints: β 1 was derived from our population and adjusted for demographic characteristics (age, gender, education, occupation and ethnicity), BMI and sports activity time; β 2 was derived from the results of GWASs and meta-analyses (Table 1) [19][20][21][22][23][24][25][26] . The third method used the unweighted counts of risk alleles to construct the score. Statistical analysis. HAPLOVIEW software version 4.2 (http://www.broadinstitute.org/haploview) was used for analyses of Hardy-Weinberg equilibrium (HWE), LD and run tagger. SPSS version 19.0 (serial No. 5076595) was used for the data analysis. The significance level for all tests was set at a two-tailed α value of 0.05. The differences in means and proportions were tested using t-tests and chi-squared tests, respectively. Logistic regression models were used to identify the odds ratio (OR) of the genetic score for smoking behaviour.

Results
Patient characteristics. A total of 3,553 participants (1,477 males and 2,076 females) were included in our study. The average age was 70.29 ± 6.43 years. There were 1,067 smokers and 2,486 never smokers in our sample population: the two groups differed in gender (P < 0.001) and education (P = 0.007) but no significant differences   Table 1. The 7 SNPs used to calculate the genetic score for smoking behaviour. * β 1 was derived from our population and adjusted by demographic characteristics (age, gender, education, occupation and ethnicity), BMI and sports activity time. # β 2 was derived from GWASs and meta-analyses.
were detected in age, ethnicity, occupation, BMI and sports activity time (P > 0.05) ( Table 2). Table 3 depicts the genotype frequencies of the 7 SNPs.
Effect of genetic score on smoking behaviour. Genetic score type 1. Risk alleles from the imputed data (0, 1 or 2) for each SNP were weighted according to their relative β coefficients (β 1, Table 1), which were estimated from our data after adjusting for demographic characteristics (age, gender, education, occupation and ethnicity), BMI and sports activity time. Weighted risk alleles were summed for each individual to generate a type 1 genetic score representing the individual's risk allele score (ranging from 0.06 to 0.88; average: 0.42 ± 0.14). The participants were divided into three groups according to tertiles (0.36 and 0.48): group 1 included participants with a genetic score < 0.36; group 2 comprised participants with a genetic score 0.36-0.48; and group 3 included participants with a genetic score > 0.48. Through logistic regression analysis, we found that participants with a high genetic score (group 3) had a 26% higher risk of trying smoking and a 29% higher risk for SI at ≤ 18 years old after adjusting for age, gender, education, occupation, ethnicity, BMI and sports activity time. Among males, the ORs were even higher (1.37 and 1.37, respectively), whereas in females, the association was not significant (Table 4).
Genetic score type 2. Risk alleles from the imputed data (0, 1 or 2) per SNP were weighted for their relative β coefficients (β 2, Table 1), which were estimated from previously reported GWASs and meta-analyses. Weighted risk alleles were summed for each individual to generate the type 2 genetic score representing the individual's risk allele score (ranging from 0.14 to 3.53; average: 1.54 ± 0.70). The participants were divided into three groups according to tertiles (1.05 and 1.95): group 1 had a genetic score < 1.05; group 2 had a genetic score of 1.05-1.95; and group 3 had a genetic score > 1.95.
Regarding the type 2 genetic score, we found that participants with a high genetic score (group 3) had a 24% higher risk of trying smoking and a 28% higher risk for SI at ≤ 18 years of age after adjusting for age, gender, education, occupation, ethnicity, BMI and sports activity time. Among males, the ORs were even higher (1.37 and 1.42, respectively), whereas in females, the association was not significant (Table 5).
Genetic score type 3. Risk alleles from the imputed data (0, 1 or 2) per SNP were unweighted and summed for each individual, generating the type 3 genetic score as a representation of the individual's risk allele score (ranging from 2 to 14; average: 7.47 ± 1.80). The participants were divided into three groups according to tertiles (7 and 9): group 1 had a genetic score < 7; group 2 had a genetic score of 7-9; and group 3 had a genetic score > 9.  Regarding the type 3 genetic score, we found that participants with a high genetic score (group 3) had a 34% higher risk of trying smoking and a 43% higher risk for SI at ≤ 18 years of age after adjusting for age, gender, education, occupation, ethnicity, BMI and sports activity time. Among males, the ORs were even higher (1.42 and 1.46, respectively), whereas in females, the association was not significant (Table 6).
Receiver-operating characteristic (ROC) curves. ROC curves were constructed using age, gender, education, occupation, ethnicity, BMI and sports activity time in addition to genetic score types 1, 2 and 3 (Fig. 3). The areas under the curve (AUCs) of the three types of genetic scores were 0.832, 0.832 and 0.832 for predicting smoking status in the total population; 0.673, 0.673 and 0.674 in males; and 0.724, 0.724 and 0.723 in females, respectively (Fig. 3). These results indicated that the associations of the three types of genetic scores with smoking were similar. Furthermore, for better extrapolation and improved understanding of such results, the unweighted genetic score represents the ideal choice.
Next, we compared the AUCs of age, gender, education, occupation, ethnicity, BMI and sports activity time with and without the genetic score (unweighted). These values were 0.832 and 0.817 in the total population, 0.674 and 0.613 in males, and 0.723 and 0.707 in females, respectively. This difference was significant in males (P < 0.05) (Fig. 3).
Furthermore, the average scores of the smoking group, heavy smoking group and SI at ≤ 18 years of age group were significantly higher than the never smoking group of males and the total population (Table 7).

Discussion
In this study, we retested all 18 significant SNPs (P < 5 × 10 −8 ) from GWASs conducted on smoking behaviour (cigarettes smoked per day (CPD), SI) in a Chinese population; we then chose 7 of these SNPs to derive genetic scores. We derived three types of genetic scores to evaluate the genetic risk of smoking behaviour (smoking, heavy smoking and SI at ≤ 18 years of age) and found that the evaluation capacities of these three scores were approximately the same. Furthermore, we linked genetic risk and smoking behaviour (smoking, heavy smoking and SI at ≤ 18 years of age) in a Chinese population.
Previous genetic score studies have used two methods to create the genetic score: 1) summing the unweighted SNPs 35 and 2) summing SNPs weighted by their effect 36 . To our knowledge, this study is the first to compare the effects of different genetic score generation methods, and we found that the three types of genetic scores elicited similar effects on smoking behaviour. Furthermore, we found that genetic score was significantly associated with smoking behaviour (smoking status or SI at ≤ 18 years of age) in the Chinese population. This result is consistent with that of a study performed in New Zealand 35 , in which individuals with elevated genetic risk were more likely to convert to daily smoking as teenagers and progressed more rapidly from SI to heavy smoking.
However, the present study has several limitations. First, the candidate SNPs that were chosen from the GWAS results were mainly identified in European or African American populations; only a few such studies have been reported in Chinese populations. This may have decreased the reliability of the findings regarding SNPs related to smoking behaviour in the Chinese population. In addition, the SNPs from the USA/Northern European populations may not be suitable or sufficient to create a genetic score in the Chinese population. Thus, additional GWAS studies of large samples from the Chinese population should be conducted to create a more suitable genetic score for this population. Second, the small sample size of smoking women in our study may have decreased the stability of the results in women. Third, the genetic score created in our study requires verification in a larger Chinese sample.
To conclude, in this study, we tested GWAS-significant SNPs associated with smoking behaviour in a Chinese population and structured three types of genetic scores. We found that the effects of the three types of genetic score were similar; however, to best extrapolate and understand these types of results, the unweighted genetic score represents the ideal choice. Furthermore, the genetic score was significantly associated with smoking behaviour (smoking status and SI at ≤ 18 years of age). The results of this study may guide relevant health education for those with a high genetic score and promote smoking control to improve the health of the population.  Table 7. Average scores of never smokers, smokers, heavy smokers and SI at ≤18 years group. P values were determined in comparison with the never smoked group.