The eQTL-missense polymorphisms of APOBEC3H are associated with lung cancer risk in a Han Chinese population

APOBEC (Apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like) enzymes may involve in mutagenic processes in multiple cancer types, including lung cancer. APOBEC family of cytidine deaminases induces base substitutions with a stringent TCW motif, which is widespread in multiple human cancers. We hypothesized that common missense variants in coding regions of APOBEC genes might damage the structure of proteins and modify lung cancer risk. To test this hypothesis, we systematically screened predicted deleterious polymorphisms in the exon regions of 10 APOBEC core genes (APOBEC1, APOBEC2, APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3F, APOBEC3G, APOBEC3H, and APOBEC4) and evaluated them with a case-control study including 1200 cases and 1253 controls. We found that the T allele of rs139293 in exon 2 of APOBEC3H was significantly associated with decreased risk of lung cancer (odds ratio = 0.76, 95% confidence interval: 0.63–0.91). Similar inverse association of this variant was observed in subgroups. Further study showed that the T allele of rs139293 was associated with the altered expression of APOBEC3H and APOBEC3C and that the two genes were co-expressed in both tumor and adjacent normal tissues. These results indicate that genetic variants in APOBEC3H may contribute to lung cancer susceptibility in Chinese population.

cancer genomes, named "kataegis" 16,17 . The study of APOBEC signature mutation based on whole-exome sequence data from The Cancer Genome Atlas (TCGA) suggests that cytosine deamination catalyzed by APOBEC enzymes is a mutagenic mechanism in multiple cancer types, including lung cancer 18,19 . Results from next-generation sequencing further showed that APOBECs induced base substitutions in tumor genomes with a stringent TCW motif (where W corresponds to either A or T), and this pattern was widespread in multiple human cancers 20 . Moreover, higher expression of APOBEC3B was identified significantly associated with the increased APOBEC signature mutations in lung cancer 21,22 .
However, even though the role of APOBEC in lung cancer genome has been identified, the association between genetic variants of APOBEC genes and susceptibility of lung cancer is still unknown. Here, we systematically screened common missense variants with predicted damaging effects in the exon regions of 10 APOBEC core genes (APOBEC1, APOBEC2, APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3F, APOBEC3G, APOBEC3H, and APOBEC4) and conducted a case-control study including 1,200 cases and 1,253 controls to investigate the associations between these variants and lung cancer risk.

Results
As shown in Table 1, the distributions of age and gender between the two groups were comparable. The proportion of smokers was significantly higher in lung cancer cases than those in controls (61.34% vs. 48.36%).
Genotyping rates for rs10911390, rs139293 and rs139299 were 100%, 98.12% and 99.96%, respectively. The observed genotype frequencies for these SNPs were in agreement with Hardy-Weinberg equilibrium in controls ( Table 2). The genotype distributions of these three variants between cases and controls are shown in Table 2. The T allele of rs139293 was significantly associated with decreased risk of lung cancer with a per-allele adjusted odds ratio (OR) of 0.76 (95% confidence interval (95% CI): 0.63-0.91, P = 0.002). However, no significant associations were observed for the remaining two SNPs (rs10911390 and rs139299). To further characterize the association of rs139293 and lung cancer risk, stratified analysis was performed by age, sex, smoking status, smoking levels and histological types. The associations remained significant in the older subjects, never smokers, patients with squamous cell carcinoma and both males and females. However, no significant heterogeneity was found between any subgroups ( Table 3). As stated in the strategies of SNP selection, rs139293 was predicted to be as "probably damaging" in PolyPhen2 database. The SNP may contribute to the development of lung cancer by affecting the structure and function of APOBEC3H or it may also act as proxy of multiple rare variants. Dense fine-mapping of known regions usually identified novel functional rare variants 23,24 . To further evaluate the rare variants in the identified region, we annotated all rare SNPs located at the exons of APOBEC3H and in linkage disequilibrium with rs139293 (D' = 1). Three rare variants were identified and all predicted to be deleterious according to SIFT or Polyphen2 database (Supplementary Table S2).
In addition to impact the structure or function of protein directly, the identified SNP rs139293, based on the ENCODE and UCSC databases, was located at a regulatory element tagged by an active enhancer, histone H3K27Ac. The region was also a DNase I hypersensitive site and could bind multiple transcription factors ( Supplementary Fig. S1). This indicated that rs139293 was also probably involved in the regulation of APOBEC3H expression. To validate this, we retrieved the genotype of rs139293 and expression of APOBEC3H in blood based on the public database GTEx Portal 25 . As Fig. 1 shows, the T allele of rs139293 was correlated with the reduced expression of APOBEC3H (P = 0.008). In consideration of distal effect of enhancer through chromatin interaction, we also analyzed the other 6 genes on the same chromosome (APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3F and APOBEC3G). As Fig. 1 shows, the rs139293 was also an expression quantitative trait loci (eQTL) SNP for APOBEC3C (P = 0.03). To support the results, we found that the APOBE3C and APOBEC3H were co-expressed in both lung tumor and adjacent normal tissues based on TCGA database (ρ = 0.27, P = 5.51 × 10 −18 in tumor tissues; ρ = 0.59, P = 1.60 × 10 −11 in adjacent normal tissues) ( Supplementary Fig. S2).

Discussion
In this study, we systematically evaluated the association of common missense genetic variants in exons of core APOBEC genes and lung cancer risk in a case-control study including 1,200 cases and 1,253 healthy controls. And finally we found that the T allele of rs139293 was significantly associated with decreased risk of lung cancer in Chinese population. The SNP was predicted to be "probably damaging" according to Polyphen2 database. Further study showed that rs139293 was located at a regulatory region and was correlated with the expression of APOBEC3H and APOBEC3C.
Previous studies have shown that germline copy number polymorphism involving APOBEC3A and APOBEC3B (A3B del ) are associated with a modest increased risk of breast cancer 26 . And breast cancers in carriers of the deletion show more mutations of the putative APOBEC-dependent genome-wide signatures than cancers in non-carriers 27 . Further mechanism study showed the A3B del was associated with immune activation rather than cell proliferation which contribute to hypermutation 28 . There were few studies concerning the role of polymorphisms in APOBEC3H and cancer risk. However, the  nature polymorphisms in human APOBEC3H have been associated with the stability and activity of the APOBEC3H protein when resistance to HIV-1 infection 29 .
The rs139293 variant was located at exon 2 of APOBEC3H and resulted in the amino acid substitution from arginine (Arg) to leucine (Leu) at codon 18. The variant was predicted to be deleterious on the structure and function of protein. Rare SNPs were more likely to have dramatic functional consequences but usually challenging to find 30 . Systematic annotation of rare SNPs in exons of APOBEC3H identified other three missense SNPs in linkage disequilibrium with rs139293 and all were predicted to be deleterious. Based on GTEx database, we found subjects with T allele of rs139293 had lower expression levels of APOBEC3C and APOBEC3H and this was further supported by the co-expression of these two genes in lung tumor and adjacent normal tissues. According to these results, we suggested that rs139293 was probably associated with reduced lung cancer risk by destroying the structure of APOBEC3H and regulating the expression of APOBEC3C and APOBEC3H.
To explore the possible mechanisms of rs139293 regulating the expression of APOBEC3C and APOBEC3H, we further annotated the SNP using ENCODE database (https://www.encodeproject.org/). We found the identified region can bind many transcription factors, including CTCF, EBF, MXI1, PAX5, RFX5, RUNX3, SIN3A, SMC3, TAF1 and WRNIP1 31 . Variants in transcription factor binding sites have been reported to be associated with cancer susceptibility 32 . This indicated that the region around rs139293 may play an important role in transcription regulation. More interestingly, we observed a chromatin interaction between chr22:39408338-39410119 and chr22:39493984-39496740 (hg19) in K562 cell line. The former located at the promoter of APOBEC3C and the later included the enhancer element around rs139293. The similar pattern of DNase I in exon colocalize with promoters had been reported previously 33 . These results suggested that the region around rs139293 was a transcriptional regulatory element and probably implicated in the regulation of APOBEC3H expression directly and APOBEC3C expression through chromatin interaction.
In summary, this study investigated the association of common missense genetic variants in APOBECs and lung cancer risk in a Chinese population. We found the missense SNP rs139293 of the APOBEC3H gene may modify the risk of lung cancer. However, our result is very preliminary, and the sample size is only moderate. Further independent studies incorporating functional evaluations are warranted to confirm the association and clarify the potential biological mechanisms of these polymorphisms in lung cancer risk.

Materials and Methods
Ethics statement. The study was performed in accordance with guidelines outlined in the Declaration of Helsinki and it was approved by the institutional review board of Nanjing Medical University (FWA00001501). The design and procedure of this study involving human participants were described in a research protocol. Written informed consent was obtained from every participant before the commencement of the study. Study population. All of the patients were histopathologically or cytologically confirmed as lung cancer by at least two pathologists. And those with a history of other cancers and ever received radiotherapy or chemotherapy were excluded from this study. A total of 1,200 cases recruited from the Cancer Hospital of Jiangsu Province and the First Affiliated Hospital of Nanjing Medical University between 2003 and 2009 were included in this study. All of the controls were randomly selected from individuals participating in a community based noncommunicable diseases screening program in Jiangsu Province during the same time period. Finally, 1,253 cancer-free controls frequency matched to the cases by age and sex were enrolled in this study. After provided with a written informed consent, we drawn 5 ml venous blood sample from each participant and took a face to face interview concerning demographic data (e.g. age and sex) and exposure information (e.g. smoking status). Current smokers were defined as those who had smoked one cigarette per day for > 1 year; smokers who had quit smoking for > 1 year were defined as former smokers; all others were classified as never smokers. Pack-years of smoking [(cigarettes per day/20) × smoking years] were calculated to measure smoking dose. In addition, smokers were divided into light and heavy smokers according to the threshold of 25 pack-years. SNP selection and genotyping. In this study, we mainly evaluated common missense variants in the coding region and focused on those predicted to have a damaging effect on the structure of proteins. SNPs in the exon regions of APOBEC1, APOBEC2, APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3F, APOBEC3G, APOBEC3H, and APOBEC4 were extracted from the 1000 Genomes database (the Phase I integrated variant set release V3, http://browser.1000genomes.org/index.html) and annotated by ANNOVAR (http://www.openbioinformatics.org/annovar/). SIFT (http://sift.jcvi.org/) and PolyPhen-2 (http://genetics.bwh.harvard.edu/pph2/) were further used to annotate the predicted function of these SNPs. SNPs meeting the following criteria were included in our study: (i) having a minor allele frequency (MAF) ≥ 5% in Chinese population; (ii) being a missense mutation in exon region; (iii) being predicted to have deleterious effects by SIFT and/or PloyPhen-2; and (iv) keeping only one SNP when multiple SNPs were in strong linkage disequilibrium (r 2 ≥ 0.8). As a result, 278 SNPs with MAF > = 5% were further annotated by ANNOVAR, and among them 24 were located at exon regions (2 SNPs in APOBEC1, 2 SNPs in APOBEC2, 1 SNP in APOBEC3A, 3 SNPs in APOBEC3B, 1 SNP in APOBEC3F, 2 SNPs in APOBEC3G, 6 SNPs in APOBEC3H, 7 SNPs in APOBEC4). Fifteen of the 24 SNPs were missense and could result in amino acid change (Supplementary Table S1). However, only four were predicted to have a deleterious phenotypic effect, rs139293 (c.53G > T) and rs139299 (c.363G > C) in APOBEC3H, rs10911390 (c.1033G > A) and rs16861394 (c.224C > T) in APOBEC4. Because rs10911390 was in strong linkage disequilibrium with rs16861394 (R 2 = 1), only rs139293, rs139299 and rs10911390 were genotyped in our study (Table 4).
Genomic DNA was isolated from a leukocyte pellet by proteinase K digestion, followed by phenol-chloroform extraction and ethanol precipitation. The genotyping were performed using the TaqMan allelic discrimination assay on the ABI 7900 system (Applied Biosystems, Foster City, CA, USA) and called using the SDS 2.3 Allelic Discrimination Software (Applied Biosystems). The primers and probes are available upon request. A series of methods were used to control the quality of genotyping: (i) genotyping was carried out without knowing the case or control status; (ii) two water controls were used in each plate as blank control; (iii) case and control samples were mixed on each plate; (iv) five percent of the samples were randomly selected for replicated genotyping.
Public database. GTEx Portal was used to calculate the association of SNPs and gene expression in blood (http://www.gtexportal.org/home/). The Expectation-Maximization (RSEM) normalized read counts of APOBEC genes in lung tumors and adjacent normal tissues were downloaded from TCGA on date 07/08/2014. Transcription regulation were annotated using ENCODE database (https://www.encodeproject.org/) and UCSC genome browser (https://genome.ucsc.edu/cgi-bin/hgGateway).   categorical variables were applied for analyzing distribution differences of demographic characteristics and genotypes between cases and controls. Correlation analysis was used to evaluate the co-expression of genes after log transformation. The association between SNPs and lung cancer risk was measured by odds ratios (ORs) and 95% confidence intervals (95% CI) using logistic regression with adjustment of age, sex and pack-years of smoking when appropriate. The tests above were two-sided and results were considered significant when P < 0.05. We used the χ 2 -based Q-test to test the heterogeneity from corresponding subgroups and the heterogeneity was considered significant when P < 0.10. All analyses were performed using R software (version 3.1.1).