Genetic variants of PTPN2 are associated with lung cancer risk: a re-analysis of eight GWASs in the TRICL-ILCCO consortium

The T-cell protein tyrosine phosphatase (TCPTP) pathway consists of signaling events mediated by TCPTP. Mutations and genetic variants of some genes in the TCPTP pathway are associated with lung cancer risk and survival. In the present study, we first investigated associations of 5,162 single nucleotide polymorphisms (SNPs) in 43 genes of this TCPTP pathway with lung cancer risk by using summary data of six published genome-wide association studies (GWAS) of 12,160 cases and 16,838 controls. We identified 11 independent SNPs in eight genes after correction for multiple comparisons by a false discovery rate <0.20. Then, we performed in silico functional analyses for these 11 SNPs by eQTL analysis, two of which, PTPN2 SNPs rs2847297 and rs2847282, were chosen as tagSNPs. We further included two additional GWAS datasets of Harvard University (984 cases and 970 controls) and deCODE (1,319 cases and 26,380 controls), and the overall effects of these two SNPs among all eight GWAS studies remained significant (OR = 0.95, 95% CI = 0.92–0.98, and P = 0.004 for rs2847297; OR = 0.95, 95% CI = 0.92–0.99, and P = 0.009 for rs2847282). In conclusion, the PTPN2 rs2847297 and rs2847282 may be potential susceptible loci for lung cancer risk.

shown to be functional. Other approaches to GWAS including pathway-based analysis with reduced dimension or multiple testing have been emerged to identify possible functional SNPs associated with lung cancer risk.
The T-cell protein tyrosine phosphatase (TCPTP/PTPN2) is an important member of the protein-tyrosine phosphatase (PTP) family. Activating and deactivating mutations in PTP genes often result in enzymes that can either promote or suppress oncogenesis. The TCPTP pathway consists of signaling events mediated by TCPTP through negative regulation of several receptor tyrosine kinases such as epidermal growth factor receptor (EGFR) 16 , vascular endothelial growth factor receptor-2 (VEGFR2) 17 , platelet-derived growth factor receptor beta (PDGFRβ) 18 , signal transducer and activator of transcription subtypes 1 (STAT1) 19 , 3 (STAT3) 20 , and 6 (STAT6) 21 , and the insulin receptor 22 .
Studies have shown that mutations and genetic variants of some genes in the TCPTP pathway are associated with lung cancer risk and survival 23,24 . However, SNPs in many candidate genes in the pathway have not been studied and reported. In the present study, we systematically investigated all potentially functional SNPs in TCPTP pathway genes by assessing their associations of lung cancer risk using eight published lung cancer GWAS datasets.  20. In B, the left-hand y-axis shows the association P value of each SNP, which is plotted as −log 10 (P) against chromosomal base pair position; the right-hand y-axis shows the recombination rate estimated from the hg19/1000 Genomes European population.

Analysis of six GWAS datasets.
Overall, 5162 SNPs from 43 TCPTP pathway genes in the six GWAS datasets from the Transdisciplinary Research in Cancer of the Lung and The International Lung Cancer Consortium (TRICL-ILCCO) Consortium were identified, and their associations with lung cancer risk are shown in the Manhattan plot (Fig. 1A). After multiple-testing correction, 112 SNPs in eight genes (ATR, EGFR, MET, PIK3R1, PIK3R3, PTPN2, STAT3, and STAT5A) remained significantly associated with lung cancer risk with FDR <0.20. The results of associations with lung cancer risk are summarized in Supplementary Table S2. Based on LD analysis (r 2 > 0.30) and online functional prediction analyses by using SNPinfo, RegulomeDB, and HaploReg, we selected to perform additional analyses for 11 SNPs: rs11707731 in ATR; rs845553, rs1140762 and rs17172432 in EGFR; rs34280975 in MET; rs706714 in PIK3R1; rs7538978 in PIK3R3; rs2847297 and rs2847282 in PTPN2; rs3744483 in STAT3; rs1135669 in STAT5A for further study (Supplementary Figure S1 and Supplementary Table S3).

Functional validation by eQTL analysis 21.
We assessed associations between the 11 SNPs and mRNA expression levels by using the genotyping and expression data available from the lymphoblastoid cell lines derived from 373 individuals of European descent (http://www.1000genomes.org/), and we found that only rs2847297 and rs2847282 were associated with expression levels of PTPN2 in additive, dominant and recessive models  Table 1. Summary of the functional prediction and eQTL analysis results of the 11 selected SNPs in the TCPTP pathways in silico. a Reference allele/effect allele. b P value of eQTL analysis results TFBS = transcription factor binding site.
( Table 1). Regional association plots for rs2847297 and rs2847282 in 500 kb up-and downstream region were shown in Fig. 1B. The SNP rs2847297 was in a low LD with rs2847282 ( Fig. 1C). PTPN2 mRNA expression levels were significantly decreased with an increased number of the rs2847297 G allele in additive (P = 0.002) ( Fig. 2A), dominant (P = 0.017) (Fig. 2B) and recessive model (P = 0.005) (Fig. 2C). The eQTL analysis results of rs2847282 were also significant ( Fig. 2D,E,F). In addition, we compared mRNA expression levels of PTPN2 in 109 paired target tissue samples from The Cancer Genome Atlas (TCGA) and found that PTPN2 mRNA expression levels were significantly increased in tumor tissues than normal tissues (P = 3.01E-05) (Supplementary Figure S2). The two SNPs rs2847297 and rs2847282 were chosen as tagSNPs, because they were significantly associated with lung cancer risk as assessed in the overall association analysis and had potential functions according to the eQTL analysis.

Discussion
In the present study, we sought to investigate associations between genetic variants in the TCPTP pathway genes and lung cancer risk using eight published GWAS studies of 14,463 cases and 44,188 controls. The principal findings included two novel, potentially functional SNPs, rs2847297 and rs2847282 of PTPN2, that were both associated with a decreased lung cancer risk and a decreased mRNA expression level of PTPN2, particularly in subgroups of ever smokers and squamous cell lung carcinoma. Four articles about pathway-based analysis and lung cancer risk (Centrosome, DNA repair, lncRNA and RNA degradation) have been accepted or published in our laboratory. We found that the loci of two SNPs in PTPN2 were different from previous studies in our lab and GWAS studies. PTPN2 plays a dual role in development and progression of cancer. Proliferation and cell cycle assays demonstrated that overexpression of PTPN2 would decrease serum requirement, increase formation of larger colonies in soft agar, alter morphology, and rapidly progress through G1 and S phases and the rate of cell division 25,26 . Another study showed that the proliferation rate would reduce in TCPTP (−/−), compared to TCPTP (+/+), lymphocytes 27 . We found that PTPN2 mRNA expression levels in matched lung cancer tissues were increased compared to adjacent normal tissues from the TCGA database, some other studies also demonstrated that PTPN2 expression levels were higher in lung AD 28,29 and SQ 30, 31 than in normal lung tissues. These findings provided oncogenic evidence of PTPN2 and were consistent with our results that the two susceptibility loci of PTPN2 were associated with a decreased lung cancer risk as a result of a decreased mRNA expression level of the gene. In addition, we found that the eQTL analysis result of rs2847297 in lung tissue was also significant in the GTEx analysis (P = 4.0E10-7) (http://www.gtexportal.org/home/eqtls/bySnp?snpId=rs2847297&tissueName=All). This result is also consistent with the eQTL analysis from the lymphoblastoid cell lines in the present study. However, it has been reported that overexpression of PTPN2 induces apoptosis in the p53 + A549 and MCF-7 cells but not in p53-HeLa cells, also consistent with features of a tumor suppressor 32 . Another study demonstrated that PTPN2 was absent in a large proportion of "triple-negative" primary human breast cancers and PTPN2 overexpression would suppress tumor growth 33 .
In subgroup analysis we found that the two SNPs were more likely to be associated with SQ risk, and the risk associated with rs2847297 G allele was more likely to be among ever smoking. Cigarette smoke is the major risk factor for lung cancer, especially for SQ. Study showed that smoking led to an increased expression of Nkx2 34 , which is the transcription factor (TF) of PTPN2. Therefore, it is likely that the locus has the possibility of influencing lung cancer risk of ever smokers through changing the expression of PTPN2.
Our study has some limitations. First, genes in the TCPTP pathway were identified mainly from the Molecular Signatures Database and Genecards. Although we did search some relative articles to complete the list of genes in the pathway, some newly discovered genes in the pathway might have been missed. Second, although we demonstrated the association of thetwo novel potentially functional loci in PTPN2 with lung cancer risk with functional evidence from eQTL analyses, the exact biochemical and molecular mechanisms are still unclear. Third, our eQTL analyses were limited to publicly available data from lymphoblastoid cell lines but target tissues, which could provide more direct correlation results between the two SNPs and PTPN2 expression.
Taken together, the present study revealed two novel, potentially functional susceptibility loci in PTPN2 associated with lung cancer risk in European populations, particularly among ever smokers and squamous carcinoma. Further validation and functional evaluation of these genetic variants are warranted to verify our findings.  37 and Genecards (http://www.genecards. org/). Overall, 43 genes located on autosomal chromosomes were selected (detailed in Supplementary Table S1). The final meta-analysis contained 5,162 SNPs with the following inclusion criteria: genotyping rate >95%, minor allele frequency (MAF) ≥ 5%, and Hardy-Weinberg Equilibrium (HWE) exact P value ≥ 10 −5 . The detailed workflow is shown in Fig. 4. In silico functional prediction and validation. We use three in silico tools, SNPinfo (http://snpinfo. niehs.nih.gov/snpinfo/snpfunc.htm) 38 , RegulomeDB (http://regulomedb.org/) 39   broadinstitute.org/mammals/haploreg/haploreg.php) 40 to predict potential functions. The expression quantitative trait loci (eQTL) analysis was performed in the 1000 Genomes Project 41 . The mRNA expression of lung cancer tissue samples was performed in TCGA 42 .

Statistical analysis.
Odds ratios (ORs) and their 95% confidence intervals (CIs) were calculated using Stata (v10, State College, Texas, USA) and PLINK (v1.06) software. A meta-analysis with the inverse variance method was employed on the 5,162 SNPs. We used Cochran's Q statistic to test for heterogeneity and I 2 statistic for the proportion of the total variation 43 . The fixed-effects model was used when there was no heterogeneity among GWAS studies (Q-test P > 0.100 and I2 < 25%); otherwise, the random-effects model was used. The false discovery rate (FDR) was performed to control for multiple testing with a threshold <0.20 44 . The genes mRNA expression levels in lung cancer and adjacent tissues from TCGA database were performed by paired t-test. Regional association plots were performed by LocusZoom 45 . Haploview v4.2 was used to generate the Manhattan plot and LD plots 46 . All other analyses were conducted with SAS (Version 9.3; SAS Institute, Cary, NC, USA).