Role of a genetic variation in the microRNA-4421 binding site of ERP29 regarding risk of oropharynx cancer and prognosis

We conducted a two-stage association study on patients with oropharynx (OP) squamous cell carcinoma (SCC) and healthy controls to identify single nucleotide variants (SNVs) located at the microRNA (miR)-binding sites of carcinogenesis genes associated with risk and prognosis of the disease. In stage 1, 49 patients and 49 controls were analyzed using Genome-Wide Human SNV Arrays to identify variants in the 3′-untranslated region (3′-UTR) of carcinogenesis-related genes, and one SNV was selected for data validation in stage 2 by TaqMan assays in 250 OPSCC patients and 250 controls. The ERP29 c.*293A > G (rs7114) SNV located at miR-4421 binding site was selected for data validation among 46 SNVs. The ERp29 and miR-4421 levels were evaluated by quantitative-PCR and Western blotting. Interaction between miR-4421 with 3′-UTR of ERP29 was evaluated by luciferase reporter assay. Event-free survival (EFS) was calculated by Kaplan–Meier and Cox methods. ERP29 GG variant genotype was more common in OPSCC patients than in controls (6.4% vs 3.6%, p = 0.02; odds ratio: 5.67; 95% confidence interval (CI) 1.27–25.26). Shorter EFS were seen in the base of tongue (BT) SCC patients with GG genotype (0.0% vs 36.2%, p = 0.01; hazard ratio: 2.31; 95% CI: 1.03–5.15). Individuals with ERP29 AG or GG genotypes featured lower levels of ERP29 mRNA (p = 0.005), ERp29 protein (p < 0.001) and higher levels of miR-4421 (p = 0.02). The miR-4421 showed more efficient binding with 3′-UTR of the variant G allele when compared with wild-type allele A (p = 0.001). Our data suggest that ERP29 rs7114 SNV may alter the risk and prognosis of OPSCC due to variation in the ERp29 production possibly modulated by miR-4421.

Smoking habit and alcohol consumption consist in the classical risk factors for developing oropharynx (OP) squamous cell carcinoma (SCC) 1 . Sexual behavior is also established as a risk factor for human papillomavirus (HPV)-related OPSCC 2 . Most of OPSCC patients are diagnosed with measurable locally advanced disease 3 , and only about half of these patients achieve complete or partial responses after 5-years survival 4 .
Endoplasmic reticulum (ER) protein 29 (p29) is a chaperone protein that functions in unfolding and facilitating transport of synthesized secretory proteins from the ER to Golgi 5 . Its function in cancer has been actively addressed, but its role in cancer development and progression is still unclear 6 . The ERp29 expression was found to be inversely associated with tumor development in the lung 7 , breast 8 , and gallbladder 9 ; moreover, it inhibited breast tumor formation in mice 10 . In contrast, ERp29 overexpression was observed in breast, melanoma, lung, cervical 11 , liver 12 , and metastatic colorectal 13 cancer cell line.
Additionally, ERp29 was found to regulate breast cancer cell growth arrest through p38 activation and upregulation of the ER stress protein p58 IPK14 , and cancer cell survival against genotoxic stress induced by doxorubicin 15,16 , cisplatin (CDDP) 17 , gemcitabine 18 , and radiation 19,20 . ERp29 overexpression was associated with unfavorable prognosis of colorectal cancer through activation of chloride intracellular channel 4 and second

Material and methods
Study population. This study was conducted in two stages. In stage 1, 49 patients and 49 controls were analyzed with the purpose of identifying SNVs on miR-binding sites of carcinogenesis genes with importance in OPSCC risk, and in stage 2, one SNV was selected for data validation in 250 OPSCC patients and 250 controls.
All OPSCC patients were diagnosed at the Clinical Oncology Service of the University of Campinas Teaching Hospital between June 2000 and April 2016. The control group comprised 250 blood donors of the same sex and ethnicity from the same Teaching Hospital. All subjects were classified as either smokers and non-smokers, and drinkers or abstemious, as previously reported 31 . The Institutional Research Committee of University of Campinas approved the study (numbers: 424/2016 and 1.438.601). All procedures were carried out according to the Helsinki Declaration, and appropriate informed consent form was obtained.
OPSCC was diagnosed according to World Health Organization criteria 32 . Histologically, the OPSCC was classified as well, moderately, poorly differentiated, or undifferentiated 33 . In addition, the OPSCC was staged according to the TNM system of the 7th American Joint Committee of Cancer Staging 34 .
HPV status was performed in available tumor fragments embedded in paraffin of 98 OPSCC patients. We could not obtain the HPV status from 152 patients due to unavailable tumor fragments. P16 immunohistochemistry and in situ hybridization were performed in tumor fragments, aiming to test the presence of human papillomavirus type 16 (HPV 16) 35 .
For survival analysis, we selected 226 patients; 24 out of 250 patients were sent to other services for treatment and follow-up, and no consistent clinical information could be obtained. Patients were treated according to the institutional protocol, based on conventional procedures 31 . Patients with locoregional advanced resectable tumors received neoadjuvant treatment (n = 19) or adjuvant treatment (n = 36) with 35 sessions of radiation, 2 Gy per session, with concurrent intravenous CDDP at a dose of 80-100 mg/m 2 or carboplatin area under the curve of 5 on days 1, 22, and 43, before or after surgery, respectively. For 168 patients, definite treatment with 35 sessions of radiation, 2 Gy per session, and concurrent intravenous CDDP or carboplatin at aforementioned doses, was administered, to whom surgical treatment was not performed because of locoregional unresectable tumors, low Karnofsky performance scale score, refusal of surgery due to expected functional or anatomic sequels, or an organ preservation protocol. We could not obtain treatment information from 3 patients enrolled in survival analysis due to lack of consistent clinical information. Patients' follow-up was performed at 3-month intervals. The end of the follow-up period was January 2019.

Stage 1: screening of SNVs, candidate gene choice and SNV selection. Ninety-eight individuals
were analyzed in the first stage. Genomic DNA from peripheral blood of 49 base of tongue (BT) SCC patients and 49 controls was genotyped for a total of 500,568 SNVs using the Affymetrix Genome-Wide Human SNV Array 5.0 (AFFYMETRIX, USA) according to the manufacturer's recommended protocols. The intensities resulting from the arrays scanning process were made available via CEL files, one per DNA sample with total quality control higher than 90% (AFFYMETRIX, USA). Tools from the Bioconductor (www.bioco nduct or.org) were used to process the CEL files. The genotyping was performed applying the corrected robust linear mixture model (crlmm) algorithm 36 .
After association analysis (patients vs controls), SNVs located in the 3′-UTR of genes previously reported to be associated with, or known to be involved in carcinogenesis pathways, were selected. The analysis of carcinogenesis pathways was performed using The Database for Annotation, Visualization and Integrated Discovery 37 and Kyoto Encyclopedia of Genes and Genomes pathway maps 38 .
SNVs showing significant deviation from Hardy-Weinberg (HW) equilibrium in controls, and those with minor allele frequency less than 10%, were excluded from the selection. SNV selection across each of carcinogenesis-related genes was carried out calculating their sample size based on the genotypic frequencies observed in healthy individuals from different ethnic populations 39 and also using MicroSNiPer 40 and MirSNPscore 41 algorithms to select variations in miRNA binding sites sequences of 3′-UTR. We select the SNVs that presented a sample size less or equal than 250 individuals, which is the total patients' sample available in our biorepository.

Scientific Reports
| (2020) 10:17039 | https://doi.org/10.1038/s41598-020-73675-z www.nature.com/scientificreports/ We selected the miRNAs that match six, seven, or eight nucleotides in seed region (position 2-8 of miRNA). The RNAhybrid software 42 was used for finding the energetically most favorable hybridization sites using the minimum free energy (MFE) of hybridization of − 20 kcal/mol or less. As a result, in this study we selected the ERP29 rs7114 for further validation.  Statistical analysis. Association between disease statuses, BTSCC patients vs controls, and genotypes for stage 1 was performed using the logistic regression model. These analyses were adjusted by age at diagnosis, sex and skin color. SNVs that presented raw p-values below the 0.01 thresholds were selected for further inspection. These analyses were implemented in R software (www.r-proje ct.org). The HW equilibrium was tested using chi-square (χ 2 ) statistics for the goodness-of-fit test. Differences between groups were analyzed by χ 2 or Fisher's exact test. Multivariate analysis using logistic regression model served to obtain age-and tobacco status-adjusted crude odds ratios (ORs) with 95% confidence intervals (CI), and to assess associations between genotypes and OPSCC in stage 2. Power of analysis (PA) was used to calculate the minimum effect size that is likely to be detected in a study using a given sample size (DSS Research: https ://bit. ly/2Fe79 sl). χ 2 and Fisher's exact tests were used to evaluate possible associations between clinical characteristics, tumor aspects, and the genotypes of the selected SNV. Considering continuous variables, data sets were probed for normality using Shapiro-Wilk's test. For the ERP29 gene expression, the data set assumed normal distribution and t test was used for analysis. For ERp29 protein content, miR-4421 expression and luciferase assay, data sets did not assume normal distribution, thus, we used the Mann-Whitney test to compare the groups.
For survival analysis, the event-free survival (EFS) was calculated from the date of diagnosis until the date of progression of disease, the first relapse, death by disease, or the last follow-up. Overall survival (OS) was calculated from the date of diagnosis until the date of death, resulting from any cause, or the date of last follow-up. EFS and OS times were calculated using Kaplan-Meier (K-M) estimate probabilities, and differences between survival curves were analyzed by the log-rank test 44 . The prognostic impact of age at diagnosis, sex, histological grade, TNM stage and ERP29 genotypes in survival of OPSCC patients was examined using Cox proportional hazard ratio (HR) regression. In a second step, all variables with p < 0.15 were included in a multivariate Cox regression (backward conditional step wise selection) 44 . For all statistical tests, significance is two-sided and achieved when p-values were ≤ 0.05. Tests were done using the SPSS 21.0 software (SPSS INCORPORATION, USA).

Results
Study population. We present demographic data and smoking and alcohol habits of 250 OPSCC patients and 250 controls in Table S1 Supplement. Control individuals were younger than OPSCC patients (median age: 44 vs 56 years, p < 0.001), and the number of tobacco and alcohol users was higher among patients than in controls (89.6% vs 13.2%, p < 0.001; 78.8% vs 49.2%, p < 0.001; respectively). Differences in age and pattern of tobacco and alcohol habits of individuals of each group were corrected in all comparisons of genotype frequencies by pertinent statistical analyses. Patients were treated with CDDP, radiotherapy (RT), and surgery. One hundred and sixty eight patients (68.6%) were submitted to chemotherapy (CT) and RT combined treatment; 47 patients (19.2%) received CT, RT, and were submitted to surgery; 12 patients (4.9%) received RT and surgery; 7 patients (2.9%) only received CT; 6 patients (2.4%), only RT; and 5 patients (2.0%) were only submitted to surgery. We could not obtain consistent information about the therapeutics of 5 patients.  (Table S4 Supplement). Only six of the 16 SNVs presented the appropriate sample size (n = 250): JMJD6 rs2240774, SLCO2A1 rs2370512, SLC7A11 rs7674870, MYO6 rs6914716, TUSC1 rs1128957 and ERP29 rs7114 (Table S4 Supplement). Among them, we select the SNV ERP29 rs7114 for further validation in stage 2. We found that miR-4421 (MFE: − 27.9 kcal/mol) (Fig. 1A) matched 6mer site complementary in seed sequence of 3′-UTR of variant allele G of ERP29 (rs7114), while the wild-type allele A disrupted these target sites.

Survival analysis of OPSCC patients.
We obtained consistent survival data from 226 OPSCC patients.
The median follow-up of patients enrolled in survival analysis was 33.3 months (range 1.5-162.2 months). The final status of patients was established on January 2019. At this time, 60 patients were alive without disease, 6 patients were alive with disease, 123 patients died due to disease, and 37 patients died due to unrelated causes. The 5-year EFS and OS rates were 42.8% and 38.7%, respectively. At 60 months of follow-up, lower EFS was observed in patients with large tumors (T3 or T4) (37.1% vs 61.6%, p = 0.002), with advanced nodal stage (N2 or N3) (31.2% vs 57.8%, p < 0.001), and with tumors in the BT (33.6% vs 50.2%, p = 0.01) (K-M estimates) compared with others. The significance of differences between groups remained the same in univariate Cox analysis. After multivariate Cox analysis, large tumors (T3 or T4) (HR 1.84, 95% CI 1.16-2.91, p = 0.009) and advanced nodal stage (N2 or N3) (HR 1.88, 95% CI 1.30-2.71, p = 0.001) were found to be predictors of poor EFS (Table S5 Supplement). Table 2. Association of ERP29 rs7114 genotypes, clinical and tumor characteristics of 250 oropharynx squamous cell carcinoma patients. n number of patients. *The number of patients differed from the total quoted in the study (n = 250), because it was not possible to obtain consistent information about histological grade and tumor stage in some cases. **The number of patients differed from the total quoted in the study (n = 250), because it was considered for analysis only the most frequent tumor localization.  (Table 3).
At the same time of follow-up, a shorter OS was perceived only in patients with large tumors (T3 or T4) (20.2% vs 71.4%, p = 0.02) (K-M estimates) compared with others. The significance of differences between groups remained the same in univariate Cox analysis. After multivariate Cox analysis, large tumors (T3 or T4) (HR 2.36, 95% CI 1.13-4.92, p = 0.02) were found to be predictors of poor OS (Table 3). No association was found between ERP29 rs7114 genotypes and OS.

Discussion
We investigated whether the ERP29 SNV rs7114 (c.*293A > G) alters the risk of OPSCC and prognosis of patients with the disease. Besides, we also investigated the role of the distinct alleles (A and G) of the referred SNV in ERP29 expression, ERp29 protein content, miR-4421 expression and its interaction with miR-4421 in pharynx SCC cell line. Table 3. Association of age, tumor characteristics and ERP29 rs7114 genotypes with survival of 102 base of tongue squamous cell carcinoma patients in Cox analysis. n number of patients, OR odds ratio, CI confidence interval. *The number of patients differed from the total quoted in the survival analysis (n = 102), because it was not possible to obtain consistent information in some cases. The significant values are indicated by bold letters. www.nature.com/scientificreports/ ERp29 is an ER chaperone protein and plays a role in protein maturation, secretion, and intercellular communication 5 . Although the controversial role of ERp29 in tumor development and progression 6 , ERp29 is a potential tumor suppressor in cancer 7-10 . Bambang et al. observed that ERp29 acts upregulating a group of genes with tumorous suppressive function such as E-cadherin number 1 (CDH1), cyclin-dependent kinase inhibitor 2B (CDKN2B), and spleen tyrosine kinase (SYK) 10 . Moreover, ERp29 downregulated a group of genes involved in cell proliferation such as epidermal growth factor receptor (EGFR), plasminogen activator receptor (uPAR), cyclin D2 (CCND2), and serine/threonine kinase 1 (AKT) 10 .
Initially, we observed that the variant ERP29 GG genotype was more common in OPSCC patients than in controls, and that individuals with the referred genotype were under 5.67-fold increased risk of OPSCC than others. There are no studies focusing on the role of the referred SNV in the risk of OPSCC or any disease.
The ERP29 SNV rs7114 determines the exchange of adenine (A) by guanine (G) at the 293 positions of the ERP29 3′-UTR, and the variant allele G creates a functional binding site for miR-4421 40,41 . miRNAs inhibit mRNA translation by directly binding to the 3′-UTR of target mRNAs, often accompanied by mRNA degradation 45 . In fact, it was already described that miRNAs are important regulatory molecules in OPSCC 46 . Overexpression of miR-4421 was associated with esophageal carcinoma development 47 . However, the role of miR-4421 in the regulation of gene expression is still unknown.
Actually, we observed that individuals carrying ERP29 AG or GG genotypes presented lower ERP29 expression and ERp29 protein content than individuals carrying wild-type AA genotype. Besides, individuals with AG or GG genotypes also presented higher levels of miR-4421 than individuals with the AA genotype.
Additionally, we cloned the 3′-UTR of ERP29 into a miRNA expression vector co-transfected with miR-4421 in pharynx SCC cells (FaDu) to verify whether ERP29 is a target gene of miR-4421, and to determine the affinity of the different alleles of the SNV rs7114 with 3′-UTR. We observed that cells co-transfected with the variant GG genotype and miR-4421 presented lower luciferase activity when compared with those co-transfected with wild-type AA genotype and miR-4421. This result indicated that miR-4421 downregulated ERP29 expression by targeting the SNV region and, indeed, miR-4421 had more affinity with the variant G allele than with the wildtype A allele. It's worth noting that miR-4421 is physiologically expressed in FaDu cells 48 .
All in all, our results support the fact that individuals with ERP29 AG or GG genotypes were under increased risk of OPSCC due to ERP29 downexpression, possibly modulated by miR-4421, and consequent loss of the tumorous suppressor function [7][8][9][10] .
We also observed that tumors located in BT and those in advanced stages were found to be predictors of poor survival, as reported in previous studies 49,50 . In fact, the OPSCC consists of a group of heterogeneous tumors with a variety of clinical characteristics 1 , thus, we performed the survival analysis only in BTSCC patients.
We observed that BTSCC patients with ERP29 GG variant genotype had worst EFS. None of the genotypes of the studied SNV have influenced the prognosis of patients in previous studies.
Besides the increase in cell proliferation 10 , downexpression of ERp29 was associated with higher cancer cell motility and invasion 10,23 , and worst prognosis of pancreatic ductal adenocarcinoma 21 and gastric cancer 22 patients. In fact, ERp29 can drive MET in breast cancer cells 10 and it may have a critical role in promoting distant metastasis during cancer progression. Furthermore, ERp29 downexpression was associated with decreased apoptosis in curcumin-treated breast cancer cells 51 and in fibroblasts and thyrocytes from null ERp29 mice 52 .
In contrast, ERP29 downexpression was associated with decreased RT resistance in nasopharyngeal carcinoma cells 19,20 , increased CDDP efficacy in lung cancer cell line with null p53 17 , and better prognosis of colorectal cancer patients 13 . Clearly, understanding the association of ERp29 with disease recurrence and distant metastasis is noteworthy for assessing its prognostic value in clinical applications.
Therefore, BTSCC patients carrying the ERP29 GG variant genotype may present worst EFS due to lower ERp29 levels, leading to activation of cell proliferation, loss of cell adhesion, and MET deregulation 10 .
In summary, we identified that inheritable abnormality in ERP29 modulates OPSCC occurrence and acts as an independent prognostic factor for EFS of BTSCC patients. We identified that ERP29 rs7114 SNV is capable of modulating ERp29 levels, possible due to miR-4421 affinity. These findings, once validated by studies with functional protein analyses and large sample sizes, will assist in individualizing the medical care provided to patients, in which high-risk patients should receive a closer follow-up.