Main

MicroRNAs (miRs) are small non-coding RNAs that bind to the target transcript in the 3′-UTR and can inhibit the translation of proteins and destabilise their target mRNA (Baek et al, 2008). miRs are predicted to regulate 30% of the human genome (Lewis et al, 2005) including genes in stress resistance, fat metabolism, cell proliferation and apoptosis pathways (Ambros, 2003). Polymorphisms in miR genes or in genes involved in miR biogenesis may affect miR-mediated cell regulation (Mishra and Bertino, 2009; Clague et al, 2010). miR biogenesis includes generation of a primary transcript (pri-miR) under RNA polymerase II (PolR2A); excision of a stem-loop structure by the nuclear RNaseIII enzyme (Drosha) to generate the pre-miR; transportation of the pre-miR to the cytoplasm and processing by another RNaseIII enzyme (Dicer) into a 22-base mature duplex RNA (Bartel, 2004). An alteration in any step during the maturation process could affect miR production. Impaired miR processing and maturation has been shown to enhance cellular transformation and tumourigenesis (Kumar et al, 2008). Given the mounting evidence implicating miRs in lung cancer development and progression (Yanaihara et al, 2006; Kumar et al, 2008; Landi et al, 2010), we investigated the role of single-nucleotide polymorphisms (SNPs) in the RNA-mediated interference machinery involved in miR maturation in lung cancer.

Materials and methods

We performed SNP genotyping and miR expression profiling using blood and tumour tissue samples from the Environment And Genetics in Lung cancer Etiology (EAGLE) study (Landi et al, 2008), including 2100 primary lung cancer cases and 2120 population controls, frequency matched on age, sex and residence, all Caucasians, enrolled in the Lombardy region of Italy in 2002–2005. Institutional review boards of the enrolling hospitals and National Cancer Institute approved the study and participating subjects signed an informed consent.

Genomic DNA was isolated from blood samples from 1984 cases and 2073 controls and used to genotype 12 SNPs (Table 1) covering different haplotype blocks in POLR2A, RNASEN (Drosha) and DICER1 (Figure 1). Genotyping was performed at the Core Genotyping Facility, NCI, using TaqMan assays (http://snp500cancer.nci.nih.gov). Duplicate quality-control samples (2%) showed 100% agreement in all assays. Subjects with at least a 90% genotype call-rate (1946 cases, 1982 controls) were included in the final analyses. All SNPs passed the Hardy–Weinberg equilibrium test among controls (P0.26).

Table 1 SNPs in the RNA-mediated interference machinery and correlation with miR expression
Figure 1
figure 1

SNPs coverage for the Drosha gene. SNPs data available from the HapMap v3 database for the chromosomal region corresponding to the Drosha gene. The six SNPs studied in this report are shown in the insets, and linkage disequilibrium (r2) data from HapMap are compared with data in controls from the EAGLE population showing very similar patterns between the two datasets.

The miR expression data were derived from formalin-fixed paraffin-embedded (FFPE) tissue samples in 165 lung adenocarcinoma (AD) and 125 squamous cell carcinoma (SQ) cases from EAGLE, who had not undergone chemotherapy or radiation therapy before tissue collection. Lung cancer histology and the presence of malignant cells in the FFPE tissue blocks were ascertained by the EAGLE local pathologists and were reviewed by a pathologist from the NCI. We excluded tissue blocks with mixed histologies or low frequency of malignant cells. miR expression profiles were obtained using a custom-made two-channel oligo array. The miRs represented on the array, data pre-processing, quality control procedures and selection of the analysed 199 human miRs were described previously (Landi et al, 2010). Array results were also confirmed by qRT–PCR using Taqman miRNA assays (Applied Biosystems, Foster City, CA, USA) in 49 samples from EAGLE that had sufficient tumour miR remaining after the array analysis (Landi et al, 2010). In addition, we analysed the association between RNASEN/rs640831 and RNASEN gene expression in non-involved lung tissue from 45 AD patients from EAGLE using data from an Affymetrix Chip HG U133A (Affymetrix Inc., Santa Clara, CA, USA).

We tested single SNP and haplotype associations with lung cancer risk in all 3928 subjects and with survival in the 1946 lung cancer cases. In single SNP analyses, homozygosity for the more frequent allele among controls was defined as the reference group and both additive and dominant models were considered. In the haplotype analyses, the most common haplotype was defined as the reference group. The associations between variant genotypes and risk of lung cancer were estimated by odds ratios and their 95% confidence intervals using unconditional logistic regression, adjusted for categories of age, sex, residence, cumulative smoking dose (pack-years), smoking intensity (cigarettes per day), and years-since-quitting smoking. We also performed subgroup analyses by smoking status (never/ever) and major histology types and analyses restricted to patients with resectable tumours (stage I, II and IIIA). Lung cancer-specific survival was defined as time from diagnosis of lung cancer to time of death due to lung cancer. Patients who were still alive at the time of last follow-up (n=439) or died for causes unrelated to lung cancer (n=98) were censored in the analyses. The association of SNPs and haplotypes with survival-time adjusted for age, sex, stage, and smoking status was estimated by fitting Cox proportional hazards model (Cox, 1972) in patients overall and separately for AD and SQ.

We evaluated the association between miR expression and SNPs within a dominant model in 290 cases using the t-test statistic for each miR-SNP combination. For each SNP we counted the number of significant (P0.05) miR-SNP associations (ns) and then computed a global permutation P-value to evaluate the significance of the association between the SNP and the global miR expression profile. The permutation test for each SNPi (i=1, 2, …, 12) was performed by repeating 9999 times the 199 miRj−SNPip (j=1, 2, …, 199) association t-tests, where each time SNPip was a random permutation across subjects of the original SNPi (P=1, 2, …, 9999) data and the number of significant miRj−SNPip associations (nsp) was recorded for each permutation. The global P-value was then defined as one plus the number of times in which nsp was at least as large as ns (numerator) divided by the total number of permutations plus one (i.e., denominator=10 000). Finally, we evaluated the association between RNASEN mRNA expression and the rs640831 SNP with a dominant model in 45 AD cases using the t-test statistic.

To account for the fact that tests were conducted for 12 different SNPs and 3 haplotypes, we considered 0.01 as P-value threshold for statistical significance. All analyses were implemented and performed using the R-project (v2.10) statistical package (http://www.r-project.org/index.html) with the exception of the haplotype analysis, conducted using the THESIAS program (Tregouet and Garelle, 2007).

Results

None of the investigated polymorphisms in POLR2A, RNASEN and DICER1 showed significant association with lung cancer risk or lung cancer survival either overall or by subgroups of histology or smoking status. Analyses based on additive and dominant models gave similar results (Supplementary Materials 1–8). However, we found that a RNASEN haplotype, GTAATC (frequency=2%), was significantly associated with lung cancer-specific reduced survival compared with the most common haplotype GTACCT (frequency=30%) among all cases with hazard ratio (HR)=1.86, 95% CI=1.19–2.92 and P=0.007. Similar results were obtained in AD and SQ histological subtypes (HRs=2.33, 3.27; 95% CIs=1.04–5.25, 1.44–7.43; and P=0.041, 0.005, respectively) and among ever smoker patients (HR=1.81; 95% CI=1.16–2.82; P=0.009). This haplotype-survival association was not observed among the 133 never smoker patients (HR=0.82; 95% CI=0.33–2.06; P=0.679). When we repeated this analysis among the 821 patients with early stage resectable tumours (stage=I, II and IIIA), the association between RNASEN haplotype GTAATC and reduced lung cancer-specific survival was even stronger (HR=2.36, 95% CI=1.36–4.09 and P=0.002). Results were also significant in patients with resectable AD and resectable SQ separately (HRs=1.95, 3.99; 95% CIs=1.00–3.81, 1.75–9.11; and P=0.05, 0.001, respectively).

We further studied whether the 12 SNPs were associated with expression of mature miRs in lung cancer tissue for all samples and for AD and SQ separately (Table 1). In AD patients, RNASEN/rs640831, included in the GTACCT haplotype, was associated with the expression of 56 miRs (global P=0.007, Table 2). On average, for subjects who inherited this SNP, 37 miRs were upregulated and 19 miRs were downregulated in comparison to subjects with the consensus genotype. miRs with tumour suppressor potential (e.g., let-7 family) and miRs with oncogenic or metastatic potential (e.g., miR-21 (Zhu et al, 2008), miR-126 (Crawford et al, 2008) and miR-15a (Cimmino et al, 2005)), were among those with altered expression in the carriers.

Table 2 miRs significantly correlated with RNASEN SNP rs640831 in AD patients

We validated the microarray results by qRT–PCR for 4 of the 56 miRs significantly associated with RNASEN/rs640831 in 49 EAGLE lung tumour samples. As shown in Supplementary Figure 1, the correlation was highly significant (P=0.001, <0.0001, <0.0001, and 0.002 for let-7g, let-7f, miR-26a and miR-107, respectively). As expected, the correlation between the microarray and qRT–PCR data was inverse as qRT–PCR values are measured in terms of number of measurement cycles needed to reach a certain expression level: the lower the number of cycles the higher the detected expression level. In addition, the association between the expressions as measured by qRT–PCR and RNASEN/rs640831 was qualitatively concordant with the microarray-based results (inverse association in the 24 AD but not in the 23 SQ cases).

Finally, to further elucidate our finding of a correlation between the RNASEN/rs640831 and the miR expression profile among AD cases, we tested the association between RNASEN gene expression and the rs640831 polymorphism in non-involved lung tissue of 45 AD patients from EAGLE. The 25 AD patients carrying one or two rs640831 minor variants showed a significantly lower mRNA expression than the 20 AD patients homozygous with the more frequent allele (fold change=0.87, P=0.013).

Discussion

We have observed (i) an association between lung cancer survival and a haplotype in RNASEN, particularly, among early stage patients and (ii) a differentially expressed miR profile and RNASEN gene expression by RNASEN/rs640831 status in lung tissue. Carrying the minor variant A vs the common variant C in RNASEN/rs640831 contributed to the survival association for the RNASEN haplotype GTA A TC compared with the haplotype GTA C CT. Our results are consistent with the combined effect of multiple genetic markers within a haplotype as better representing the impact of the genetic locus on disease progression than individual markers (Johnson et al, 2001; Crawford and Nickerson, 2005). This is the first evidence that inherited variation in the miR-processing machinery, more specifically in RNASEN, might affect survival from lung cancer. Previous studies have shown that low RNASEN gene expression was associated with survival in oesophageal cancer patients (Sugito et al, 2006) and, suggestively, with reduced survival in non-small-cell lung carcinoma patients (Karube et al, 2005). Our findings provide a possible genetic basis for the previous reports. The most frequent variant in RNASEN was associated with miR expression changes and with lower RNASEN mRNA expression in AD. Several of these miRs have been previously reported to be associated with lung cancer survival in the EAGLE study (Landi et al, 2010) and other lung cancer studies (Markou et al, 2008; Yu et al, 2008; Raponi et al, 2009). An analogous global modification of miR profile due to changes in Drosha transcript expression level has been observed in cervical cancer (Muralidhar et al, 2007). A similar finding was not observed in our analysis restricted to cases with SQ histology, suggesting that the mechanism by which RNASEN/rs640831 affects miR expression is specific to AD lung tissues. Such observation is consistent with previous studies showing that protein levels of genes involved in the endogenous miR machinery differ between lung AD and SQ histology subtypes (Chiosea et al, 2007). It is also possible that the relatively small sample size of the SQ affected the results.

In contrast to the survival analysis that was based on a large sample (n2000), allowing for exploration of several models, including haplotypes, the miR expression analysis was based on a smaller sample size (n300, of which 125 were SQ). Consequently, only the most informative SNP (MAF=34%) could be adequately tested with the miR profile. In contrast, we were not able to detect possible effects of the full haplotype in association with the miR profile, because this test would have required hundreds of samples with miR expression data. Further functional studies and/or larger miR profiling studies are necessary to confirm our findings.

In conclusion, in the largest study of SNPs in genes involved in miR biogenesis and lung cancer to date, an inherited variant in the RNASEN gene, coding for the Drosha enzyme, was associated with reduced expression of the Drosha gene and changes in the expression of miRs involved in many cancer-related processes. A haplotype in Drosha that included the inherited RNASEN variant and five additional variants was associated with poor lung cancer survival, particularly, among early stage patients. Drosha gene variants may affect miR-processing machinery including miR maturation and expression level, which may consequently affect the function of miRs in transcript and protein translation regulation. These variants in Drosha may have potential prognostic implications in lung cancer.