An integrated genome-wide approach to discover deregulated microRNAs in non-small cell lung cancer: Clinical significance of miR-23b-3p deregulation

In spite of significant technical advances, genesis and progression of non-small cell lung cancer (NSCLC) remain poorly understood. We undertook an integrated genetic approach to discover novel microRNAs that were deregulated in NSCLCs. A total 119 primary NSCLCs with matched normal were analyzed for genome-wide copy number changes. We also tested a subset of matched samples by microRNA expression array, and integrated them to identify microRNAs positioned in allelic imbalance area. Our findings support that most of the identified deregulated microRNAs (miR-21, miR-23b, miR-31, miR-126, miR-150, and miR-205) were positioned in allelic imbalance areas. Among microRNAs tested in independent 114 NSCLCs, overexpression of miR-23b was revealed to be a significantly poor prognostic factor of recurrence free survival (HR = 2.40, P = 0.005, 95%CI: 1.32–4.29) and overall survival (HR = 2.35, P = 0.005, 95%CI: 1.30–4.19) in multivariable analysis. In addition, overexpression of miR-23b in H1838 cell line significantly increased cell proliferation, while inhibition of miR-23b in H1437 and H1944 cell lines significantly decreased cell doubling time. In summary, integration of genomic analysis and microRNA expression profiling could identify novel cancer-related microRNAs, and miR-23b could be a potential prognostic marker for early stage NSCLCs. Further biological studies of miR-23b are warranted for the potential development of targeted therapy.

In spite of significant technical advances, genesis and progression of non-small cell lung cancer (NSCLC) remain poorly understood. We undertook an integrated genetic approach to discover novel microRNAs that were deregulated in NSCLCs. A total 119 primary NSCLCs with matched normal were analyzed for genome-wide copy number changes. We also tested a subset of matched samples by microRNA expression array, and integrated them to identify microRNAs positioned in allelic imbalance area. Our findings support that most of the identified deregulated microRNAs (miR-21, miR-23b, miR-31, miR-126, miR-150, and miR-205) were positioned in allelic imbalance areas. Among microRNAs tested in independent 114 NSCLCs, overexpression of miR-23b was revealed to be a significantly poor prognostic factor of recurrence free survival (HR = 2.40, P = 0.005, 95%CI: 1.32-4.29) and overall survival (HR = 2.35, P = 0.005, 95%CI: 1.30-4. 19) in multivariable analysis. In addition, overexpression of miR-23b in H1838 cell line significantly increased cell proliferation, while inhibition of miR-23b in H1437 and H1944 cell lines significantly decreased cell doubling time. In summary, integration of genomic analysis and microRNA expression profiling could identify novel cancer-related microRNAs, and miR-23b could be a potential prognostic marker for early stage NSCLCs. Further biological studies of miR-23b are warranted for the potential development of targeted therapy.
Surgical resection remains to be the major therapy for stage I and II cases of non-small cell lung cancer (NSCLC) and is considered as a first line of treatment for better survival 1,2 . A recent report 3 indicated that 5-year survival of 289 stage I NSCLC cases after surgical treatment was 63%. Many types of combination chemotherapy and targeted therapy have been investigated 4 . However, to date only limited response and survival benefit has been shown with any regimen 5 . The basis of these disappointing results stems from a lack of understanding of the basic biological mechanisms involved in the carcinogenesis of NSCLC.

Results
We have employed an integrated genomic approach to identify altered genomic regions and deregulated microRNAs in NSCLC. Genome-wide copy number analysis in a well characterized cohort of NSCLC samples (Supplementary Figure S1) allow us to identify de novo allelic imbalance area in tumors that may underlie NSCLC tumorigenesis. Moreover, integration of these allelic imbalance areas with microRNA expression array data enabled us to select microRNA of interest for further studies.
Detection of allelic imbalance areas and associated genes from copy number analysis. An overview of copy numbers of 66 adenocarcinoma (ADC) cases in whole genome were shown in Fig. 1A. Amplified regions were shown in red color and deleted regions were indicated by blue color. In each chromosome, copy number changes were analyzed more details and correlated with approximate locus (representative example for Chr.1 is shown Fig. 1B). As evident from Fig. 1B, amplified regions are more frequent in the chromosomal arm 1q (red color) in comparison with chromosomal arm 1p (blue color). An overview of whole genome copy number pattern was shown in Fig. 1C. In our analysis, copy numbers over 2.50 were defined as amplified (upper red bar) and those under 1.70 were considered as deleted (lower blue bar). Based on the later empirical cut-off of copy numbers, amplified regions and deleted regions were shown for ADC in Fig. 1B for chromosome 1. The same procedure was also performed for 53 squamous cell carcinoma (SCC) cases (data not shown). A summary of allelic imbalanced locus that showed over 15% amplification and/or deletion in ADC and SCC are available in Supplementary  Table S2. Briefly, notable amplified area identified in our analysis were: 1q21.1-1q23.3 (ADC 22.7%, SCC 16.9%) that contained potential oncogene S100A9; 8q21. 3  . More details of copy number alterations in Chromosomal arms 1p and 1q are displayed. The headplot at the bottom depicted the log2 ratio of copy number intensities; red color for amplifications and blue color for deletions. (C). Genome-wide copy number frequencies are plotted according to their chromosomal locations. The light grey line within each chromosome denotes the centromere, separating both p and q arms. Red color denotes copy number amplification, blue color denotes copy number deletion and black color denotes copy neutral LOH regions (somatic uniparental disomy). poorly characterized TSG USP29. In summary, in addition to confirm previously reported allelic imbalance loci such as LOH and copy number changes on chromosome 2q, 4q, 5q, 8p, 8q, 9p22 and 19q13, we identified novel LOH on 1q42 and copy number amplification on 1q21.1 and 1q32-q42 that contain some cancer associated genes that may have role in NSCLC tumorigenesis. Deregulated microRNAs determined by microRNA expression array. Primary NSCLC and corresponding normal tissues were analyzed in 8 pairs of samples. Supplementary Table S3 summarized the clinicopathological information of all the 8 cases. The microRNA array we used here contained 688 mature microRNAs. Two types of analysis were carried out to identify probes which showed differential expression between tumor and adjacent normal tissues. Firstly, the expression ratios of individual microRNA between tumor and adjacent normal tissue were compared. Representative scatter plots are shown in Fig. 2A for 4 cases. Secondly, mean expression ratio profile across all patients was compared to the mean adjacent normal expression profile, and the differentially expressed microRNAs are shown in Fig. 2B a. In both analyses, differentially expressed microRNAs were expected to deviate from the bulk population. Four overexpressed (miR-21, miR-193b, miR-205, miR-296) and 4 under expressed microR-NAs (miR-126, miR-23b, miR-145 and let-7b) in tumor samples were identified by comparing mean expression values between tumors and normal (Fig. 2B a). Differential expression patterns were also observed in ADC or SCC specific manner (Fig. 2B b,c). Both ADCs and SCCs showed up-regulation of miR-205 and down-regulation of miR-126 and miR-145 in tumors. Up-regulations of miR-21, miR-150 and miR-296 were only found in ADCs, whereas up-regulation of miR-31 and down-regulation of let-7b were only seen in SCCs. In summary, considering mean expression values in all the 8 NSCLCs samples and group of ADC and SCC, 5 over-expressed (miR-21, miR-31, miR-150, miR-205, miR-296) and 3 under expressed microRNAs (miR-126, miR-23b, miR-145) in tumor samples were determined for technical validation.

Association of microRNA expression with clinicopathological factors. We analyzed whether
there is any association of each microRNA expression with all the available clinicopathological parameters (Supplementary Table S8). In the independent early stage dominant cohort (n = 114), positive smoking history was significantly associated with low level miR-126 (P = 0.030, student's t-test), N0 stage with high level miR-23b (P = 0.047) and TNM stage I with high level miR-205 (P = 0.034). MicroRNA expression array results of representative 4 tumor-normal paired samples. Significantly differential expression between tumors and adjacent normal samples were found in some of microRNAs (red dots). (B). Scatter plots of promising microRNA that differentially expressed between tumors and normal by array analysis are shown: (a) Average microRNA expression ratios between tumor and adjacent normal tissue of all 8 cases (y axis) are plotted according to the mean expression (x axis). Differentially expressed microRNAs are expected to deviate from the bulk population (red dots). (b) Sub-group analyses scatter plot of mean tumor expression levels (y axis) and adjacent normal expression levels (x axis) of individual microRNAs for 3 Adenocarcinoma cases and (c) same for 5 squamous cell carcinoma cases. Probes with a large differential expression are identified as red dots.
Significant expression difference between ADC and SCC was only seen for miR-205 among 8 candidate microRNAs. SCC cell type was associated with miR-205 overexpression (P < 0.001) as previously reported 17,18 . Although microRNA profile has been reported to be different between African-American and Caucasian 19 , in our study no significant difference was found in any candidate microRNA expression among these two ethnic groups.
All the 114 patients of independent cohort were available for follow-up. The mean follow-up period was 46.3 months (range 1.0-204.0 months). Table 2  . We also examined these factors in TNM Stage I cases (n = 63) ( Table 3). Although many of factors turned out to be non-significant, histological SCC and high miR-23b expression were still significant prognostic factors in multivariable analysis of both recurrence free survival (RFS) and overall survival (OS). Especially, miR-23b was the solely significant miR factor of RFS (HR = 2.46, P = 0.041, 95% CI:1.04-5.62) and OS (HR = 2.64, P = 0.021, 95%CI:1.16-5.85) in this cohort of NSCLCs. As miR-23b overexpression was correlated significantly with poor RFS and OS, we further performed Kaplan-Meier curves analysis for miR-23b and associated clinical factors. Kaplan-Meier curves of RFS and OS in relation with miR-23b expression were shown in Fig. 5A. In log-rank test, overexpression of mir-23b correlated significantly with poor RFS (P = 0.020) and poor OS (P = 0.013) (Fig. 5A a). These findings remained statistically significant in subgroup analysis adjusting for T1 stage (RFS: P = 0.021,   MicroRNA expression and allelic imbalance. We integrated allelic imbalance area from copy number analysis cohort (n = 119) ( Fig. 3) with loci of all the 8 microRNAs we tested in this study. We considered allelic imbalance loci if a SNP locus is amplified or deleted at least in 15% of our tested samples.
The integrated data of copy number variation (CNV) determined by SNP array analysis and deregulated candidate microRNAs were shown in Table 4. Four out of 8 deregulated microRNAs were located in gene locus with over 15% allelic imbalance (amplification or deletion Overexpression and inhibition of miR-23b in lung cancer cell lines. To determine the biological consequences of miR-23b deregulation, we first analyzed the expression level of miR-23b in a panel of NSCLC cell lines and one immortalized normal broncho-epithelial cell line (Fig. 5B a). We decided to perform functional analysis using cell lines that were derived from ADCs because of the wide variability of expression levels. We selected H1838 cell line for miR-23b overexpression (mimic), and H1437 & H1944 cell lines for inhibition of miR-23b. We performed [3-(4,5-dimethylthiazol-2-yl)-2,5 diphenyl tetrazolium bromide] (MTT) assay using these three cells either overexpressing or inhibiting miR-23b. In the mimic study, cell viability was significantly increased in miR-23b transfected H1838 cells than control at 72 hours after transfection (p < 0.001, student t-test). While in the inhibition study, miR-23b siRNA transfected H1437 and H1944 cells showed significant decrease in cells viability compared to the control at 72 hours after transfection (p < 0.001, student t-test) (Fig. 5B b). These phenotypic characteristics due to forced alterations of miR-23b were consistent with primary NSCLC data that miR-23b is a potential oncogene.

Discussion
Generally, recurrent losses and gains of broad chromosomal regions in cancer suggest that multiple genes located in the same chromosomal regions may be concurrently function as a tumor suppressor gene (TSG) and oncogene. In addition to the identification of novel allelic imbalance area, our data provided confirmation of previously reported allelic imbalance area that may contain novel NSCLC related cancer genes. Furthermore, the integrated approach we employed in this study support the previous assumption that almost half of microRNA are located in the allelic imbalance areas. NSCLC is an extensively heterogeneous disease at the molecular level which in part may be related to easy exposure of lung to different kinds of environmental stimuli. Various environmental stimuli and  Table 3. Correlation of clinicopathological and miR factors with clinical outcome in TNM Stage I case (n = 63). AA: African-American, ADC: adenocarcinoma, SCC: squamous cell carcinoma, N.S.: not significant, Hazard ratios, 95% confidence intervals and P values were obtained using Cox proportional hazards models for RFS and OS. If significant displayed bold. MiR-23b mimic was transfected to H1838 which showed low level miR-23b expression among the cell lines, while mir-23b inhibitor was transfected to H1437 and H1944 cell lines that showed high level miR-23b expression among the cell lines. Average cell proliferation ratio ± standard error was shown at each time point. P values were shown if there was significance between transfected cells and controls (student t-test, two-tailed).
individual response to a particular stimuli leads to differential molecular alterations that include mutations, translocations, copy number alterations etc. All these alterations may have an impact on NSCLC initiation and/or progression and may be targeted for therapy and the development of preventive strategies. Comprehensive investigation of focal copy number alterations in NSCLC has led to the identification of multiple cancer-driving genes with potential therapeutic implications [20][21][22] . However, all these approaches for identification of genetic alterations have limitations and therefore development of novel analytical approaches may generate new, previously not discovered targets for further studies. By our analysis, we have identified novel regions of allelic imbalance that contain several cancer related genes such as CHD1L and S100A9 at locus 1q21.1; and CENPF and ESRRG at locus 1q32-q42. However, since the main focus of this study was to identify NSCLC related microRNAs, we yet not further validated and functionally characterized any of these genes for potential clinical implications.
We have identified several differentially expressed microRNAs in cancer and matching normal samples of NSCLC. Some of our findings are consistent with previously reported data while others are inconsistent. As for example, miR-31 was reported to be overexpressed in colorectal cancer 23 which is consistent with our finding; however, it was also reported to be under expressed due to promoter hypermethylation in prostate cancer 24 . These contradictions could be due to tumor context, technology used for analysis and different endogenous and exogenous stimuli for the genesis of a given tumor in a given population. Other microRNAs that were overexpressed in our study such as miR-21, miR-205 and miR-296 were also reported to be deregulated in other cancer types. As for example, miR-205 was reported to be under expressed in non-muscle invasive urothelial cancer, however it was found to be overexpressed in muscle-invasive urothelial cancer 25 . Similarly in consistent with our results, miR-126 and miR-145 were also reported to be under expressed in NSCLC 26,27 , and functionally miR-145 was revealed to inhibit NSCLC cell proliferation by targeting a well-known oncogene, c-Myc 28 . However, contradictory findings were also reported for miR-145 29 . In the later study, although miR-145 was under expressed in lung cancer, overexpression of miR-145 was related to poor survival. Thus, detail biological studies are necessary for understanding the exact role of miR-145 in the genesis of NSCLC.
Our array analysis data indicate that miR-23b is under expressed in NSCLC (Fig. 2B a). However, in our independent cohort study by Q-RT-PCR (Fig. 4), we found that miR-23b was overexpressed in NSCLC. These differences may derive from the fact that the array result only showed the total average of 8 samples. Total average is not necessarily indicates whether the overexpressed or under expressed cases are dominant. Actually, each of subgroup analysis (3ADCs and 5 SCCs) of miR-23b did not show either overexpression or under expression (Supplementary Table S5) which is also supported by Q-RT-PCR results of training set (n = 18) (Supplementary Table S6). Another explanation is the existence of dual role of this microRNA as described above for miR-205. If the function of a given microRNA differs in each tumor stage, the expression pattern will depend on the characteristics of the cohort. Actually, miR-23b expression was significantly different between pathological N0 stage and N1/N2 stages (P = 0.047, student t-test, Supplementary Table S8). Both overexpression and under expression of miR-23b were reported previously in various solid tumors 30 . Our independent tumor cohort (n = 114) mainly consisted of TNM stage I tumors (63/114, 55.2%), and it is slightly different from array cohort (3/8, 37.5%) (Supplementary Table S7).
The integration of allelic imbalance determined by SNP array and our limited microRNA expression data support that half of the microRNAs are positioned in allelic imbalance areas. Although loci imbalance status of miR-126 and miR-150 (Table 4) showed good correlation with microRNA expression by Q-RT-PCR (Supplementary Table S5, S6), the expression pattern of miR-31 and miR-205 didn't match their loci imbalance status. Possible explanations underlying these inconsistencies are existence of another deregulating mechanism, inadequate threshold setting or array noises. Meanwhile, in order to find low level allelic imbalance, we manually analyzed SNP data in the locus of miR-21 and miR-23b using the same threshold (amplification > 2.5, deletion < 1.7). Expressions of both of these microRNAs were overexpressed in tumors by Q-RT-PCR analysis (Fig. 4). We then extended our analysis in all the  Table 4. Association of microRNA expression and allelic imbalance.
SNP data set (n = 119) and as expected observed one of previously well characterized microRNA, miR-21 locus showed significant amplification in tumor (Supplementary Table S9), and our focus miR-23b locus was also showed significant amplification in tumor (Supplementary Table S10). Overexpression of miR-21 and miR-23b in NSCLCs seemed to be partially due to allelic amplification. While it is evident that miR-23b is a potential oncogene for the genesis and progression of NSCLC by our initial functional analysis data and testing of NSCLC samples, miR-23b has been found to have dual role in carcinogenesis, both as a tumor suppressor and as an oncomir. A detailed review of functional consequences by miR-23b has been published recently 30,31 . As a tumor suppressor, Majid et al. 31 reported that miR-23b was frequently silenced in prostate cancer by methylation, and it had anti-proliferative and anti-invasive properties through repressing Src kinase/Akt pathway. Promoter hypermethylation of miR-23b was also found in gliomas 32 . Another recent report suggested that radio-resistant pancreatic cancer cell lines showed the reduced levels of miR-23b, and overexpression of miR-23b sensitized the cells to radiation by targeting ATG12, a known autophagy-related protein 33 . On the contrary, Jin et al. indicated the oncogenic character of miR-23b in breast cancer, showing that ERBB2, EGF and TNF-α promote its expression through AKT/NF-κ B pathway 34 . Chen et al. revealed that down-regulation of miR-23b was followed by the inhibition of β -catenin/Tcf-4 and HIF-1α /VEGF signaling pathways in glioma 35 . Recently, Zaman et al. found that PTEN was overexpressed after miR-23b-3p knock-down in renal cancer cells and also found inverse correlation of miR-23b with PTEN expression in human samples 36 . PTEN inactivation was reported as a poor prognostic factor 37,38 or a chemotherapy resistant factor 39,40 for NSCLC. As a future study, we have to elucidate the gene targets of miR-23b in NSCLCs including PTEN 36 , and find mechanism of involvement of miR-23b in lung carcinogenesis.
There are several reports about the prognostic significance of miRNAs in early stage NSCLCs 41-45 . Yanaihara et al tested a panel of miRNAs in 104 NSCLCs including 65 stage I tumors (62.5%) 46 . They found miR-205, miR-21 and miR-150 were overexpressed, while miR-126 and miR-145 were under expressed in NSCLCs. Other groups reported overexpression of miR-31 in NSCLCs 47,48 . Patnaik et al examined miRNA expression profiles that might predict recurrence of localized stage I NSCLC after surgery 49 . They compared 37 recurrent and 40 non-recurrent cases. Although they did not focus on mir-23b in their study, it was listed as one of the significantly overexpressed microRNAs in recurrent cases (P = 0.003, Fold change: 2.51). All these reports may partially support our results. In this study, we evaluated prognostic significance of miRNAs expression in early stage NSCLCs using formalin fixed paraffin embedded (FFPE) samples. If reliable prognostic markers would be available in early stage NSCLC, patients could be handled in a more appropriate way to increase survival time. In addition, miRNA marker also has potential for monitoring of disease as miRNAs are more stable than messenger RNAs due to its small size 50 , and can be tested in bodily fluids such as in sputum 51 and serum 52 . Several studies have already been published using miRNA microarray data from FFPE samples [53][54][55] . As they are often the only available tissue source with comprehensive clinical data and long-term follow up, it is meaningful to prove their qualities. Hall et al showed that microRNAs were not subjected to the same deterioration seen in other RNA types 53 . Based on these reports, this study also showed that micro array results from FFPE samples were almost consistent with Q-RT-PCR results.
In conclusion, we have identified novel allelic imbalance regions that could harbor potential NSCLC related genes. Our integrated analysis revealed that a substantial numbers of microRNA were located in allelic imbalance area. More interestingly, from the clinical context, increased miR-23b expression in the tumor is a novel candidate biomarker of significant for poor survival of NSCLC patients. However, further validation of miR-23b in a multi-centered prospective study is needed before any potential clinical implementation; and detail miR-23b deregulation mechanisms including downstream pathway should be evaluated not only for the potential as a prognostic marker but also for suitability of early detection marker and targeted therapy of NSCLC.

Methods
Clinical samples. Different set of NSCLC clinical samples were tested in this study. An overview of study design was shown in Fig. 3. A total 119 patients undergoing surgical resection of a primary NSCLC at The Johns Hopkins Hospital (Baltimore, MD, USA), the Johns Hopkins Bayview Medical Center (Baltimore, MD, USA) or the Medical College of Wisconsin, Froedtert Memorial Hospital (Milwaukee, WI, USA) were included for SNP array analysis. Among these specimens, 66 were adenocarcinoma (ADC) (including those with bronchoalveolar components), and 53 were squamous cell carcinoma (SCC). Details criteria of these samples are available in Supplementary Table S1. A subset (8 pairs) of the cohort (Supplementary Table S3) was analyzed for microRNA expression array analysis, and 8 candidate microRNAs were selected for further validation based of microRNA array data bioinformatics analysis. Ten additional sample pairs were added to these 8 samples, and tested by Q-RT-PCR as a training set (n = 18) (Fig. 3). As an independent set, 114 NSCLC tumors were used for testing 8 candidate microRNAs. The clinicopathological characteristics of this independent cohort of 114 NSCLC samples were summarized in Supplementary Table S7 DNA extraction. Hematoxylin-Eosin stained sections were histologically examined at every 20 sections for the presence or absence of tumor cells, as well as for tumor density. Only sections that showed more than 70% of tumor cells were used for DNA extraction. Microdisected tissues and lymphocytes were digested with 1% SDS and 50 μ g/ml proteinase K (Boehringer, Mannheim, Germany) at 48°C overnight followed by phenol/chloroform extraction and ethanol precipitation of DNA as previously described 56 .
SNP microarray analysis. Genomic DNA from microdissected frozen tumor tissues and corresponding lymphocyte were analyzed in parallel. Briefly 250 ng DNA was digested with XbaI (New England Biolabs Inc., Ipswich, MA, USA), ligated to the adaptor, and amplified by polymerase chain reaction (PCR) using a single primer. After purification of PCR products with the MinElute 96 UF PCR purification kit (Qiagen, Valencia, CA, USA), amplicons were quantified, fragmented, labeled and subsequently hybridized on Affymetrix GeneChip1 Mapping 10 K 2.0 SNP microarrays following the manufacturer's instructions (Affymetrix Inc., Santa Clara, CA, USA). After washing and staining, the arrays were scanned for data analysis.
RNA extraction for microRNA expression analysis. Total RNA was extracted from the 10-μ m-thick FFPE tissue sections as previously described 57 using the Ambion RecoverAll Total Nucleic Acid Isolation Kit for FFPE tissues (Applied Biosystems/Ambion, Austin, TX, USA) according to the manufacturer's instructions. Only sections that showed more than 70% of tumor cells were used for RNA extraction.
The quantity and quality of the total RNA was verified with the NanoDrop spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). Twelve pairs of tumor-adjacent samples were hybridized. Of these, 8 pairs were considered highest quality and were used for the following experiments. miRdicator TM array platform. Custom microarrays were produced by printing DNA oligonucleotide probes representing 688 microRNAs (Sanger database, version 9 and additional Rosetta validated and predicted microRNAs). Each probe, printed in triplicate, was carried up to 22-nt linker at the 3′ end of the microRNA's complement sequence in addition to an amine group used to couple the probes to coated glass slides. 10/20 μ M of each probe were dissolved in 2× SSC + 0.0035% SDS and spotted in triplicate on Schott Nexterion ® Slide E (Applied Microarrays Inc., Tempe, AZ, USA) coated microarray slides using a Genomic Solutions ® BioRobotics MicroGrid II (Genomic solutions, Beverly, MA, USA) according to the manufacturer's directions. Sixty four negative control probes were designed using the sense sequences of different microRNAs. Two groups of positive control probes were designed to hybridize for miRdicator TM array (Rosetta Genomics Inc., Philadelphia, PA, USA) synthetic spikes. Small RNA was added to the RNA before labeling to verify the labeling efficiency. Probes for abundant small RNA [e.g. small nuclear RNAs (U43, U49, U24, Z30, U6, U48, U44), 5.8 s and 5 s ribosomal RNA] were spotted on the array to verify RNA quality. The slides were blocked in a solution containing 50 mM ethanolamine, 1 M Tris (pH9.0) and 0.1% SDS for 20 min at 50 °C , then thoroughly rinsed with water and spun dry.
Cy-dye labeling of microRNA for miRdicator TM array. Total RNA (3-5 μ g) was labeled by ligation of a RNA-linker, p-rCrU-Cy/dye (Dharmacon, Lafayette, CO, USA) 58 to the 3′ -end with Cy3 or Cy5. The labeling reaction contained total RNA, spikes (20-0.1 fmoles), 300 ng RNA-linker-dye (Dharmacon) , 15% DMSO, 1x ligase buffer and 20 units of T4 RNA ligase (New England Biolabs Inc.) and proceeded at 40 °C for 1 hr followed by 1 hr at 37 °C. The labeled RNA was mixed with 3x hybridization buffer (Ambion, Austin, TX, USA), heated to 95 °C for 3 min and then added on top of the miRdicator TM array. Slides were hybridize 12-16 hr in 42 °C, followed by two washes in room temperature with 1xSSC and 0.2% SDS and a final wash with 0.1xSSC. The array was scanned using an Agilent Microarray Scanner Bundle G2565BA resolution of 10 μ m at 100% power) (Agilent Technologies, Santa Clara, CA, USA). The data was analyzed using SpotReader software (Niles Scientific, Seattle, WA). Standard bioinformatics and statistical analysis were performed.

Real Time reverse transcriptase (RT) polymerase chain reaction (Q-RT-PCR) for Quantification of microRNAs.
Briefly, a total of 10 ng RNA isolated from primary tissues was reverse transcribed using TaqMan reverse transcription kit (Applied Biosystems, Foster City, CA, USA) and microRNA-specific primers provided with TaqMan microRNA assays (Applied Biosystems) in 15 μ L reaction volume that contains 3 μ L of RT Primer Mix, 0.15 μ L of 100 mM dNTPs, 1 μ L of Reverse Transcriptase enzyme 50 U/μ L, 0.19 μ L of RNase inhibitor 20 U/μ L, 4.16 μ L of Nuclease Free water and 5 μ L of RNA (10 ng). RT reaction was carried out with annealing at 16 °C for 30 min followed by extension at 42 °C for 30 min. 1.3 μ L of the RT reaction was then used with 1 μ L specific primers for each microRNAs (Applied Biosystems) in triplicate wells for 45-cycles PCR on a 7900HT thermocycler (Applied Biosystems). The thermal cycling parameters were as follows: 50 °C for 2 min, 95 °C for 10 min, followed by a third step for denaturation at 95 °C for 15 s and annealing/extension at 60 °C for 1 min repeated for 40 cycles. SDS v2.4 software (Applied Biosystems) was used to determine cycle threshold (Ct) values of the fluorescence measured during PCR. All experiments were done in triplicate. Two normalization steps were considered: loading the same quantity of template RNA in each well and normalizing the data against endogenous genes (hsa-miR-16, RNU6). As expression of hsa-miR-16 was evenly distributed across the samples, we decided to use hsa-miR-16 for normalization in this study (data not shown).The ABI TaqMan SDS v 2.4 software was utilized to obtain raw Ct values. Relative quantification of microRNA expression was calculated with the 2(-Delta Delta Ct) method (Applied Biosystems User Bulletin N 2) (P/N 10303859). Lung cancer cell lines H23, H226, H522, H838, H1437, H1650, H1703, H1838, H1944, H1975, H2170 and SV40-Immortalized normal human bronchial epithelium cell line BEAS-2B were obtained from and propagated according to the recommendations of American Type Culture Collection (ATCC). Mediums and antibiotics were purchased from Mediatech (Manassas, VA, USA) and supplemented with fetal bovine serum (10%) (Hyclone, Logan, UT, USA), 100 μ g/ml streptomycin and 100 I.U/ml penicillin (both from Life technologies). Cells were grown at 37 °C in a humidified atmosphere composed of 95% air and 5% CO 2 in a monolayer culture. All cancer cell lines were maintained in RPMI 1640, and BEAD-2B was grown in BEGM (Lonza, Walkersville, MD, USA) medium.
Cell Proliferation assay (MTT assay). Transfected lung cancer cell lines were plated on 96-well plates at a density of 5 × 10 3 to 1 × 10 4 per well. Cellular viability was measured by the MTT proliferation assay kit (ATCC, Manassas, VA, USA) according to the manufacturer's instructions as described in the previous paper from our group 59 . Each assay was performed in triplicate, and each experiment was repeated at least three times. The extent of cellular survival was represented as a percentage of the first measurement day.

Data Analysis and Statistical Consideration.
Allelic calls for tumor DNA and corresponding normal genomic DNA were obtained from normalized SNP array data using the GDAS genotyping software supplied by the array manufacturer (Affymetrix). A Hidden Markov Model was applied to infer the probability of allelic imbalance for each SNP in tumor DNA compared to corresponding normal DNA using the dChip software (Cheng Li Lab). Further details of identified allelic imbalance area are described in our previous study 60 .
We determined an optimal cut off value for each tested microRNA using a training set that consist of 18 tumors with paired normal. ROC curve was generated for each microRNA and the empiric cut off value was selected by maximizing sensitivity and specificity. Based on this cut off value, we divided another set of 114 tumor samples into overexpressed and under expressed groups and the differences of clinical outcomes were compared between overexpressed and under expressed groups. For clinical outcomes, RFS was defined as the time from surgery to the time of first documentation of any disease recurrence. OS was defined as the time from surgery to the time of death of disease. Those who remained alive were censored at the last date the subject was known to be alive. Associations of microRNA expressions with RFS and OS were evaluated using Cox proportional hazards model with hazard ratios and 95% confidence intervals estimated for multivariable analysis. For other statistics, continuous variables were analyzed by Student's t-test and categorical variables were analyzed by Fisher's exact test. All statistical analyses were performed using JMP 9 software (SAS institute, Cary, NC, USA). The level of statistical significance was set at P < 0.05 in two-tailed.