Original Paper | Published:

MALAT-1, a novel noncoding RNA, and thymosin β4 predict metastasis and survival in early-stage non-small cell lung cancer


Early-stage non-small cell lung cancer (NSCLC) can be cured by surgical resection, but a substantial fraction of patients ultimately dies due to distant metastasis. In this study, we used subtractive hybridization to identify gene expression differences in stage I NSCLC tumors that either did or did not metastasize in the course of disease. Individual clones (n=225) were sequenced and quantitative RT–PCR verified overexpression in metastasizing samples. Several of the identified genes (eIF4A1, thymosin β4 and a novel transcript named MALAT-1) were demonstrated to be significantly associated with metastasis in NSCLC patients (n=70). The genes’ association with metastasis was stage- and histology specific. The Kaplan–Meier analyses identified MALAT-1 and thymosin β4 as prognostic parameters for patient survival in stage I NSCLC. The novel MALAT-1 transcript is a noncoding RNA of more than 8000 nt expressed from chromosome 11q13. It is highly expressed in lung, pancreas and other healthy organs as well as in NSCLC. MALAT-1 expressed sequences are conserved across several species indicating its potentially important function. Taken together, these data contribute to the identification of early-stage NSCLC patients that are at high risk to develop metastasis. The identification of MALAT-1 emphasizes the potential role of noncoding RNAs in human cancer.


Lung cancer is one of the most frequent cancers in the world (Parker et al., 1997), and non-small cell lung cancer (NSCLC) represents 75–80% of all lung carcinomas (Shieh et al., 1999). The overall 5-year survival rate of these patients is only 15% (Nagamachi et al., 1998). In early-stage NSCLC, the prognosis after complete tumor resection is much better. However, a significant fraction of early-stage NSCLC patients develops distant metastases, which are incurable up to now. There are currently no reliable markers available that allow to accurately predict metastasis development in early-stage NSCLC patients.

Metastatic spread of tumor cells is the main cause of cancer death (Fidler, 1990). The genetic events leading to acquisition of metastatic potential are at least partially distinct from the primary mechanisms that induce malignant transformation. The process of cancer metastasis consists of linked sequential steps, including invasion, detachment, intravasation, circulation, adhesion, extravasation and growth in distant organs (Fidler, 1990; Lu et al., 2001). Necessary functional attributes of tumor cells include production of angiogenic factors that lead to generation of fragile capillary structures, secretion of metalloproteinases and other compounds capable of degrading extracellular matrices (Ura et al., 1989; Horak et al., 1992). Two main categories determine the chances of a tumor cell to become metastatic: first, intracellular signaling pathways that regulate proliferation, apoptosis and the synthesis of cytokines, and second, changes in cell surface proteins and secreted proteins involved in cell adhesion, cellular migration and proteolysis (Hesketh, 1997). The expression of a considerable number of genes in a variety of experimental systems has been shown to affect metastatic capacity and several model systems have been established to detect and study mechanisms altering the ability of tumor cells to metastasize (Hibi et al., 1998; Clark et al., 2000; Ridley, 2000; Shimizu et al., 2000).

However, despite progress in the knowledge about metastatic pathways, the estimation of the likelihood of individual tumors to metastasize is still an unmastered task. Strategies to improve outcome of patients with early-stage NSCLC by multimodal treatment including adjuvant chemotherapy after surgery have failed to show a major benefit for patient outcome (Xu et al., 2000; Ukena, 2001). One likely reason for the lack of benefit is that it is currently impossible to identify the subgroup of patients with early-stage NSCLC that is not cured by surgery alone. The identification of high-risk patients is of utmost importance not only to predict individual patients’ prognosis, but also to select patients that could benefit from additional therapy. Many strategies to identify these patients suffer from shortcomings. Analysis of gene expression differences could help, but several investigators used model animal systems that might bear limited value for human patients. In addition, whereas differences in gene expression have been analysed between primary tumors and their metastases, data on the differences in gene expression between primary tumors with distinct metastatic potential are missing. In our study, we used a subtractive hybridization approach to determine differences in gene expression between NSCLC tumors that were cured by surgery and tumors that subsequently metastasized. The reliability of our data was verified by quantitative real-time RT–PCR in the subtracted cDNAs. In addition, we demonstrate that mRNA expression of the identified genes was associated with metastasis in a larger cohort of NSCLC patients. Finally, we provide strong evidence that expression levels of some of the identified genes predict patients’ survival in early-stage NSCLC.


We utilized a subtractive hybridization method to identify differences in gene expression between primary non-small cell lung tumors that either did or did not subsequently metastasize. For subtraction analysis, RNA samples from four primary tumors derived from patients with subsequent metastasis were pooled and compared to a pool of RNA from five primary tumors without subsequent metastasis. RNA samples were chosen from patients with stage I disease for whom complete tumor resection was confirmed by pathological analysis of all specimens. No evidence of metastasis was present at the time of diagnosis. To exclude differences in gene expression based on sex and histology, all samples were chosen from male patients and were histologically confirmed to be adenocarcinoma. The median follow-up time was longer than 5 years to ensure that development of metastasis would be highly unlikely in the metastasis-free patients. The experimental outline is shown in Figure 1.

Figure 1

Experimental overview and example of subtraction procedure. RNA samples from tumors that subsequently metastasized (M) or did not metastasize (N) were pooled and subjected to subtractive hybridization. The resulting library was cloned and sequenced. The reliability of the subtractive hybridization was confirmed by real-time quantitative RT–PCR in the pooled samples and in the subtracted library. Then, the identified genes were analysed in 70 NSCLC patients

Overview of differentially expressed genes

Following subtraction, the resulting cDNA library was cloned into a standard cloning plasmid vector and individual clones were sequenced. In total, 225 independent sequences were obtained. A total of 26 transcripts were found more than once (Table 1). Interestingly, more than 20% of transcripts that were enriched in metastasis-associated tumors were associated with immunological functions. Several interferon-γ-induced genes were found. Also, about 15% of the identified transcripts were involved in protein biosynthesis and an equal percentage coded for proteins previously identified to be associated with cancer and/or metastasis. However, a significant fraction of transcripts could not be attributed to known genes. The most frequently identified transcript was a so far uncharacterized mRNA derived from chromosome 11q13. We have named this RNA MALAT-1 (Metastasis Associated in Lung Adenocarcinoma Transcript) for reasons indicated below.

Table 1 Metastasis-associated transcripts in NSCLC identified by subtractive hybridization

Confirmation of differential expression by quantitative real-time RT–PCR in a cohort of 70 NSCLC patients

To confirm the reliability of the subtractive hybridization, individual transcripts were analysed in the pooled samples as well as in the subtracted libraries (metastasis minus nonmetastasis and nonmetastasis minus metastasis). For MALAT-1, these analyses demonstrated about threefold higher expression in the pooled sample from metastasizing tumors compared to the nonmetastasizing tumors. After subtraction, strong enrichment of the identified genes were found in the metastasis minus nonmetastasis library (Figure 1). Next, we analysed gene expression by real-time RT–PCR in the original nine samples used for subtraction. In total, 11 independent transcripts from our screening procedure were analysed. Higher expression levels in the metastasis group compared to the nonmetastasis group were confirmed in 10 of 11 transcripts analysed by real-time RT-PCR (Figure 2a). These analyses demonstrated the reliability and the validity of the procedure. One of the transcripts, thymosin β4, has recently been described as a regulator of motility and metastasis in murine fibrosarcoma cells (Kobayashi et al., 2002) and its expression was associated with metastasis in a melanoma mouse model (Clark et al., 2000). Therefore, we also analysed the two other main genes (RhoC and fibronectin) identified by Clark et al. Interestingly, neither RhoC nor fibronectin expression differed in expression levels between patients with or without subsequent metastasis (Figure 2a).

Figure 2

Association of gene expression with metastasis. (a) Specific PCR primers were designed for 11 genes identified in our screening procedure and for two genes (RhoC and fibronectin) that were recently described to be associated with metastasis. Quantitative RT–PCR was carried out on the samples used for subtraction (n=9). Average gene expression in the non-metastasis samples was set as 1 to allow easier comparisons between different genes. Expression of almost all genes (10 of 11) identified in our screen was higher in the samples that subsequently metastasized than in the other ones. For RhoC and fibronectin, expression was lower in metastatic compared to nonmetastatic tumors. (b) Gene expression was analysed in a second cohort of patients (n=70) with early-stage NSCLC. Statistical analyses were carried out using the Mann–Whitney U-test. Analysis of stage I adenocarcinoma or squamous cell carcinoma patients revealed significantly higher expression of MALAT-1, eIF4A1 and thymosin β4 in metastasizing tumors. (c) This figure indicates the histology dependence of the genes’ association with metastasis. For each histological subtype and gene, we divided the average expression values of all tumors that metastasized by the average expression of all tumors that did not metastasize. This analysis was carried out for the entire cohort of stage I, II and IIIA NSCLC patients (n=70)

Expression level of five of the identified genes (NPCRP, thymosin β4, MALAT-1, eIF4A1 and MDM2) plus RhoC and fibronectin were analysed in 70 patients with stage I, II and IIIA non-small cell lung cancer. This patient group has been described previously (Müller-Tidow et al., 2001).

Relevance of thymosin β4, eIF4A1 and MALAT-1 for metastasis prediction

First, we focused on the expression of these genes in metastatic tumors compared to nonmetastatic tumors. Thymosin β4, eIF4A1 and MALAT-1 were expressed at higher levels in tumors that subsequently metastasized than in others (n=70) (Figure 2b). However, expression of only one gene, eIF4A1, differed statistically significantly for the entire patient population (P=0.04). Three genes, NPCRP, RhoC and fibronectin, were higher expressed in tumors that did not metastasize (data not shown). The tumors of two adenocarcinoma patients showed NPCRP expression levels that were up to 100-fold higher than the levels detected in other patients. Since RNA of one of these tumors was included into the subtraction procedure, the association of NPCRP with metastasis was very strong in our pooled cDNA, but NPCRP was not generally associated with metastasis in the entire patient population.

We next analysed whether the association of these genes with metastasis was specific for certain histological subtypes or stages of disease. When only stage I patients suffering from adenocarcinoma or squamous cell carcinoma (n=31) were included into the analyses, expression levels of MALAT-1 (P=0.03), eIF4A1 (P=0.04) and thymosin β4 (P=0.05) were significantly higher in metastasizing tumors compared to nonmetastasizing tumors (Figure 2b). Further analyses indicated that the association of these genes with metastasis was specific for histological subtypes (Figure 2c). In all, 26 tumors were histologically classified as adenocarcinomas. For three out of five genes (MALAT-1, eIF4A1, NPCRP) that we identified by our screening method, expression in metastatic adenocarcinoma was several fold higher than in nonmetastatic adenocarcinoma. Interestingly, no significant differences in gene expression were found for squamous cell carcinomas (n=34). For large cell carcinomas (n=10), mixed results were obtained: whereas some of the genes were clearly enriched in metastasizing tumors (MALAT-1, eIF4A1, MDM2), others were clearly not (NPCRP). These data provided evidence that the association of the identified genes with metastasis depended on the tumor's histology.

Prognostic relevance of thymosine β4 and MALAT-1

Metastasis is the main cause of death from early-stage NSCLC (Keller et al., 2000). Therefore, metastasis-associated genes potentially predict patients’ survival. To analyse the association between the identified metastasis-associated transcripts and survival, samples were divided into two groups and classified for each gene as high (above the median expression level, n=70) or low (below the median expression level, n=70) expressers. The Kaplan–Meier plots and Log-rank tests for stage I patients suffering from adenocarcinoma or squamous cell carcinoma indicated that high expression levels of thymosin β4 (P=0.01) and MALAT-1 (P=0.04) were associated with significantly worse survival of patients with stage I disease (Figure 3). None of the low expressors of thymosine β4 died from stage I NSCLC. High expression of MALAT-1 was highly predictive for a poor prognosis in early disease. Indeed, only 2/22 (9%) of patients with low MALAT-1 levels died during a 5-year follow-up period. On the other hand, more than 40% (12/28) of patients with high levels of MALAT-1 ultimately died.

Figure 3

Identification of genes that predict prognosis in stage I NSCLC. Patients were grouped for each gene in high vs low expressing tumors based on its gene expression in comparison to the median expression of all tumors. The Kaplan–Meier survival plots are shown for patients with adenocarcinoma or squamous cell carcinoma and stage I disease (n=31). The log-rank test was used to calculate statistical significance

MALAT-1 – a noncoding RNA

In the subtractive hybridization, the most frequently identified transcript was a so far uncharacterized RNA. After uncovering its significant prognostic value for metastasis and survival in early-stage NSCLC, we further analysed this new transcript. To obtain longer fragments of the sequence, we used 5′ and 3′ rapid amplification of cDNA (RACE) and cloned a fragment of 940 nucleotides (nt). Database searches revealed that this sequence has been described as part of different ESTs. Two separate ESTs were mapped to chromosome 11q13 in a radiation hybrid map (James et al., 1994) and later identified to belong to a single transcript of 8.5 kb (Guru et al., 1997), preliminarily named α gene (GenBank Acc AF203815). This transcript was isolated again in searches for translocation breakpoints at 11q13 (van Asseldonk et al., 2000). In detailed database searches, we found hundreds of human expressed sequence tags with significant homology to the α gene sequence (significant=Blast E-value<1e−30). This analysis also uncovered two splice sites within the published sequence leading to two splice variants of 8110 or 8352 nt (Figure 4a). We renamed the transcript according to our results as MALAT-1 (metastasis associated in lung adenocarcinoma transcript 1). We provide a nonredundant map of human ESTs covering the whole sequence of MALAT-1 (Figure 4a) and a list of the ESTs (Supplementary Table 3) Supplementary Table 1. The sequence of MALAT-1 have been deposited into GenBank (long isoform: Acc# BK001418, short isoform: Acc# BK001411).

Figure 4

MALAT-1 is a novel noncoding RNA. (a) The genomic and transcribed sequences of MALAT-1 are depicted. Human ESTs cover the full length of the MALAT-1 sequence. In addition, homologous ESTs can be found in several other species (chimpanzee (P.t.), rhesus monkey (Ma.m.), mouse (Mu.m.)). (b) Northern blot analysis of total RNA from several NSCLC cell lines revealed a single band of the predicted size when probed with antisense sequences. The control blot probed with sense RNA did not show any band. (c) In vitro translation in the presence of [35S]methionine of a MALAT-1 clone did not give rise to any peptide or protein. Two different cDNAs were used as positive controls. (d) Real-time quantitative RT–PCR was used to analyse MALAT-1 expression in a cDNA panel derived from normal human organs. Gene expression levels were standardized to GAPDH expression

Supplementary Table 3 Primer List for Subtractive Hybridization and RT-PCR
Supplementary Table 2 Clones identified in subtractive hybridization assay

We also screened EST databases of other organisms for expression of MALAT-1 sequences. ESTs of Pan troglodytes (chimpanzee), Macaca mulatta (rhesus monkey) and Mus musculus (mouse) were found and mapped to the human MALAT-1 sequence at several sites (Figure 4a and Supplementary Table 3). Homologs were also found for Bos taurus, Sus scrofa and Rattus norvegicus (data not shown). Our own finding of MALAT-1 in a subtractive hybridization and its abundance in EST databases of various species convincingly demonstrated expression of this transcript in vivo. Intensive homology searches did not reveal any significant similarity to any other known mRNA. Interestingly, a sequence stretch of 7924 nt was 69.3% identical between the human and mouse genome, indicating conservation of the MALAT-1 gene. Stretches of up to 1 kb showed more than 80% identity and a sequence of 300 bp is 90% identical between humans and mice. This degree of conservation hints at a potentially important function.

To demonstrate expression of MALAT-1 in vivo, we analysed mRNA expression in lung carcinoma cell lines. In Northern blot analysis using the sense sequence of MALAT-1 as probe, no RNA band was detected. Using the antisense sequence of MALAT-1 for probing, we found a single band at a size in accordance to the full-length size of the MALAT-1 transcript (Figure 4b). Thereby, we could verify the existence of the full-length transcript of MALAT-1 in lung cancer cell lines. Further expression analyses were carried out by real-time quantitative RT–PCR (see Figure 5). Neither the ESTs nor the full-length sequence of MALAT-1 contained open reading frames of significant length. The longest open reading frame within the whole sequence of more than 8000 bp would encode peptides of only 52 or 55 amino acids (aa). This is shorter than expected by chance in a nucleotide sequence of this length. In addition, none of the putative translational start sites contains consensus sequences to a Kozak sequence. In vitro transcription and translation of the MALAT-1 sequence isolated by RACE (containing the first 52 aa ORF) did not lead to the synthesis of any peptide or protein, whereas positive controls gave rise to protein synthesis (Figure 4c). Taken together, these data demonstrated that MALAT-1 is a novel noncoding RNA.

Figure 5

MALAT-1 in non-small cell lung cancer. (a) For in depth analysis of expression of the full-length MALAT-1 RNA, we designed three different primer pairs for MALAT-1 for real-time quantitative RT–PCR. The three amplicons covered the 5′ terminal part, the middle part and the 3′ terminal part of MALAT-1 and were named accordingly. The 5′ terminal primers were located within the region identified by subtractive hybridization and were used in the previous experiments. (b) Expression of MALAT-1 was detectable in NSCLC cell lines, and expression levels of all three amplicons correlated significantly (P=0.01–0.005) with each other in five NSCLC cell lines (A549, HTB-53, HTB-56, HTB-57, HTB-58). (c) Comparative genomic hybridization of 23 patients suffering from early-stage NSCLC (Adeno, SCC) revealed a loss of 11q in one and a gain of 11q in five patients. Gain of 11q was associated with higher expression of MALAT-1, whereas loss of 11q led to decreased MALAT-1 expression. (d) MALAT-1 expression was analysed by real-time quantitative RT–PCR in stage I NSCLC patients (only Adeno and squamous cell carcinoma, n=30). Boxplots show the range of MALAT-1 expression levels in samples from nonmetastasizing (n=22) and metastasizing (n=8) patients. Analyses of all three amplicons led to comparable results with higher MALAT-1 expression in the metastasizing samples (P=0.002). (e) Comparison of the mean expression levels from the analysis described in Figure (d) clearly showed higher MALAT-1 expression in the metastasizing group than in the nonmetastasizing group

MALAT-1 is widely expressed in normal tissue

To identify the expression pattern of MALAT-1, we analysed MALAT-1 expression by real-time quantitative RT–PCR using TaqMan technology in a panel of 23 pooled cDNAs from various healthy human organs. Highest levels of MALAT-1 expression were detected in pancreas and lung. Intermediate expression levels were found in prostate, ovary, colon, placenta, spleen, small intestine, kidney, heart, liver, testis and brain. MALAT-1 was absent in skin, stomach, bone marrow and uterus (Figure 4d).

Expression of MALAT-1 in NSCLC

To further characterize the expression pattern of the novel transcript MALAT-1, we designed primers for three amplicons dispersed near the 5′ end, in the middle and near the 3′ end of the MALAT-1 sequence for real-time quantitative RT–PCR (Figure 5a).

MALAT-1 expression was analysed by all three amplicons in five human lung cancer cell lines (A549, HTB-53, HTB-56, HTB-57, HTB-58). Expression levels detected by the different amplicons correlated significantly with each other (P=0.01, P=0.01, P=0.005, respectively) (Figure 5b). The close correlation of expression levels indicated that MALAT-1 consisted of the predicted full-length sequence of about 8 kb (Figure 4a).

To analyse whether the mRNA expression differences of MALAT-1 in NSCLC patient samples were associated with chromosomal changes at the 11q region, we performed comparative genomic hybridization (CGH) studies for 23 patients (adenocarcinoma or squamous cell carcinoma) with stage I or II disease. One patient showed a loss of the 11q region, whereas five patients had a gain of 11q and 17 did not show gross 11q changes. As expected, loss of 11q was associated with loss of MALAT-1 expression. A gain at 11q resulted in a broad range of MALAT-1 expression levels with low expression in some and high expression in other patients (Figure 5c).

In a subset of patients suffering from stage I NSCLC (adenocarcinoma or squamous cell carcinoma; n=30), MALAT-1 expression levels were analysed using three different amplicons within the MALAT-1 sequence (Figure 5d, e). The expression levels strongly correlated between the three amplicons used for detection (P<0.001, data not shown). Expression levels of MALAT-1 in metastasizing and nonmetastasizing patients are depicted in boxplots (Figure 5d). Overall analyses of all three amplicons led to similar results with evidence for higher MALAT-1 expression in the metastasizing samples (P=0.002). Comparison of mean expression levels revealed ninefold induction of MALAT-1 expression in metastasizing patient samples on average compared to nonmetastasizing patients (Figure 5e).


In this study, we identified genes that are associated with NSCLC metastasis. To our knowledge, this is the first study in NSCLC that directly compared gene expression differences between primary tumors cured by surgery and those that subsequently metastasized. Our early-stage NSCLC samples provided an excellent starting point to study the association of gene expression with the development of distant metastasis. All patients with stage I and II disease were treated solely by surgery, whereas stage IIIA patients received additional radiation therapy. All tumors could be correctly classified as metastasis-inducing or noninducing tumors because of the very long follow-up time (>5 years) in our study. Only occult metastasis present at the time of surgery could subsequently lead to tumor recurrence in other organs of the body. In our study, only two of 50 patients in stage I and II had local tumor recurrence verifying that distant metastasis is the main reason for tumor-related death in early-stage NSCLC.

The direct comparison of subsequently metastasizing primary tumors with nonmetastasizing primary tumors offers important advantages. First, changes in gene expression due to the different environments of the tumor and metastasis can be avoided (Brem et al., 2001). Second, for clinical patient management, the knowledge of differences between metastasizing and nonmetastasizing primary tumors can lead to better prognosis prediction and ultimately to risk adapted therapeutic strategies. Cluster analyses of microarray hybridization experiments showed that the gene expression profile of a primary tumor correlates more closely with the gene expression profile of its metastasis than with gene expression in other primary tumors (Clark et al., 2000). Thus, the metastatic potential of a primary tumor is obviously related to the intrinsic gene expression profile. Our data support the view that the comparison between different primary tumors can help to identify metastasis-associated genes.

Several genes identified in our analyses are already known to be associated with cancer and metastasis, thus indicating the reliability of our results. For 10 out of 11 transcripts, higher expression levels on average were verified in metastasizing tumors vs nonmetastasizing tumors (Figure 2a). Thus, most of the genes we identified are likely to be truly overexpressed in metastasizing tumors in vivo. However, several transcripts were not found in high quantities in each metastasizing sample. Therefore, it was necessary to verify the significance of our findings in a larger group of NSCLC tumors with documented clinical follow-up. Interestingly, the association of high expression with metastasis was significantly evident for three (MALAT-1, thymosin β4, eIF4A1) genes in early-stage NSCLC patients (Figure 2b). The results for the group of 31 stage I patients remained unchanged when the nine samples that were initially used for the screening procedure were excluded from the analyses. Further analyses in the entire patient group (n=70) indicated that differences in gene expression between metastatic and nonmetastatic tumors were most prominent for adenocarcinoma (Figure 2c). These findings indicated that metastasis in NSCLC is a cell-type-specific phenomenon. Keeping this in mind, it is not surprising that the strongest differences for the identified genes were found where the initial selection process took place: in early-stage adenocarcinoma.

Recently, one group (Clark et al., 2000) published microarray data derived from a melanoma metastasis model. One of the three main genes identified by these authors, thymosin β4, was also isolated by our strategy. Whereas thymosin β4 proved to be associated with metastasis and prognosis in our patient group, no significant differences were found for either RhoC or fibronectin, the two other main genes identified by these authors. Therefore, mechanisms of metastasis seem to differ between tumors derived from different tissues. This point of view is also supported by our finding that metastasis prediction for most of the genes identified by us was specific for adenocarcinoma. Thymosin β4 has also been found to be downregulated compared to normal lung tissue in squamous cell carcinoma but not in adenocarcinoma (McDoniels-Silvers et al., 2002). Also, murine thymosin β4 has been found to modulate actin polymerization that fits well with a potential role in metastasis (Li et al., 1996). Association with melanoma metastasis has also been described for Sec61β and eIF4G (Clark et al., 2000), corroborating the quality of our data.

Association with metastasis was also previously unknown for the eIF4A1 protein that takes part in the eIF4 complex to recruit mRNA to ribosomes. eIF4A1 localizes a RNA helicase to the 5′ region of mRNA (Gingras et al., 1999). eIF4A1 has not been linked to malignancy and metastasis before, but eIF4E (Zimmer et al., 2000) and eIF4G2 (Kikuchi et al., 2003) have been described to be associated with cancer. eIF4A1 was associated with metastasis in our patient group, but differences in survival between patients expressing low and high levels of eIF4A1 were statistically not significant. More detailed analyses indicated that patients with very high eIF4A1 (top 20%) had a very poor prognosis, whereas eIF4A1 expression slightly above the median did not affect patient survival (data not shown).

NPCRP/PLUNC was initially identified as a gene expressed in airway epithelia and in adult lung (Weston et al., 1999). We found this gene to be expressed at very high levels in two metastasizing tumors from patients with lung adenocarcinoma. The other metastasizing tumors showed low expression levels. The reason for the high expression of NPCRP/PLUNC in the two tumors is unknown. This finding might reflect a general phenomenon: the identification of genes differentiating metastasizing from nonmetastasizing tumors might represent differences between as yet unidentified histological subtypes that possess differing propensities to metastasize. Several hints already indicate that different subtypes of adenocarcinoma exist and differ in their gene expression pattern (Bhattacharjee et al., 2001). Nevertheless, the identification of differentially expressed genes will help to enhance prognosis regardless whether based on yet uncovered histological subtypes or due to a direct involvement of the identified gene in metastasis.

Several so far unknown genes were identified in our study. One of these, a novel transcript on chromosome 11q13, was the transcript found most frequently. The region 11q13 has long been known to be relevant for tumorigenesis and metastasis (Rasio et al., 1995; Bekri et al., 1997; Chakrabarti et al., 1998). To determine sequences upstream and downstream of the identified clones, we used 5′ and 3′ RACE techniques and obtained an RNA sequence of 940 nt. This transcript was named MALAT-1. The full-length noncoding RNA contains more than 8000 bp as verified by Northern blotting. We have found MALAT-1 in normal tissues, NSCLC cell lines and in NSCLC patient samples. To our knowledge, this is the first time that a noncoding RNA has been assigned to a distinct pathological event. We provide evidence that the noncoding transcript MALAT-1 is associated with metastasis in lung adenocarcinoma. Recently, a number of at least 4280 putative noncoding RNAs have been identified in the murine transcriptome (FANTOM and RIKEN, 2002). Functional noncoding RNAs might therefore play a much more crucial role in physiological and pathological processes than currently anticipated.

What is the relevance of these findings? First, the identified genes can now be further analysed for their direct involvement in metastatic processes. Second, it is currently still controversial whether adjuvant chemotherapy is beneficial for NSCLC patients with stage I and II disease. However, in stage IIIA and IIIB NSCLC patients, neo-adjuvant and adjuvant chemo- and radiotherapy is clearly effective as recently shown in multicenter therapy trials (Sorenson et al., 2001; Souquet and Geriniere, 2001). Since adjuvant therapy is effective in stage III disease, it is very likely that a subgroup of patients with stage I and II NSCLC would benefit from adjuvant treatment. The definition of a suitable subgroup of stage I and II patients that might benefit from adjuvant therapy is crucial for the design of appropriate new trials. As shown in our current study, expression analysis of metastasis-associated genes can identify patients with high and low risk of subsequent metastasis. The risk of subsequent metastasis for high expressing tumors of MALAT-1 increased almost fivefold compared to low expressing tumors. While patients at low risk for distant metastasis could be spared, adjuvant treatment could be offered for patients at high risk for development of distant metastasis.

In conclusion, we provide evidence that metastasis-associated gene expression is highly specific for entity, histology and stage of the disease. Several genes that were closely associated with subsequent metastasis were identified in our study. Finally, expression levels of two of these genes, MALAT-1, a novel transcript on 11q13, and thymosin β4 were demonstrated to be independent prognostic parameters for survival in early-stage NSCLC.

Materials and methods

Tumor specimens and survival data

Primary tumor specimens were obtained at the time of initial surgery for NSCLC at a University hospital in Germany. Samples were immediately shock frozen and stored at −70°C. RNA was isolated using TRIzol reagent (Invitrogen, Carlsbad, CA, USA). A total of 1 μg of RNA from each sample was reverse-transcribed using an oligo-d(T) primer and MMLV reverse transcriptase according to the protocol of the manufacturer (Clontech, Palo Alto, CA, USA). The clinical characteristics of the NSCLC patients have been described (Müller-Tidow et al., 2001).

cDNA subtractive hybridization

RNA samples for cDNA subtraction were chosen based on the following criteria to exclude differences in gene expression independent of the metastatic potential: male patient, follow-up time longer than 5 years, stage I disease, histologically confirmed adenocarcinoma, tumors completely resected by surgery (R0 resection). RNA from four tumors that subsequently metastasized was pooled as well as RNA from five tumors that subsequently did not metastasize. In all, 1 μg from each sample was used for cDNA synthesis. Owing to the limited amounts of available RNA, cDNA was synthesized and amplified according to the SMART™ PCR cDNA synthesis protocol (Clontech), which led to reliable amplification (Chenchik et al., 1998). All primer sequences are given in Supplementary Table 2. Double-stranded cDNA fragments were digested with RsaI and purified by ethanol precipitation. The diluted RsaI-digested cDNA fragments from metastasizing tumor RNA were divided into two portions, and each was ligated to a cDNA adaptor as tester cDNA at 16°C overnight. An excess of driver cDNA from nonmetastasizing tumors was added to each tester cDNA, and the samples were heat denatured and allowed to anneal during the first hybridization at 68°C for 8 h. The two samples from the first hybridization were mixed together, denatured driver DNA was added to further enrich differentially expressed sequences, and the mixture was incubated at 68°C overnight to complete the second hybridization. Differentially expressed cDNA fragments were diluted and amplified in a two-step PCR. The PCR conditions were: 30 s at 94°C, 30 s at 66°C, 90 s at 72°C, 27 cycles. The primary PCR was reamplified for 10 PCR cycles with 2°C higher annealing temperature.

Supplementary Table 4 MALAT-1 ESTs from different species

cDNA cloning and sequencing

The nested PCR products were cloned into a standard vector (TOPO-TA Cloning, Invitrogen). Individual clones were sequenced by standard procedures using dye terminator chemistry on a 3700 Sequencer (Applied Biosystems). Sequences were compared to sequences in Genbank using the Blast program at the NCBI.

Analyses of gene expression by real-time quantitative RT–PCR

Real-time quantitative RT–PCR was carried out to determine gene expression in the samples from NSCLC patients as described (Gibson et al., 1996). In brief, cDNA was prepared as described above and amplified using real-time RT–PCR in the ABI Prism 7700 sequence detector (PE Biosystems, Foster City, CA, USA). The relative amounts of gene expression were calculated by using the expression of GAPDH as an internal standard. At least two independent analyses were performed for each sample and each gene without knowledge of patient data.

Statistical analyses

Statistical data analyses were performed using SPSS 10.0. To compare differences in gene expression between metastasizing and nonmetastasizing tumors, the nonparametric Mann Whitney U-test was used. For the Kaplan–Meier plots, tumors were categorized as high or low expression depending on the sample's expression compared to the median of all patients (n=70). The Kaplan–Meier plots were tested for statistical significance using the log-rank test. Correlation analyses were carried out using the Pearson coefficient analysis. All tests were employed two-sided and an α level of 5% (P=0.05) was considered significant.

Northern blot

Total RNA was isolated as described above and 10 μg of that was subjected to electrophoresis and Northern blotting. MALAT-1 was hybridized with a digoxygenin-labeled RNA probe of 800 nt cloned from the RACE reaction for MALAT-1 (DIG-RNA labeling kit, Enzo Life Sciences, Farmingdale, NY, USA).

In vitro transcription and translation

In vitro transcription/translation was carried out using the TnT T7 Quick Coupled Transcription/Translation System (Promega), according to the manufacturer's recommendations. Proteins were labeled with [35S]methionine, and subjected to SDS–PAGE and autoradiography. No DNA or different plasmids served as control.

Accession codes




  1. Bekri S, Adelaide J, Merscher S, Grosgeorge J, Caroli-Bosc F, Perucca-Lostanlen D, Kelley PM, Pebusque MJ, Theillet C, Birnbaum and Gaudray P . (1997). Cytogenet. Cell Genet., 79, 125–131.

  2. Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda M, Weber G, Mark EJ, Lander ES, Wong W, Johnson BE, Golub TR, Sugarbaker DJ and Meyerson M . (2001). Proc. Natl. Acad. Sci. USA, 98, 13790–13795.

  3. Brem R, Hildebrandt T, Jarsch M, Van Muijen GN and Weidle UH . (2001). Anticancer Res., 21, 1731–1740.

  4. Chakrabarti R, Srivatsan ES, Wood TF, Eubanks PJ, Ebrahimi SA, Gatti RA, Passaro Jr E and Sawicki MP . (1998). Genes Chromosomes Cancer, 22, 130–137.

  5. Chenchik A, Zhu YY, Diatchenko L, Li R, Hill J and Siebert PD . (1998). Gene Cloning and Analysis by RT–PCR. BioTechniques Books: Natick, MA, pp. 305–319.

  6. Clark EA, Golub TR, Lander ES and Hynes RO . (2000). Nature, 406, 532–535.

  7. FANTOM and RIKEN . (2002). Nature, 420, 563–573.

  8. Fidler IJ . (1990). Cancer Res., 50, 6130–6138.

  9. Gibson UE, Heid CA and Williams PM . (1996). Genome Res., 6, 995–1001.

  10. Gingras AC, Raught B and Sonenberg N . (1999). Annu. Rev. Biochem., 68, 913–963.

  11. Guru SC, Agarwal SK, Manickam P, Olufemi SE, Crabtree JS, Weisemann JM, Kester MB, Kim YS, Wang Y, Emmert-Buck MR, Liotta LA, Spiegel AM, Boguski MS, Roe BA, Collins FS, Marx SJ, Burns L and Chandrasekharappa SC . (1997). Genome Res., 7, 725–735.

  12. Hesketh R . (1997). The Oncogene and Tumour Suppressor Gene FactsBook, 2nd edn. Academic Press, Harcourt Brace & Company: New York.

  13. Hibi K, Liu Q, Beaudry GA, Madden SL, Westra WH, Wehage SL, Yang SC, Heitmiller RF, Bertelsen AH, Sidransky D and Jen J . (1998). Cancer Res., 58, 5690–5694.

  14. Horak ER, Leek R, Klenk N, LeJeune S, Smith K, Stuart N, Greenall M, Stepniewska K and Harris AL . (1992). Lancet, 340, 1120–1124.

  15. James MR, Richard III CW, Schott JJ, Yousry C, Clark K, Bell J, Terwilliger JD, Hazan J, Dubay C and Vignal A ., et al, (1994). Nat. Genet., 8, 70–76.

  16. Keller SM, Adak S, Wagner H, Herskovic A, Komaki R, Brooks BJ, Perry MC, Livingston RB and Johnson DH . (2000). N. Engl. J. Med., 343, 1217–1222.

  17. Kikuchi T, Daigo Y, Katagiri T, Tsunoda T, Okada K, Kakiuchi S, Zembutsu H, Furukawa Y, Kawamura M, Kobayashi K, Imai K and Nakamura Y . (2003). Oncogene, 22, 2192–2205.

  18. Kobayashi T, Okada F, Fujii N, Tomita N, Ito S, Tazawa H, Aoyama T, Choi SK, Shibata T, Fujita H and Hosokawa M . (2002). Am. J. Pathol., 160, 869–882.

  19. Li X, Zimmerman A, Copeland NG, Gilbert DJ, Jenkins NA and Yin HL . (1996). Genomics, 32, 388–394.

  20. Lu Z, Jiang G, Blume-Jensen P and Hunter T . (2001). Mol. Cell. Biol., 21, 4016–4031.

  21. McDoniels-Silvers AL, Nimri CF, Stoner GD, Lubet RA and You M . (2002). Clin. Cancer Res., 8, 1127–1138.

  22. Müller-Tidow C, Metzger R, Kügler K, Diederichs S, Idos G, Thomas M, Dockhorn-Dworniczak B, Schneider PM, Koeffler HP, Berdel WE and Serve H . (2001). Cancer Res., 61, 647–653.

  23. Nagamachi Y, Tani M, Shimizu K, Tsuda H, Niitsu Y and Yokota J . (1998). Cancer Lett., 127, 203–209.

  24. Parker SL, Tong T, Bolden S and Wingo PA . (1997). CA Cancer J. Clin., 47, 5–27.

  25. Rasio D, Negrini M, Manenti G, Dragani TA and Croce CM . (1995). Cancer Res., 55, 3988–3991.

  26. Ridley A . (2000). Nature, 406, 466–467.

  27. Shieh DB, Godleski J, Herndon II JE, Azuma T, Mercer H, Sugarbaker DJ and Kwiatkowski DJ . (1999). Cancer, 85, 47–57.

  28. Shimizu K, Nagamachi Y, Tani M, Kimura K, Shiroishi T, Wakana S and Yokota J . (2000). Genomics, 65, 113–120.

  29. Sorenson S, Glimelius B and Nygren P . (2001). Acta Oncol., 40, 327–339.

  30. Souquet PJ and Geriniere L . (2001). Lung Cancer, 34 (Suppl 2), S155–S158.

  31. Ukena D . (2001). Lung Cancer, 33 (Suppl 1), S25–S28.

  32. Ura H, Bonfil RD, Reich R, Reddel R, Pfeifer A, Harris CC and Klein-Szanto AJ . (1989). Cancer Res., 49, 4615–4621.

  33. van Asseldonk M, Schepens M, de Bruijn D, Janssen B, Merkx G and Geurts van Kessel A . (2000). Genomics, 66, 35–42.

  34. Weston WM, LeClair EE, Trzyna W, McHugh KM, Nugent P, Lafferty CM, Ma L, Tuan RS and Greene RM . (1999). J. Biol. Chem., 274, 13698–13703.

  35. Xu G, Rong T and Lin P . (2000). Chin. Med. J. (Engl), 113, 617–620.

  36. Zimmer SG, DeBenedetti A and Graff JR . (2000). Anticancer Res., 20, 1343–1351.

Download references


We thank Sarah Pierschalski for excellent technical assistance. The sequences of MALAT-1 have been deposited into GenBank (long isoform: Acc# BK001418, short isoform: Acc♯ BK001411).

Author information

Correspondence to Hubert Serve or Carsten Müller-Tidow.

Additional information

This work is supported by a grant from the Wilhelm Sander-Stiftung (2001.086.1)

Rights and permissions

Reprints and Permissions

About this article


  • metastasis
  • non-small cell lung cancer
  • subtractive hybridization
  • prognostic parameter
  • thymosin β4
  • MALAT-1

Further reading