β2microglobulin mRNA expression levels are prognostic for lymph node metastasis in colorectal cancer patients

Colorectal cancer (CRC) is the fourth most common non-cutaneous malignancy in the United States and the second most frequent cause of cancer-related death. One of the most important determinants of CRC survival is lymph node metastasis. To determine whether molecular markers might be prognostic for lymph node metastases, we measured by quantitative real-time RT–PCR the expression levels of 15 cancer-associated genes in formalin-fixed paraffin-embedded primary tissues derived from stage I–IV CRC patients with (n=20) and without (n=18) nodal metastases. Using the mean of the 15 genes as an internal reference control, we observed that low expression of β2microglobulin (B2M) was a strong prognostic indicator of lymph node metastases (area under the curve (AUC)=0.85; 95% confidence interval (CI)=0.69–0.94). We also observed that the expression ratio of B2M/Spint2 had the highest prognostic accuracy (AUC=0.87; 95% CI=0.71–0.96) of all potential two-gene combinations. Expression values of Spint2 correlated with the mean of the entire gene set at an R2 value of 0.97, providing evidence that Spint2 serves not as an independent prognostic gene, but rather as a reliable reference control gene. These studies are the first to demonstrate a prognostic role of B2M at the mRNA level and suggest that low B2M expression levels might be useful for identifying patients with lymph node metastasis and/or poor survival.

Colorectal cancer (CRC) is the fourth most common noncutaneous malignancy in the United States and the second most frequent cause of cancer-related death. In 2007, an estimated 153 760 cases of CRC will be diagnosed, and 52 180 people will die from the disease (Jemal et al, 2006). The most important determinant of colon cancer survival is stage. The tumour -node -metastasis system, as defined by the American Joint Committee on Cancer, is the most commonly used cancer staging system and classifies colon cancer into four stages based on the depth of invasion of the bowel wall (T), extent of regional lymph node involvement (N), and presence of distant sites of metastatic disease (M) (Greene et al, 2002). Stage I includes T1 and T2 tumours without nodal or distant metastases and most patients with this disease will be cured with segmental colectomy alone. The overall 5-year survival (OS) of this stage is 93.2%. Stage II is subdivided into two classes (IIA and IIB; OS ¼ 84.7 and 72.2%, respectively) and includes T3 and T4 tumours, respectively. Like stage I, nodal or distant metastasis is absent in stage II disease. Although many patients with stage II disease will be cured by surgical resection alone, many patients with completely resected stage II disease will ultimately die from colon cancer (Jemal et al, 2006). Stage III disease includes tumours that do contain nodal disease but do not contain distant metastases. After complete surgical resection, these patients face a 50 -60% chance of developing recurrent disease. A survival benefit from adjuvant 5-fluorouracil-based chemotherapy has been firmly established in these patients, and recent data have shown further efficacy through the inclusion of oxaliplatin into adjuvant treatment programmes (Chung and Saltz, 2007;Wolpin et al, 2007).
At present, the standard procedure for determining the spread of metastatic disease to lymph nodes is pathological examination of B20 resected lymph nodes stained with haematoxylin and eosin (H&E). We reasoned that an assay that was able to identify patients with metastatic disease by measuring RNA expression levels of select genes would be helpful for making clinical decisions. In the current study, we investigated whether expression levels of 14 cancer-associated genes in the primary tumour were correlated with lymph node metastases. These 14 genes are derived from a set of 22 that our laboratory has previously identified as being overexpressed in various cancers (Reed et al, 2007).
In addition to the set of 14 genes, we also chose to examine the expression of b 2 microglobulin (B2M). Our rationale for the Revised 14 February 2008; accepted 7 April 2008; published online 27 May 2008 inclusion of B2M was based on the following observations: (1) 32% of B2M null Â IL-2 null mice develop adenocarcinoma in the proximal half of the colon between 6 and 12 months (Shah et al, 1998), (2) out of 19K genes screened from 25 matched CRC tissue and normal mucosa, B2M was the most highly downregulated gene in CRC (Bianchini et al, 2006), (3) oncogenic K-ras mutations (which are present in the majority of CRC) inhibit the expression of B2M and other interferon (IFN)-responsive genes (Klampfer et al, 2003), (4) downregulation of B2M in CRC has been confirmed by real-time RT -PCR (Bianchini et al, 2006), and (5) the level of expression of B2M is very high and can be reliably measured in formalin-fixed paraffin-embedded (FFPE) tissues (Chen et al, 2007).

Patients and tissues
This study was approved by the Institutional Review Boards at the Medical University of South Carolina and by the Mayo Clinic College of Medicine in Jacksonville. Metastatic and benign lymph nodes from colon cancer patients. Medical records were searched for patients who underwent surgical resection and who did (n ¼ 7) and did not (n ¼ 7) have associated lymph node metastases at the Mayo Clinic. A 50-mm-thick section was cut for real-time RT -PCR studies and a 5-mm-thick section was used for H&E staining. The presence of metastatic disease in lymph nodes identified as positive was confirmed by the study pathologist. Primary tumour specimens. Medical records were searched for colon cancer patients who had at least one metastatic lymph node (n ¼ 20) or no metastatic lymph nodes (n ¼ 18). Duplicate 50-mm-thick sections were cut for real-time RT -PCR studies and a 5-mm-thick section was used for H&E staining.
RNA isolation from paraffin sections mRNA extraction followed the method of Specht et al (2001). Briefly, paraffin-embedded tissue sections were deparaffinised twice with 1 ml of xylene at 371C or room temperature for 10 min. The pellet was subsequently washed with 1 ml of 100, 90, and 70% of ethanol and air-dried at room temperature for 2 h. The pellet was resuspended in 200 ml of RNA lysis buffer (2% lauryl sulphate, 10 mmol l À1 Tris-HCl (pH 8.0), and 0.1 mmol l À1 EDTA) and 100 mg of proteinase K and incubated at 601C for 16 h. RNA was extracted using 1 ml of phenol/chloroform (5 : 1) solution (Sigma, St Louis, MO, USA). The aqueous layer containing RNA was transferred to a new 1.5 ml tube. Phenol/chloroform extraction was performed a total of three times. RNA was precipitated with an equal volume of isopropanol, 0.1 volume of 3 mol l À1 sodium acetate, and 100 mg of glycogen at À201C for 16 h. After centrifugation at 12 000 r.p.m. for 15 min (41C), the RNA pellet was washed with 70% of ethanol and air-dried at room temperature for 2 h. Finally, the pellet was dissolved in 12 ml of DEPC water and treated with DNase before complementary DNA (cDNA) synthesis as described in the text.

Complementary DNA synthesis and real-time RT -PCR
Complementary DNA was made from 6 ml of RNA described above, 200 U of M-MLV reverse transcriptase (Promega, Madison, WI, USA), and a panel of truncated gene-specific primers (Table 1). Real-time RT -PCR was performed on a PE Biosystems Gene Amp s 7300 or 7500 Sequence Detection System (Foster City, CA, USA). With the exception of the SYBR Green I master mix (purchased from Qiagen, Valencia, CA, USA), all reaction components were purchased from PE Biosystems. Standard reaction volume was 10 ml and contained 1 Â SYBR RT -PCR buffer, 3 mM MgCl 2 , 0.2 mM each of dATP, dCTP, and dGTP, 0.4 mM dUTP, 0.1 U UNG Erase enzyme, 0.25 U AmpliTaq Gold, 0.35 ml cDNA template, and 50 nM of oligonucleotide primer. Initial steps of RT -PCR were 2 min at 501C for UNG Erase activation, followed by a 10 min hold at 951C. Cycles (n ¼ 40) consisted of a 15 s melt at 951C, followed by a 1 min annealing/extension at 601C. The final step was a 601C incubation for 1 min. All reactions were performed in triplicate. Before cDNA synthesis, RNA was treated with or without DNase as described in the text.

Gene expression and statistical analysis
To quantitate gene expression, the DC t method was used. As an internal reference, we used either the mean C t value of all genes or the median value as described in the text. A primary tumour sample was considered to have sufficient mRNA if its mean C t value was o35.2 (38 out of 38 samples; mean ± s.d. of all samples ¼ 27.5 ± 4.06). Area under the curve (AUC) measurements were performed for single gene analysis using MedCalc software (Mariakerke, Belgium); patients were dichotomised according to lymph node metastasis status. For AUC analysis of B2M/gene X expression ratios, DC t values of 14 different gene combinations were obtained by subtracting the C t value of B2M from the other genes. Area under the curve analysis was then performed using MedCalc software. Associations between categorical values were assessed using Fisher's exact test. For ordinal variables (e.g. T-stage, pathologic stage), we also used t-test to compare mean levels across lymph node and expression categories due to concern over sparseness. Correlation coefficient analysis of potential reference genes was performed using Microsoft Excel software.

RESULTS
Using a novel microarray/bioinformatics approach, previously we identified a set of 22 genes that were predicted to be overexpressed in multiple cancers (Reed et al, 2007). To investigate whether these genes were overexpressed in metastatic CRC, we selected 14 genes and measured their level of expression in lymph nodes obtained from CRC patients who were positive (n ¼ 7) and negative (n ¼ 7) for metastatic disease by H&E staining. We next performed AUC analysis, the most commonly used statistical method for determining accuracies of diagnostic tests (Henderson, 1993). Receiveroperator characteristic (ROC) curve analysis is based on a plot of sensitivity as a function of 1Àspecificity. The area under the ROC curve (W) is a measure of diagnostic (or prognostic; see below) accuracy such that values between 0.5 and 0.7 indicate low accuracy, values between 0.7 and 0.9 indicate moderate accuracy, and values greater than 0.9 indicate high accuracy (Swets, 1988). We observed that the AUC values for detection of metastatic disease of 11 out of 14 (79%) genes were greater than 0.80 (Table 2 and Figure 1). We conclude from this experiment that the set of 14 genes is highly overexpressed in metastatic CRC and hypothesise that one or more genes in this set may be prognostic for lymph node metastases.
To determine whether expression levels of molecular markers might correlate with lymph node metastases, RNA was isolated from FFPE primary tumour sections as described in Materials and Methods and analysed for the expression of the 14 cancerassociated genes listed in Table 1. For reasons stated in the introduction section, we also included in our marker panel the B2M gene. Characteristics of the patients with (n ¼ 20) and without (n ¼ 18) lymph node metastases are shown in Table 3. As anticipated, we observed a significant association between lymph node metastases and T-stage, pathologic stage, and tissue differentiation (continuous).
To evaluate potential prognostic values of the genes, we simply used as an internal reference the mean C t value of all 15 genes ( Figure 2). Our rationale for this approach was two-fold. First, an ideal internal reference gene in cancer prognostics is one that provides an accurate measure of the amount of tumour and not tissue. As our genes were selected on the basis of overexpression in tumour and/or metastatic disease, we reasoned that their mean C t value should be a reliable measure of tumour content. Second, the number of genes we used for reference was 15, a number sufficiently high to avoid potential problems caused by outliers. In addition to the analysis using the mean C t value of the 15-gene set as an internal reference, a separate analysis was also performed using the median C t value.
Using the mean of the entire 15-gene set as an internal reference control, we calculated DC t values for all genes and performed AUC analysis. We observed that the only gene whose AUC value was higher than 0.80 for prognosis of nodal metastases was B2M (AUC ¼ 0.85, 95% confidence interval (CI) ¼ 0.69 -0.94; Table 4), such that low expression of B2M was associated with nodal metastases. A similar AUC value (0.83, 95% CI ¼ 0.67 -0.93) for B2M was obtained when the median value was substituted for the mean (not shown). The gene with the second highest prognostic accuracy was GPX2 (Table 4); low expression of this gene was also associated with lymph node metastases. EpCAM1 EpCAM2 Mal2 AGR2 CDH1 CEA6 Elf3 S100P Claudin7 Spint2 GPX2 CDH3 CK19 Figure 1 Real-time RT -PCR analysis of metastatic and benign lymph nodes from colon cancer patients. Real-time PCR analyses of seven benign lymph nodes (left side of each matched data set; open triangles) and seven metastatic lymph nodes (right side of each matched data set; closed diamonds) were performed as described in Materials and Methods using primer pairs for the indicated genes. C t values for each gene were determined from triplicate reactions. DC t values were obtained by subtracting the mean C t value of B2M (which is highly expressed in normal lymph node tissue) from the mean C t value for each respective gene. Note: The mean B2M C t value of the metastatic lymph nodes was slightly lower (i.e. B2M gene expression was slightly higher) but not significantly different from that of benign tissue (17.8 ± 2.1 vs 20.2 ± 2.1, respectively).  Using a DC t of 4.5 for a threshold for marker positivity (where DC t ¼ C t mean 15-gene set ÀC t B2M ), we observed that the sensitivity for nodal disease detection was 85% (17 out of 20 stage III and stage IV patients were correctly classified; see Table 3), whereas the specificity was 83% (15 out of 18 stage I and stage II patients were correctly classified; see Table 3). Interestingly, all apparent 'false positives' in the node-negative groups were derived from stage I patients. The relevance of this finding is discussed in further detail below.
To investigate whether one or two genes from the panel could substitute for the entire set, we analysed our expression results in two manners. First, we calculated the expression ratios of all B2M/ gene X pairs to determine whether a particular pair exhibited high prognostic accuracy. These calculations revealed that the expression ratio of B2M/Spint2 had the highest prognostic accuracy of all possible pairs (AUC ¼ 0.87; 95% CI ¼ 0.71 -0.96). Second, we performed a correlation coefficient analysis and determined that the mean C t value of the 15-gene set was most highly correlated with Spint2 (R 2 ¼ 0.97; Figure 3). The results of these analyses suggest that Spint2 can be used as an internal reference control gene.

DISCUSSION
In this study, we observed that low expression of B2M was a strong prognostic indicator of lymph node metastases regardless of whether the mean expression value of the 15-gene set was used as an internal reference control (AUC ¼ 0.85; Table 4), the median expression value of the 15-gene set was used (AUC ¼ 0.83), Spint2 was used as a single internal reference gene (AUC ¼ 0.87), or classic reference genes such as TBP or UBP were used (AUC ¼ 0.79). Based on the results described in this study, the mean ( ± s.d.) accuracy of B2M for prognosis of lymph node metastases was 0.83±0.04, a value that is sufficiently high to warrant further investigation into this biomarker. Although a previous study has shown that B2M was the most highly downregulated gene in CRC compared to adjacent normal tissue (Bianchini et al, 2006), this is the first study to show a relationship between B2M mRNA levels and a clinical parameter related to outcome of CRC patients. B2M is known as a classic IFN-responsive gene (Yan et al, 2004;Gottenberg et al, 2006;Gray et al, 2006;Joyce et al, 2007;Scherbik et al, 2007;Urosevic et al, 2007). Interferons were originally discovered as antiviral proteins that inhibit virus replication (Isaacs and Lindenmann, 1987). Upon virus infection, IFNs are induced in mammalian cells and thus mediate cellular homeostatic responses to virus infection. In addition to their antiviral properties, IFNs are involved in many other physiological processes including cell growth and proliferation, cell death, the immune response, and other cellular defence mechanisms (Sen, 2001). In colon cancer, oncogenic K-ras inhibits the expression of IFN-responsive genes through inhibition of STAT1 and STAT2 expression (Klampfer et al, 2003). Under normal circumstances, IFN induces phosphorylation of STAT1, which is released from the IFNg receptor and forms STAT1 homodimers. The homodimers then translocate to the nucleus to activate target genes (Darnell et al, 1994;Ihle and Kerr, 1995) such as B2M.
In addition to inactivation by K-ras mutations in colon cancer, B2M gene expression can also be impaired by mutations in the coding and non-coding (promoter) regions. For example, in HCT-15 colon cancer cells, both B2M alleles are mutated, one by an 11 base-pair deletion and the other by a point mutation, resulting in the loss of HLA class 1 surface antigens (Gattoni-Celli et al, 1992). Loss of B2M mRNA expression has important implications at the protein level. B2M is a chaperone of major histocompatibility complex (MHC) class I (-like) molecules that play a central role in antigen presentation, immunoglobulin transport, and iron metabolism. In the tumour host immune response, HLA-A,B,C assembles with B2M in the endoplasmic reticulum (Momburg and Koch, 1989;. Loss of these class 1 antigens is associated with decreased histological differentiation in colon cancer (Gattoni-Celli et al, 1992), as well as increased malignancy in a number of neoplasms, including B-cell lymphoma (Moller et al, 1987) and melanoma (Paschen et al, 2003). Interestingly, loss of the native HLA-A,B,C/B2M complex appears to be sporadic in nature; in some cases, the loss is localised to certain portions of the tumour, whereas in others, loss of B2M is evident across the entire tumour (Momburg and Koch, 1989). As MHC class 1 antigens are required for the host to mount a tumour response, the loss of these antigens allows a tumour to escape recognition by the immune system.
Despite the known biological properties of B2M, this gene has been inadvertently used as an internal reference control in studies involving IFN signalling (Einav et al, 2005;Tamassia et al, 2007), as well as real-time PCR studies of colon cancer prognosis (Takeuchi et al, 2003). Interestingly, in the study performed by Takeuchi et al, the investigators examined 36 tumours and observed that the mRNA copy numbers of B2M in T3/T4 cases (mean 1.78 Â 10 5 copies) had a tendency to be lower than that in T1/T2 cases (mean 4.44 Â 10 5 copies; P ¼ 0.16), but were not highly correlated with another reference gene. Owing to the later observation, the authors correctly concluded that B2M should not be used as an internal reference control. However, the investigators failed to recognise that the B2M gene itself might serve as a prognostic marker.
In this study, we observed that Spint2 was a reliable internal reference control gene for CRC. This result is consistent with the study of Kataoka et al (2000), who found no relationship between Spint2 mRNA and tumour stages in CRC. Further, during the course of acetic acid-induced experimental colitis in an in vivo mouse model, Spint1 but not Spint2 was upregulated in the recovery phase , a process that requires cellular regeneration. This result further supports the concept that mRNA expression levels of Spint2 may remain constant during tumour progression in CRC.
Spint1 and Spint2 are Kunitz-type serine protease inhibitors that regulate hepatocyte growth factor (HGF) activity through inhibition of HGF activator (HGFA), matriptase and hepsin (Parr and Jiang, 2006). Matriptase, urokinase-type activator, HGFA, and hepsin are the main factors responsible for converting inactive pro-HGF into active HGF, which is mainly secreted by stromal fibroblasts. Hepatocyte growth factor is known to play a number of roles in cancer metastasis and tumour growth. Thus, because Spint1 and Spint2 serve to inhibit the activity of HGF, these genes have been characterised as tumour suppressors (Morris et al, 2005). Further, in a study of 41 ovarian cancer patients, low expression of Spint2 was determined to be an independent factor of poor prognosis (P ¼ 0.013; hazard ratio, 2.30; Tanaka et al, 2003). Interestingly, the suppression of metastatic behaviour (e.g. cell motility) by Spint2 can be abrogated in vitro by treatment with extracellular signal-regulated kinase/mitogen-activated protein kinase and phospholipase C-g inhibitors (Morris et al, 2005), suggesting that the suppressive effects of this gene can be bypassed.
Of the 14 cancer-associated genes used in the current study, none proved to be prognostic for lymph node metastases at an AUC value 40.80. This was rather surprising, as several of these genes have been previously shown to be prognostic for various cancers. For example, immunohistochemical studies have shown that low expression of E-Cadherin is associated with poor survival in a number of cancers including pancreatic (Shimamura et al, 2003), thyroid (Scheumman et al, 1995), gall bladder (Hirata et al, 2006), breast (Park et al, 2007), lung (Nozawa et al, 2006), hepatic (Wu et al, 2006), endometrial (Scholten et al, 2006), and colon (Pena et al, 2005) cancer. Further, we were also surprised to find that EpCAM2/TROP2 was not a prognostic factor, as overexpression of this gene has been shown to be associated with liver metastases and survival in CRC patients  Figure 3 Spint2 expression levels are highly correlated with the mean of the 15-gene set. Mean C t values of the 15 genes were obtained by dividing the sum of the C t values by 15. In cases where no signal was detected, a C t value of 40 was used. Linear regression analysis of the indicated data points was performed using Microsoft Excel software. (Ohmachi et al, 2006). It is not clear why we failed to observe a prognostic role for EpCAM2. In contrast to all other genes used in this study, EpCAM2 arose from retrotransposition (of EpCAM1) and does not contain introns. Hence, in the absence of DNase treatment, amplification of this gene is prone to genomic contamination.
A long-range goal of our research group is to develop a prognostic assay that can be used to predict what stage II colon cancer patients might benefit from adjuvant chemotherapy.
In this study, we observed that 3 out of 18 patients who were node negative by pathological assessment had low levels of B2M. The clinical significance of this finding is not known. However, in a subsequent study performed on untreated stage II tissues obtained from a separate institute, we discovered that low B2M expression was an independent prognostic marker for poor patient survival (Blum et al, 2008). Thus, the results described here may ultimately prove to be of benefit to a significant number of CRC patients.