The current systems of risk grouping in pediatric acute lymphoblastic leukemia (ALL) fail to predict therapeutic success in 10–35% of patients. To identify better predictive markers of clinical behavior in ALL, we have developed an integrated approach for gene expression profiling that couples suppression subtractive hybridization, concatenated cDNA sequencing, and reverse transcriptase real-time quantitative PCR. Using this approach, a total of 600 differentially expressed genes were identified between t(4;11) ALL and pre-B ALL with no determinant chromosomal translocation. The expression of 67 genes was analyzed in different cytogenetic ALL subgroups and B lymphocytes isolated from healthy donors. Three genes, BACH1, TP53BPL, and H2B/S, were consistently expressed as a significant cluster associated with the low-risk ALL subgroups. A total of 42 genes were differentially expressed in ALL vs normal B lymphocytes, with no specific association with any particular ALL subgroups. The remaining 22 genes were part of a specific expression profile associated with the hyperdiploid, t(12;21), or t(4;11) subgroups. Using an unsupervised hierarchical cluster analysis, the discriminating power of these specific expression profiles allowed the clustering of patients according to their subgroups. These genes could help to understand the difference in treatment response and become therapeutical targets to improve ALL clinical outcomes.
At least six important chromosomal changes have been identified in pediatric B-cell precursor acute lymphoblastic leukemia (ALL). They correspond to hyperdiploid (>50) and hypodiploid (<45) chromosomal status, or one of the following chromosomal translocations: TEL-AML t(12;21), E2A-PBX1 t(1;19), BCR-ABL t(9;22), and MLL-AF4 t(4;11).1 These chromosomal modifications and other clinical findings such as age and initial white blood cell count (WBC) define pediatric ALL subgroups and are used as diagnostic and prognostic markers to assign specific risk-adjusted therapies. For instance, 1.0 to 9.9-year-old patients with none of the determinant chromosomal translocation (NDCT) mentioned above but with a WBC higher than 50 000 cells/μl are associated with higher risk group.2 Hyperdiploid and t(12;21) ALL patients are considered low-risk ALL, are treated with less intensive therapy, and have a better event-free survival (EFS) after therapy (>80% at 5 years).1,3 Conversely, hypodiploid, t(1;19), t(9;22), and t(4;11) ALL patients are considered high-risk ALL, and are treated with more intensive regimens. These patients, with the exception of t(1;19), are at a higher risk of relapse and have a much lower EFS after therapy (<40% after 3–5 years).1,3 The t(1;19) patients have an EFS of 70–80% after 5 years that is more similar to low- than high-risk ALL.1,3
The present risk grouping based on clinical, cytogenetic, and immunophenotypic criteria fails to predict the 10–35% patients who will relapse on current therapies. Such inadequacy might be overcome with the identification of molecular prognostic markers. To identify such markers, it is necessary to establish ALL gene expression profiles. Currently, specific gene expression profiles have been established with the use of microarrays for B-cell precursor ALL subgroups, T-cell ALL, minimal residual disease, and to distinguish between acute myeloid leukemia (AML) and ALL.4,5,6,7,8 A specific gene expression profile that can accurately predict relapse is beginning to emerge from these studies.5 However, specific markers that can identify low-risk ALL and be indicative of an effective response to less intensive or toxic therapy have yet to be found.
Gene expression profile characterization can be realized with different techniques that can be grouped under two broad descriptions: sequencing technologies developed for the analysis of global expression pattern and technologies developed for the differential profiling of gene expression.9,10,11,12,13,14,15 However, each of these techniques has several limitations. Many of the global sequencing approaches have the disadvantage of unacceptable amount of redundant sequencing in order to complete the characterization. Any of the techniques that require plasmid cloning introduce insert size bias. Microarray approaches are limited by issues related to gene preselection (it is estimated that only 20% of the entire cell transcriptome is presently represented even on the most comprehensive microarray), difficulties with quantification, detection of rare transcripts, and issues of amplification bias especially with small clinical specimens. Therefore, there is a need for the development of high-throughput techniques that overcome the above disadvantages.
We have developed a novel approach to facilitate gene expression profiling that combines the selective and normalization power of suppression subtractive hybridization (SSH), the high-throughput sequencing capability of concatenated cDNA sequencing (CCS), and the quantitative analytical power of reverse transcriptase real-time quantitative PCR (RT-RQ-PCR). The SSH–CCS–RT-RQ-PCR approach was developed using pediatric t(4;11) and NDCT ALL subgroups as an experimental model. We hypothesized that we could characterize rapidly, with low RNA requirement, a large set of differentially expressed genes and identify unknown markers and expression profiles of ALL subgroups. In addition, the potential for discovery was not limited by preselection of specific target genes. RT-RQ-PCR was used to assess the specificity of the SSH–CCS-generated gene expression profiles, and to expand the study to normal B lymphocytes and other ALL subgroups. This approach allowed us to determine specific gene expression profile for t(4;11), t(12;21), hyperdiploid ALL subgroups, and identify general and specific low-risk ALL markers.
Materials and methods
Human patient samples were collected under an Institutional Review Board for Human Subject Research (IRB) for Baylor College of Medicine and Affiliated Hospitals approved protocol. Two panels of newly diagnosed patients were used in the gene expression study: a smaller panel of 19 patients for the general analysis and a broader panel of 38 patients for the specific low- vs high-risk study (BACH1, TP53BPL, and H2B/S expression analysis). The patients were categorized into one of the following pre-B ALL subgroups: four and five patients with hyperdiploidy (55–58 chromosomes), six and 15 patients with the translocation t(12;21), four and six patients with the translocation t(4;11), one and five patients with the translocation t(9;22), one and four patients with the translocation t(1;19), and three NDCT patients with high WBC at presentation (82 000 to >200 000 cells/μl). Leukopheresis, bone marrow, or blood exchange samples of the patients were fractionated by density centrifugation on a polysucrose gradient (Histopaque, Sigma, St Louis, MO, USA) and cells were collected at the interface. The percentage of the blasts was between 84 and 95% in most of the biological samples and over 90% after Ficoll separation.
Freshly prepared buffy coats of three normal individuals were obtained from a blood bank (Gulf Coast Regional Blood Center, Houston, TX, USA) and the mononuclear cells were separated on polysucrose gradient as mentioned above. CD19+ B lymphocytes were then purified by sequential magnetic separation using MACS microbeads and separation columns as per the manufacturer's protocol (Miltenyi Biotec, Auburn, CA, USA). In brief, the cells were first blocked with an FcR blocking reagent, then incubated with a mouse anti-human CD3+ (T lymphocytes) antibody conjugated to super-paramagnetic MACS MicroBeads. The cells were applied to a LD depletion column in the magnetic field of the MACS separator to allow the CD3− cells to elute out of the column. The CD3− cells were then incubated with a mouse anti-human CD14+ (monocytes) antibody conjugated to super-paramagnetic MACS MicroBeads and applied on a positive selection column-type LS. The eluted CD3−CD14− cells were finally incubated with a CD19+ antibody conjugated to super-paramagnetic MACS MicroBeads and applied twice on an LS column. The purity of the CD19+ cell isolation was over 95% as confirmed by FACS analysis.
The total cellular RNA was extracted utilizing Ultraspec II (Biotecx Laboratories, Houston, TX, USA), a commercial version of the acid-phenol method.16 RNA integrity was checked on a formaldehyde agarose gel. mRNA was extracted with Oligotex (Qiagen, Santa Clarita, CA, USA). The total and mRNA extractions were performed as per the manufacturer's instructions.
Suppression subtractive hybridization
The Diatchenko and co-workers14,15 PCR-based cDNA subtraction method was performed using the SMART PCR cDNA synthesis and PCR-Select subtraction kits (Clontech Laboratories, Palo Alto, CA, USA) as described previously.17 The protocol was initiated with 240 ng of mRNA from two t(4;11) ALL patients (patients A and B) and two NDCT ALL patients (patients C and D). The cDNA was then cut into smaller fragments with RsaI to optimize the hybridization process, and a certain proportion of the resulting fragments (tester) were linked to a set of adaptors as per the manufacturer's instructions17 (Clontech). One set of hybridizations was performed between patients A and C cDNA, and between patients B and D cDNA. In this hybridization set, patients C and D cDNA was used in excess (drivers), resulting in two differentially expressed subtracted t(4;11) cDNA pools. A second set of hybridization was performed using patients A and B cDNA in excess (drivers), resulting in two differentially expressed subtracted NDCT cDNA pools. The differentially expressed SSH cDNA fragments from each pool were then amplified twice.17
The concatenation strategy is illustrated in Figure 1. The subtracted cDNA pools were electrophoresed on a 1.25% agarose gel. The parts of the gel containing 500–700 bp cDNA were excised and pooled in two groups (t(4;11) and NDCT). The cDNAs were extracted using the QIAquick gel extraction kit (Qiagen) and concentrated with Microcon YM-50 (Millipore, Bedford, MA, USA). The SSH adaptors were then removed by restriction digest at 37°C overnight with EagI and NotI (New England Biolabs, Beverly, MA, USA) in the presence of 1% BSA. The digested products were gel purified a second time and concentrated as above. The cDNAs were ligated using T4 DNA ligase at 16°C overnight to form large concatenated products. The concatenated products were then sheared into 1–2 kb fragments with an Aero-Mist Nebulizer (Cis-Us Inc., Bedford, MA, USA), shotgun cloned into pUC18-based sequencing vector, and sequenced in the Baylor College of Medicine Human Genome Sequencing Center (BCM-HGSC) pipeline. Individual cDNA sequences were computationally resolved by electronic digests into contiguous sequences (contigs) representing the original SSH cDNA fragments. The processed reads were assembled using the PHRED/PHRAP suite of programs and analyzed through Consed Autofinish to identify the contigs/SSH cDNA fragments present at high copy numbers.18,19,20 The contigs/SSH cDNA fragments were subjected to rapid batch analysis by querying the latest genome and transcript databases using a suite of informatic tools developed for genomic sequence analysis but adapted for cDNA by the BCM-HGSC.
Optimization of the concatenation process for SSH product sequencing
One of the main goals in the development of the SSH–CCS approach was to obtain maximum gene expression profile information with minimum sequencing. A stepwise strategy was adopted to determine the optimal number of sequence reads that will identify the majority of SSH cDNA fragments in a subtracted library and make the sequencing cost-effective. For this purpose, multiple assemblies from the t(4;11) ALL SSH–CCS library were run with incremental increases in the number of sequence reads and analyzed as a function of their diversity in SSH cDNA fragments. A total of 413 different SSH cDNA fragments were identified from 650 sequence reads. The first 405 sequence reads identified 348 different SSH cDNA fragments whereas only 65 new SSH cDNA fragments were identified from the following 245 sequence reads, indicating that a plateau effect was already beginningto take place between 405 and 650 sequence reads (Figure 2, □).
To associate these 413 different SSH cDNA fragments from the t(4;11) ALL SSH–CCS library with individual transcripts, each sequence assembly was then analyzed by BLAST analysis against the NCBI nr database. Approximately 35% of the sequence reads were inter-related since several SSH cDNA fragments corresponded to different RsaI fragments of a transcript full-length sequence. Consequently, the 413 different SSH cDNA fragments corresponded in fact to 316 differentially expressed transcripts (Figure 2, •). Although further increases in sequence reads would identify more unique transcripts, the cost–benefit ratio of this procedure would clearly be compromised by the fact that the majority of these reads would fall in the previously identified SSH cDNA fragment category.
The BLAST analysis also revealed that about 12% of the SSH cDNA fragment sequences did not show any significant alignments (Figure 2, ◊). About half of these appear to be due to poor sequence quality. The other half identify transcripts from areas of the genome, which have yet to be sequenced to high quality. This number is consistent with the fraction of the genome, which has not been easily amenable to sequence analysis.
cDNA synthesis and real-time PCR
Total RNA was suspended in RNAsecure and treated with DNA-free to remove any DNA contamination (Ambion). Total RNA (0.5–1 μg) was then subjected to RT using TaqMan Reverse Transcription Reagents (Applied Biosystems, Foster City, CA, USA).
PCR primers were designed using Primer Express Software (Applied Biosystems) with the following parameters: the length of the amplicon between 50 and 150 bp, the Tm of the primers between 58 and 60°C, and the span of an intron. Real time quantitative (RQ-PCR) PCR was performed using the SYBR Green PCR Core Reagents kit (Applied Biosystems) as per the manufacturer's instructions. Each reaction contained a 1:10 cDNA dilution to amplify the gene of interest or a 1:500 cDNA dilution to amplify the endogenous 18S control. The PCRs were performed on an ABI Prism 7700 or 7900HT Sequence Detection System (96- and 384-well plate, respectively) under the universal thermal cycling parameter (50°C for 2 min and 95°C for 10 min followed by 40 cycles of 95°C for 15 s and 60°C for 1 min). Fluorescence was measured during the extension period of each cycle to monitor the amplification.
The specificity of the amplification was monitored by dissociation curves. The dissociation of SYBR Green I labeled cDNA was performed after the completion of the PCR by heating the PCR products for 15 s at 95°C, 20 s at 60°C, and increasing the temperature slowly up to 95°C over a 20 min interval. The dynamic range of several genes was also tested using serial RNA dilution to generate a relative standard curve. Each of the genes tested were detected over a 4–5 logarithm dilution range. The PCR efficiencies of the target gene and endogenous control were predicted by comparing the slope value of the relative standard curve.
The RQ-PCR data were analyzed using a comparative relative quantification method.21 In this method, the expression of a gene in a target sample is compared with its expression in a reference sample named calibrator. The fold change in expression is calculated using the formula 2−ΔΔCT where ΔΔCT=(CT of the gene in the target sample −CT of the endogenous control in the target sample) – (CT of the gene in the calibrator − CT of the endogenous control in the calibrator).21 In this study, the fold change in gene expression was normalized with the CT value of 18S rRNA used as endogenous control and was calculated in relation to the CT value of normal B lymphocytes used as calibrator. In cases where the genes were not expressed in any of the normal B lymphocytes or their CT value was >36.5, one of the NDCT patient's sample was used consistently as a calibrator.
Every reaction was performed in duplicate on each plate. Initially, triplicates were used but were found to be unnecessary due to the high reproducibility of the results. DNA contamination was monitored for each sample with an RT minus control, and no DNA contamination was ever detected. In all, 25 genes were analyzed on a complete panel of 19 ALL patients and three healthy individuals, and amplified a second time from cDNA originating from different RT reactions, or distinct cDNA dilutions. This resulted in the same expression patterns seen in the initial amplification.
Cluster and general discriminant analyses
Unsupervised hierarchical cluster analysis (HCA), a multivariate statistical method, was performed with the CLUSFAVOR computer algorithm developed by Dr Leif Peterson22,23 (available at the web site: http://mbcr.bcm.tmc.edu/genepi). HCA identifies ‘natural’ groupings of objects considered in an analysis. One minus correlation (1−r) was used as the distance function for the fold variation value of the gene expression.
To investigate the ability of transcripts to discriminate diagnostic category, we employed a heuristic search method called stepwise general discriminant analysis (GDA) (SPSS Version 11, Chicago, IL, USA). GDA builds a linear discriminant function model based on single continuous degree-of-freedom predictor variables. The 67 genes identified in this study were used as predictor variables. During the GDA run, model building was based on Wilks lambda method using an F to enter of 3.84 (F=t2=1.962) and F to remove of 2.71.
Differential gene expression in the different ALL subgroups
Over 600 different genes were identified as differentially expressed (over 300 in each of the two subgroups) in the present SSH CCS study. In all, 67 genes were selected from either libraries for further characterization on the basis that they were unknown or have been previously implicated in cancers. Their expression was studied in an expanded ALL panel of 19 patients and three healthy individuals. According to their expression, the genes could be grouped as either general ALL markers, low-risk markers, or associated to a specific subgroup.
In all, 42 genes were found to be ALL markers. Each of these genes showed a broad range of expression between the different ALL patients without any specific expression pattern associated to the different subgroups. A total of 39 genes, among which four were unknown, were expressed in every malignant ALL case studied, in a range between 2.5–300-fold over the expression detected in normal B lymphocytes (Table 1). Three genes, UN-30, CHC1L, and UN-161, had little or no detectable expression in the normal B lymphocytes resulting in very high fold increase of expression.
BACH1, TP53BPL, and H2B/S were revealed to be specific markers for all low-risk patients with a cluster pattern of expression including the hyperdiploid and t(12;21) subgroups when analyzed on the ALL panel of 19 patients. The same correlation was maintained when the study was expanded to a panel of 38 patients. They had an average of 2.5-, 2.1-, and 4.6-fold increase, respectively, above the combined high-risk t(4;11), t(1;19), t(9;22), and NDCT subgroups (Table 2, Figure 3).
t(12;21) and hyperdiploid gene expression profiles
The t(12;21) subgroup was characterized by an increase in the hypothetical protein DKFZP566J091, p68, and UN-18 gene expression as well as a decrease in FLT3 expression (Table 2, Figure 4). FLT3 has been previously reported as highly expressed in AML and ALL.24 In this study, FLT3 expression was significantly lower in the t(12;21) subgroup (average of 4.7±2.4) than in the other ALL subgroups, most specifically when compared to the hyperdiploid and t(4;11) ALL (average of 25.1±5 and 15.4±2.2, respectively).
The hyperdiploid subgroup was characterized by an average 2.3-, 4.9-, and 16.1-fold increase in the expression level of RTVP-1, IFITM2, and IFIT1, respectively, and a 22.4- and 76.8-fold decrease in the expression of BAALC and UN-197 over the other subgroups (Table 2, Figure 4). None of the gene expression increases in the hyperdiploid ALL subgroup directly reflected the chromosomal abnormalities of the patients who all had three copies of chromosomes 4, 10, and 18, and three to four copies of chromosomes 14 and 21. Genes located on multiple-copy chromosomes were not proportionally (IFIT1) or selectively highly expressed in hyperdiploid ALL. Moreover, IFITM2 expression was elevated in all of hyperdiploid ALL, whereas only two of the patients had an additional copy of chromosome 11.
t(4;11) and NDCT gene expression profiles
In all, 11 differentially expressed genes were associated with the t(4;11) ALL subgroup (Table 2, Figure 4). DAD1, HOXA9, MEIS1, hypothetical protein KIAA1576, and three unknown genes showed an increase in expression. DAD1 and MEIS1 expression was consistently elevated in the t(4;11) subgroup but HOXA9 expression was inconsistent. The expressions of some of these genes were low or absent in patients from the other ALL subgroups mimicking the expression or lack of expression found in the normal B lymphocytes.
Five genes were selectively expressed at lower levels in the t(4;11) subgroup in comparison to the other ALL subgroups: ERG, CD10, FBXW7, and two unknown genes. ERG, a transcriptional activator with mitogenic and transforming activity,25 was not expressed in the normal B lymphocytes and was minimally expressed in the t(4;11) subgroup. FBXW7 average expression was at least 20-fold lower in t(4;11) ALL than in the other ALL subgroups.
Only one gene characterized specifically the NDCT group, AC133, a marker of hematopoietic stem and progenitor cells (Figure 4).26 AC133 was consistently expressed at a low level in the NDCT ALL. High levels of AC133 expression were detected in the hyperdiploid and t(4;11) ALL.
Cluster and general discriminant analyses
An unsupervised hierarchical cluster analysis was performed with the CLUSFAVOR algorithm to group the genes based on similarity in their expression pattern and to segregate the patient cases based on global similarities in their gene expression patterns. The analysis was carried out on the fold variation value of the gene expression (RT-RQ-PCR data) for the normal B lymphocytes, the hyperdiploid, t(12;21), t(4;11), and NDCT ALL patients (Figure 5). Genes that were defined as t(4;11) (genes 1–4, 14–16) and ALL markers (such as genes 21–30, 48–55) clustered together. However, the most interesting data were related to the patient cluster-ing. Five distinct clusters were formed by the t(4;11) ALL, the normal B lymphocytes, the hyperdiploid ALL, two of the NDCT, and the t(12;21) ALL. The third NDCT clustered with the t(12;21) ALL. It is interesting to mention that at the time of the clustering, the karyotype analysis of one patient (number 40) had not yet been obtained. That patient however clustered appropriately with a low-risk hyperdiploid ALL and cytogenetic result later confirmed the hyperdiploid karyotype of patient number 40.
A GDA was performed to investigate the ability of transcripts to discriminate the different diagnostic categories. GDA identified 13 genes (ANAX1, BAALC, CSRP2, ECT2, HOXA9, IFITM2, MEIS1, TPTC, DKFZP566J091, KIAA1576, UN-30, UN-58, and UN-141) with a statistical significance in terms of Wilks lambda that was less than 10−6 for each of the 13 genes. The expression values for these 13 transcripts could simultaneously discriminate the 20 cases into their respective diagnostic categories (normal B lymphocytes, hyperdiploid, t(12;21), t(4;11), and NDCT ALL) with 100% classification probability. If performed in the absence of the normal B lymphocytes, the GDA identified nine genes (ABI1, CD10, H2B/S, HOXA9, IFITM2, TP53BPL, KIAA1576, UN-54, and UN-355) that simultaneously discriminated diagnostic category with 100% correct classification probability.
Analysis of the SSH–CCS–RT-RQ-PCR approach
Sequences from both libraries (t(4;11) and NDCT) were batch blasted against NCBI nr database. It appears that ∼20% of transcripts in each pool identified with genes previously shown to be related to leukemia and other cancers. This is four times higher than two randomly selected unigene sets from NCBI (result not shown). Moreover, between 27 and 37% of the transcripts in each library are novel transcripts. A detailed analysis of the BLAST showed that 38% of these novel transcripts corresponded to partial and full-length cDNA sequences previously identified through various screens with no function assigned, 47% identified new unpredicted genes, 11% matched with DNA regions where a gene had been predicted but not identified, and 4% associated with sequences homologous to ESTs. This explains why nine out of the 11 unknown transcripts presented in this study are not represented on recent commercial microarrays such as the Affimetrix U133 microarray.
To confirm the differential expression of the genes identified in the subtractions more accurately, 90 SSH cDNA fragments were selected from both libraries and their expression was analyzed by RT-RQ-PCR in the four patients used in the SSH experiment. The expression of 87% of these SSH cDNA fragments tested displayed the expected differential pattern of expression (result not shown). False-positive genes originated only from the NDCT subtracted library, suggesting a less efficient subtraction for this group.
Several of the SSH cDNA fragments associated with the sex, immunophenotype, and differentiation status of the cells. Genes located on the Y chromosome such as the eucaryotic translation initiation factor 1A (EIF1AY), the ribosomal protein S4 (RPS4Y), and the DEAD/H box polypeptide (DBY) were obtained in the male subtracted library (SSH performed between a male and a female patient). When analyzed by RT-RQ-PCR, DBY expression was not only detected selectively in the males but it was increased in the 10 ALL males from 4–30-fold (average of 14.28±3) over the expression detected in the two healthy males (average of 1.11±0.11) (result not shown). Genes previously associated with stem cells, CD34+ hematopoietic progenitors, and early immature stages of lymphoid development (CD10, CD20, FLT3, BAALC, and AC133)26,27,28,29 had various levels of expression in the different ALL subgroups and reflected the diverse stages of development of the leukemia tested (Figure 4). CD10 RNA expression correlated well with the FACS immunophenotype analysis (Figure 4 and results not shown). CD10 protein cell surface expression was not detected or observed in only 0.1–1% of the cells in the t(4;11) ALL subgroup and was coupled with a very weak RNA expression level. High RNA expression level in other ALL subgroups corresponded to CD10 detection in 68–99% of the cells.
This study shows that ALL subgroup gene expression profiles can be characterized with the SSH–CCS–RT-RQ-PCR approach. Our data suggest that BACH1, TP53BPL, and H2B/S are markers of low-risk ALL. Several transcripts that were not previously associated with pediatric ALL were also identified. Although the study was carried out with a relatively small number of patients and using only about 1/10 of the identified genes, the clusters of expression were clear and statistically significant, even allowing the association of an uncharacterized patient to the proper karyotype subgroup.
The association of genes, most specifically BACH1, TP53BPL, and H2B/S, with a clinical risk group has never been observed before since published studies have only assigned gene expression profile to specific karyotypically defined subgroups.4,5 These genes as well as some of the genes expressed specifically by the low-risk subgroups might be part of the mechanism involved in the favorable responsiveness to drug therapy and the difference in clinical outcome between low- and high-risk ALL. However, their functional role in relation to clinical risk is not clear for the moment. BACH1 associates with MafK to generate DNA binding complexes that recognize NK-E2/Maf recognition elements (MARE) and act as transcriptional regulators.30 These complexes might regulate hematopoietic differentiation through interaction with other transcription factors and might also be implicated in chromatin structure modification.31 H2B/S belongs to the cell-cycle-dependent histone gene group that is closely associated with DNA synthesis and is abundant in rapidly dividing cells.32 Rapidly dividing cells should be more susceptible to most chemotherapeutic agents. TP53BPL, also identified as Topors, has been reported to be involved in the cellular response to the chemotherapeutic drug camptothecin used in the treatment of solid tumors.33 It might also be involved in the positive response to the drugs used in ALL therapy. TP53BPL might not only regulate drug responsiveness, but might also be implicated in other mechanisms such as the activation of apoptosis.34 We also detected an increase in expression of other genes interacting or targeted by p53 such as RTVP and p53R2.35,36 Moreover, p53 expression was elevated in the malignant blasts (Table 1) and it is likely that it is functional since p53 mutations rarely occurred in ALL.37
The expression of three genes was consistently more elevated in two t(4;11) patients who relapsed and died when compared with the long-term survivors of the same subgroup. They were ARC21, BAALC, and DAD1 with an average fold increase of 1.7, 2.8, and 6.7, respectively. BAALC overexpression had been previously linked to poor prognosis in AML patients28 but has not been studied in relation with pediatric ALL subgroups. The fatal association was strictly limited to t(4;11) patient since similar BAALC expression was also detected in the t(9;22) patient who appears to be in long-term remission (>3 years). Moreover, none of these genes showed any increased expression in the two NDCT patients who also relapsed and died. The association of these genes with a poor prognosis has never been reported for t(4;11) before and needs to be confirmed on an extended panel of t(4;11) patients with various outcomes. However, the specificity of the phenomenon where the overexpression of a gene correlates with relapse for one particular subgroup only is not unique. The overexpression of other genes was predictive of future relapse for t(4;11) as well as for other subgroups.5 DBY was one of these genes for the hyperdiploid subgroup.5 We did not analyze its expression in any relapsed hyperdiploid patient. However, DBY was overexpressed in every male ALL patients we studied when compared to the normal male individuals. Previous clinical data have indicated a difference in risk of relapse between male and female children, male children of any subgroups being at higher risk of relapse.38,39 Such discrepancy might be, in fact, related to the overexpression of DBY and possibly other genes located on the Y chromosome.
In this study, a comparison of the gene expression was not only performed between different ALL subgroups, but with the normal peripheral B lymphocytes. Such comparison might help in the identification of genes associated with the state of maturation of the blasts. In every known case where a gene was a marker of the stem or precursor hematopoietic cells, such as AC133, FLT3, and BAALC,26,27,28 no expression was detected in the normal peripheral B lymphocytes. Moreover, these genes were differentially expressed in one of the subgroups reflecting the fact that the subgroups are at different stages of differentiation. Specific studies on the function of the genes identified in this study, such as UN-18, KIAA1576, UN-251, UN-210, and UN-58, will have to be carried out to elucidate their role(s) in the malignant vs maturation status of the blasts.
Normal peripheral B lymphocytes were also used in this study as calibrator. In fact, the choice of normal B lymphocyte as calibrator does not influence the result since it serves as a baseline, and the difference in gene expression between each patient would be maintained whatever sample is used as calibrator. Moreover, this choice was based on the fact that it would be difficult to compare the expression of the malignant blasts with the expression of the normal immature pro- and pre-B lymphocytes. It is impossible for ethical reasons to perform an invasive procedure to draw the bone marrow of healthy children. Moreover, a large amount of bone marrow would also have to be acquired to sort minute amounts of precursor cells. It would also be very difficult to match the different states of maturation of the malignant blasts from each patient to their normal counterpart, since each patient would be at a slightly different maturation state from one another and blast populations are often heterogenous.
The results presented in this paper highlight the significant advantages of the SSH–CCS–RT-RQ-PCR as a high-throughput approach for the rapid analysis of specific gene expression. The high-throughput advantage of this approach is not only associated with the automation of the cloning and sequencing process, and the computerization of the BLAST analysis, which incredibly speed up the SSH analysis by several weeks, but it also relies on the following aspects. The subtraction process (SSH) results in the direct identification of the differentially expressed genes, minimizing and simplifying the processing of the obtained information. The normalization process (SSH) eliminates redundant mRNA/cDNA copies allowing the identification of rare transcripts and minimizing the sequencing effort. The production of short cDNA fragments (RsaI digest in preparation for SSH) coupled with the concatenation and shearing process (CCS) permits the identification of several transcripts per sequence read, assuring maximal transcript identification with minimal sequencing. The CCS process also eliminates the uneven representation of transcripts due to cloning bias associated with the original SSH analysis, assuring the identification of every differentially expressed genes. Furthermore, the identification of the transcripts also simplifies the validation process. Limitation of the original SSH validation protocol such as transcript misrepresentation due to uneven labeling of the entire subtracted library (dot blot) and the repetitive hybridization of unidentified probe (cDNA Southern) originating in fact from the same transcript are eliminated.
Multiple unknown genes were detected by the SSH–CCS–RT-RQ-PCR approach. The identification of these genes, still absent on the most recent Affimetrix U133 microarray, gives a serious advantage to the approach in the discovery of potential markers and of genes that might influence the malignant blasts' behavior and response to treatment. Several of the known genes associated with the t(4;11) and hyperdiploid subgroups (DAD, HOXA9, MEIS, ERG, CD10, and IFIT1) were also reported as being specific to these subgroups using microarray approach.4,5 However, some of the known genes (FBXW7, p68, RTVP, IFITM2, and BAALC) detected by our approach were not mentioned by Yeoh et al.5 They selected by informatics assessment only the 40 genes that were the most representative of each of the karyotype subgroups.5 Each of these genes had a high difference in expression level between the karyotype subgroup it is associated with and the other ALL subgroups. The SSH–CCS approach was able to identify small differences in transcription levels down to two-fold. Mammalian cells are extremely sensitive to variations in the expression of certain genes, and small changes have been documented to result in severe disease.40 Thus, it is important to consider genes that are minimally differentially expressed since they could be key genes in the clinical behavior of the leukemia subgroups.
The combined SSH–CCS strategy also allowed the detection of transcriptional differences between the cell types being profiled such as alternate exon splicing events and alternate start sites, which have been commonly seen in cancers and a number of other diseases. Intron retention in different genes has been reported in several diseases as well as in solid tumor. For example, CD44 transcripts with the retention of intron 9 have been found in multiple cancer types.41 In this study, several SSH cDNA fragments were in fact introns of different genes indicating the occurrence of a defective splicing process. The expression of some of these SSH cDNA fragments was studied in the full ALL panel. The results indicated that the defect was not related to one subgroup but was present in all the subgroups based on the expression pattern of the related gene (data not shown). Such defect can be due to the absence or mutation of some key splicing proteins. Recent studies have also pointed out that hyper- and hypophosphorylation of SR proteins, a family of pre-mRNA splicing factors, influences the splicing process.42 CLK4 expression, one of the kinases that phosphorylates the SR proteins, was upregulated by more than seven-fold in ALL over the normal level of expression found in the normal B lymphocytes. TP53BPL, which contains the RS domain specific to SR proteins, and p68, which functions as an RNA helicase and is part of the spliceosome complex, were also overexpressed 6–19-fold in all the ALL samples compared to the nonmalignant lymphocytes.33,34,43,44 The defective splicing process cannot be explained by the upregulation of these genes and more studies are needed to understand this mechanism. However, it is becoming evident that the unsplicing/partial splicing of mRNA is a more common event in several cell types than first believed and the molecules might play a role that is yet to be determined (Clamp M et al, Platform talk: Ensembl: analysis and comparison of multiple genomes. Advances in Genome Biology and Technology (AGBT) Meeting, February 2003).
The expression of several of the genes analyzed in this study has never been associated with ALL or with different ALL subgroups. Several of these genes were previously reported with increased,46,47,48 unmodified,36 or decreased levels of expression,37,49,50 mutated51 or inhibiting the growth48,52 of other tumor types. Our study brings to light a potential role for these genes in the acquisition, maintenance, or regression of the ALL blast malignant status and points out important leads for further investigation. Moreover, our approach might greatly contribute to expand our limited knowledge of the transcriptome and benefit the design of future array to contain the entire genome complement.
Harrison CJ . The detection and significance of chromosomal abnormalities in childhood acute lymphoblastic leukaemia. Blood Rev 2001; 15: 49–59.
Smith M, Arthur D, Camitta B, Caroll AJ, Crist W, Gaynon P et al. Uniform approach to risk classification and treatment assignment for children with acute lymphoblastic leukemia. J Clin Oncol 1996; 14: 18–24.
Pui CH, Evans WE . Acute lymphoblastic leukemia. N Engl J Med 1998; 339: 605–615.
Armstrong SA, Staunton JE, Silverman LB, Pieters R, den Boer ML, Minden MD et al. MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 2002; 30: 41–47.
Yeoh E-J, Ross ME, Shurtleff SA, Williams WK, Patel D, Mahfouz R et al. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 2002; 1: 133–143.
Ferrando AA, Neuberg DS, Staunton J, Loh ML, Huard C, Raimondi SC et al. Gene expression signatures define novel oncogenic pathways in T cell acute lymphoblastic leukemia. Cancer Cell 2002; 1: 75–87.
Chen J-S, Coustan-Smith E, Suzuki T, Neale GA, Mihara K, Pui C-H et al. Identification of novel markers for monitoring minimal residual disease in acute lymphoblastic leukemia. Blood 2001; 97: 2115–2120.
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999; 286: 531–537.
Kozian DH, Kirschbaum BJ . Comparative gene-expression analysis. Trends Biotech 1999; 17: 73–78.
Stanton LW . Methods to profile gene expression. Trends Cardiovasc Med 2001; 11: 49–54.
Wang A, Pierce A, Judson-Kremer K, Gaddis S, Aldaz M, Johnson DG et al. Rapid analysis of gene expression (RAGE) facilitates universal expression profiling. Nucleic Acids Res 1999; 27: 4609–4618.
Andersson B, Lu J, Shen Y, Wentland MA, Gibbs RA . Simultaneous shotgun sequencing of multiple cDNA clones. DNA Seq 1997; 7: 63–70.
Yu W, Andersson B, Worley KC, Muzny DM, Ding Y, Liu W et al. Large-scale concatenation cDNA sequencing. Genome Res 1997; 7: 353–358.
Diatchenko L, Lau YF, Campbell AP, Chenchik A, Moqadam F, Huang B et al. Suppression subtractive hybridization: a method for generating differentially regulated or tissue-specific cDNA probes and libraries. Proc Natl Acad Sci USA 1996; 93: 6025–6030.
Gurskaya NG, Diatchenko L, Chenchik A, Siebert PD, Khaspekov GL, Lukyanov KA et al. Equalizing cDNA subtraction based on selective suppression of polymerase chain reaction: cloning of Jurkat cell transcripts induced by phytohemagglutin and phorbol 12-myristate 13-acetate. Anal Biochem 1996; 240: 90–97.
Chomczynski P, Sacchi N . Single-step method of RNA isolation by acid guanidinium thiocyanate–phenol–chloroform extraction. Anal Biochem 1987; 162: 156–159.
Gingras MC, Margolin JF . Differential expression of multiple unexpected genes during U937 cell and macrophage differentiation detected by suppressive subtractive hybridization. Exp Hematol 2000; 28: 65–76.
Ewing B, Hillier L, Wendl MC, Green P . Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 1998; 8: 175–185.
Ewing B, Green P . Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 1998; 8: 186–194.
Gordon D, Desmarais C, Green P . Automated finishing with autofinish. Genome Res 2001; 11: 614–625.
Livak KJ, Schmittgen TD . Analysis of relative gene expression data using real-time quantitative PCR and the 2−ΔΔCT method. Methods 2001; 25: 402–408.
Peterson LE . Factor analysis of cluster-specific gene expression levels from cDNA microarrays. Comp Meth Prog Biomed 2002; 69: 179–188.
Peterson LE . Software Report: CLUSFAVOR 5.0: hierarchical cluster and principal component analysis of microarray-based transcriptional profiles. Genome Biol 2002; 3: software 0002.1–0002.8.
Birg F, Courcoul M, Rosnet O, Bardin F, Pebusque MJ, Marchetto S et al. Expression of the FMS/KIT-like gene FLT3 in human acute leukemias of the myeloid and lymphoid lineages. Blood 1992; 80: 2584–2593.
Hart AH, Corrick CM, Tymms MJ, Hertzog PJ, Kola I . Human ERG is a proto-oncogene with mitogenic and transforming activity. Oncogene 1995; 10: 1423–1430.
Yin AH, Miraglia S, Zanjani ED, Almeida-Porada G, Ogawa M, Leary AG et al. AC133, a novel marker for human hematopoietic stem and progenitor cells. Blood 1997; 90: 5002–5012.
Small D, Levenstein M, Kim E, Carow C, Amin S, Rockwell P et al. STK-1, the human homolog of Flk-2/Flt-3, is selectively expressed in CD34+ human bone marrow cells and is involved in the proliferation of early progenitor/stem cells. Proc Natl Acad Sci USA 1994; 91: 459–463.
Tanner SM, Austin JL, Leone G, Rush LJ, Plass C, Heinonen K et al. BAALC, the human member of a novel mammalian neuroectoderm gene lineage, is implicated in hematopoiesis and acute leukemia. Proc Natl Acad Sci USA 2001; 98: 13901–13906.
Loken MR, Shah VO, Hollander Z, Civin CI . Flow cytometric analysis of normal B lymphoid development. Pathol Immunopathol Res 1988; 7: 357–370.
Oyake T, Itoh K, Motohashi H, Hayashi N, Hoshino H, Nishizawa M et al. Bach proteins belong to a novel family of BTB-basic leucine zipper transcription factors that interact with MafK and regulate transcription through the NF-E2 site. Mol Cell Biol 1996; 16: 6083–6095.
Motohashi H, Shavit JA, Igarashi K, Yamamoto M, Engel JD . The world according to Maf. Nucleic Acids Res 1997; 25: 2953–2959.
Ahn J, Gruen JR . The genomic organization of the histone clusters on human 6p21.3. Mammalian Genome 1999; 10: 768–770.
Haluska P, Saleem A, Rasheed Z, Ahmed F, Su EW, Liu LF, Rubin EH . Interaction between human topoisomerase I and a novel RING finger/arginine–serine protein. Nucleic Acids Res 1999; 27: 2538–2544.
Zhou R, Wen H, Ao SZ . Identification of a novel gene encoding a p53-associated protein. Gene 1999; 235: 93–101.
Ren C, Li L, Goltsov AA, Timme TL, Tahir SA, Wang J, Garza L et al. mRTVP-1, a novel p53 target gene with proapoptotic activities. Mol Cell Biol 2002; 22: 3345–3357.
Byun DS, Chae KS, Ryu BK, Lee MG, Chi SG . Expression and mutation analyses of P53R2, a newly identified p53 target for DNA repair in gastric carcinoma. Int J Cancer 2002; 98: 718–723.
Krug U, Ganser A, Koeffler HP . Tumor suppressor genes in normal and malignant hematopoiesis. Oncogene 2002; 21: 3475–3495.
Piu CH, Boyett JM, Rivera GK, Hancock ML, Sandlund JT, Ribeiro RC et al. Long-term results of total therapy studies 11,12, and 13A for childhood acute lymphoblastic leukemia at St Jude Children's Research Hospital. Leukemia 2000; 14: 2286–2294.
Ishii E, Eguchi H, Matsuzaki A, Koga H, Yanai F, Kuroda H et al. Outcome of acute lymphoblastic leukemia in children with AL90 regimen: impact of response to treatment and sex difference on prognostic factors. Med Pediatr Oncol 2001; 37: 10–19.
Yan H, Dobbie Z, Gruber SB, Markowitz S, Romans K, Giardiello FM, Kinzler KW et al. Small changes in expression affect predisposition to tumorigenesis. Nat Genet 2002; 30: 25–26.
Caballero OL, de Souza SJ, Brentani RR, Simpson AJ . Alternative spliced transcripts as cancer markers. Dis Markers 2001; 17: 67–75.
Prasad J, Colwill K, Pawson T, Manley JL . The protein kinase Clk/Sty directly modulates SR protein activity: both hyper- and hypophosphorylation inhibit splicing. Mol Cell Biol 1999; 19: 6991–7000.
Iggo RD, Jamieson DJ, MacNeill SA, Southgate J, McPheat J, Lane DP . P68 RNA helicase: identification of a nucleolar form and cloning of related genes containing a conserved intron in yeasts. Mol Cell Biol 1991; 11: 1326–1333.
Neubauer G, King A, Rappsilber J, Calvio C, Watson M, Ajuh P et al. Mass spectrometry and EST-database searching allows characterization of the multi-protein spliceosome complex. Nat Genet 1998; 20: 46–50.
Kurian KM, Watson CJ, Wyllie AH . DNA chip technology. J Pathol 1999; 187: 267–271.
Chung S, Kim M, Choi WJ, Chung JK, Lee K . Expression of translationally controlled tumor protein mRNA in human colon cancer. Cancer Lett 2000; 156: 185–190.
Yokota S, Yamamoto Y, Shimizu K, Momoi H, Kamikawa T, Yamaoka Y et al. Increased expression of cytosolic chaperonin CCT in human hepatocellular and colonic carcinoma. Cell Stress Chaperones 2001; 6: 345–350.
Macmillan JC, Hudson JW, Bull S, Dennis JW, Swallow CJ . Comparative expression of the mitotic regulators SAK and PLK in colorectal cancer. Ann Surg Oncol 2001; 8: 729–740.
Ray ME, Wistow G, Su YA, Meltzer PS, Trent JM . AIM1, a novel non-lens member of the βγ-crystallin superfamily, is associated with the control of tumorigenicity in human malignant melanoma. Proc Natl Acad Sci USA 1997; 94: 3229–3234.
Vogt T, Kroiss M, McClelland M, Gruss C, Becker B, Bosserhoff AK et al. Deficiency of a novel retinoblastoma binding protein 2-homolog is a consistent feature of sporadic human melanoma skin cancer. Lab Invest 1999; 79: 1615–1627.
Strohmaier H, Spruck CH, Kaiser P, Won K-A, Sangfelt O, Reed SI . Human F-box protein hCdc4 targets cyclin E for proteolysis and is mutated in a breast cancer cell line. Nature 2001; 413: 316–322.
Liu J, Yuan Y, Huan J, Shen Z . Inhibition of breast and brain cancer cell growth by BCCIPα, an evolutionarily conserved nuclear protein that interacts with BRCA2. Oncogene 2001; 20: 336–345.
We thank Dr C Philip Steuber and the TCCC Leukemia team for their collaboration, and Mary-Ann A Mastangelo for the technical support and advice with the RT-RQ-PCR technique. We acknowledge the contributions of the BCM-HGSC production team headed by Donna Muzny, the library team headed by Erica Sodergren, and the sequence instrumentation team headed by Graham Scott. We also thank Keelan Hamilton for providing the oligonucleotides used for primer walking and acknowledge the contribution of the cDNA group. We are also grateful to several members of the bio-informatics department including David Wheeler, Kim Worley, David Stefan, and Paul Havlak for their contributions toward this study.
About this article
Cite this article
Qiu, J., Gunaratne, P., Peterson, L. et al. Novel potential ALL low-risk markers revealed by gene expression profiling with new high-throughput SSH–CCS–PCR. Leukemia 17, 1891–1900 (2003). https://doi.org/10.1038/sj.leu.2403073
- acute lymphoblastic leukemia
- gene expression profiling
- suppression subtractive hybridization/concatenated cDNA sequencing/reverse transcriptase real-time quantitative PCR
The reduced and altered activities of PAX5 are linked to the protein–protein interaction motif (coiled-coil domain) of the PAX5–PML fusion protein in t(9;15)-associated acute lymphocytic leukemia
Deregulated cyclin E promotes p53 loss of heterozygosity and tumorigenesis in the mouse mammary gland