Abstract
It is now well established that gene expression profiling using DNA microarrays can provide novel information about various types of hematological malignancies, which may lead to identification of novel diagnostic markers. However, to successfully use microarrays for this purpose, the quality and reproducibility of the procedure need to be guaranteed. The quality of microarray analyses may be severely reduced, if variable frequencies of nontarget cells are present in the starting material. To systematically investigate the influence of different types of impurity, we determined gene expression profiles of leukemic samples containing different percentages of nonleukemic leukocytes. Furthermore, we used computer simulations to study the effect of different kinds of impurity as an alternative to conducting hundreds of microarray experiments on samples with various levels of purity.
As expected, the percentage of erroneously identified genes rose with the increase of contaminating nontarget cells in the samples. The simulations demonstrated that a tumor load of less than 75% can lead to up to 25% erroneously identified genes. A tumor load of at least 90% leads to identification of at most 5% false-positive genes. We therefore propose that in order to draw well-founded conclusions, the percentage of target cells in microarray experiment samples should be at least 90%.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Schena M, Shalon D, Heller R, Chai A, Brown PO, Davis RW . Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. Proc Natl Acad Sci USA 1996; 93: 10614–10619.
DeRisi J, Penland L, Brown PO, Bittner ML, Meltzer PS, Ray M et al. Use of a cDNA microarray to analyse gene expression patterns in human cancer. Nat Genet 1996; 14: 457–460.
Schena M, Shalon D, Davis RW, Brown PO . Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 1995; 270: 467–470.
Wurmbach E, Gonzalez-Maeso J, Yuen T, Ebersole BJ, Mastaitis JW, Mobbs CV et al. Validated genomic approach to study differentially expressed genes in complex tissues. Neurochem Res 2002; 27: 1027–1033.
Smith JL, Freebern WJ, Collins I, De Siervi A, Montano I, Haggerty CM et al. Kinetic profiles of p300 occupancy in vivo predict common features of promoter structure and coactivator recruitment. Proc Natl Acad Sci USA 2004; 101: 11554–11559.
Southern E, Mir K, Shchepinov M . Molecular interactions on microarrays. Nat Genet 1999; 21 (1 Suppl): 5–9.
Brown PO, Botstein D . Exploring the new world of the genome with DNA microarrays. Nat Genet 1999; 21 (1 Suppl): 33–37.
Janoueix-Lerosey I, Novikov E, Monteiro M, Gruel N, Schleiermacher G, Loriod B et al. Gene expression profiling of 1p35–36 genes in neuroblastoma. Oncogene 2004; 23: 5912–5922.
Guipaud O, Deriano L, Salin H, Vallat L, Sabatier L, Merle-Beral H et al. B-cell chronic lymphocytic leukaemia: a polymorphic family unified by genomic features. Lancet Oncol 2003; 4: 505–514.
Hoefnagel JJ, Dijkman R, Basso K, Jansen PM, Hallermann C, Willemze R et al. Distinct types of primary cutaneous large B-cell lymphoma identified by gene expression profiling. Blood, prepublished online August 12, 2004; doi 10.1182/blood-2004-04-1594.
Lossos IS, Czerwinski DK, Alizadeh AA, Wechser MA, Tibshirani R, Botstein D et al. Prediction of survival in diffuse large-B-cell lymphoma based on the expression of six genes. N Engl J Med 2004; 350: 1828–1837.
Bittner M, Meltzer P, Chen Y, Jiang Y, Seftor E, Hendrix M et al. Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 2000; 406: 536–540.
Finley DJ, Zhu B, Barden CB, Fahey III TJ . Discrimination of benign and malignant thyroid nodules by molecular profiling. Ann Surg 2004; 240: 425–436, discussion 427–436.
Ferrando AA, Neuberg DS, Staunton J, Loh ML, Huard C, Raimondi SC et al. Gene expression signatures define novel oncogenic pathways in T cell acute lymphoblastic leukemia. Cancer Cell 2002; 1: 75–87.
Ando T, Suguro M, Kobayashi T, Seto M, Honda H . Multiple fuzzy neural network system for outcome prediction and classification of 220 lymphoma patients on the basis of molecular profiling. Cancer Sci 2003; 94: 906–913.
Holleman A, Cheok MH, den Boer ML, Yang W, Veerman AJ, Kazemier KM et al. Gene-expression patterns in drug-resistant acute lymphoblastic leukemia cells and response to treatment. N Engl J Med 2004; 351: 533–542.
Valk PJ, Verhaak RG, Beijen MA, Erpelinck CA, Barjesteh van Waalwijk van Doorn-Khosrovani S, Boer JM et al. Prognostically useful gene-expression profiles in acute myeloid leukemia. N Engl J Med 2004; 350: 1617–1628.
Holloway AJ, van Laar RK, Tothill RW, Bowtell DD . Options available – from start to finish – for obtaining data from DNA microarrays II. Nat Genet 2002; 32 (Suppl): 481–489.
Li Y, Li T, Liu S, Qiu M, Han Z, Jiang Z et al. Systematic comparison of the fidelity of aRNA, mRNA and T-RNA on gene expression profiling using cDNA microarray. J Biotechnol 2004; 107: 19–28.
Ojaniemi H, Evengard B, Lee DR, Unger ER, Vernon SD . Impact of RNA extraction from limited samples on microarray results. Biotechniques 2003; 35: 968–973.
Mikulowska-Mennis A, Taylor TB, Vishnu P, Michie SA, Raja R, Horner N et al. High-quality RNA from cells isolated by laser capture microdissection. Biotechniques 2002; 33: 176–179.
Nakamura T, Furukawa Y, Nakagawa H, Tsunoda T, Ohigashi H, Murata K et al. Genome-wide cDNA microarray analysis of gene expression profiles in pancreatic cancers using populations of tumor cells, normal ductal epithelial cells selected for purity by laser microdissection. Oncogene 2004; 23: 2385–2400.
Zhu G, Reynolds L, Crnogorac-Jurcevic T, Gillett CE, Dublin EA, Marshall JF et al. Combination of microdissection and microarray analysis to identify gene expression changes between differentially located tumour cells in breast cancer. Oncogene 2003; 22: 3742–3748.
Staal FJ, van der Burg M, Wessels LF, Barendregt BH, Baert MR, van den Burg CM et al. DNA microarrays for comparison of gene expression profiles between diagnosis and relapse in precursor-B acute lymphoblastic leukemia: choice of technique and purification influence the identification of potential diagnostic markers. Leukemia 2003; 17: 1324–1332.
Bolstad BM, Irizarry RA, Astrand M, Speed TP . A comparison of normalization methods for high density oligonucleotide array data based on bias and variance. Bioinformatics 2003; 19: 185–193.
Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP . Exploration, normalization and summaries of high density nucleotide array probe level data. Biostatistics 2003; 4: 249–264.
Ge Y, Dudoit S, Speed TP . Resampling-based Multiple Testing for Microarray Data Analysis. Department of Statistics, University of California: Berkeley, 2003.
Tushner FG, Tibshirani R, Chu G . Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 2001; 98: 5116–5121.
Storey JD, Tibshirani R . SAM thresholding and false discovery rates for detecting differential gene expression in DNA microarrays. In: Parmigiani G, Garrett ES, Irizarry RA, Zeger SL (eds). The Analysis of Gene Expression Data: Methods and Software. New York: Springer, 2003.
Tibshirani R, Hastie T, Narasimhan B, Chu G . Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci USA 2002; 99: 6567–6572.
Storey JD . A direct approach to false discovery rates. J Roy Statist Soc 2002; Series B: 479–498.
Potter JD . Epidemiology, cancer genetics and microarrays: making correct inferences, using appropriate designs. Trends Genet 2003; 19: 690–695.
Hrusak O, Porwit-MacDonald A . Antigen expression patterns reflecting genotype of acute leukemias. Leukemia 2002; 16: 1233–1258.
Pui CH, Behm FG, Crist WM . Clinical and biologic relevance of immunologic marker studies in childhood acute lymphoblastic leukemia. Blood 1993; 82: 343–362.
Allsup DJ, Cawley JC . The diagnosis and treatment of hairy-cell leukaemia. Blood Rev 2002; 16: 255–262.
Yaziji H, Gown AM . Immunohistochemical analysis of gynecologic tumors. Int J Gynecol Pathol 2001; 20: 64–78.
Llewellyn H . Observer variation, dysplasia grading, and HPV typing: a review. Am J Clin Pathol 2000; 114 (Suppl): S21–S35.
Schlemper RJ, Kato Y, Stolte M . Review of histological classifications of gastrointestinal epithelial neoplasia: differences in diagnosis of early carcinomas between Japanese and Western pathologists. J Gastroenterol 2001; 36: 445–456.
de Bree E, Koops W, Kroger R, van Ruth S, Witkamp AJ, Zoetmulder FA . Peritoneal carcinomatosis from colorectal or appendiceal origin: correlation of preoperative CT with intraoperative findings and evaluation of interobserver agreement. J Surg Oncol 2004; 86: 64–73.
Elgamal AA, Holmes EH, Su SL, Tino WT, Simmons SJ, Peterson M et al. Prostate-specific membrane antigen (PSMA): current benefits and future value. Semin Surg Oncol 2000; 18: 10–16.
Coindre JM . Immunohistochemistry in the diagnosis of soft tissue tumours. Histopathology 2003; 43: 1–16.
Baker M, Gillanders WE, Mikhitarian K, Mitas M, Cole DJ . The molecular detection of micrometastatic breast cancer. Am J Surg 2003; 186: 351–358.
Weber T, Klar E . Minimal residual disease in thyroid carcinoma. Semin Surg Oncol 2001; 20: 272–277.
Hood JD, Cheresh DA . Role of integrins in cell invasion and migration. Nat Rev Cancer 2002; 2: 91–100.
Orr FW, Wang HH, Lafrenie RM, Scherbarth S, Nance DM . Interactions between cancer cells and the endothelium in metastasis. J Pathol 2000; 190: 310–329.
Malinda KM, Kleinman HK . The laminins. Int J Biochem Cell Biol 1996; 28: 957–959.
Tureci O, Ding J, Hilton H, Bian H, Ohkawa H, Braxenthaler M et al. Computational dissection of tissue contamination for identification of colon cancer-specific expression profiles. FASEB J 2003; 17: 376–385.
Lu P, Nakorchevskiy A, Marcotte EM . Expression deconvolution: a reinterpretation of DNA microarray data reveals dynamic changes in cell populations. Proc Natl Acad Sci USA 2003; 100: 10370–10375.
Stuart RO, Wachsman W, Berry CC, Wang-Rodriguez J, Wasserman L, Klacansky I et al. In silico dissection of cell-type-associated patterns of gene expression in prostate cancer. Proc Natl Acad Sci USA 2004; 101: 615–620.
Mansmann U . Issues in planning and analysing microarray data studies, Proc Int Symp on Bioinformatics for Agricultural Biotechnology, Suwan, Korea, 2003.
Acknowledgements
We thank Dr E van Wering for providing T-ALL samples.
Author information
Authors and Affiliations
Corresponding author
Additional information
Supplementary Information
Supplementary Information accompanies the paper on the Leukemia website (http://www.nature.com/leu).
Supplementary information
Appendix A
Appendix A
Microarray data were simulated50 as follows. For n={10, 20, 40, 100} microarrays, two groups of n/2 arrays each, A and B, were simulated. Each array contained 1000 genes, of which 50 were set to be truly expressed in B only. The base log2-expression of a gene g in array a was simulated as follows:
-
calculate an average expression over all arrays mg=log2(mg′) with (mg′)−1∼Γ(1,1)
-
true differential expression: dg∈{0,1} with P(dg=1)=0.05
-
sign of expression difference: sg∈{−1,1} with P(sg=1)=0.5
-
amplitude of expression: cg∼U(1.4, 1.5)
-
expression: ○ log2(ea,g)∼N(mg,1) ∀a∈A ○ log2(ea,g)∼N(mg+dgsgcg, 1) ∀a∈B
where the subscript g indicates values for gene g over all arrays, subscript a,g denotes values for gene g in array a, P(x) indicates the probability of x occurring, Γ(α,θ) is the Gamma distribution, U(l,r) is the uniform distribution on [l,r], and N(μ,σ) is the Gaussian distribution with mean μ and standard deviation σ.
Next to the 50 truly expressed genes, either 50 random impurity genes or 50 group-specific impurity genes were set to be differentially expressed in arrays, expressed at a fraction f of the level of a truly expressed gene. Random impurity genes g were added as follows:
-
presence of differential expression: da,g∈{0,1} with P(da,g=1)=0.05
-
sign of expression difference: sa,g∈{−1,1} with P(sa,g=1)=0.5
-
amplitude of expression difference:
-
ca,g∼U(1.4,1.5)
-
differential expression: log2(ea,g)=log2(ea,g)+f da,g sa,g ca,g ∀a∈B
Group-specific impurity genes g were added as follows:
-
presence of differential expression: dg∈{0,1}, with P(dg=1)=0.05
-
sign of expression difference: sg∈{0,1}, with P(sg=1)=0.5
-
amplitude of expression difference: ca,g∼U(1.4,1.5)
-
differential expression: log2(ea,g)=log2(ea,g)+f dg sg ca,g ∀a∈B
In the simulations, the impurity fraction f was varied between 0.0 and 1.0.
Rights and permissions
About this article
Cite this article
de Ridder, D., van der Linden, C., Schonewille, T. et al. Purity for clarity: the need for purification of tumor cells in DNA microarray studies. Leukemia 19, 618–627 (2005). https://doi.org/10.1038/sj.leu.2403685
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/sj.leu.2403685
Keywords
This article is cited by
-
Serum biomarkers identification by iTRAQ and verification by MRM: S100A8/S100A9 levels predict tumor-stroma involvement and prognosis in Glioblastoma
Scientific Reports (2019)
-
A 13-gene expression-based radioresistance score highlights the heterogeneity in the response to radiation therapy across HPV-negative HNSCC molecular subtypes
BMC Medicine (2017)
-
ISOpureR: an R implementation of a computational purification algorithm of mixed tumour profiles
BMC Bioinformatics (2015)
-
Routine use of microarray-based gene expression profiling to identify patients with low cytogenetic risk acute myeloid leukemia: accurate results can be obtained even with suboptimal samples
BMC Medical Genomics (2012)
-
Genome-wide expression analysis of paired diagnosis–relapse samples in ALL indicates involvement of pathways related to DNA replication, cell cycle and DNA repair, independent of immune phenotype
Leukemia (2010)