Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Original Manuscript
  • Published:

Gene Profiling and Microarrays

Purity for clarity: the need for purification of tumor cells in DNA microarray studies

Abstract

It is now well established that gene expression profiling using DNA microarrays can provide novel information about various types of hematological malignancies, which may lead to identification of novel diagnostic markers. However, to successfully use microarrays for this purpose, the quality and reproducibility of the procedure need to be guaranteed. The quality of microarray analyses may be severely reduced, if variable frequencies of nontarget cells are present in the starting material. To systematically investigate the influence of different types of impurity, we determined gene expression profiles of leukemic samples containing different percentages of nonleukemic leukocytes. Furthermore, we used computer simulations to study the effect of different kinds of impurity as an alternative to conducting hundreds of microarray experiments on samples with various levels of purity.

As expected, the percentage of erroneously identified genes rose with the increase of contaminating nontarget cells in the samples. The simulations demonstrated that a tumor load of less than 75% can lead to up to 25% erroneously identified genes. A tumor load of at least 90% leads to identification of at most 5% false-positive genes. We therefore propose that in order to draw well-founded conclusions, the percentage of target cells in microarray experiment samples should be at least 90%.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 2
Figure 1
Figure 3
Figure 4

Similar content being viewed by others

References

  1. Schena M, Shalon D, Heller R, Chai A, Brown PO, Davis RW . Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. Proc Natl Acad Sci USA 1996; 93: 10614–10619.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. DeRisi J, Penland L, Brown PO, Bittner ML, Meltzer PS, Ray M et al. Use of a cDNA microarray to analyse gene expression patterns in human cancer. Nat Genet 1996; 14: 457–460.

    Article  CAS  PubMed  Google Scholar 

  3. Schena M, Shalon D, Davis RW, Brown PO . Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 1995; 270: 467–470.

    Article  CAS  PubMed  Google Scholar 

  4. Wurmbach E, Gonzalez-Maeso J, Yuen T, Ebersole BJ, Mastaitis JW, Mobbs CV et al. Validated genomic approach to study differentially expressed genes in complex tissues. Neurochem Res 2002; 27: 1027–1033.

    Article  CAS  PubMed  Google Scholar 

  5. Smith JL, Freebern WJ, Collins I, De Siervi A, Montano I, Haggerty CM et al. Kinetic profiles of p300 occupancy in vivo predict common features of promoter structure and coactivator recruitment. Proc Natl Acad Sci USA 2004; 101: 11554–11559.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Southern E, Mir K, Shchepinov M . Molecular interactions on microarrays. Nat Genet 1999; 21 (1 Suppl): 5–9.

    Article  CAS  PubMed  Google Scholar 

  7. Brown PO, Botstein D . Exploring the new world of the genome with DNA microarrays. Nat Genet 1999; 21 (1 Suppl): 33–37.

    Article  CAS  PubMed  Google Scholar 

  8. Janoueix-Lerosey I, Novikov E, Monteiro M, Gruel N, Schleiermacher G, Loriod B et al. Gene expression profiling of 1p35–36 genes in neuroblastoma. Oncogene 2004; 23: 5912–5922.

    Article  CAS  PubMed  Google Scholar 

  9. Guipaud O, Deriano L, Salin H, Vallat L, Sabatier L, Merle-Beral H et al. B-cell chronic lymphocytic leukaemia: a polymorphic family unified by genomic features. Lancet Oncol 2003; 4: 505–514.

    Article  PubMed  Google Scholar 

  10. Hoefnagel JJ, Dijkman R, Basso K, Jansen PM, Hallermann C, Willemze R et al. Distinct types of primary cutaneous large B-cell lymphoma identified by gene expression profiling. Blood, prepublished online August 12, 2004; doi 10.1182/blood-2004-04-1594.

  11. Lossos IS, Czerwinski DK, Alizadeh AA, Wechser MA, Tibshirani R, Botstein D et al. Prediction of survival in diffuse large-B-cell lymphoma based on the expression of six genes. N Engl J Med 2004; 350: 1828–1837.

    Article  CAS  PubMed  Google Scholar 

  12. Bittner M, Meltzer P, Chen Y, Jiang Y, Seftor E, Hendrix M et al. Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 2000; 406: 536–540.

    Article  CAS  PubMed  Google Scholar 

  13. Finley DJ, Zhu B, Barden CB, Fahey III TJ . Discrimination of benign and malignant thyroid nodules by molecular profiling. Ann Surg 2004; 240: 425–436, discussion 427–436.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Ferrando AA, Neuberg DS, Staunton J, Loh ML, Huard C, Raimondi SC et al. Gene expression signatures define novel oncogenic pathways in T cell acute lymphoblastic leukemia. Cancer Cell 2002; 1: 75–87.

    Article  CAS  PubMed  Google Scholar 

  15. Ando T, Suguro M, Kobayashi T, Seto M, Honda H . Multiple fuzzy neural network system for outcome prediction and classification of 220 lymphoma patients on the basis of molecular profiling. Cancer Sci 2003; 94: 906–913.

    Article  CAS  PubMed  Google Scholar 

  16. Holleman A, Cheok MH, den Boer ML, Yang W, Veerman AJ, Kazemier KM et al. Gene-expression patterns in drug-resistant acute lymphoblastic leukemia cells and response to treatment. N Engl J Med 2004; 351: 533–542.

    Article  CAS  PubMed  Google Scholar 

  17. Valk PJ, Verhaak RG, Beijen MA, Erpelinck CA, Barjesteh van Waalwijk van Doorn-Khosrovani S, Boer JM et al. Prognostically useful gene-expression profiles in acute myeloid leukemia. N Engl J Med 2004; 350: 1617–1628.

    Article  CAS  PubMed  Google Scholar 

  18. Holloway AJ, van Laar RK, Tothill RW, Bowtell DD . Options available – from start to finish – for obtaining data from DNA microarrays II. Nat Genet 2002; 32 (Suppl): 481–489.

    Article  CAS  PubMed  Google Scholar 

  19. Li Y, Li T, Liu S, Qiu M, Han Z, Jiang Z et al. Systematic comparison of the fidelity of aRNA, mRNA and T-RNA on gene expression profiling using cDNA microarray. J Biotechnol 2004; 107: 19–28.

    Article  CAS  PubMed  Google Scholar 

  20. Ojaniemi H, Evengard B, Lee DR, Unger ER, Vernon SD . Impact of RNA extraction from limited samples on microarray results. Biotechniques 2003; 35: 968–973.

    Article  CAS  PubMed  Google Scholar 

  21. Mikulowska-Mennis A, Taylor TB, Vishnu P, Michie SA, Raja R, Horner N et al. High-quality RNA from cells isolated by laser capture microdissection. Biotechniques 2002; 33: 176–179.

    Article  CAS  PubMed  Google Scholar 

  22. Nakamura T, Furukawa Y, Nakagawa H, Tsunoda T, Ohigashi H, Murata K et al. Genome-wide cDNA microarray analysis of gene expression profiles in pancreatic cancers using populations of tumor cells, normal ductal epithelial cells selected for purity by laser microdissection. Oncogene 2004; 23: 2385–2400.

    Article  CAS  PubMed  Google Scholar 

  23. Zhu G, Reynolds L, Crnogorac-Jurcevic T, Gillett CE, Dublin EA, Marshall JF et al. Combination of microdissection and microarray analysis to identify gene expression changes between differentially located tumour cells in breast cancer. Oncogene 2003; 22: 3742–3748.

    Article  CAS  PubMed  Google Scholar 

  24. Staal FJ, van der Burg M, Wessels LF, Barendregt BH, Baert MR, van den Burg CM et al. DNA microarrays for comparison of gene expression profiles between diagnosis and relapse in precursor-B acute lymphoblastic leukemia: choice of technique and purification influence the identification of potential diagnostic markers. Leukemia 2003; 17: 1324–1332.

    Article  CAS  PubMed  Google Scholar 

  25. Bolstad BM, Irizarry RA, Astrand M, Speed TP . A comparison of normalization methods for high density oligonucleotide array data based on bias and variance. Bioinformatics 2003; 19: 185–193.

    Article  CAS  PubMed  Google Scholar 

  26. Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP . Exploration, normalization and summaries of high density nucleotide array probe level data. Biostatistics 2003; 4: 249–264.

    Article  PubMed  Google Scholar 

  27. Ge Y, Dudoit S, Speed TP . Resampling-based Multiple Testing for Microarray Data Analysis. Department of Statistics, University of California: Berkeley, 2003.

    Book  Google Scholar 

  28. Tushner FG, Tibshirani R, Chu G . Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 2001; 98: 5116–5121.

    Article  Google Scholar 

  29. Storey JD, Tibshirani R . SAM thresholding and false discovery rates for detecting differential gene expression in DNA microarrays. In: Parmigiani G, Garrett ES, Irizarry RA, Zeger SL (eds). The Analysis of Gene Expression Data: Methods and Software. New York: Springer, 2003.

    Google Scholar 

  30. Tibshirani R, Hastie T, Narasimhan B, Chu G . Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci USA 2002; 99: 6567–6572.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Storey JD . A direct approach to false discovery rates. J Roy Statist Soc 2002; Series B: 479–498.

    Article  Google Scholar 

  32. Potter JD . Epidemiology, cancer genetics and microarrays: making correct inferences, using appropriate designs. Trends Genet 2003; 19: 690–695.

    Article  CAS  PubMed  Google Scholar 

  33. Hrusak O, Porwit-MacDonald A . Antigen expression patterns reflecting genotype of acute leukemias. Leukemia 2002; 16: 1233–1258.

    Article  CAS  PubMed  Google Scholar 

  34. Pui CH, Behm FG, Crist WM . Clinical and biologic relevance of immunologic marker studies in childhood acute lymphoblastic leukemia. Blood 1993; 82: 343–362.

    CAS  PubMed  Google Scholar 

  35. Allsup DJ, Cawley JC . The diagnosis and treatment of hairy-cell leukaemia. Blood Rev 2002; 16: 255–262.

    Article  CAS  PubMed  Google Scholar 

  36. Yaziji H, Gown AM . Immunohistochemical analysis of gynecologic tumors. Int J Gynecol Pathol 2001; 20: 64–78.

    Article  CAS  PubMed  Google Scholar 

  37. Llewellyn H . Observer variation, dysplasia grading, and HPV typing: a review. Am J Clin Pathol 2000; 114 (Suppl): S21–S35.

    PubMed  Google Scholar 

  38. Schlemper RJ, Kato Y, Stolte M . Review of histological classifications of gastrointestinal epithelial neoplasia: differences in diagnosis of early carcinomas between Japanese and Western pathologists. J Gastroenterol 2001; 36: 445–456.

    Article  CAS  PubMed  Google Scholar 

  39. de Bree E, Koops W, Kroger R, van Ruth S, Witkamp AJ, Zoetmulder FA . Peritoneal carcinomatosis from colorectal or appendiceal origin: correlation of preoperative CT with intraoperative findings and evaluation of interobserver agreement. J Surg Oncol 2004; 86: 64–73.

    Article  PubMed  Google Scholar 

  40. Elgamal AA, Holmes EH, Su SL, Tino WT, Simmons SJ, Peterson M et al. Prostate-specific membrane antigen (PSMA): current benefits and future value. Semin Surg Oncol 2000; 18: 10–16.

    Article  CAS  PubMed  Google Scholar 

  41. Coindre JM . Immunohistochemistry in the diagnosis of soft tissue tumours. Histopathology 2003; 43: 1–16.

    Article  CAS  PubMed  Google Scholar 

  42. Baker M, Gillanders WE, Mikhitarian K, Mitas M, Cole DJ . The molecular detection of micrometastatic breast cancer. Am J Surg 2003; 186: 351–358.

    Article  CAS  PubMed  Google Scholar 

  43. Weber T, Klar E . Minimal residual disease in thyroid carcinoma. Semin Surg Oncol 2001; 20: 272–277.

    Article  CAS  PubMed  Google Scholar 

  44. Hood JD, Cheresh DA . Role of integrins in cell invasion and migration. Nat Rev Cancer 2002; 2: 91–100.

    Article  PubMed  Google Scholar 

  45. Orr FW, Wang HH, Lafrenie RM, Scherbarth S, Nance DM . Interactions between cancer cells and the endothelium in metastasis. J Pathol 2000; 190: 310–329.

    Article  CAS  PubMed  Google Scholar 

  46. Malinda KM, Kleinman HK . The laminins. Int J Biochem Cell Biol 1996; 28: 957–959.

    Article  CAS  PubMed  Google Scholar 

  47. Tureci O, Ding J, Hilton H, Bian H, Ohkawa H, Braxenthaler M et al. Computational dissection of tissue contamination for identification of colon cancer-specific expression profiles. FASEB J 2003; 17: 376–385.

    Article  CAS  PubMed  Google Scholar 

  48. Lu P, Nakorchevskiy A, Marcotte EM . Expression deconvolution: a reinterpretation of DNA microarray data reveals dynamic changes in cell populations. Proc Natl Acad Sci USA 2003; 100: 10370–10375.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Stuart RO, Wachsman W, Berry CC, Wang-Rodriguez J, Wasserman L, Klacansky I et al. In silico dissection of cell-type-associated patterns of gene expression in prostate cancer. Proc Natl Acad Sci USA 2004; 101: 615–620.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Mansmann U . Issues in planning and analysing microarray data studies, Proc Int Symp on Bioinformatics for Agricultural Biotechnology, Suwan, Korea, 2003.

    Google Scholar 

Download references

Acknowledgements

We thank Dr E van Wering for providing T-ALL samples.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to F J T Staal.

Additional information

Supplementary Information

Supplementary Information accompanies the paper on the Leukemia website (http://www.nature.com/leu).

Supplementary information

Appendix A

Appendix A

Microarray data were simulated50 as follows. For n={10, 20, 40, 100} microarrays, two groups of n/2 arrays each, A and B, were simulated. Each array contained 1000 genes, of which 50 were set to be truly expressed in B only. The base log2-expression of a gene g in array a was simulated as follows:

  • calculate an average expression over all arrays mg=log2(mg′) with (mg′)−1∼Γ(1,1)

  • true differential expression: dg∈{0,1} with P(dg=1)=0.05

  • sign of expression difference: sg∈{−1,1} with P(sg=1)=0.5

  • amplitude of expression: cg∼U(1.4, 1.5)

  • expression: ○ log2(ea,g)∼N(mg,1) ∀a∈A ○ log2(ea,g)∼N(mg+dgsgcg, 1) ∀a∈B

where the subscript g indicates values for gene g over all arrays, subscript a,g denotes values for gene g in array a, P(x) indicates the probability of x occurring, Γ(α,θ) is the Gamma distribution, U(l,r) is the uniform distribution on [l,r], and N(μ,σ) is the Gaussian distribution with mean μ and standard deviation σ.

Next to the 50 truly expressed genes, either 50 random impurity genes or 50 group-specific impurity genes were set to be differentially expressed in arrays, expressed at a fraction f of the level of a truly expressed gene. Random impurity genes g were added as follows:

  • presence of differential expression: da,g∈{0,1} with P(da,g=1)=0.05

  • sign of expression difference: sa,g∈{−1,1} with P(sa,g=1)=0.5

  • amplitude of expression difference:

  • ca,g∼U(1.4,1.5)

  • differential expression: log2(ea,g)=log2(ea,g)+f da,g sa,g ca,g ∀a∈B

Group-specific impurity genes g were added as follows:

  • presence of differential expression: dg∈{0,1}, with P(dg=1)=0.05

  • sign of expression difference: sg∈{0,1}, with P(sg=1)=0.5

  • amplitude of expression difference: ca,g∼U(1.4,1.5)

  • differential expression: log2(ea,g)=log2(ea,g)+f dg sg ca,g ∀a∈B

In the simulations, the impurity fraction f was varied between 0.0 and 1.0.

Rights and permissions

Reprints and permissions

About this article

Cite this article

de Ridder, D., van der Linden, C., Schonewille, T. et al. Purity for clarity: the need for purification of tumor cells in DNA microarray studies. Leukemia 19, 618–627 (2005). https://doi.org/10.1038/sj.leu.2403685

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/sj.leu.2403685

Keywords

This article is cited by

Search

Quick links