Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Imputing gene expression from selectively reduced probe sets

Abstract

Measuring complete gene expression profiles for a large number of experiments is costly. We propose an approach in which a small subset of probes is selected based on a preliminary set of full expression profiles. In subsequent experiments, only the subset is measured, and the missing values are imputed. We developed several algorithms to simultaneously select probes and impute missing values, and we demonstrate that these 'probe selection for imputation' (PSI) algorithms can successfully reconstruct missing gene expression values in a wide variety of applications, as evaluated using multiple metrics of biological importance. We analyze the performance of PSI methods under varying conditions, provide guidelines for choosing the optimal method based on the experimental setting, and indicate how to estimate imputation accuracy. Finally, we apply our approach to a large-scale study of immune system variation.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: An integrated approach to probe selection and imputation.
Figure 2: Relative and absolute imputation accuracy.
Figure 3: Additional evaluation metrics of biological relevance.
Figure 4: Cost-benefit analysis to determine the optimal number of selected probes and modular decomposition subsets.
Figure 5: The samples, probe ratio, linearity (SPRL) predictor and ImmVar results.

Similar content being viewed by others

Accession codes

Accessions

Gene Expression Omnibus

References

  1. Amit, I. et al. Unbiased reconstruction of a mammalian transcriptional network mediating pathogen responses. Science 326, 257–263 (2009).

    Article  CAS  Google Scholar 

  2. Golub, T.R. et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999).

    Article  CAS  Google Scholar 

  3. Schena, M., Shalon, D., Davis, R.W. & Brown, P.O. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467–470 (1995).

    Article  CAS  Google Scholar 

  4. Cheung, V.G. et al. Natural variation in human gene expression assessed in lymphoblastoid cells. Nat. Genet. 33, 422–425 (2003).

    Article  CAS  Google Scholar 

  5. Schadt, E.E. et al. Genetics of gene expression surveyed in maize, mouse and man. Nature 422, 297–302 (2003).

    Article  CAS  Google Scholar 

  6. Emilsson, V. et al. Genetics of gene expression and its effect on disease. Nature 452, 423–428 (2008).

    CAS  PubMed  Google Scholar 

  7. Su, A.I. et al. Large-scale analysis of the human and mouse transcriptomes. Proc. Natl. Acad. Sci. USA 99, 4465–4470 (2002).

    Article  CAS  Google Scholar 

  8. Lamb, J. et al. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science 313, 1929–1935 (2006).

    Article  CAS  Google Scholar 

  9. Lein, E.S. et al. Genome-wide atlas of gene expression in the adult mouse brain. Nature 445, 168–176 (2007).

    Article  CAS  Google Scholar 

  10. Dimas, A.S. et al. Common regulatory variation impacts gene expression in a cell type-dependent manner. Science 325, 1246–1250 (2009).

    Article  CAS  Google Scholar 

  11. Alizadeh, A.A. et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000).

    Article  CAS  Google Scholar 

  12. Gasch, A.P. et al. Genomic expression programs in the response of yeast cells to environmental changes. Mol. Biol. Cell 11, 4241–4257 (2000).

    Article  CAS  Google Scholar 

  13. Wagner, A. Estimating coarse gene network structure from large-scale gene perturbation data. Genome Res. 12, 309–315 (2002).

    Article  CAS  Google Scholar 

  14. Hughes, T.R. et al. Functional discovery via a compendium of expression profiles. Cell 102, 109–126 (2000).

    Article  CAS  Google Scholar 

  15. Whitfield, M.L. et al. Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol. Biol. Cell 13, 1977–2000 (2002).

    Article  CAS  Google Scholar 

  16. Chu, S. et al. The transcriptional program of sporulation in budding yeast. Science 282, 699–705 (1998).

    Article  CAS  Google Scholar 

  17. Cho, R.J. et al. A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell 2, 65–73 (1998).

    Article  CAS  Google Scholar 

  18. Spellman, P.T. et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9, 3273–3297 (1998).

    Article  CAS  Google Scholar 

  19. DeRisi, J.L., Iyer, V.R. & Brown, P.O. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278, 680–686 (1997).

    Article  CAS  Google Scholar 

  20. Pomeroy, S.L. et al. Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415, 436–442 (2002).

    Article  CAS  Google Scholar 

  21. van 't Veer, L.J. et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002).

    Article  CAS  Google Scholar 

  22. Bibikova, M. et al. Quantitative gene expression profiling in formalin-fixed, paraffin-embedded tissues using universal bead arrays. Am. J. Pathol. 165, 1799–1807 (2004).

    Article  CAS  Google Scholar 

  23. Paik, S. et al. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N. Engl. J. Med. 351, 2817–2826 (2004).

    Article  CAS  Google Scholar 

  24. Bustin, S.A. Absolute quantification of mRNA using real-time reverse transcription polymerase chain reaction assays. J. Mol. Endocrinol. 25, 169–193 (2000).

    Article  CAS  Google Scholar 

  25. Geiss, G.K. et al. Direct multiplexed measurement of gene expression with color-coded probe pairs. Nat. Biotechnol. 26, 317–325 (2008).

    Article  CAS  Google Scholar 

  26. Spurgeon, S.L., Jones, R.C. & Ramakrishnan, R. High throughput gene expression measurement with real time PCR in a microfluidic dynamic array. PLoS ONE 3, e1662 (2008).

    Article  Google Scholar 

  27. Gnirke, A. et al. Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat. Biotechnol. 27, 182–189 (2009).

    Article  CAS  Google Scholar 

  28. Xing, E.P., Jordan, M.I. & Karp, R.M. Feature selection for high-dimensional genomic microarray data. in Proc. Int. Conf. Mach. Learn. (eds. Brodley, C.E. & Pohoreckyj Danyluk, A.) 601–608 (ICML 2001).

  29. Hedenfalk, I. et al. Gene-expression profiles in hereditary breast cancer. N. Engl. J. Med. 344, 539–548 (2001).

    Article  CAS  Google Scholar 

  30. Troyanskaya, O. et al. Missing value estimation methods for DNA microarrays. Bioinformatics 17, 520–525 (2001).

    Article  CAS  Google Scholar 

  31. Heng, T.S.P. et al. The Immunological Genome Project: networks of gene expression in immune cells. Nat. Immunol. 9, 1091–1094 (2008).

    Article  CAS  Google Scholar 

  32. Oba, S. et al. A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19, 2088–2096 (2003).

    Article  CAS  Google Scholar 

  33. Kim, H., Golub, G.H. & Park, H. Missing value estimation for DNA microarray gene expression data: local least squares imputation. Bioinformatics 21, 187–198 (2005).

    Article  CAS  Google Scholar 

  34. Bø, T.H., Dysvik, B. & Jonassen, I. LSimpute: accurate estimation of missing values in microarray data with least squares methods. Nucleic Acids Res. 32, e34 (2004).

    Article  Google Scholar 

  35. Scherf, U. et al. A gene expression database for the molecular pharmacology of cancer. Nat. Genet. 24, 236–244 (2000).

    Article  CAS  Google Scholar 

  36. Liu, X. et al. Analysis of cell fate from single-cell gene expression profiles in C. elegans. Cell 139, 623–633 (2009).

    Article  CAS  Google Scholar 

  37. Zahn, J.M. et al. AGEMAP: a gene expression database for aging in mice. PLoS Genet. 3, e201 (2007).

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by grant RC2 GM093080 (funded through the American Recovery and Reinvestment Act) from the US National Institutes of Health–National Institute of General Medical Sciences to C.B. and D.K. We thank I. Amit and J. Ye for useful comments on this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

Y.D. and D.K. designed the methods; Y.D. implemented the methods, wrote the code, performed the experiments and analyzed the data; T.F. and C.B. provided data and gave feedback on the results; Y.D. and D.K. wrote the manuscript; C.B. reviewed and commented on the manuscript.

Corresponding author

Correspondence to Daphne Koller.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–8, Supplementary Table 1, Supplementary Results and Supplementary Note (PDF 1946 kb)

Supplementary Data

The data sets used in the experiments described in the paper, in the file format used by the PSI software. (ZIP 11389 kb)

Supplementary Software

PSI software. (ZIP 16 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Donner, Y., Feng, T., Benoist, C. et al. Imputing gene expression from selectively reduced probe sets. Nat Methods 9, 1120–1125 (2012). https://doi.org/10.1038/nmeth.2207

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.2207

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing