Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Technical Report
  • Published:

Bayesian method to predict individual SNP genotypes from gene expression data

Abstract

RNA profiling can be used to capture the expression patterns of many genes that are associated with expression quantitative trait loci (eQTLs). Employing published putative cis eQTLs, we developed a Bayesian approach to predict SNP genotypes that is based only on RNA expression data. We show that predicted genotypes can accurately and uniquely identify individuals in large populations. When inferring genotypes from an expression data set using eQTLs of the same tissue type (but from an independent cohort), we were able to resolve 99% of the identities of individuals in the cohort at Padjusted ≤ 1 × 10−5. When eQTLs derived from one tissue were used to predict genotypes using expression data from a different tissue, the identities of 90% of the study subjects could be resolved at Padjusted ≤ 1 × 10−5. We discuss the implications of deriving genotypic information from RNA data deposited in the public domain.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Accuracy of RNA-derived genotypes.

Similar content being viewed by others

References

  1. Beer, D.G. et al. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat. Med. 8, 816–824 (2002).

    Article  CAS  PubMed  Google Scholar 

  2. Hoshida, Y. et al. Gene expression in fixed tissues and outcome in hepatocellular carcinoma. N. Engl. J. Med. 359, 1995–2004 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Barrett, T. et al. NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res. 37, D885–D890 (2009).

    Article  CAS  PubMed  Google Scholar 

  4. Parkinson, H. et al. ArrayExpress—a public database of microarray experiments and gene expression profiles. Nucleic Acids Res. 35, D747–D750 (2007).

    Article  CAS  PubMed  Google Scholar 

  5. Chen, Y. et al. Variations in DNA elucidate molecular networks that cause disease. Nature 452, 429–435 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Dimas, A.S. et al. Common regulatory variation impacts gene expression in a cell type–dependent manner. Science 325, 1246–1250 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Greenawalt, D.M. et al. A survey of the genetics of stomach, liver, and adipose gene expression from a morbidly obese cohort. Genome Res. 21, 1008–1016 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Schadt, E.E. et al. Mapping the genetic architecture of gene expression in human liver. PLoS Biol. 6, e107 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Schadt, E.E. et al. Genetics of gene expression surveyed in maize, mouse and man. Nature 422, 297–302 (2003).

    Article  CAS  PubMed  Google Scholar 

  10. Smith, E.N. & Kruglyak, L. Gene-environment interaction in yeast gene expression. PLoS Biol. 6, e83 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Yang, X. et al. Validation of candidate causal genes for obesity that affect shared metabolic pathways and networks. Nat. Genet. 41, 415–423 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Hertzberg, L. et al. Prediction of chromosomal aneuploidy from gene expression data. Genes Chromosom. Cancer 46, 75–86 (2007).

    Article  CAS  PubMed  Google Scholar 

  13. The International HapMap Consortium. The International HapMap Project. Nature 426, 789–796 (2003).

  14. Hao, K., Chudin, E., Greenawalt, D. & Schadt, E.E. Magnitude of stratification in human populations and impacts on genome wide association studies. PLoS ONE 5, e8695 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Tan, P.K. et al. Evaluation of gene expression measurements from commercial microarray platforms. Nucleic Acids Res. 31, 5676–5684 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Baumbusch, L.O. et al. Comparison of the Agilent, ROMA/NimbleGen and Illumina platforms for classification of copy number alterations in human breast tumors. BMC Genomics 9, 379 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Barnes, M., Freudenberg, J., Thompson, S., Aronow, B. & Pavlidis, P. Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms. Nucleic Acids Res. 33, 5914–5923 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Lamb, J.R. et al. Predictive genes in adjacent normal tissue are preferentially altered by sCNV during tumorigenesis in liver cancer and may rate limiting. PLoS ONE 6, e20090 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Wang, S.M., Ooi, L.L. & Hui, K.M. Identification and validation of a novel gene signature associated with the recurrence of human hepatocellular carcinoma. Clin. Cancer Res. 13, 6275–6283 (2007).

    Article  CAS  PubMed  Google Scholar 

  20. Emilsson, V. et al. Genetics of gene expression and its effect on disease. Nature 452, 423–428 (2008).

    Article  CAS  PubMed  Google Scholar 

  21. Heid, I.M. et al. Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. Nat. Genet. 42, 949–960 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Musunuru, K. et al. From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Nature 466, 714–719 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Speliotes, E.K. et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat. Genet. 42, 937–948 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Teslovich, T.M. et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707–713 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Yang, X. et al. Systematic genetic and genomic analysis of cytochrome P450 enzyme activities in human liver. Genome Res. 20, 1020–1036 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Zhong, H. et al. Liver and adipose expression associated SNPs are enriched for association to type 2 diabetes. PLoS Genet. 6, e1000932 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Zhong, H., Yang, X., Kaplan, L.M., Molony, C. & Schadt, E.E. Integrating pathway analysis and genetics of gene expression for genome-wide association studies. Am. J. Hum. Genet. 86, 581–591 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Lanktree, M.B. et al. Meta-analysis of dense genecentric association studies reveals common and uncommon variants associated with height. Am. J. Hum. Genet. 88, 6–18 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Couzin-Frankel, J. Ethics. DNA returned to tribe, raising questions about consent. Science 328, 558 (2010).

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

E.E.S., S.W. and K.H. designed the methods and experiments, and jointly analyzed the data sets. E.E.S. and K.H. wrote the manuscript.

Corresponding authors

Correspondence to Eric E Schadt or Ke Hao.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Tables 1 and 2, Supplementary Figures 1–3 and Supplementary Note (PDF 274 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schadt, E., Woo, S. & Hao, K. Bayesian method to predict individual SNP genotypes from gene expression data. Nat Genet 44, 603–608 (2012). https://doi.org/10.1038/ng.2248

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.2248

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing