Abstract
We present PEER (probabilistic estimation of expression residuals), a software package implementing statistical models that improve the sensitivity and interpretability of genetic associations in population-scale expression data. This approach builds on factor analysis methods that infer broad variance components in the measurements. PEER takes as input transcript profiles and covariates from a set of individuals, and then outputs hidden factors that explain much of the expression variability. Optionally, these factors can be interpreted as pathway or transcription factor activations by providing prior information about which genes are involved in the pathway or targeted by the factor. The inferred factors are used in genetic association analyses. First, they are treated as additional covariates, and are included in the model to increase detection power for mapping expression traits. Second, they are analyzed as phenotypes themselves to understand the causes of global expression variability. PEER extends previous related surrogate variable models and can be implemented within hours on a desktop computer.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Stegle, O., Parts, L., Durbin, R. & Winn, J. A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLoS Comput. Biol. 6, e1000770 (2010).
Parts, L., Stegle, O., Winn, J. & Durbin, R. Joint genetic analysis of gene expression data with inferred cellular phenotypes. PLoS Genet. 7, e1001276 (2011).
Brem, R.B., Yvert, G., Clinton, R. & Kruglyak, L. Genetic dissection of transcriptional regulation in budding yeast. Science 296, 752–755 (2002).
Brem, R.B., Storey, J.D., Whittle, J. & Kruglyak, L. Genetic interactions between polymorphisms that affect gene expression in yeast. Nature 436, 701–703 (2005).
Smith, E.N. & Kruglyak, L. Gene-environment interaction in yeast gene expression. PLoS Biol. 6, e83 (2008).
Rockman, M.V. & Kruglyak, L. Genetics of global gene expression. Nat. Rev. Genet. 7, 862–872 (2006).
Valdar, W. et al. Genome-wide genetic association of complex traits in heterogeneous stock mice. Nat. Genet. 38, 879–887 (2006).
Doss, S., Schadt, E.E., Drake, T.A. & Lusis, A.J. Cis-acting expression quantitative trait loci in mice. Genome Res. 15, 681–691 (2005).
Stranger, B.E. et al. Population genomics of human gene expression. Nat. Genet. 39, 1217–1224 (2007).
Cheung, V.G. & Spielman, R.S. Genetics of human gene expression: mapping DNA variants that influence gene expression. Nat. Rev. Genet. 10, 595–604 (2009).
Montgomery, S.B. et al. Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464, 773–777 (2010).
Pickrell, J.K. et al. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464, 768–772 (2010).
Breitling, R. et al. Genetical genomics: spotlight on QTL hotspots. PLoS Genet. 4, e1000232 (2008).
Franke, L. & Jansen, R.C. eQTL analysis in humans. Methods Mol. Biol. 573, 311–328 (2009).
Lee, S.I. et al. Identifying regulatory mechanisms using individual variation reveals key role for chromatin modification. Proc. Natl. Acad. Sci. USA 103, 14062–14067 (2006).
Zhang, W., Zhu, J., Schadt, E.E. & Liu, J.S. A Bayesian partition method for detecting pleiotropic and epistatic eQTL modules. PLoS Comput. Biol. 6, e1000642 (2010).
Balding, D.J. Handbook of Statistical Genetics. (Wiley-Interscience, 2007).
Plagnol, V. et al. Extreme clonality in lymphoblastoid cell lines with implications for allele specific expression analyses. PLoS One 3, e2966 (2008).
Leek, J.T. & Storey, J.D. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3, e161 (2007).
Kang, H.M., Ye, C. & Eskin, E. Accurate discovery of expression quantitative trait loci under confounding from spurious and genuine regulatory hotspots. Genetics 180, 1909–1925 (2008).
Schadt, E.E. et al. An integrative genomics approach to infer causal associations between gene expression and disease. Nat. Genet. 37, 710–717 (2005).
Small, K.S. et al. Identification of an imprinted master trans regulator at the KLF14 locus related to multiple metabolic phenotypes. Nat. Genet. 43, 561–564 (2011).
1000 Genomes Project Consortium, 1000 Genomes Project. et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).
Nica, A.C. et al. The Architecture of Gene Regulatory Variation across Multiple Human Tissues: The MuTHER Study. PLoS Genet. 7, e1002003 (2011).
Huber, W. et al. Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 18 (suppl. 1), S96–S104 (2002).
Pearson, R.D. et al. PUMA: a Bioconductor package for propagating uncertainty in microarray analysis. BMC Bionf. 10, 211 (2009).
Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010).
Rattray, M., Stegle, O., Sharp, K. & Winn, J. Inference algorithms and learning theory for Bayesian sparse factor analysis. J. Phys. Conf. Ser. 197, 012002 (2009).
Broman, K.W., Wu, H., Sen, S. & Churchill, G.A. R/qtl: QTL mapping in experimental crosses. Bioinformatics 19, 889–890 (2003).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Storey, J.D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100, 9440–9445 (2003).
Listgarten, J., Kadie, C., Schadt, E.E. & Heckerman, D. Correction for hidden confounders in the genetic analysis of gene expression. Proc. Natl. Acad. Sci. USA 107, 16465–16470 (2010).
Biswas, S., Storey, J.D. & Akey, J.M. Mapping gene expression quantitative trait loci by singular value decomposition and independent component analysis. BMC Bioinf. 9, 244 (2008).
Zhu, J. et al. Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks. Nat. Genet. 40, 854–861 (2008).
Aten, J.E., Fuller, T.F., Lusis, A.J. & Horvath, S. Using genetic markers to orient the edges in quantitative trait networks: the NEO software. BMC Sys. Biol. 2, 34 (2008).
MacKay, D.J.C. Probable networks and plausible predictions-a review of practical Bayesian methods for supervised neural networks. Network 6, 469–505 (1995).
Acknowledgements
We thank R. Brem and L. Kruglyak for providing genotype and expression phenotype data to be included alongside this protocol. This work received financial support from the Wellcome Trust (grant no. WT077192/Z/05/Z) and the Technical Computing Initiative (Microsoft Research). O.S. received funding from the Volkswagen Foundation.
Author information
Authors and Affiliations
Contributions
O.S., L.P., J.W. and R.D. designed the probabilistic models underlying the protocol. O.S., L.P. and M.P. developed the PEER software suite. O.S. and L.P. wrote the paper with input from all authors.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Data 1
Example dataset used to illustrate the protocol steps. (ZIP 2526 kb)
Rights and permissions
About this article
Cite this article
Stegle, O., Parts, L., Piipari, M. et al. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat Protoc 7, 500–507 (2012). https://doi.org/10.1038/nprot.2011.457
Published:
Issue Date:
DOI: https://doi.org/10.1038/nprot.2011.457
This article is cited by
-
Molecular quantitative trait loci in reproductive tissues impact male fertility in cattle
Nature Communications (2024)
-
Haplotype-aware modeling of cis-regulatory effects highlights the gaps remaining in eQTL data
Nature Communications (2024)
-
Single-cell multiomics of the human retina reveals hierarchical transcription factor collaboration in mediating cell type-specific effects of genetic variants on gene regulation
Genome Biology (2023)
-
Evaluation of noninvasive biospecimens for transcriptome studies
BMC Genomics (2023)
-
Genome- and Transcriptome-wide Association Studies to Discover Candidate Genes for Diverse Root Phenotypes in Cultivated Rice
Rice (2023)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.