Protocol | Published:

Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses

Nature Protocols volume 7, pages 500507 (2012) | Download Citation

Abstract

We present PEER (probabilistic estimation of expression residuals), a software package implementing statistical models that improve the sensitivity and interpretability of genetic associations in population-scale expression data. This approach builds on factor analysis methods that infer broad variance components in the measurements. PEER takes as input transcript profiles and covariates from a set of individuals, and then outputs hidden factors that explain much of the expression variability. Optionally, these factors can be interpreted as pathway or transcription factor activations by providing prior information about which genes are involved in the pathway or targeted by the factor. The inferred factors are used in genetic association analyses. First, they are treated as additional covariates, and are included in the model to increase detection power for mapping expression traits. Second, they are analyzed as phenotypes themselves to understand the causes of global expression variability. PEER extends previous related surrogate variable models and can be implemented within hours on a desktop computer.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    , , & A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLoS Comput. Biol. 6, e1000770 (2010).

  2. 2.

    , , & Joint genetic analysis of gene expression data with inferred cellular phenotypes. PLoS Genet. 7, e1001276 (2011).

  3. 3.

    , , & Genetic dissection of transcriptional regulation in budding yeast. Science 296, 752–755 (2002).

  4. 4.

    , , & Genetic interactions between polymorphisms that affect gene expression in yeast. Nature 436, 701–703 (2005).

  5. 5.

    & Gene-environment interaction in yeast gene expression. PLoS Biol. 6, e83 (2008).

  6. 6.

    & Genetics of global gene expression. Nat. Rev. Genet. 7, 862–872 (2006).

  7. 7.

    et al. Genome-wide genetic association of complex traits in heterogeneous stock mice. Nat. Genet. 38, 879–887 (2006).

  8. 8.

    , , & Cis-acting expression quantitative trait loci in mice. Genome Res. 15, 681–691 (2005).

  9. 9.

    et al. Population genomics of human gene expression. Nat. Genet. 39, 1217–1224 (2007).

  10. 10.

    & Genetics of human gene expression: mapping DNA variants that influence gene expression. Nat. Rev. Genet. 10, 595–604 (2009).

  11. 11.

    et al. Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464, 773–777 (2010).

  12. 12.

    et al. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464, 768–772 (2010).

  13. 13.

    et al. Genetical genomics: spotlight on QTL hotspots. PLoS Genet. 4, e1000232 (2008).

  14. 14.

    & eQTL analysis in humans. Methods Mol. Biol. 573, 311–328 (2009).

  15. 15.

    et al. Identifying regulatory mechanisms using individual variation reveals key role for chromatin modification. Proc. Natl. Acad. Sci. USA 103, 14062–14067 (2006).

  16. 16.

    , , & A Bayesian partition method for detecting pleiotropic and epistatic eQTL modules. PLoS Comput. Biol. 6, e1000642 (2010).

  17. 17.

    Handbook of Statistical Genetics. (Wiley-Interscience, 2007).

  18. 18.

    et al. Extreme clonality in lymphoblastoid cell lines with implications for allele specific expression analyses. PLoS One 3, e2966 (2008).

  19. 19.

    & Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3, e161 (2007).

  20. 20.

    , & Accurate discovery of expression quantitative trait loci under confounding from spurious and genuine regulatory hotspots. Genetics 180, 1909–1925 (2008).

  21. 21.

    et al. An integrative genomics approach to infer causal associations between gene expression and disease. Nat. Genet. 37, 710–717 (2005).

  22. 22.

    et al. Identification of an imprinted master trans regulator at the KLF14 locus related to multiple metabolic phenotypes. Nat. Genet. 43, 561–564 (2011).

  23. 23.

    1000 Genomes Project Consortium, 1000 Genomes Project. et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

  24. 24.

    et al. The Architecture of Gene Regulatory Variation across Multiple Human Tissues: The MuTHER Study. PLoS Genet. 7, e1002003 (2011).

  25. 25.

    et al. Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 18 (suppl. 1), S96–S104 (2002).

  26. 26.

    et al. PUMA: a Bioconductor package for propagating uncertainty in microarray analysis. BMC Bionf. 10, 211 (2009).

  27. 27.

    & Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010).

  28. 28.

    , , & Inference algorithms and learning theory for Bayesian sparse factor analysis. J. Phys. Conf. Ser. 197, 012002 (2009).

  29. 29.

    , , & R/qtl: QTL mapping in experimental crosses. Bioinformatics 19, 889–890 (2003).

  30. 30.

    et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

  31. 31.

    & Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100, 9440–9445 (2003).

  32. 32.

    , , & Correction for hidden confounders in the genetic analysis of gene expression. Proc. Natl. Acad. Sci. USA 107, 16465–16470 (2010).

  33. 33.

    , & Mapping gene expression quantitative trait loci by singular value decomposition and independent component analysis. BMC Bioinf. 9, 244 (2008).

  34. 34.

    et al. Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks. Nat. Genet. 40, 854–861 (2008).

  35. 35.

    , , & Using genetic markers to orient the edges in quantitative trait networks: the NEO software. BMC Sys. Biol. 2, 34 (2008).

  36. 36.

    Probable networks and plausible predictions-a review of practical Bayesian methods for supervised neural networks. Network 6, 469–505 (1995).

Download references

Acknowledgements

We thank R. Brem and L. Kruglyak for providing genotype and expression phenotype data to be included alongside this protocol. This work received financial support from the Wellcome Trust (grant no. WT077192/Z/05/Z) and the Technical Computing Initiative (Microsoft Research). O.S. received funding from the Volkswagen Foundation.

Author information

Author notes

    • Oliver Stegle
    •  & Leopold Parts

    These authors contributed equally to this work.

Affiliations

  1. Max Planck Institute for Intelligent Systems, Tübingen, Germany.

    • Oliver Stegle
  2. Max Planck Institute for Developmental Biology, Tübingen, Germany.

    • Oliver Stegle
  3. Wellcome Trust Sanger Institute, Cambridge, UK.

    • Leopold Parts
    •  & Richard Durbin
  4. Pear Computer LLP, London, UK.

    • Matias Piipari
  5. Microsoft Research, Cambridge, UK.

    • John Winn

Authors

  1. Search for Oliver Stegle in:

  2. Search for Leopold Parts in:

  3. Search for Matias Piipari in:

  4. Search for John Winn in:

  5. Search for Richard Durbin in:

Contributions

O.S., L.P., J.W. and R.D. designed the probabilistic models underlying the protocol. O.S., L.P. and M.P. developed the PEER software suite. O.S. and L.P. wrote the paper with input from all authors.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Oliver Stegle.

Supplementary information

Zip files

  1. 1.

    Supplementary Data 1

    Example dataset used to illustrate the protocol steps.

About this article

Publication history

Published

DOI

https://doi.org/10.1038/nprot.2011.457

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.