Article | Published:

Defining the consequences of genetic variation on a proteome-wide scale

Nature volume 534, pages 500505 (23 June 2016) | Download Citation

Abstract

Genetic variation modulates protein expression through both transcriptional and post-transcriptional mechanisms. To characterize the consequences of natural genetic diversity on the proteome, here we combine a multiplexed, mass spectrometry-based method for protein quantification with an emerging outbred mouse model containing extensive genetic variation from eight inbred founder strains. By measuring genome-wide transcript and protein expression in livers from 192 Diversity outbred mice, we identify 2,866 protein quantitative trait loci (pQTL) with twice as many local as distant genetic variants. These data support distinct transcriptional and post-transcriptional models underlying the observed pQTL effects. Using a sensitive approach to mediation analysis, we often identified a second protein or transcript as the causal mediator of distant pQTL. Our analysis reveals an extensive network of direct protein–protein interactions. Finally, we show that local genotype can provide accurate predictions of protein abundance in an independent cohort of collaborative cross mice.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Accessions

Primary accessions

Gene Expression Omnibus

Data deposits

The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD002801 (http://www.proteomexchange.org/). Raw RNA-seq fastq files and processed gene-level data are archived at Gene Expression Omnibus (GEO) under accession number GSE72759. We implemented our mediation method as the R package, intermediate, which can be freely downloaded from http://github.com/churchill-lab/intermediate. The Genotyping by RNA-seq (GBRS) software is available for download from https://github.com/churchill-lab/gbrs.

References

  1. 1.

    Central dogma of molecular biology. Nature 227, 561–563 (1970)

  2. 2.

    , , & Correlation between protein and mRNA abundance in yeast. Mol. Cell. Biol. 19, 1720–1730 (1999)

  3. 3.

    et al. Global quantification of mammalian gene expression control. Nature 473, 337–342 (2011)

  4. 4.

    et al. Comparative analysis of proteome and transcriptome variation in mouse. PLoS Genet. 7, e1001393 (2011)

  5. 5.

    et al. Integrative phenomics reveals insight into the structure of phenotypic diversity in budding yeast. Genome Res. 23, 1496–1504 (2013)

  6. 6.

    et al. Deep proteomics of the Xenopus laevis egg using an mRNA-derived reference database. Curr. Biol. 24, 1467–1475 (2014)

  7. 7.

    et al. System-wide molecular evidence for phenotypic buffering in Arabidopsis. Nat. Genet. 41, 166–167 (2009)

  8. 8.

    & Genetics of global gene expression. Nat. Rev. Genet. 7, 862–872 (2006)

  9. 9.

    , , & Genetic dissection of transcriptional regulation in budding yeast. Science 296, 752–755 (2002)

  10. 10.

    et al. Genetic analysis of genome-wide variation in human gene expression. Nature 430, 743–747 (2004)

  11. 11.

    et al. Genetics of gene expression surveyed in maize, mouse and man. Nature 422, 297–302 (2003)

  12. 12.

    & Genetical genomics: the added value from segregation. Trends Genet. 17, 388–391 (2001)

  13. 13.

    et al. Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function. Nat. Genet. 37, 233–242 (2005)

  14. 14.

    et al. Genetic variation shapes protein networks mainly through non-transcriptional mechanisms. PLoS Biol. 9, e1001144 (2011)

  15. 15.

    et al. Genetic basis of proteome variation in yeast. Nat. Genet. 39, 1369–1375 (2007)

  16. 16.

    , , , & Protein quantification across hundreds of experimental conditions. Proc. Natl Acad. Sci. USA 106, 15544–15548 (2009)

  17. 17.

    et al. Multilayered genetic and omics dissection of mitochondrial activity in a mouse reference population. Cell 158, 1415–1430 (2014)

  18. 18.

    et al. Variation and genetic control of protein abundance in humans. Nature 499, 79–82 (2013)

  19. 19.

    , , & Quantitative trait loci underlying gene product variation: a novel perspective for analyzing regulation of genome expression. Genetics 137, 289–301 (1994)

  20. 20.

    , , , & Genetics of single-cell protein abundance variation in large yeast populations. Nature 506, 494–497 (2014)

  21. 21.

    , , & MS3 eliminates ratio distortion in isobaric multiplexed quantitative proteomics. Nat. Methods 8, 937–940 (2011)

  22. 22.

    et al. MultiNotch MS3 enables accurate, sensitive, and multiplexed detection of differential expression across cancer cell line proteomes. Anal. Chem. 86, 7150–7158 (2014)

  23. 23.

    et al. The Collaborative Cross, a community resource for the genetic analysis of complex traits. Nat. Genet. 36, 1133–1137 (2004)

  24. 24.

    , , & The Diversity Outbred mouse population. Mamm. Genome 23, 713–718 (2012)

  25. 25.

    & Ten years of the collaborative cross. Genetics 190, 291–294 (2012)

  26. 26.

    et al. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature 477, 289–294 (2011)

  27. 27.

    et al. Quantitative trait locus mapping methods for diversity outbred mice. G3 (Bethesda) 4, 1623–1633 (2014)

  28. 28.

    et al. A genetic and physiological study of impaired glucose homeostasis control in C57BL/6J mice. Diabetologia 48, 675–686 (2005)

  29. 29.

    et al. A spontaneous mutation in the nicotinamide nucleotide transhydrogenase gene of C57BL/6J mice results in mitochondrial redox abnormalities. Free Radic. Biol. Med. 63, 446–456 (2013)

  30. 30.

    , , , & Deletion of nicotinamide nucleotide transhydrogenase: a new quantitive trait locus accounting for glucose intolerance in C57BL/6J mice. Diabetes 55, 2153–2156 (2006)

  31. 31.

    et al. The BioPlex Network: a systematic exploration of the human interactome. Cell 162, 425–440 (2015)

  32. 32.

    et al. Molecular basis for SNX-BAR-mediated assembly of distinct endosomal sorting tubules. EMBO J. 31, 4466–4480 (2012)

  33. 33.

    & Identification of the binding partners for flightless I, A novel protein bridging the leucine-rich repeat and the gelsolin superfamilies. J. Biol. Chem. 273, 7920–7927 (1998)

  34. 34.

    et al. A tissue-specific atlas of mouse protein phosphorylation and expression. Cell 143, 1174–1189 (2010)

  35. 35.

    et al. Genomic varation. Impact of regulatory variation from RNA to protein. Science 347, 664–667 (2013)

  36. 36.

    et al. Protein abundances are more conserved than mRNA abundances across diverse taxa. Proteomics 10, 4209–4212 (2010)

  37. 37.

    & Insights into the regulation of protein abundance from proteomic and transcriptomic analyses. Nat. Rev. Genet. 13, 227–232 (2012)

  38. 38.

    et al. Status and access to the Collaborative Cross population. Mamm. Genome 23, 706–712 (2012)

  39. 39.

    et al. The Collaborative Cross at Oak Ridge National Laboratory: developing a powerful resource for systems genetics. Mamm. Genome 19, 382–389 (2008)

  40. 40.

    , & The Collaborative Cross, developing a resource for mammalian systems genetics: a status report of the Wellcome Trust cohort. Mamm. Genome 19, 379–381 (2008)

  41. 41.

    , & An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989 (1994)

  42. 42.

    & Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207–214 (2007)

  43. 43.

    , , & Open source clustering software. Bioinformatics 20, 1453–1454 (2004)

  44. 44.

    & Accelerating the inbreeding of multi-parental recombinant inbred lines generated by sibling matings. G3 (Bethesda) 2, 191–198 (2012)

  45. 45.

    et al. Haplotype probabilities in advanced intercross populations. G3 (Bethesda) 2, 199–202 (2012)

  46. 46.

    et al. RNA-Seq alignment to individualized genomes improves transcript abundance estimates in multiparent populations. Genetics 198, 59–73 (2014)

  47. 47.

    , , & QTLRel: an R package for genome-wide association studies in which relatedness is a concern. BMC Genet. 12, 66 (2011)

  48. 48.

    & Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies. Am. J. Hum. Genet. 75, 424–435 (2004)

  49. 49.

    , & Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J. R. Stat. Soc. Series B 66, 187–205 (2004)

  50. 50.

    & The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. J. Pers. Soc. Psychol. 51, 1173–1182 (1986)

  51. 51.

    & Required sample size to detect the mediated effect. Psychol. Sci. 18, 233–239 (2007)

  52. 52.

    et al. Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors. Nat. Genet. 35, 57–64 (2003)

  53. 53.

    , , , & Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013)

  54. 54.

    , & An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002)

  55. 55.

    et al. Pfam: the protein families database. Nucleic Acids Res. 42, D222–D230 (2014)

  56. 56.

    & UniProt Consortium. UniProt Knowledgebase: a hub of integrated protein data. Database (Oxford) 2011, bar009 (2011)

  57. 57.

    et al. Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000)

  58. 58.

    & Controlling the false discovery rate: a practical and powerful approach to multiple testing on JSTOR. J. R. Stat. Soc. B 57, 289–300 (1995)

Download references

Acknowledgements

The authors thank L. Somes and M. Strobel for breeding the mice; S. Ciciotte, S. Daigle, J. Pereira, C. Snow, R. Lynch and H. Munger for extracting RNA and performing RNA-seq experiments; and A. Manichaikul for discussion on mediation analysis. Collaborative Cross strains used in this study were imported to JAX from the Systems Genetics Core Facility at the University of North Carolina (USA)38. Previous to their relocation to UNC, CC lines CC001, CC003 and CC017 were generated and bred at Oak Ridge National Laboratory (USA)39; CC line CC004 was generated and bred at Tel Aviv University (Israel)40. Research reported here was supported by Harvard Medical School, The Jackson Laboratory, and National Institutes of Health (NIH) grants under awards P50GM076468 (to G.A.C.), F32HD074299 (to S.C.M.), GM67945 (to S.P.G.) and U41HG006673 (to S.P.G. and E.L.H).

Author information

Author notes

    • Joel M. Chick
    •  & Steven C. Munger

    These authors contributed equally to this work.

    • Gary A. Churchill
    •  & Steven P. Gygi

    These authors jointly supervised this work.

Affiliations

  1. Harvard Medical School, Boston, Massachusetts 02115, USA

    • Joel M. Chick
    • , Edward L. Huttlin
    •  & Steven P. Gygi
  2. The Jackson Laboratory, Bar Harbor, Maine 04609, USA

    • Steven C. Munger
    • , Petr Simecek
    • , Kwangbom Choi
    • , Daniel M. Gatti
    • , Narayanan Raghupathy
    • , Karen L. Svenson
    •  & Gary A. Churchill

Authors

  1. Search for Joel M. Chick in:

  2. Search for Steven C. Munger in:

  3. Search for Petr Simecek in:

  4. Search for Edward L. Huttlin in:

  5. Search for Kwangbom Choi in:

  6. Search for Daniel M. Gatti in:

  7. Search for Narayanan Raghupathy in:

  8. Search for Karen L. Svenson in:

  9. Search for Gary A. Churchill in:

  10. Search for Steven P. Gygi in:

Contributions

C.M. developed the methodology for analysing the convection models, conducted the plate analysis, contributed to the interpretation and wrote the manuscript. N.C. conducted the convection calculations, contributed to the development of the methodology and analysis, contributed to the interpretation and wrote the manuscript. M.S. and R.D.M. provided guidance with GPlates and scripts, contributed to the interpretation and wrote the manuscript. P.J.T. provided the StagYY convection code, guidance on using it and wrote the manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Gary A. Churchill or Steven P. Gygi.

Extended data

Supplementary information

Zip files

  1. 1.

    Supplementary Tables

    This zipped file contains Supplementary Tables 1-9.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nature18270

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.