Large-scale gene function analysis with the PANTHER classification system

Abstract

The PANTHER (protein annotation through evolutionary relationship) classification system (http://www.pantherdb.org/) is a comprehensive system that combines gene function, ontology, pathways and statistical analysis tools that enable biologists to analyze large-scale, genome-wide data from sequencing, proteomics or gene expression experiments. The system is built with 82 complete genomes organized into gene families and subfamilies, and their evolutionary relationships are captured in phylogenetic trees, multiple sequence alignments and statistical models (hidden Markov models or HMMs). Genes are classified according to their function in several different ways: families and subfamilies are annotated with ontology terms (Gene Ontology (GO) and PANTHER protein class), and sequences are assigned to PANTHER pathways. The PANTHER website includes a suite of tools that enable users to browse and query gene functions, and to analyze large-scale experimental data with a number of statistical tests. It is widely used by bench scientists, bioinformaticians, computer scientists and systems biologists. In the 2013 release of PANTHER (v.8.0), in addition to an update of the data content, we redesigned the website interface to improve both user experience and the system's analytical capability. This protocol provides a detailed description of how to analyze genome-wide experimental data with the PANTHER classification system.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Overview of PANTHER infrastructure.
Figure 2: Sample phylogenetic tree from PANTHER.
Figure 3: Example of a PANTHER pathway diagram.
Figure 4
Figure 7
Figure 8: Summary of the results from the statistical overrepresentation test displayed in a table.
Figure 9: Results from the statistical enrichment test.
Figure 10: Graph view of results from the statistical enrichment test.
Figure 11: Results from the statistical enrichment test as visualized in the pathway diagram.
Figure 5: Results of functional classification displayed as a gene list page.
Figure 6: PANTHER pie chart results using Supplementary Data 1 as the input gene list file.
Figure 12: Results of the statistical overrepresentation test viewed in the PANTHER pathway, 'Heterotrimeric G protein signaling pathway – Gi-α and Gs-α–mediated pathway (PANTHER accession code P00026)'.

References

  1. 1

    Mi, H., Muruganujan, A. & Thomas, P.D. PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Res. 41, D377–D386 (2013).

    CAS  Article  Google Scholar 

  2. 2

    Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).

    CAS  Article  Google Scholar 

  3. 3

    Thomas, P.D. et al. PANTHER: a library of protein families and subfamilies indexed by function. Genome Res. 13, 2129–2141 (2003).

    CAS  Article  Google Scholar 

  4. 4

    Thomas, P.D. et al. Applications for protein sequence-function evolution data: mRNA/protein expression analysis and coding SNP scoring tools. Nucleic Acids Res. 34, W645–W650 (2006).

    Article  Google Scholar 

  5. 5

    Mi, H. et al. Assessment of genome-wide protein function classification for Drosophila melanogaster. Genome Res. 13, 2118–2128 (2003).

    CAS  Article  Google Scholar 

  6. 6

    Mi, H. et al. The PANTHER database of protein families, subfamilies, functions and pathways. Nucleic Acids Res. 33, D284–D288 (2005).

    CAS  Article  Google Scholar 

  7. 7

    Mi, H. & Thomas, P. PANTHER pathway: an ontology-based pathway database coupled with data analysis tools. Methods Mol. Biol. 563, 123–140 (2009).

    CAS  Article  Google Scholar 

  8. 8

    Funahashi, A. et al. CellDesigner 3.5: a versatile modeling tool for biochemical networks. Proc. IEEE 96, 1254–1265 (2008).

    Article  Google Scholar 

  9. 9

    van Baarsen, L.G.M. et al. Gene expression profiling in autoantibody-positive patients with arthralgia predicts development of arthritis. Arthritis Rheum. 62, 694–704 (2010).

    CAS  Article  Google Scholar 

  10. 10

    Verma, G., Bhatia, H. & Datta, M. Gene expression profiling and pathway analysis identify the integrin signaling pathway to be altered by IL-1β in human pancreatic cancer cells: role of JNK. Cancer Lett. 320, 86–95 (2012).

    CAS  Article  Google Scholar 

  11. 11

    Boyer, A.P., Collier, T.S., Vidavsky, I. & Bose, R. Quantitative proteomics with siRNA screening identifies novel mechanisms of trastuzumab resistance in HER2-amplified breast cancers. Mol. Cell Proteomics 12, 180–193 (2013).

    Article  Google Scholar 

  12. 12

    Stützer, I. et al. Systematic proteomic analysis identifies β-site amyloid precursor protein cleaving enzyme 2 and 1 (BACE2 and BACE1) substrates in pancreatic beta cells. J. Biol. Chem. 288, 10536–10547 (2013).

    Article  Google Scholar 

  13. 13

    Shi, Y. et al. Genome-wide association study identified eight new risk loci for polycystic ovary syndrome. Nat. Genet. 44, 1020–1025 (2012).

    CAS  Article  Google Scholar 

  14. 14

    den Hoed, M. et al. Identification of heart rate–associated loci and their effects on cardiac conduction and rhythm disorders. Nat. Genet. 45, 621–631 (2013).

    CAS  Article  Google Scholar 

  15. 15

    Feng, J. et al. Dnmt1 and Dnmt3a maintain DNA methylation and regulate synaptic function in adult forebrain neurons. Nat. Neurosci. 13, 423–430 (2010).

    CAS  Article  Google Scholar 

  16. 16

    Hek, K. et al. A genome-wide association study of depressive symptoms. Biol. Psychiatr. 73, 667–678 (2013).

    CAS  Article  Google Scholar 

  17. 17

    Neely, G.G. et al. A global in vivo Drosophila RNAi screen identifies NOT3 as a conserved regulator of heart function. Cell 141, 142–153 (2010).

    CAS  Article  Google Scholar 

  18. 18

    McDowall, J. & Hunter, S. InterPro protein classification. Methods Mol. Biol. 694, 37–47 (2011).

    CAS  Article  Google Scholar 

  19. 19

    Gene Ontology Consortium. The Gene Ontology: enhancements for 2011. Nucleic Acids Res. 40, D559–D564 (2012).

  20. 20

    Mi, H. et al. PANTHER version 7: improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium. Nucleic Acids Res. 38, D204–D210 (2010).

    CAS  Article  Google Scholar 

  21. 21

    Reference Genome Group of the Gene Ontology Consortium. The Gene Ontology's Reference Genome Project: a unified framework for functional annotation across species. PLoS Comput. Biol. 5, e1000431 (2009).

  22. 22

    Gaudet, P., Livstone, M.S., Lewis, S.E. & Thomas, P.D. Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium. Brief Bioinform. 12, 449–462 (2011).

    Article  Google Scholar 

  23. 23

    Cerami, E.G. et al. Pathway Commons, a web resource for biological pathway data. Nucleic Acids Res. 39, D685–D690 (2010).

    Article  Google Scholar 

  24. 24

    Thomas, P.D. GIGA: a simple, efficient algorithm for gene tree inference in the genomic age. BMC Bioinformatics 11, 312 (2010).

    Article  Google Scholar 

  25. 25

    Le Novere, N. BioModels Database—a database of annotated published models http://www.ebi.ac.uk/biomodels-main/static-pages.do?page=home (2011).

  26. 26

    Hucka, M . et al. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19, 524–531 (2003).

    CAS  Article  Google Scholar 

  27. 27

    Demir, E. et al. The BioPAX community standard for pathway data sharing. Nat. Biotech. 28, 935–942 (2010).

    CAS  Article  Google Scholar 

  28. 28

    Cho, R.J. & Campbell, M.J. Transcription, genomes, function. Trends Genet. 16, 409–415 (2000).

    CAS  Article  Google Scholar 

  29. 29

    Clark, A.G. et al. Inferring non-neutral evolution from human-chimp-mouse orthologous gene trios. Science 302, 1960–1963 (2003).

    CAS  Article  Google Scholar 

  30. 30

    Mootha, V.K. et al. Identification of a gene causing human cytochrome c oxidase deficiency by integrative genomics. Proc. Natl. Acad. Sci. USA 100, 605–610 (2003).

    CAS  Article  Google Scholar 

  31. 31

    Mann, H.B. & Whitney, D.R. On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Statist. 18, 50–60 (1947).

    Article  Google Scholar 

  32. 32

    Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545–15550 (2005).

    CAS  Article  Google Scholar 

  33. 33

    Sherman, B. et al. DAVID Knowledgebase: a gene-centered database integrating heterogeneous gene annotation resources to facilitate high-throughput gene functional analysis. BMC Bioinformatics 8, 426 (2007).

    Article  Google Scholar 

Download references

Acknowledgements

We thank the Reference Proteome team, especially C. McAnulla and M. Martin, for their support in providing up-to-date Reference Proteome data set, and we thank Y. Matsuoka and K. Manami from the Systems Biology Institute Japan for their support on CellDesigner and pathway file update. This work is supported by the US National Institutes of Health (NIH)/National Institute of General Medical Sciences (NIGMS) grant no. GM081084 to P.D.T. Funding for open access was provided by the University of Southern California.

Author information

Affiliations

Authors

Contributions

A.M. developed the software code for the website. J.T.C. maintained the database and web servers. H.M. generated the content of the system and supervised the project. P.D.T. provided the funding and supervised the project. H.M. wrote the manuscript with contributions from all the authors.

Corresponding author

Correspondence to Huaiyu Mi.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Data 1

File containing 555 gene IDs, 500 of which can be classified in PANTHER. It can be used as a sample upload file to test the functional classification tools and the overrepresentation tool. (TXT 10 kb)

Supplementary Data 2

File containing 19,911 genes, with IDs in the first column and numeric experimental values in the second column. This file can be used as a sample upload file for all tools. (TXT 381 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Mi, H., Muruganujan, A., Casagrande, J. et al. Large-scale gene function analysis with the PANTHER classification system. Nat Protoc 8, 1551–1566 (2013). https://doi.org/10.1038/nprot.2013.092

Download citation

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing