The HUPO PSI's Molecular Interaction format—a community standard for the representation of protein interaction data

Article metrics


A major goal of proteomics is the complete description of the protein interaction network underlying cell physiology. A large number of small scale and, more recently, large-scale experiments have contributed to expanding our understanding of the nature of the interaction network. However, the necessary data integration across experiments is currently hampered by the fragmentation of publicly available protein interaction data, which exists in different formats in databases, on authors' websites or sometimes only in print publications. Here, we propose a community standard data model for the representation and exchange of protein interaction data. This data model has been jointly developed by members of the Proteomics Standards Initiative (PSI), a work group of the Human Proteome Organization (HUPO), and is supported by major protein interaction data providers, in particular the Biomolecular Interaction Network Database (BIND), Cellzome (Heidelberg, Germany), the Database of Interacting Proteins (DIP), Dana Farber Cancer Institute (Boston, MA, USA), the Human Protein Reference Database (HPRD), Hybrigenics (Paris, France), the European Bioinformatics Institute's (EMBL-EBI, Hinxton, UK) IntAct, the Molecular Interactions (MINT, Rome, Italy) database, the Protein-Protein Interaction Database (PPID, Edinburgh, UK) and the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING, EMBL, Heidelberg, Germany).

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 5: Graphical representation of XML document structure.
Figure 1: Graphical representation of the PSI MI format.
Figure 2: PSI MI example file.
Figure 3: 'Interaction detection' controlled vocabulary.
Figure 4: The PIMWalker network visualization tool.


  1. 1

    Miyazaki, S., Sugawara, H., Gojobori, T. & Tateno, Y. DNA Data Bank of Japan (DDBJ). Nucleic Acids Res. 31, 13–16 (2003).

  2. 2

    Stoesser, G. et al. The EMBL Nucleotide Sequence Database: major new developments. Nucleic Acids. Res. 31, 17–22 (2003).

  3. 3

    Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J. & Wheeler, D.L. GenBank. Nucleic Acids Res. 31, 23–27 (2003).

  4. 4

    Westbrook, J., Feng, Z., Chen, L., Yang, H. & Berman, H.M. The Protein Data Bank and structural genomics. Nucleic Acids Res. 31, 489–491 (2003).

  5. 5

    Spellman, P.T. et al. Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol. 3, research0046.1–0046.9 (2003).

  6. 6

    Brazma, A. et al. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat. Genet. 29, 365–371 (2001).

  7. 7

    Ball, C.A. Microarray Gene Expression Data (MGED) Society: standards for microarray data. Science 298, 539 (2002).

  8. 8

    Orchard, O., Hermjakob, H. & Apweiler, R. The Proteomics Standards Initiative. Proteomics 7, 1374–1376 (2003).

  9. 9

    Taylor, C.F. et al. A systematic approach to modeling, capturing and disseminating proteomics experimental data. Nat. Biotechnol. 21, 247–254 (2003).

  10. 10

    Bader, G.D., Betel, D. & Hogue, C.W.V. BIND, the Biomolecular Interaction Network Database. Nucleic Acids Res. 31, 248–250 (2003).

  11. 11

    Salwinski, L. et al. The Database of Interacting Proteins: 2004 update. Nucleic Acids Res. 32, D449–D451 (2004).

  12. 12

    Mewes, H.W. et al. MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 30, 31–34 (2002).

  13. 13

    Zanzoni, A. et al. MINT: a Molecular INTeraction database. FEBS Lett. 513, 135–140 (2002).

  14. 14

    von Mering, C. et al. STRING: a database of predicted functional associations between proteins. Nucleic Acids Res. 31, 258–261 (2003).

  15. 15

    Bader, G.D. & Hogue, C.W. BIND—a data specification for storing and describing biomolecular interactions, molecular complexes and pathways. Bioinformatics 16, 465–477 (2000).

  16. 16

    Kaiser, J. Proteomics. Public-private group maps out initiatives. Science 296, 827 (2002).

  17. 17

    Orchard, S., Kersey, P., Hermjakob, H. & Apweiler, R. The HUPO Proteomics Standards Initiative meeting: towards common standards for exchanging proteomics data. Comp. Funct. Genomics 4, 16–19 (2003).

  18. 18

    Orchard, S. et al. Progress in establishing common standards for exchanging proteomics data: the second meeting of the HUPO Proteomics Standards Initiative. Comp. Funct. Genomics 4, 203–206 (2003).

  19. 19

    Hucka, M. et al. The Systems Biology Markup Language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19, 524–531 (2003).

  20. 20

    The Gene Ontology Consortium. Creating the gene ontology resource: design and implementation. Genome Res. 11, 1425–1433 (2001).

  21. 21

    Boeckmann, B. et al. The Swiss-Prot protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370 (2003).

  22. 22

    Deane, C.M., Salwinski, L., Xenarios, I. & Eisenberg, D. Protein interactions: two methods for assessment of the reliability of high throughput observations. Mol. Cell Proteomics 1, 349–356 (2002).

  23. 23

    Rain, J.-R. et al. The protein-protein interaction map of Helicobacter pylori. Nature 409, 211–215 (2001).

  24. 24

    Garavelli, J.S. The RESID Database of Protein Modifications: 2003 developments. Nucleic Acids Res. 31, 499–501 (2003).

  25. 25

    Day, R.N., Periasamy, A. & Schaufele, F. Fluorescence resonance energy transfer microscopy of localized protein interactions in the living cell nucleus. Methods 25, 4–18 (2001).

  26. 26

    Reboul, J. et al. C. elegans ORFeome version 1.1: experimental verification of the genome annotation and resource for proteome-scale protein expression. Nat. Genet. 34, 35–41 (2003).

  27. 27

    Peri, S. et al. Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res. 13, 2363–2371 (2003).

  28. 28

    Hermjakob, H. et al. IntAct—an open source molecular interaction database. Nucleic Acids Res., 32, D452–D455 (2004).

  29. 29

    Husi, H. & Grant, S.G. Construction of a Protein-Protein Interaction Database (PPID) for Synaptic Biology. in Neuroscience Databases: A Practical Guide. (R. Kotter, ed.) 1–62 (Boston/Dordrecht/London, Kluwer Academic Publishers, 2002).

Download references


This work was supported partially by EU grant number QLRI-CT-2001-00015 under the Research and Technological Development program 'Quality of Life and Management of Living Resources'. The PSI meetings were supported by the Human Proteome Organization. The work in the University of Rome 'Tor Vergata' was supported by grants from Associazione Italiana per la Ricerca sul Cancro and grant GTF02011 from Telethon. M.L. is supported by the European Molecular Biology Laboratory International PhD program and Biotechnology and Biological Sciences Research Council grant 8/C19399. Y.L. and R.Z. are supported by grants 2001AA233031, 2002CB512801, 110CB510209. M.V.'s laboratory is supported by grants from the US National Cancer Institute and National Human Genome Research Institute. L.M.-P. would like to thank Jens Pedersen, Claudia Bagni, Benedetta Mattei, Elena Santonico, Federico Demasi and Michael Ashburner for contributions to the controlled vocabularies. Emmanuel Cézanne, Sébastien Cros, Claire Even, Nicolas Jolibert, Sandrine Marquès, Christophe Roumegous, Patrick Sablayrolles and René Thomas-Nelson contributed to the development of the PSI XSLT utilities. The collaborative development process has been facilitated by the infrastructure provided by Source Forge.

Author information

Correspondence to Henning Hermjakob.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Institute of Bioinformatics, International Tech Park, Whitefield Road, 560 066 Bangalore, India.

Rights and permissions

Reprints and Permissions

About this article

Further reading