PMD: A Resource for Archiving and Analyzing Protein Microarray data

Protein microarray is a powerful technology for both basic research and clinical study. However, because there is no database specifically tailored for protein microarray, the majority of the valuable original protein microarray data is still not publically accessible. To address this issue, we constructed Protein Microarray Database (PMD), which is specifically designed for archiving and analyzing protein microarray data. In PMD, users can easily browse and search the entire database by experimental name, protein microarray type, and sample information. Additionally, PMD integrates several data analysis tools and provides an automated data analysis pipeline for users. With just one click, users can obtain a comprehensive analysis report for their protein microarray data. The report includes preliminary data analysis, such as data normalization, candidate identification, and an in-depth bioinformatics analysis of the candidates, which include functional annotation, pathway analysis, and protein-protein interaction network analysis. PMD is now freely available at www.proteinmicroarray.cn.

technology are also actively collected in PMD and freely available for all the users. We strongly believe that this database could be a valuable resource for the research community. With the addition of the bioinformatics tools and the latest publications, PMD could serve as a unique port for protein microarray technology.

Results
PMD web interface. The home page for PMD is a web-browser-based interface for performing database administration, data submission and storage, and query processing (Fig. 1A). Users can access the entire database by browsing the home page or submitting a query to search the database. To browse PMD, users can select the "Experiment" option or "Array" option in the home page, which will show the data based on the experiment names (or titles, as shown in Fig. 1B) and protein microarrays (Fig. 1C), respectively.
Besides, we are collecting protein microarray data from other databases, i.e. GEO & ArrayExpress and publications. Researchers who are developing their own protein microarrays or applying protein microarray for their own researches are highly encouraged to submit their original data to PMD. Following the archiving standards in PMD, users can submit their data by either microarray experiments (Fig. 1B) or microarrays (Fig. 1C). Since May 2014 when PMD began to accept data, there are now 137 experimental projects and 156 protein microarrays  Analysis tools implemented in PMD database. PMD is not only a specific resource for archiving protein microarray data, but also a unique platform for integrated analysis. Like DNA microarrays, the raw data of protein microarrays are usually stored in two major formats: gpr file (GenePix) and txt file (Agilent). In PMD, we encouraged users to provide their raw data as gpr files. As raw data have to be processed before further data analysis, PMD provides a standard data processing and normalization protocol for new users. PMD adopts specifically designed R scripts for raw data normalization and identification of "differentially expressed proteins". Here, "differentially expressed proteins" refers to proteins that show statistical differences between control microarrays and experimental microarrays. Additionally, PMD also provides bioinformatics tools for protein annotation and pathway analysis, which is achieved by combining The Database for Annotation, Visualization and Integrated Discovery (DAVID) 17 , Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) 18 and Protein ANalysis THrough Evolutionary Relationships (PANTHER) 19 . All of these analyses can be automatically performed after raw data were uploaded.
To clearly show how to use these analysis tools, we use a set of Homo sapiens proteome microarray data 20 with PMD ID PMDE78 as an example ( Fig. 2A). After submitting the data to PMD, and indicating the experimental and control groups, automatically, PMD will perform the analysis and generate the list of "differentially expressed proteins". The list contains basic annotation, such as UniProt ID, Pfam information, Protein Data Bank (PDB) ID, and post-translational modification (Fig. 2B). One step further, PMD will automatically perform in-depth bioinformatics analysis based on the list of "differentially expressed proteins". One can easily identify significantly enriched pathways by PANTHER (Fig. 2C), enriched gene ontology (GO) by DAVID (Fig. 2D), and protein-protein interaction (PPI) network by STRING (Fig. 2E). These results are included in a complete report, which will be automatically sent to the users.

Discussion
Compared to experiments using DNA microarrays, protein microarray experiments employ more diversified types of arrays and are designed to investigate a wider range of applications in both basic research and clinical studies. In this study, we report a specifically designed database for protein microarrays, named PMD. PMD has the following features: (I) It is a unique platform specifically designed for archiving original protein microarray data, and so it can promote data sharing among the proteomic community; (II) It provides standards and guidelines specifically tailored for the archiving and storage of protein microarray data; (III) Multiple software structures have been applied to construct an automated data analysis pipeline (Fig. 3). This pipeline is specific for protein microarrays, in contrast to the data analysis part of the GEO database that is more generally designed for DNA microarrays. In addition, the latest research publications about protein microarray technology development and application are also actively collected in PMD. With PMD, one can access all of the related information and the original protein microarray data in a "one-stop" fashion, with a capability of "one-click" data analysis. We strongly believe that PMD is a valuable resource for the research community by promoting protein microarray data sharing and facilitating data analysis.

Methods
Data acquisition and storage. The protein microarray data in PMD are obtained from 3 resources: the GEO/ArrayExpress databases, scientific literatures, as well as user's contributions. PMD integrate GEO/ ArrayExpress protein microarray data based on publications. Accordingly, several related datasets that are cited with a single publication are now stored as one experiment project in PMD. PMD also devote to collect protein microarray data that are associated with publications but are not publically available. In order to conveniently manage and share the protein microarray data, we implemented archiving standards for protein microarrays in PMD with specific modifications. These standards contain 6 critical elements: experiment name, provider, array . Work flow for PMD analysis tools. PMD analysis tools is an automated data analysis pipeline for protein microarray. After submitting protein microarray data into the database, by one-click PMD will automatically store the experimental and array information, normalize the raw data, and run the implemented analysis tools. In the end, users can receive a complete report containing a list of "differentially expressed proteins" and the results of all the in-depth bioinformatics analysis.
Scientific RepoRts | 6:19956 | DOI: 10.1038/srep19956 type, sample type, microarray annotation, and raw data. Among these elements, array type and sample type are specifically designed for protein microarrays corresponding the diverse types and applications of protein microarrays.
Database architecture and web interface. The collected protein microarray data were stored as a MySQL relational database. The information and raw data stored in PMD can be easily queried and downloaded by a user-friendly web interface. The front-end of PMD was constructed using Hypertext Preprocessor (PHP), while its back-end was built on joomla framework, running in an nginx web server. PMD architecture contains 3 major components: experimental management, metadata, and analysis tools.