AOD: the antioxidant protein database

An antioxidant is a molecule that can prevent free radicals from causing damages in organisms. The increasing studies on antioxidants calls for a specialized database that is not readily available yet. To this end, in the present study, the Antioxidant Database (AOD) was developed to help researchers understand and reveal the biological functions of antioxidant proteins. AOD is freely available at http://lin.uestc.edu.cn/AODdatabase/index.aspx. The current release of AOD consists of 710 antioxidant proteins. Information including taxonomy, source organism, subcellular location, gene ontology, catalytic activity and function of antioxidant proteins are all extracted from UniProtKB/Swiss-Prot and captured in AOD. In addition, two web-based tools for performing sequence similarity search and computationally identification of antioxidants were also integrated in AOD. We believe that AOD will greatly facilitate the researches on antioxidants.


Results and Discussions
Data statistics. Among the 710 antioxidant proteins of AOD, 458 are from eukaryota, 221 from bacteria, 28 from archaea and 3 from virus (Fig. 1a). In AOD, 94% of the antioxidant proteins contain more than 100 amino acids with an average length of 227. The longest antioxidant protein (A1IGV8) has 1,463 amino acids and the shortest one (P83233) merely 8 amino acids. The length distribution of antioxidant proteins in AOD is shown in Fig. 1b.
The ratio of the frequency of amino acids in AOD to the frequency of amino acids in UniProtKB/Swiss-Prot protein knowledgebase is shown in Fig. 1c. We found that antioxidant proteins are enriched in residues Cys, Gly, His and Val.
SCIENTIfIC REPORTS | 7: 7449 | DOI:10.1038/s41598-017-08115-6 The subcellular location of antioxidant proteins is closely correlated with its biological activities 17 . Therefore, the information about subcellular location of antioxidant proteins is also provided in AOD. Four hundred and eighty seven antioxidant proteins in AOD have annotated subcellular location information, and 76% of them only reside in a single subcellular localization and the other 24% in multiple subcellular locations (Fig. 1d).
Quality assessment. In order to evaluate the quality of AOD, the 710 antioxidant proteins in the database were manually checked. It was found that there are 609 antioxidant proteins that have been reported in previous studies as indicated in the UniProtKB/Swiss-Prot. The Uniprot IDs for these 609 proteins were listed in Supplementary Table S1. As indicated in UniProtKB/Swiss-Prot, the remaining 101 antioxidant proteins were evaluated by experimental methods at transcript level, which supports the existence of the protein.
Browse. By clicking the 'Browse' button, the antioxidant proteins in AOD database will be shown on the computer screen and can be displayed page by page. The antioxidant proteins could be selected by clicking the buttons on their left. The information of protein name, taxonomy, organism, subcellular location, gene ontology, catalytic activity, function, amino acid sequence and links to external database such as Swiss-Model Repository, Uniprot, Gene Ontology and NCBI for each entry can be viewed by clicking on the sequence ID or by clicking the 'Show' button on the top right corner. The selected antioxidant proteins could also be downloaded to local machine by clicking the 'Download' button and saved in fasta format.
Search. The AOD can be searched in multiple ways. Users can basically search antioxidant proteins by limiting the searching filed to UniProt ID, Taxonomy, Organism, Subcellular locations or Protein names and entering the related query keyword. To perform conditional search, user can add (or remove) the searching filed using the Add (or Remove) button and then join the multiple query keywords by logical operators like AND/OR. Tools. AOD integrates two web-based tools for performing further analyses, i.e. sequence similarity search and identification of antioxidants. We have integrated BLAST 18 in AOD that allows users to perform the BLAST search against sequences deposited in AOD, which will facilitate finding sequences in the database that have high sequence similarity with the query sequence.
To the best of our knowledge, AodPred 19 is the smartest computational tool for identifying antioxidants at present. For the convenience of experimental scientists, the AodPred predictor was also integrated in AOD. Users can employ it to predict whether a query protein sequence is antioxidant or not. After inputting the query sequences in fasta format and clicking the 'Submit' button, the probability of the predictions being antioxidant or non-antioxidant will be shown in a new page.

Conclusions
In the present study, a comprehensive database called AOD is built with the aim to provide useful insights to the study of antioxidant proteins and to help researchers understand the role of different properties of antioxidant in their antioxidative activities. AOD is the first database providing information on antioxidants from multiple perspectives. We hope the AOD will better serve the research on antioxidant proteins. In order to make an encyclopedia-like database for antioxidants, we will continue to accumulate the information of new antioxidant proteins and add them into AOD.

Methods and Materials
Data collection. Sequences of antioxidant proteins together with their information including taxonomy, source organism, subcellular location, gene ontology, catalytic activity and function were obtained from the UniProtKB/Swiss-Prot database (release 2016_11) 20, 21 by searching the keyword "antioxidant". In order to obtain the quality data, the following steps were performed: (i) only proteins with the experimentally confirmed antioxidative activities were included; (ii) proteins including illegal letters, i.e., "B", "X" or"Z", were excluded. This finally yields 710 proteins that have experimentally proven or confirmed antioxidative activity to be included into AOD. Database structure and interface. As a user-friendly database, AOD is freely available at http://lin.uestc. edu.cn/AODdatabase/index.aspx. The database main page contains the following interfaces: Home, Browse, Search, Tools, Statistics, Links and Contact. The information related to protein name, taxonomy, origin of organism, subcellular location, sequence length, gene ontology, catalytic activity, protein sequence, function and links to external database such as Swiss-Model Repository, Gene Ontology and NCBI are all provided in AOD.