PEPlife: A Repository of the Half-life of Peptides

Short half-life is one of the key challenges in the field of therapeutic peptides. Various studies have reported enhancement in the stability of peptides using methods like chemical modifications, D-amino acid substitution, cyclization, replacement of labile aminos acids, etc. In order to study this scattered data, there is a pressing need for a repository dedicated to the half-life of peptides. To fill this lacuna, we have developed PEPlife (http://crdd.osdd.net/raghava/peplife), a manually curated resource of experimentally determined half-life of peptides. PEPlife contains 2229 entries covering 1193 unique peptides. Each entry provides detailed information of the peptide, like its name, sequence, half-life, modifications, the experimental assay for determining half-life, biological nature and activity of the peptide. We also maintain SMILES and structures of peptides. We have incorporated web-based modules to offer user-friendly data searching and browsing in the database. PEPlife integrates numerous tools to perform various types of analysis such as BLAST, Smith-Waterman algorithm, GGSEARCH, Jalview and MUSTANG. PEPlife would augment the understanding of different factors that affect the half-life of peptides like modifications, sequence, length, route of delivery of the peptide, etc. We anticipate that PEPlife will be useful for the researchers working in the area of peptide-based therapeutics.


System and Methods
Data Collection. The data was manually collected and curated from published research articles and patents.
Only those peptides were included in the database, whose half-life was experimentally determined. We queried PubMed to search for research articles and The Lens for patents. The query '(peptide[Title/Abstract] AND half-life[Title/Abstract])' was used to retrieve articles relevant to half-life of peptides from PubMed. It resulted in ~2280 articles as on November 2015. During the initial screening, the articles lacking relevant information and reviews were excluded. Around 900 potential papers were scrutinized to mine the required fields. Finally, data was systematically curated from 335 articles. Similarly, full-text of granted patents were obtained from The Lens and manually screened to filter the patents with relevant information for data curation. We also collected relevant information about FDA approved peptide drugs from DrugBank 35 and related literature.
In PEPlife, we have systematically compiled comprehensive information about each peptide. The information includes the peptide's name, sequence, length, terminal and non-terminal modifications, biological property, assay used to determine the half-life of peptide. To maintain complete information, we made multiple entries for the same peptide if its bioactivity or half-life was tested using different concentrations, conditions, routes of administration, etc. This complete information, thus, highlights the influence of these subtle conditions on the half-life of peptides.
Database Architecture and Web Interface. PEPlife was built using Apache HTTP server on Linux Platform. MySQL an object-relational database management system (RDBMS) was used to manage all the data in the backend. It allows easy retrieval and storage of the data in the database. HTML, CSS, PHP and JavaScript were used to develop the front-end web interface. The architecture of PEPlife is represented in Fig. 1.
Database Content. The information in PEPlife can be categorized into two broad types: primary information and secondary information. The primary information has been curated from the literature and consists of the following major fields namely: (i) PMID, (ii) the peptide sequence, (iii) the name of peptide, (iv) the length of peptide, (v) N-terminal modification, (vi) C-terminal modification (vii) configuration (linear or cyclic) (viii) chirality of the amino acids, (ix) chemical modification, (x) origin of the peptide, (xi) biological activity of the peptide, (xii) half-life, (xiii) assay types, (xiv) sample on which the half-life was tested and (xv) Patent ID.
The secondary information, which was derived from the primary information, includes the tertiary structures of peptides. To obtain the structural information, all the peptides of PEPlife database were searched and mapped to the peptide sequences in the Protein Data Bank (PDB) 36 . After obtaining the exact sequence match in PDB, the structure same as the match was assigned to the query peptide. Using this approach, we determined the structures of 265 peptides. In the cases where identical peptide sequences were not available, we predicted the structures of the peptides using PEPstrMOD 37 which is updated and advanced version of PEPstr 38 , containing natural and modified residues [38][39][40][41] . Due to the unavailability of force-field libraries for complex chemical modifications (e.g., pegylation, penicillamine, etc.), the structures of the peptides containing such modifications were not predicted. The structures of a total of 36 peptides, which had amino acids lesser than five residues, could not be predicted by using the software mentioned above. Therefore, in this case where a peptide has less than five residues, we used a linear conformation with dihedral angles (φ and ψ ) as 180°. The initial structure was subjected to energy minimization and molecular dynamics simulation. The trajectory of the whole simulation was searched for the conformation that had minimum energy. The minimum-energy-structure was considered as the final predicted structure. The structures of 132 peptides having more than 40 residues were predicted using I-TASSER web service 42 . The tertiary structures of the peptide entries from DrugBank were also predicted in the same way. Out of 29 peptide entries from DrugBank, 6 entries were mapped to PDB, 9 entries were predicted using PEPstrMOD, and the rest 14 entries with modified residues were not predicted.
The secondary structures of all peptides were assigned using DSSP software from their tertiary structure 43 . DSSP assigns the secondary structure into eight different states (B: beta-bridge; C: loop; E: extended strand; G: 3/10 helix; H: alpha-helix; I: pi-helix; S: bend and T: turn). The secondary structure analysis of the predicted structures (except DrugBank peptides) revealed that the peptide residues frequently belong to loop regions (~32%), followed by helix (~30%), turns (~17%) and bends (~19%). Only a few peptide residues were observed in strand regions (~2%). The predicted peptide structures were also converted into SMILES notation using Open Babel software 44 .

Implementation of Web Tools.
A number of tools have been integrated for data retrieval, similarity search and data analysis; following is the brief description of different options available in PEPlife.
Search Tools. We have incorporated 4 different modules under the Search option to facilitate easy retrieval of data: Simple, Advanced, Peptide and SMILES. In the Simple Search, the website facilitates users to search peptides according to any of the fields in the database. In this option, users can also select the fields to display in the results. In the case of the Advanced Search, users can perform complex and multiple queries for extracting desired entries from the database. This option allows the use of standard logical operators ("= ", "> ", "< " and "LIKE"). A user can combine the outputs of different queries using operators like "AND & OR". The Peptide Search tool searches exact as well as substring matches of a given peptide sequence among the peptide sequences available in PEPlife. We have also maintained structures of the peptides in SMILES format to assist users to understand the property of peptides at atom/bond level. The SMILES Search facilitates users to search a query peptide in SMILES format present in our database.
Browsing Tools. In PEPlife, we have provided a simple yet thorough class-wise browsing facility, in which all the peptide-entries have been categorized into different classes. In this module, the information related to a peptide can be browsed using following seven criteria Peptides in PEPlife have different conformations, amino acid configurations and lengths ( Fig. 2A). The examples of the effects of modifications on the half-life of peptides are given in Fig. 3 and Supplementary Table S1. Peptides in 245 entries are cyclic. The peptides in 213 entries have peptides with mixed (i.e., containing both L and D) amino acid configurations. Lengths of the peptides vary from less than five amino acids to more than 35 amino acids. Peptides with the lengths from six to ten amino acids have the maximum number of entries (571 entries), followed by the peptides having > 35 amino acids (462 entries). Peptides composed of 21 to 25 amino acids are present in the least number of entries (57 entries) (Fig. 2B).
In order to increase the half-life of peptides, various modifications have been incorporated in the peptides. Among them, most of the modifications have been done at the termini (Fig. 2C). The maximal number of N-terminal modifications include the addition of 2,4-dichlorophenoxyacetic acid with (CH2) n -spacers, followed by acylation. Besides, PEGylation, glycosylation, succinylation, addition of human serum albumin (HSA) and hydroxylation have been utilized as important N-terminal modifications to improve the half-life of peptides. Amidation is the most used C-terminal modification followed by biotinylation with PEG (polyethylene glycol) spacers. The other C-terminal modifications include additions of Human serum albumin (3 entries The entries in PEPlife show a number of in vivo (948 entries) and in vitro (1265 entries) methods used to assess the half-life of peptides. These methods include mass spectrometry, immunoassays, radiolabeling, spectroscopy and various other assays. Some of the favored assessment methods include HPLC (540 entries), radioimmunoassay (335 entries) and ELISA (91 entries) (Fig. 2D).

Discussion
The half-life of a peptide determines its bioavailability to the organism; a peptide having therapeutic advantages should also possess optimal bioavailability to be used as a drug. The short half-life of a therapeutic peptide can lead to the less bioavailability. Despite the significant relevance of half-life in bioavailability, so far no platform is available which covers a broad variety of information related to the half-life of peptides. However, few bioinformatics platforms predict the half-life of specific peptides in the specific environments only 31,45 . Moreover, these platforms do not contain a wide range of information. Therefore, it is evident that there is a need for a database that has a broad scope and usage in peptide half-life improvement. In this report, we have created a database as an attempt to fulfill the lacuna and to provide a repertoire of information related to the half-life of peptides having a variety of properties and modifications. The database also covers the variations observed in the half-life according to different environments, organisms and different routes of administration.
The half-life of a peptide depends on both, the organism and the peptide. A number of in vitro and in vivo studies have been done to understand the relationship between the half-life of peptides and their sequences, structures, modifications; host organisms; and drug administration routes in the host organism. The factors which affect the enzymatic degradation and the pharmacokinetics of a peptide in an organism play crucial roles in deciding the stability of that peptide 46,47 . Apparently the factors that lower the enzymatic degradation and metabolism of a peptide tend to stabilize the peptide 48,49 . Different organisms have different pharmacokinetics and different extent of proteolysis of a peptide, leading to a difference in its half-life 46,47 . Moreover, different individuals of the same species can have variable pharmacokinetics of the same peptide, leading to the variable half-life of the peptide 47 .
The significant factors affecting half-life include the sequence of a peptide, modifications, administration routes, and the amount of the peptide (dose). It is observed that the sequence variants of a peptide have different half-lives. Chemical modifications also alter the half-life 49 . To achieve improved half-life, the inclusion of chemical modifications such as the use of D-amino acids, non-natural amino acids (e.g., ornithine), PEGylation and N and C-terminal modifications have been extensively employed. A peptide administered in different organisms via the same route has different half-lives 49 . Furthermore, different administration routes also affect the half-life of a peptide 50 . Clearly, all the mentioned details are necessary to improve the half-life of therapeutic peptides. For this reason, it is essential to store such details at one platform for their easy access and use.
PEPlife is a repository of valuable information related to the stability of peptides. It harbors extensive and systematic cataloging of the data related to the half-life of peptides and the affecting factors. This information can prove indispensable for the rational design of peptides of therapeutic importance. To add further advantages, a number of tools have been provided in the database to facilitate the extraction and analysis of the compiled information. We anticipate that PEPlife will be helpful not only to satisfy half-life queries but also to understand the properties of peptides that govern their half-lives.
In the future, various interesting studies can be done using the data of PEPlife. Some of them can be as follows: (i) structures available in PEPlife can be used for docking and various membrane simulations studies, (ii) the dataset of PEPlife can be used for development of various prediction methods for peptide half-life, and, (iii) the SMILES of PEPlife can be used to develop QSAR models. We hope that PEPlife will be a useful resource for researchers working in the area of designing of therapeutic peptides. Update of PEPlife. We will update PEPlife at regular intervals to further widen the coverage of half-life of peptides reported in literature. PEPlife also provides the users an option to submit new entries of peptides and their half-life on its web interface by filling an HTML form. Our team will confirm the validity of each new entry before incorporating into PEPlife in order to maintain a high level of quality.

Limitations.
We have made an attempt to cover as much information as possible related to half-life of peptides by manual curation, though it is possible that a few articles might not be incorporated that could not be fetched with our search criteria. We have provided structural information of most of the peptides but due to unavailability of force-field libraries of complex modification of peptides, a few structures of peptides could not be predicted.