Essential proteins and possible therapeutic targets of Wolbachia endosymbiont and development of FiloBase-a comprehensive drug target database for Lymphatic filariasis

Lymphatic filariasis (Lf) is one of the oldest and most debilitating tropical diseases. Millions of people are suffering from this prevalent disease. It is estimated to infect over 120 million people in at least 80 nations of the world through the tropical and subtropical regions. More than one billion people are in danger of getting affected with this life-threatening disease. Several studies were suggested its emerging limitations and resistance towards the available drugs and therapeutic targets for Lf. Therefore, better medicine and drug targets are in demand. We took an initiative to identify the essential proteins of Wolbachia endosymbiont of Brugia malayi, which are indispensable for their survival and non-homologous to human host proteins. In this current study, we have used proteome subtractive approach to screen the possible therapeutic targets for wBm. In addition, numerous literatures were mined in the hunt for potential drug targets, drugs, epitopes, crystal structures, and expressed sequence tag (EST) sequences for filarial causing nematodes. Data obtained from our study were presented in a user friendly database named FiloBase. We hope that information stored in this database may be used for further research and drug development process against filariasis. URL: http://filobase.bicpu.edu.in.

Scientific RepoRts | 6:19842 | DOI: 10.1038/srep19842 In the past few years, several investigators have made extensive efforts to identify better drugs and therapeutic targets to enhance the treatment of Lf [14][15][16][17][18][19][20][21][22] . A tremendous job was done by the division of Parasitology, New England Biolabs, by mining the essential genes of B. malayi 23 . It provides a new gateway to develop and design new therapeutics for brugian filariasis. Unfortunately, Wolbachia (Wol) which is an excellent drug target for Lf was untouched. It is an obligate intracellular symbiotic bacterium of filarial nematodes. It plays very crucial role in the development, vitality and fertility of the filarial nematodes 24,25 . Targeting Wol would be very effective approach to monitor the human filariasis infections. In 2009, the first computational effort was made to identify the essential genes of the unculturable Wolbachia endosymbiotic bacterium 26 in the lack of complete proteome of Wol. At present, we have complete genome of Wolbachia from Brugia malayi (wBm) 27 , which could be use to define the most suitable therapeutic targets in wBm through a hierarchical proteome subtractive approach. Several researchers have used this approach to identify the possible therapeutic targets [28][29][30][31][32] in various pathogens. Therefore, we have adopted this approach with slight addition and modification to come up with possible therapeutic targets. The complete work flow of our approach has been demonstrated in Fig. 1. In this current approach, we have added estimation of the identified therapeutic targets through the druggability test and ensured that all identified drug targets are non-homologous to human proteins. Identified potential therapeutic targets were further modelled to get insight into their molecular organization and protein conformations.
All identified therapeutic targets in this study was modelled and stored in an open access database named FiloBase. In addition, we have modelled forty drug targets of B. malayi and incorporated in our database which was earlier identified by Kumar et al. from the England Biolabs 23 . In order to supplement and make it more informative, other significant filarial information such as ESTs sequences, potential epitopes, experimental drugs, experimental structures of nematode proteins was incorporated in FiloBase by extensive literature survey which was in demand to facilitate the drug discovery process of Lf 33 . At present, FiloBase contains 119 potential drug targets for Lf; we hope that FiloBase will be worthwhile to expedite the process of drug discovery for the better treatment of Lf.

Identification of pathogens metabolic pathways. Kyoto Encyclopedia of Genes and Genomes
(KEGG) is a manually curated database which helps to understand the biological systems of the organisms. Metabolic pathway of wBm and human was retrieved from the KEGG database and manual comparison was performed to find out the unique and common pathways in pathogens. Metabolic pathways which were present in both (human and wBm) were considered as common pathways and those which were present only in wBm not in human host were identified as a unique pathway. Protein sequence of both the pathways were retrieved from the UniProt database and passed through the further proteome subtractive channel.
Mining of non-homologous to human proteins of wBm. Retrieved protein sequences of wBm from both the pathways were subjected to BLASTp and sequence similarity search was performed against the human proteome database. The main objective of this step was to define the non-homologous to human proteins in wBm. It is likely to prevent the cross-reactivity of drug compounds with the human host proteins 34 . Here, we have used 'Expect' value (e-value) < 0.005 and a minimum bit score of >100 to exclude the homologous sequences. Proteins which showed "HITS" with the above mentioned cut-off values were considered as non-homologous proteins [35][36][37] and carried for further screening process, while remaining sequences were excluded from the list.
Essentiality assessment of wBm proteins. In order to identify the essential proteins of wBm, resultant non-homologous to human protein sequences of wBm were subjected to the protein BLAST tool and similarity search was performed against the essential protein sequence of bacteria from Database of Essential Gene (DEG) 38 with an e-value < 0.0001 and bit score > 100 39 . DEG is a database which contains indispensable genes from bacteria, archaea, and eukaryote organisms which support their cellular life. Currently, it holds 12,926 essential genes of bacteria. Based on the assumptions that similar proteins which are essential in one bacteria may be essential for another bacteria, hits found with DEG database with the above mentioned cut-off values were expected to represent the crucial proteins of wBm while remaining proteins were not, therefore excluded from the list of probable drug target. Drug prioritization. All these non-homologous to human and essential proteins of wBm can be treated as potential therapeutic targets. However, being non-homologous to human and indispensable for the pathogen survival is not only the criteria to identify the most suitable drug targets for any pathogens, other vital parameters such as low molecular weight, sub-cellular localizations and their ability to interact with potential drugs (druggability) are also playing very significant role to identify the potential drug targets. Therefore, screening of potential therapeutic targets of wBm were carried out by identifying their biological significance and sub-cellular localization using Gneg-mPLoc 40 . Gneg-mPLoc is a powerful tool to identify the sub-cellular location of gram negative bacteria proteins. It identifies the gram negative protein localization in the following eight locations (1) cytoplasm, (2) extracellular, (3) fimbrium, (4) flagellum, (5) inner membrane, (6) nucleoid, (7) outer membrane and (8) periplasm 40 . Resultant data sets of Gneg-mPLoc were further cross-checked with CELLO v.2.5 41 and PSORTb II 42 which are another web based tool for the prediction of sub-cellular localization of protein. The main objective of this study was to classify the proteins as potential drugs and vaccine targets. The cytoplasmic and inner membrane proteins were considered as potential drug targets, whereas the surface membrane, peri-plasmic and extracellular proteins were considered as potential vaccine targets. Prevention from the Lf can be accomplished through the vaccine development which involves antigenic surfaces to trigger the humoral immune responses. Therefore, membrane associated protein candidates are likely to represents as potential therapeutic targets to develop new vaccine, epitopes and as well as powerful drug candidates. Druggability analysis. The most reliable way to identify the druggability of a protein is to identify the similar protein which binds to the drug-like compound 43,44 . Therefore, all identified potential drug targets were further evaluated based on the druggability test. In this step, all identified potential drug targets were searched against the DrugBank database 45 and TTD database 46 . For this study, we have downloaded all FDA approved drugs (1,478), FDA approved Biotech drugs (183), experimental drugs (2,718), approved enzymes (174) and approved transporters (79) from DrugBank and TTD database (2134). Hits found with DrugBank and TTD database was considered as druggable targets while remaining were considered as novel drug targets which further need to be validated experimentally.
Tertiary structure identification. The tertiary structures of identified possible therapeutic and vaccine targets were searched in the PDB 47 database for experimental structure. Available structures were incorporated in our database, while, un-available structures were searched for suitable templates for building their tertiary structure using NCBI protein BLAST tool. Template structures which have shown sequence identity more than 40% and query coverage more than 90% were modelled using Modeller9v10 48 and sequence which exhibits lower sequence identities with template proteins were modelled using I-TASSER (Iterative Threading ASSEmbly Refinement) server 49 or Phyre2 (Protein Homology/analogY Recognition Engine) server 50 . I-TASSER builds a 3D model of the user given protein sequences based on the multiple threading approach through LOMETS (Local Meta-Threading Server) and iterative template fragment assembly 51 while Phyre2 predicts protein structure based on the remote homology recognition techniques. All the modelled structures were verified using Structural Analysis and Verification Server (SAVeS) for PROCHECK 52 to evaluate their stereo-chemical quality by analyzing residue-by-residue geometry and ERRAT 53 was used to evaluate their overall structural quality. PROVE program was also used to verify our modelled structure. It verifies the structure based on the Z-score deviation.
Database development and organization. All identified potential drug targets were stored in a user friendly open access database named FiloBase. All data were stored in a My-SQL and hosted using an Apache server. The complete flow-chart of the database development was illustrated in Fig. 2. The user interface of the database was prepared in PHP and JavaScript whereas; back-end was supported by PHP and Bio-Perl. The entire database contents was classified in five different search categories, (1) Drug targets of B. malayi and W. bancrofti, User can easily fetch desired information by selecting the query field and enter their query keywords in the given text box from the home page of the database. To make our retrieval system user friendly and more convenient, we have classified all the information and made available by a click on the table menu on the home page of the database (Fig. 3a).

Results and Discussion
Identification of metabolic pathways for wBm. Here, we report first computational approach to identify potential therapeutic targets of wBm by the protein subtractive approach. It was proved as a successful approach to identify the potential drug target proteins which are involved in various metabolic pathways of pathogen, but absent in the host organism and essential for their survival 54 . Overall summary of the project is  Table 1. In this current study, we have considered several vital parameters and systematic approach with a drug prioritization method to come with superior drug and vaccine targets. Thus, identified drug targets supposed to exhibits less side-effect and may signify as a supreme drug or a vaccine targets for the treatment of Lf.
Initially, metabolic pathway information of wBm and human host was collected from the KEGG database. At present, KEGG contains 281 metabolic pathways of human and 65 metabolic pathways of wBm. Following our protocol, we identified 5 unique metabolic pathways and 44 common metabolic pathways of wBm (Supplementary Table 1). Protein sequences from unique (50) and common metabolic pathways (460) of wBm were retrieved from the UniProt database and submitted to NCBI Protein BLAST, against the human. It results 156 proteins that showed "NO HITS" against the human proteome, remaining 311 proteins were excluded from the lists.
Essential proteins of wBm. In order to identify the essentiality of 156 non-homologous protein sequence of wBm, a sequence similarity searched was performed against the DEG database using protein BLAST tool. It resulted, 115 proteins, among these, 24 wBm proteins were belonging to the unique metabolic pathways and 91 proteins were from common metabolic pathways (Supplementary Table 2). These 115 proteins were likely to represents the significant role in metabolic pathways of wBm, non-homologous to human and indispensable for their survival. Further we identified the length of the essential proteins. These initial target protein lists can be used as a potential drug target; however, being non-homologous to human, essential and their involvement in various metabolic pathways is not an only criterion for the selection of potential drug targets. Therefore, to minimize the drug side-effects and for the identification of most potential drug targets of wBm, we further investigated these drug targets for low molecular weight, sub-cellular localization, experimental structure and druggability parameters. Through low molecular weight analysis we identified 10 protein sequences which have sequence length less than 100 kD. As it has been reported earlier by Dutta et. al. that short length sequences have less chance to represent an attractive drug target 55 , we screened this short length protein sequences from the potential drug target list. It results 105 protein sequences. Prioritization of essential non-homologous proteins of wBm. Since, sub-cellular localization of protein plays a major role to understand the protein functions which could be essential for drug discovery and development process 56 . To investigate the sub-cellular localization of these 105 protein sequences; Gneg-mPLoc was used and resultant data was crosschecked with CELLO server. The main objective of this step is to identify the locations of the proteins in the cell which helps to differentiate between the drug and vaccine targets 32 . Protein founds in the cytoplasmic, periplasmic or inner-membrane region was considered as potential drug targets while extracellular proteins were recognized as potential vaccine targets. We identified 101 potential drug targets (Supplementary Table 3) and 4 potential vaccine targets ( Table 2).

Druggability of essential non-homologous proteins of wBm. The druggability of essential wBm
proteins were evaluated based on the assumptions that a druggable protein targets should interact with drug-like compound 57,58 . For this reason, each essential, non-homologous protein of wBm which plays a key role in metabolic pathway were accessed in the standalone BLASTp tool to find the similar drug target homology against the DrugBank and TTD database with an e-value less than 0.005. Proteins which showed hits with the defined cut-off values were recognized as significant homologs 59 whereas, remaining proteins were not therefore excluded from the future study (Supplementary Table 4). Out of 101, we identified 61 drug targets which have shown similarity with the drug targets available in DrugBank and TTD database. Interestingly, 56 drug targets from DrugBank and 25 drug targets from TTD database have shown similarity, among these 20 drug targets were common in both the database. Ultimately, we identified 61 highly potential druggable targets (Table 3) and 4 potential vaccine targets ( Table 2). These proposed therapeutic targets can be utilized for the further experimental studies to develop the most anti wBm therapeutics to treat the Lf.

Data compilation and database development.
Identified drug targets and vaccine targets were stored in a user friendly database and we named it as FiloBase-a comprehensive drug target database for filariasis. The snapshot of the home page and the result page of FiloBase have been shown in Fig. 3.
The inaugural release of FiloBase contains a total no. of 119 potential drug targets of filarial nematodes. We have collected 58 drug targets of B. malayi from the literature survey 23,33 and 61 drug targets from our current study through subtractive proteome approach for wBm. Since, structural information plays an important role for  the drug and vaccine development 60 . These drug targets were stored in our database which can be fetched through the classified tab options present in the left-side of the database page (Fig. 3a). User can also enter their keywords to the query box available at the home page of the database and retrieve desired information. It will fetch the drug target information, related to the user given query (Fig. 3b). All identified drug targets were modelled for their structural information using Modeller9v11, Phyre2.0 and I-TASSER server. Reliability of modelled drug targets was further evaluated using PROCHECK, ERRAT and PROVE server. The verification data of the modelled structure are stored in the model verification tab of the result page of the potential drug targets. User can fetch the desire information from the given link and can further  Table 3. Identified potential therapeutic targets of wBm.

Unique Pathway Proteins (UPP) Analysis Proteins In Numbers Common Pathway Protein (CPP) Analysis Proteins In Numbers
improve the quality of the modelled structure if needed. All verified structures were incorporated in our database with additional information about potential drug targets to render it more enlightening and useful. Possible binding site residues were identified based on the template proteins and structure was shown three dimensionally using JSmol visualize. In sequence tab, solvent accessibility and secondary structure were shown for each potential drug targets. User can get the graphical view of the secondary structure through the link given at the result page of the database. Alignment file of target and template proteins is also given in the result page to database to crosscheck the quality of the modelled structure (Fig. 3). This information may be useful for further development and to get insight into the structural level of the potential therapeutic targets.
In 3D Structure tab, a user friendly JSmol visualizer tool was used to visualize the modelled structure of therapeutics targets. In addition, based on the template, we have identified possible binding sites and listed in the result page. User can click on the active site pockets to explore the binding pockets and get insight into their positions and contributing residues using JSmol button (Fig. 3f). In KEGG tab, we have shown the role of the corresponding drug target in the metabolic pathway (Fig. 3g).
To make our database more informative, we have cross-linked our database with various related databases such as KEGG database for Pathways analysis, sequence similarity database in KEGG (SSDB) for motif search, UniProt and EMBL database for sequence information.
Various  (26), W. bancrofti (5103) and wBm (26)] and stored in our database. These EST sequences of filariasis causing nematodes and wBm bacteria with add-on information will be useful for various groups of researchers.
Due to the advancements in biological sciences, an enormous number of sequence data were generated. Therefore, BLAST has become essential tool for biologists to compare these proteins and nucleotide sequences with largest set of datasets to investigate the similarities between them. For that reason, we have developed a standalone BLAST tool for filariasis and named it FiloBLAST. Query sequences can be entered in the single letter amino acid or in nucleotide code and based on the input sequence type algorithm can be selected. At present, user can BLAST their query sequences against the local copy of EST databases, Protein databases and drug targets from DrugBank databases. For EST databases, we have collected EST sequences from all the filariasis causing nematodes and made a local copy for BLAST search. We have also downloaded EST sequences from NEMBASE4 64 and given a link to the NEMBASE4 database for further information. Likewise, we have also collected data from Protein Data Bank 65 , Swiss-Prot 66 , and DrugBank 67 database and made a local copy so that user can query their nucleotide and protein sequences against these databases to fetch the desired information in a single platform.
For potential and experimentally proved epitope sequences of filariasis causing nematodes were identified through the enormous literature survey and databases such as PubMed and Google Scholar. Various epitope databases were queried (IEDB 68 , BCIpep 69 and SEDB 58 ) to recognize the experimental and potential epitopes. At present, FiloBase contains 62 epitopes; 27 from B. malayi and 35 from W. bancrofti. A quick link has been provided in the home page of the database to fetch this information in a single click. We will timely update our database as earliest filarial data will be reported in the literature or authentic web resources. We have further provided an online data submission tool for users to upload their data in our database. Submitted data will be verified and may incorporate into our database. User can send their suggestions for further improvement of FiloBase.

Conclusions
In summary, owing to the emergence of resistant and limitations of access drugs for Lf, current research interest is focused on the identification of novel drug and vaccine targets to enhance the treatment of Lf. Using comparative proteome subtractive approach, we have pruned those drug targets which may be less effective or may cause severe side-effects. Therefore, proteins which are indispensable to the survival of pathogens and non-homologous to host proteins were considered as potential therapeutic targets. Druggability approach was further applied to the potential drug target data sets to expose the more probable drug targets. Our investigation reveals 61 potential drug targets and 4 potential vaccine target for wBm which could be further validated experimentally through the drug and vaccine design pipelines. Identified drug and vaccine targets were further modelled and then verified for their structural quality. Sequentially, we have enriched these data with essential information which may enlighten and support further research on wBm. All information was stored in a single stop web based platform. We named it FiloBase. We have also collected filariasis related information and records from various literatures and authenticated website to make accessible for the filarial researcher community through a user friendly database. In future, our database can be hereby extended by identifying potential drug targets from other filarial causing nematodes.