Introduction

Von Hippel-Lindau (VHL) syndrome is a hereditary predisposition to develop several cancers resulting from pathological inactivation of the von Hippel-Lindau protein (pVHL)1,2,3. pVHL is the product of the same gene located on chromosome 3p25 and constantly transcribed in both fetal and adult tissues4. Two different alternatively-spliced isoforms were initially identified5. pVHL30 contains all 213 residues of the VHL gene, whereas pVHL19 lacks the first 53 residues due to an alternative translation start site5. Both isoforms are biologically active, binding elongins B and C and cullin 2 to form an ubiquitin E3 ligase complex known as VCB6,7. The main pVHL function is ubiquitin-mediated degradation of hypoxia-inducible factor 1-alpha (HIF-1α)3 and pVHL activity is crucial in the oxygen sensing pathway. Under physiological oxygen concentrations, pVHL targets HIF-1α for proteosomal degradation. In hypoxic conditions HIF-1α escapes ubiquitin-mediated proteolysis and translocates to the nucleus, where it activates many genes involved in angiogenesis, oxidative metabolism, cell survival, and cancer progression3,8. Several other cellular functions not directly related to the pVHL/HIF-1α axis are also reported9,10,11,12,13,14,15, e.g. external matrix deposition, drawing a complex scenario for pVHL in cells and tissues. Numerous efforts addressed the specific pVHL molecular pathway16,17, describing pVHL as a molecular hub, mediating interactions with more than 200 different proteins18. Recently, a third pVHL isoform of unknown function was reported in the literature19, making an interpretation of pVHL’s molecular role even harder. pVHL has no significant sequence identity to other human proteins, but is well conserved within mammals20. Even between mammals pVHL shows important differences. The main distinction resides in the N-terminus of pVHL30, which is disordered21 and contains many copies of an acidic pentamer in human and other higher primates, while being shorter and lacking the disordered N-terminal tail region in lower mammals22. VHL syndrome is characterized by the development of several generally benign tumors, which affect specific target organs, such as the retina, epididymis, adrenal glands, pancreas and kidneys1,23,24. It is considered a severe autosomal dominant genetic condition with inheritance of one in over 35,00025. Defects of pVHL function are not limited to the sole VHL syndrome. It is thought that pVHL tumor suppressor loss of function is present in ca. 75% of clear cell renal cellular carcinomas (ccRCC) not directly related to VHL syndrome26. Recent studies also suggest a role for pVHL in p53 tumor suppressor regulation27,28. Kidney-specific pVHL inactivation causes the development of kidney cysts in a mouse model29, while reintroduction of a wild type gene interrupts malignant progression30. A number of experimental and in silico data of proteins involved in pVHL tumorigenesis is reported9,13 and contained in large databases, such as IntAct18, STRING31 and BioGRID32. It is thought that pVHL has at least four different protein-protein interaction interfaces (A to D)13. Several specific interactors were found for each interface and correlation with functions other than oxygen sensing, such as DNA-damage repair33, microtubule dynamics34 and oxidative metabolism, reinforce the pivotal role of pVHL. As the amount of details known about pVHL function is rapidly increasing, the multiple pVHL roles may confound our understanding of this complex protein. Knowledge is usually derived from freely accessible protein sequence and function databases. Although valuable, these universal resources are generalist by design, yielding a strong fragmentation of the huge amount of pVHL data. For a non-bioinformatician, scattered information represents one of the biggest hurdles, slowing down a holistic understanding of the pVHL biological role. Here we present VHLdb, a novel resource providing expert curation for the pVHL tumor suppressor. The database was primarily designed to be effective for a non-expert, making information retrieval easier. Overall, VHLdb accounts for 478 unique interactors in two curation levels (manual and automatic), with data retrieved from different sources. Detailed information on the pVHL interaction interface and post-translational modifications were also included. A feedback function allows inclusion of novel information from experts in the field wishing to contribute annotation on interactors or mutations. Finally, a downloading tool is also provided for data sharing.

Database Description

Mutation data

Germline and somatic mutations have been collected from35,36,37,38, integrated39,40,41 and annotated with predictions on protein stability. The final dataset is made up of 1,074 mutations and, to the best of our knowledge, represents the largest publicly available repository of pathogenic pVHL variants. An example of mutation details is given in Table 1 and Fig. 1. Where possible, a pVHL interacting surface has been defined for each mutation. E.g. frameshift mutations cannot be assigned to any surface due to their intrinsic nature. Solvent accessibility has been computed for each mutation using DSSP42 and mutated residues are defined exposed when at least 20% of their surface is accessible to solvent. Bluues43 and NeEMO44 have been run on all possible mutations, using the pVHL 3D structure with PDB code 1LM8 as reference. Current pVHL 3D structures cover only the structured part of the protein (i.e. alpha- and beta-domains), lacking the first 60 residues which form an intrinsically disordered tail. Pathogenicity assessment for mutations in this segment was not included in VHLdb to avoid the risk of erroneous interpretation from low confidence predictions. Bluues43 calculates the electrostatic properties of a protein and is able to predict electrostatic properties of mutated solvent exposed residues. NeEMO44 evaluates stability changes caused by amino acid substitutions using a machine learning based approach from structure. It has been run on all point mutations of the crystallized protein, i.e. again excluding only the N-terminus.

Table 1 Example of mutation data contained in VHLdb. For each mutation, codon, effect on protein, NeEMO prediction, disease and Pubmed id is given.
Figure 1
figure 1

Example of a mutation as displayed in VHLdb.

For each mutation all available details are listed (i.e. coding variant, effect on protein, type of mutation, pVHL surface involved, solvent accessibility, phenotype, thermodynamic predictions and reference) and visualized as a red sphere on the surface-colored pVHL structure.

pVHL interactome

The pVHL interactome has been defined starting from searches in publicly available databases. VHLdb contains two levels of annotation for interactors, automatic and manual (Fig. 2). Automatic annotations are denoted by an empty silver star and build the overall pVHL interactome, albeit at a lower confidence level. Manually curated pVHL interactors, represented with a a gold star, have been annotated with the exact molecular details and their functional meaning.

Figure 2
figure 2

VHLdb home page and VHLdb manually curated interactors set.

(A) VHLdb home page. On the left, a column with version, statistics and useful links. On the right, a clickable image which redirects to the pVHL interacting proteins page. (B) Manually curated pVHL interacting proteins sorted by interacting surface. Proteins labelled with modification are the ones which bind pVHL upon post-translational modification. Proteins for which no interacting surface could be determined are labeled unknown.

The automatic pVHL interaction network has been generated with queries to the STRING31, BioGrid45, iHOP46, MIPS47 and IMEx48 databases. STRING and Biogrid are two of the most popular protein-protein interaction databases. The IMEx Consortium is a long-term coordination project which currently contains twelve interaction databases. MIPS is a database of mammalian interacting proteins while iHOPS is a text-mining based resource parsing the PubMed database for possible statements on a target protein interaction. Both are presented in a human readable format and their data is not associated with a confidence score. All interactions from IMEx, STRING are annotated with this measurement, while BioGrid interactions are poorly annotated. When available, this score is reported in the interactor page so the user can easily assess the interaction quality. The five resources have been queried through the standard user interface using the most general terms, i.e. “VHL” or “pVHL”. In all cases, only human interaction data was considered. The results from the different sources have been merged and processed to remove duplicates. Annotation from UniProt49, PDB50, Gene Ontology51, Pfam52 and MobiDB21 has been added. Searches in interaction databases allowed us to build the full network, currently containing 478 proteins.

Manually curated pVHL interactions

The manually curated high quality pVHL interaction network is currently composed of 117 proteins. 35 come from a previous publication13 while the others have been annotated and are presented in this work (see Table 2). Data curation was performed by each expert following an in-house standardized protocol to guarantee reproducibility and correctness. In detail, the manual curation workflow considers a preliminary search in Pubmed53 and Uniprot49 using pVHL-related keywords (e.g. “VHL syndrome”, “pVHL AND ccRCC”) adapted to the interactor under investigation. Keywords were manually selected by curators using the most common keywords found in the VHL syndrome literature, e.g. angiogenesis, proteasome degradation, oxygen sensing. In case of proteins with different synonymous names (e.g. the EGLN protein family also known as PHD) multiple searches were performed. The final nomenclature for each VHLdb entry was chosen using the official HGNC consortium name. Interaction details have been manually extracted from the literature. Pubmed has been searched for papers describing either structural details of the interaction (e.g. pVHL and target protein residues, sequence motifs and domains) and their functional implications. An example of structural details of the interaction is given in Fig. 3. Upon identification, each interactor has been analyzed with Consurf 54 to assess sequence conservation as well as PRISM55 and Crescendo56 to predict the spatial localization of the interaction at the residue level. Presence of linear sequence motifs, known to be relevant in protein-protein interactions, post-translational modification or enzymatic cleavage was performed with ELM57. The interaction surface was assigned following our classification13 as summarized in Table 3.

Table 2 Example of manually curated interactors.
Table 3 Distribution of VHLdb interactors and mutations by pVHL interacting surface.
Figure 3
figure 3

Example of manually curated interaction annotation.

For each of the manually curated pVHL interacting proteins, informations from the manual curation process are listed and, if the pVHL interacting residues are known, displayed in an interactive window.

Implementation

VHLdb uses separate modules for data management, processing and presentation. Figure 4 shows a schematic representation of the whole application. To eliminate the need for data conversion, simplifying development and maintenance, all modules share the JSON (JavaScript Object Notation) format to exchange data. The MongoDB database engine is used for storage and Node.js as middleware between data and presentation. VHLdb exposes its resources through a RESTful interface, using the Restify library for Node.js. At the time of writing, VHLdb supports a custom REST API, the search-route, as detailed in the Help page. The user interface is implemented using the Angular.js framework and Bootstrap library. These libraries provide a mobile-ready interface, allowing VHLdb to be natively accessed from any kind of device. Structural annotations are displayed with the Web-GL based molecular viewer PV58. Custom molecular views have been developed. An “interaction viewer” has been implemented in the entry page to display interaction data and a “mutation viewer” has been implemented in the mutations page. The former allows the user to visualize the pVHL residues interacting with a manually curated interacting protein by highlighting the interacting region on the pVHL structure. The latter displays the location of any mutation on the pVHL structure as a sphere, allowing the user to visually access the structural location of a mutation. VHLdb allows direct download of all pVHL interactions, as well as mutations. The database offers both a graphical web interface and RESTful web services from the URL: http://VHLdb.bio.unipd.it/.

Figure 4
figure 4

Schematic representation of the VHLdb implementation schema.

Black arrows represent the data flow from curators to end users. The gray arrow represents the feedback function VHLdb offers to the end users to report an entry, submit new data o more simply contact the curators.

Results

Using VHLdb

VHLdb offers simple yet powerful ways to access its data. First, the navigation bar on top of the home page allows the user to access the mutation or interaction page. The home page features a clickable map, redirecting the user to interface-specific pVHL interaction lists (Fig. 2). The mutation page lists all coding variants (sorted by codon) in a user-friendly searchable, filterable and downloadable table, as well as the previously described mutation viewer. The interaction page features a graphical representation of the manually curated pVHL interaction network organized by interacting surface and a sortable, searchable and filterable table, similar to the mutations one listing all protein-protein interactions. The third element of this page is a table showing Gene Ontology (GO) enrichment analysis results for each surface and GO tree. This page allows download of the complete pVHL interaction set in four different formats (JSON, XML, CSV and TAB separated). Details of any protein can be accessed from the interaction page. This page shows all available annotations for a particular pVHL interacting protein including general annotations from UniProt, manually curated interaction details (if available), sequence annotation from Pfam and MobiDB, structure annotations from PDB, functional annotation from GO and references from PubMed. All these data can be downloaded in a protein-specific way in the formats specified above. A feeedback form is accessible from this page and can be used to report inconsistencies or suggest annotations for a specific pVHL interacting protein. Another way to give feedback and request data submission is the contact page accessible from the navigation bar, featuring two distinct submission forms, for general feedback and specific data submission requests. These messages are manually reviewed by our curators and after validation, i.e. confirmation of user-suggested literature, the proposed data will be added to VHLdb.

VHLdb statistics

VHLdb collects data on 478 pVHL interacting proteins and 1,074 pathogenic somatic or germline pVHL mutations. In total, 117 of 478 pVHL interacting proteins were manually reviewed and constitute the core curated pVHL interaction network. The remaining proteins constitute the automated low confidence pVHL interaction network. For 62 proteins of the core set it was possible to identify the interacting surface (see Table 3). For 55 proteins it was possible to identify the pVHL residues involved in the interaction and for 10 the residues of the interaction partner as well. For 51 proteins we also defined whether the interaction between pVHL and any other protein is direct or not. Table 2 shows a more detailed listing of the manually curated VHLdb protein set. Statistical analysis shows that the interactor distribution differs among the four pVHL interfaces. Interface A presents 9 exclusive interactors, distributed between sub-interfaces A1 and A2, and is known to bind elongins B and C and cullin 2 to form the VCB complex6. Interacting proteins in this region compete with elongins B and C, highlighting pVHL functions beyond the well known HIF-1α degradation. We also found that 190 mutations affect this area, yielding three different VHL phenotypes. E.g. Guanine nucleotide-binding protein subunit beta-2-like 1 and E2F transcription factor 1 (UniProt codes: P63244 and Q01094, respectively) are both known to promote cell cycle progression under different stimuli. A simple database search shows that the two proteins rely on the same interaction interface, suggesting a correlated role, at least for pVHL binding. Their interaction with the same pVHL surface suggests a pivotal pVHL role in controlling cell cycle progression under different stimuli and oxygen concentrations. Similar results were found for the remaining interaction interfaces. In detail, 39 interactors were found for interface B, 6 for interface C and one interactor for interface D, for a total of 827 different mutations distributed among interaction interfaces. Interface B is the HIF-1α binding region and characterized by the largest number of interactors. As a further example, we found that proteins such as tubulin beta, collagen alpha-1(IV) and kinesin bind sub-interface B2 showing that molecular details of functions related to endothelial matrix regulation15 should correspond to this specific interaction area.

Conclusions

We have presented VHLdb, a novel database collecting curated information on pVHL interactors and mutation effects. It provides comprehensive information of pVHL interactors derived from different sources as a unique structured resource. As detailed information about VHL disease is rapidly increasing, this huge amount of information is scattered in different generalist resources and not promptly reachable by a non-expert user. We expect the VHLdb to be useful for both experimentalists seeking to study pVHL biology in greater details and clinicians aiming to understand the effects of novel pVHL variants. An intuitive pVHL oriented user interface was designed and four different output formats are provided to facilitate data retrieval. VHLdb is also effective for the qualitative study of pVHL pathogenic mutations and interacting proteins. From a total of 478 different interactors, 62 were mapped on the corresponding interaction interface. Moreover, 1,074 somatic and germline pathogenic mutations are reported, increasing the previous set of pathogenic pVHL mutations35. This can be particularly helpful for future mutation-correlation studies. Information in VHLdb may serve the scientific community to decipher data derived from tumor genome sequencing projects59 as well as to provide high quality data to be included in predictive genomics studies60. Updates such as error reports and submissions of new data to VHLdb are highly encouraged from the community through the implemented feedback function. For the future, it is envisaged the VHLdb will include more annotations, such as distinct causal relationships between mutations and affected pathways.

Additional Information

How to cite this article: Tabaro, F. et al. VHLdb: A database of von Hippel-Lindau protein interactors and mutations. Sci. Rep.6, 31128; doi: 10.1038/srep31128 (2016).