Exploring novel mechanistic insights in Alzheimer’s disease by assessing reliability of protein interactions

Protein interaction networks are widely used in computational biology as a graphical means of representing higher-level systemic functions in a computable form. Although, many algorithms exist that seamlessly collect and measure protein interaction information in network models, they often do not provide novel mechanistic insights using quantitative criteria. Measuring information content and knowledge representation in network models about disease mechanisms becomes crucial particularly when exploring new target candidates in a well-defined functional context of a potential disease mechanism. To this end, we have developed a knowledge-based scoring approach that uses literature-derived protein interaction features to quantify protein interaction confidence. Thereby, we introduce the novel concept of knowledge cliffs, regions of the interaction network where a significant gap between high scoring and low scoring interactions is observed, representing a divide between established and emerging knowledge on disease mechanism. To show the application of this approach, we constructed and assessed reliability of a protein-protein interaction model specific to Alzheimer’s disease, which led to screening, and prioritization of four novel protein candidates. Evaluation of the identified candidates showed that two of them are already followed in clinical trials for testing potential AD drugs.

was optimized to generate maximum recall. Figure 1 shows the output sentences generated by following a machine learning approach.

Figure1: Sentences extracted from biological literature using a Machine learning approach trained for relationship extraction.
Now as shown in Figure1, a automated work flow can also generate false positives. So, for the construction of a accurate disease specific interaction network "human curation" is necessary.

Task
You have been provided with sentences (output of a machine learning approach) which are believed to represent Protein-protein interaction(PPi) .
All such sentences should be assigned a label 3 infront of them which means they represent knowledge. Just for your information: Some times it may be difficult to classify what to consider knowledge /what not . In this case ask your self a question is this sentence helping me know some insights into the disease mechanism or can this information be useful for drug discovery purposes. Anything which is useful for us in getting new insight into the disease is knowledge for us.
Task 2: Determine weather information represented in the PPI really belongs to the disease assigned to you(Contextual information).
Suppose you are annotating a corpus of Alzheimer PPI's .Sometimes what happens in abstracts is that there is just mention of Alzheimer as a word in the abstract and the whole abstract is taking about normal brain phenomenon.
So you have to make sure that the sentence is definitely taken from abstract which is taking about alzheimer's disease.
Task 3: add PMID, Date of publication, Journal name and impact factor of the journal information to your annotation.
*Impact factor is one of the most controversial features included in the scoring, as there exist a mixed view among scientists to consider this as a parameter for judging quality. Although finally going on with the consensus and based on the arguments that this factor must be taken into account especially when judging reliability of literature derived information we have included it in our scoring.
Task 4: Classify weather the sentence is a fact or a hypothesis Example: You are living in Bonn Nature is beautiful Angela merkel is a German These are all facts Hypothesis/Speculation: Tomorrow I may go to Koln I don't know if nature will also be beautiful in Sahara desert Angela merkel might again become German chancellor All these sentences are speculations because we are not sure if the are happening or not. Remember words like may, might, could be, suggest etc are used to represent speculation in text So for each PPi sentence you have to check if it is a fact or speculation.
"GRK2 may hyperphosphorylates tau in tauopathies" (Hypothesis) "Presenilin--1 interacts with plakoglobin and enhances plakoglobin--Tcf--4 association" (Fact) Task 5: Check weather the interaction sentence mentioned is a supporting evidence or a contraction If you are considering a interaction of A-B(Two proteins) Then A interacts with B is supporting evidence "In mouse, we found a high avidity binding of Abeta peptides to ACHE." A does not interact with B is a contraction (opposite to supporting evidence) "Our protein interaction experiment argues against interaction between APP and ACHE" Task 6: Classify if a PPi is supported by a invivo evidence or In vitro evidence. In this case you will have to read the full text to check weather the experiment conducted is invivo or invitro.
In Vivo experiment refers to a medical test, experiment or procedure that is done on a living organism, such as a laboratory animal or human.
In Vitro refers to experiments done within a glass or culture medium, observable in a test tube, in an artificial environment outside the living organism; Or Biological processes or reactions that would normally occur within an organism but here are made to occur in an artificial environment, i.e. A laboratory You must also annotate the "interaction detection method" used to confirm a particular PPi existing in literature. Most cases particular PPI is backed up with multiple interaction detection methods. Please mention all of them with the interaction separated by comma along with their ID's as mentioned in PPIO ontology (http://bioportal.bioontology.org/ontologies/PPIO) to make sure you are annotating a relevant PPI detection method.

Supplementary file 2 Guidelines for ranking of various knowledge bins from an expert point of view
The aim of this survey is to rank various parameter combinations, which provides confidence to extracted 'Protein-protein interactions related knowledge' present in literature.
Using different parameter combinations (bins), we want to create a scoring function that provides a rational for confidence assessment of a protein-protein interaction reported in literature. Hence, leading to differentiation of established knowledge (supporting information and contradictions), emerging knowledge and novel predictions.
You have been provided below a list of 12 bins that you have to rank starting from maximum to minimum priority.
Each bin presented below represents a particular type of evidence supporting proteinprotein interactions (PPI) and it is composed of the following entities: ! Fact: Sentence mentioned in literature is a fact e.g. o FGF-20 selectively activates tyrosine hydroxylase in calbindinnegative neurons. ! Hypo: Sentence mentioned in literature is a speculation or hypothesis e.g.
o FGF-20 selectively might activate tyrosine hydroxylase in calbindinnegative neurons. ! Invivo: The experiment showing PPI was conducted in human or animal (mouse or rat) as model organism. ! Human: Experiment performed in humans ! Animal: Experiment performed using mouse or rat as a model organism. ! Invitro: Experiment performed in a controlled environment outside of a living organism. (http://mmbr.asm.org/content/59/1/94.full.pdf) o Physical methods: Physical methods used for Protein protein interaction includes: " Protein affinity chromatography " Affinity bloating " Immunoprecipitation " Cross linking o Library based methods: Library based methods used for Protein protein interaction includes: " Protein probing " Phage display " Two hybrid system o Genetic methods: Genetic methods used for Protein protein interaction includes: " Extragenic suppressors " Synthetic lethal effect " Over production phenotypes o Others : All other methods (in-silico etc) apart from one listed above will be mapped to this class. Note: Although, the techniques mentioned under invitro section (Physical, library based and genetic methods) are all done in laboratory but you are required to rank all of them based on the amount of trust you have in them.
A 'Bin' is composed of different combinations of above exemplified entities represent the following: Fact|Invivo|Human: A evidence or statement extracted from given article is a fact supported by the study mentioned within the same article and the model organism for this Invivo experiment is human Fact|Invivo|Animal: A evidence or statement extracted from given article is a fact supported by the study mentioned within the same article and the model organism used for this Invivo experiment is animal(Mouse/Rat) Hypo|Invivo|Human: A evidence or statement extracted from given article is a Hypothesis supported by the study mentioned within the same article and the model organism is human Hypo|Invivo|Animal: A evidence or statement extracted from given article is a hypothesis supported by the study mentioned within the same article and the model organism for this Invivo experiment is animal(Mouse/Rat) Fact|Invitro|Physicochemical methods: A evidence or statement extracted from given article is a fact supported by the study mentioned within the same article and the study mentioned used a Physical techniques for PPI detection Hypo|Invitro|Physicochemical methods: A evidence or statement extracted from given article is a Hypothesis supported by the study mentioned within the same article and the study mentioned used a Physical techniques for PPI detection Fact|Invitro|Library based methods: A evidence or statement extracted from given article is a fact supported by the study mentioned within the same article and the study mentioned used Library based methods for PPI detection Hypo|Invitro|Library based methods: A evidence or statement extracted from given article is a Hypothesis supported by the study mentioned within the same article and the study mentioned used Library based methods for PPI detection Fact|Invitro|Genetic methods: A evidence or statement extracted from given article is a fact supported by the study mentioned within the same article and the study mentioned used Genetic methods for PPI detection Hypo|Invitro|Genetic methods: A evidence or statement extracted from given article is a hypothesis supported by the study mentioned within the same article and the study mentioned used Genetic methods for PPI detection Fact|Invitro|other: A evidence or statement extracted from given article is a fact supported by the study mentioned within the same article and the study mentioned used other methods (in-silico or something else) apart from one mentioned above for PPI detection Hypo|Invitro|other: A evidence or statement extracted from given article is a fact supported by the study mentioned within the same article and the study mentioned used other methods (in-silico or something else) apart from one mentioned above for PPI detection You are requested to rank all of these bins (representing confidence of statements derived from literature with evidences) based on your expert opinion. Hence considering the above-mentioned details please assign suitable rank to each bin. To show you how you can rank these parameters based on your expertise, please see the below table that has been assigned ranks to demonstrate an example:

S.No
Bin Rank 1 Please assign a rank to the bin as described above based on your experience and priority:

Supplementary file 3
Protein interaction network specific to Alzheimer's disease, which consists of 301 nodes (proteins) and 339 edges (protein interactions). Each edge present in the network is directed and has been assigned with a reliability score. Figure 3: Protein interaction network specific to Alzheimer's disease. To open the file with the Cytoscape; please follow these steps: 1. Download and install the appropriate Cytoscape version 2. Open Cytoscape 3. Please go to the file menu and select import Network(multiple file types) option. 4. Please browse through the downloaded network file. 5. Then select the network file and you will see the network in your cytoscape window.

Supplementary file 4
The random network constructed from test corpus annotation is available for download at the following URL: http://www.scai.fraunhofer.de/de/geschaeftsfelder/bioinformatik/downloads.html Network file available in (.xgmml format) can be opened and visualized using Cytoscape_Version2.8.3 onwards.
To open the file with the Cytoscape; please follow these steps: (also called apolipoprotein E receptor 2 or ApoER2), which is predominantly expressed in brain, might be associated with Alzheimer's disease.
[PMID: 12399018] • We also found that expression of LRP8 increased APP association with lipid rafts and increased gamma--secretase activity, both of which might contribute to increased Abeta production.

GRK5
• Recent studies have indicated the possible involvement of GRK, primarily altered GRK2 and GRK5, dysfunction in the pathogenesis of AD.
[20730384] • Altogether, these findings indicate that GRK5 deficiency accelerates β-amyloidogenic APP processing and Aβ accumulation in APPsw mice via impaired cholinergic activity and that presynaptic M2 hyperactivity is the specific target for eliminating the pathologic impact of GRK5 deficiency. [21041302] • GRK5 alteration may further increase beta amyloid production in Alzheimer's disease and exaggerates brain inflammation, possibly even the basal forebrain cholinergenic degeneration