Neurodegeneration and Cancer: Where the Disorder Prevails

It has been reported that genes up-regulated in cancer are often down-regulated in neurodegenerative disorders and vice versa. The fact that apparently unrelated diseases share functional pathways suggests a link between their etiopathogenesis and the properties of molecules involved. Are there specific features that explain the exclusive association of proteins with either cancer or neurodegeneration? We performed a large-scale analysis of physico-chemical properties to understand what characteristics differentiate classes of diseases. We found that structural disorder significantly distinguishes proteins up-regulated in neurodegenerative diseases from those linked to cancer. We also observed high correlation between structural disorder and age of onset in Frontotemporal Dementia, Parkinson’s and Alzheimer’s diseases, which strongly supports the role of protein unfolding in neurodegenerative processes.


Results
In this work, we used the cleverMachine approach (available at http://www.tartaglialab.com/cs_multi/submission) 10 to analyse physico-chemical features of proteins associated with Schizophrenia, Alzheimer's and Parkinson's diseases as well as colorectal, lung and prostate cancers. Analysis carried out with the boxplotter algorithm (accessible at http://www.tartaglialab.com/boxplotter/submit; Table S1) reveals that genes up-regulated in CNS disorders code for proteins that are poorly abundant at physiological conditions (human reference proteome) 11,12 , indicating that expression is significantly increased in the disease state (down-regulated genes follow the opposite trend; Fig. 2A-C). By contrast, genes up-regulated in colorectal, lung and prostate cancers are associated with proteins that are already abundant in the reference proteome (down-regulated genes follow the opposite trend; Fig. 2D-F). The finding that genes associated with different diseases are constitutively expressed at specific levels suggests a link with physico-chemical features of their product products 8,13 . As a matter of fact, previous reports indicate that protein abundance is intrinsically constrained by solubility 8,9,14,15 , unfolded polypeptides are poorly expressed 16,17 and nucleic-acid binding proteins are highly abundant 18,19 (Table S1).  2 focused on transcripts that are up-regulated in central nervous system (CNS) and down-regulated in cancer or vice versa (i.e., intersection between gene sets). Our study deals instead with sets of genes that are either up-regulated or down-regulated in cancer and CNS diseases (i.e., symmetric difference between gene sets). We found that structural disorder strongly differentiates cancer types and CNS diseases (p-values < 10 −5 ; http://www.tartaglialab.com/cs_multi/confirm/524/36563b35ee/). Evidence for this conclusion is presented in Fig. 3, where we compared 18000 genes (~75000 protein isoforms) using ten disorder predictors 10 . For each CNS disease, we found that up-regulated genes are significantly enriched in intrinsically unfolded proteins (17 out of 18 of protein sets follow the trend giving an overall signal strength of 17/18 = 0.94; p-values < 10 −5 ; Fisher's exact test; Fig. 3A), while down-regulated genes contain more structured polypeptides (signal strength = 18/18), in agreement with DisEMBL disorder predictions 20 (see Material and Methods). Comparing genes up-and down-regulated in cancer types and CNS diseases, we observed that structural disorder propensity anti-correlates with order-promoting features such as alpha-helix (31 out of 36 predictors show opposite trends resulting in a score of − 31/36 = − 0.86) and beta-sheet (− 0.91) propensities. Increase in disorder is also significantly associated with depletions in burial (predictors agreement = − 0.77), hydrophobicity (− 0.55) and membrane propensities (− 0.47) 21 . By contrast, proteins up-regulated in colorectal and lung cancer are enriched in nucleic-acid binding ability (8 out of 12 sets follow the trend, while the remaining 4/12 do not show significant enrichments; Fig. 3B), which is in line with evidence showing that transcription factors such as p53 play a major role in oncogenesis 22 . Interestingly, prostate cancer shows significant up-regulation of membrane proteins (e.g. NGEP-L), as previously reported in other studies (3 of 6 sets follow the same trend, while the remaining 3/6 do not show significant enrichments; Fig. 3C) 23 . Proteins up-regulated in colorectal and lung cancer (vertical arrows; green dots) have increased nucleic acid propensity (down-regulation is associated with decrease). (C) Membrane propensity differentiates between CNS diseases and proteins up-regulated in prostate cancer. Genes up-regulated in prostate cancer show increased membrane propensity (vertical arrow; green dots; down-regulation is associated with opposite trend). Red: a particular CNS disease is enriched with respect to a cancer type in structural disorder (A), nucleic-acid binding propensity (B) or membrane propensity (C); Green: a cancer type is enriched with respect to a particular CNS disease in structural disorder (A), nucleic-acid binding propensity (B) or membrane propensity (C); Yellow: non significant enrichment; Each enrichment is associated with a p-value < 10 −5 calculated with Fisher's exact test; AD: Alzheimer's disease; PD: Parkinson's disease; SCZ: Schizophrenia; CRC: Colorectal cancer; LC: Lung cancer; PC: Prostate cancer; UP/DOWN: over/underexpression with respect to healthy control samples.
GO annotations suggest that proteins containing disordered regions are abundant in colorectal, lung and prostate cancers, although their enrichment is less significant than in Schizophrenia, Alzheimer's and Parkinson's diseases. To test this hypothesis, we generated random groups of human genes (same number of proteins as in the original sets) and compared their features with those of cancers and CNS diseases. We found that structural disorder is indeed enriched in both up-regulated and down-regulated cancer proteins (19 out of 36 down-and up-regulated sets follow the trend and 13/16 do not show significant enrichments; p-values < 10 −5 ; Figure S1A), although the signal is stronger for Schizophrenia, Alzheimer's and Parkinson's diseases (18/18 up-regulated gene sets are enriched in disorder and 16/18 down-regulated sets are depleted; Figure S1B), in agreement with our original findings (Fig. 3A). We also observed that nucleic acid propensities are enriched in cancers (15/18 sets show significant increase and three are non-significantly enriched) and CNS diseases (15/18 sets have significant increase and one is non-significantly enriched), but signal strength is higher for cancers (Fig. 3B).
To further investigate the intimate connection between CNS diseases and structural disorder, we analysed 428 mutations of proteins involved in Frontotemporal Dementia, Alzheimer's and Parkinson's diseases (available at http://www.molgen.ua.ac.be/ADMutations/ and http://www.molgen.vib-ua.be/ PDMutDB/). We observed a strong correlation (Pearson's correlation = − 0.9; p-value < 10 −3 ) between age of onset and disorder 24 , which, in agreement with GO analysis, indicates that reduction in folding efficiency is a key factor in neurodegeneration (Fig. 5). In line with this observation, previous reports indicate that intrinsically unfolded proteins such as α -synuclein (Parkinson's disease 25 ), Aβ 42 (Alzheimer's disease 26 ) and DISC1 (Schizophrenia 27 ) cause neuronal damages by assembling into amyloid fibrils. As proteomic analyses indicate that amyloid-forming proteins have an intrinsic propensity to attract disordered proteins 26 , it is possible that neurotoxicity arises from direct co-aggregation of proteins that have unfolded regions available for promiscuous interactions. Thus, up-regulation of disordered proteins might be the consequence of a cellular response to compensate progressive sequestration in amyloid deposits. To investigate this hypothesis, we compared proteins sequestered by amyloid fibrils 26 and those deregulated in Alzheimer's disease. The cleverMachine analysis 10 indicates that proteins binding to amyloid aggregates are not physico-chemically dissimilar to those up-regulated in Alzheimer's disease (see http://www.tartaglialab.com/cs_multi/cc_runs/622/; Figure S3), which strongly tightens the link between misfolding and neurodegeneration. In line with this findings, very recent reports showed that increase in protein insolubility is associated with massive accumulation of natively unfolded proteins 28 .

Conclusions
It has been shown that structurally disordered proteins are tightly regulated by the cell 29,30 and their uncontrolled over-expression triggers pathological conditions such as for instance cardiovascular diseases and diabetes 31 . In this study, we reported the finding that genes up-regulated in CNS diseases are more enriched in disordered protein products than cancer genes, which has important implications for the etiopathogenesis of neurodegenerative diseases. As a matter of fact, changes in the abundance of unfolded proteins induce re-wiring of protein networks and promote formation of aberrant interactions 32 leading to association with amyloid deposits 26 . As genes up-regulated in prostate, colorectal and lung cancer code for proteins that are less disordered than those up-regulated in CNS diseases and more unfolded than those down-regulated in CNS diseases, we cannot exclude the possibility that structural disorder might play a role in cancer, although to a lesser extent. Indeed, unregulated promiscuity of unfolded proteins can trigger fatal events leading to cell death signalling 29 . For instance, in the case of the Bcl-2 family of apoptosis regulators, aberrant expression of intrinsically disordered proteins can determine different cell fate decisions through alteration of interaction networks 33 (we note that Bcl-2 is up-regulated in CNS disorders and down-regulated in cancer 2 ).
Our results do not indicate that aggregation is uniquely linked to neurodegeneration. Indeed, although amyloid fibrils sequester natively unfolded proteins 26 , which are particularly abundant in brain regions 34,35 , some cancer types are associated with protein aggregation 36 and protein deposits influence cell survival in the context of several tumors, especially those that are metastatic. For example, co-aggregation of toxic amyloid-β peptide (Aβ ) and TGF-β -induced antiapoptotic factor (TIAF1) is a hallmark of metastatic cancer cell mass 37,38 . Expression levels of TIAF1 vary throughout the metastatic spread, being up-regulated in developing tumors and down-regulated in established metastatic cancer cells 37 . In a number of cases, aggregation of specific genes is associated with both CNS diseases and cancer types. For instance, aggregation of superoxide dismutase SOD1 causes cellular death in amyotrophic lateral sclerosis 39 . Yet, SOD1 has also a role in breast cancer and an ability to augment estrogen-responsive gene expression 40 . Similarly, DNA-binding domain of p53 is conformationally unstable and the majority of disease mutants are known to increase structural disorder 41 . Upon aggregation, mutant p53 not only induces misfolding and co-aggregation of wild-type p53, but also of its paralogues p63 and p73 into cellular inclusions, causing inefficient transcription of target genes, which, in turn, is crucial for cell growth control and apoptosis 42 .
In conclusion, our analysis is one of the first attempts to illustrate how an epidemiological observation on inverse comorbities 2 can be rationalized in terms of physico-chemical features of proteins encoded by deregulated genes. We cannot exclude that additional factors, including age of disease onset and drug treatment, could influence the expression patterns associated with disease. As a matter of fact, drugs used in the treatment of neurodegenerative diseases, such as for instance thioridazine 43 , have been shown to display anti-tumor effects while anti-tumor drugs, such as cyclin-dependent kinase inhibitors 44 and mithramycin 45 are neuro-protective. Yet, these findings reinforce the existence of a link between cancer and CNS diseases and indicate that future studies will have to focus on specific molecular pathways 46 .

Materials and Methods
Gene sets were taken from the paper by Ibáñez et al. 2 : Alzheimer's disease (AD); Parkinson's disease (PD); Schizophrenia (SCZ); Colorectal cancer (CRC); Lung cancer (LC); Prostate cancer (PC). Results can be accessed at http://www.tartaglialab.com/cs_multi/confirm/524/36563b35ee/. Examples of our calculations are at http://www.tartaglialab.com/cs_multi/confirm/240/6be82069c3/. Comparison with random sets can be found at http://www.tartaglialab.com/cs_multi/confirm/576/ef217f98eb/ (CNS diseases) and http://www.tartaglialab.com/cs_multi/confirm/602/cfc3e02cdc/ (cancers). Classification of disordered proteins interacting with amyloid fibrils is available at http://www.tartaglialab.com/cs_multi/ cc_runs/622/. cleverMachine. The cleverMachine (CM) algorithm analyses physico-chemical properties of two protein datasets 10 . The tool creates profiles, or physico-chemical signatures, for each protein, utilizing a large set of features -both experimentally and statistically derived from other tools. In our analysis we used a number of physico-chemical properties (hydrophobicity, alpha-helix, beta-sheet, disorder, burial, aggregation, membrane and nucleic acid-binding propensities) and 10 propensity predictors per feature. Only differentially enriched properties were used in the calculations. Further information can be found at http://s.tartaglialab.com/page/clever_suite. multiCleverMachine analysis. The multiCleverMachine (multiCM) extends the concept of binary comparisons used in CM by introducing more set groups. After submission of one or more inputs for signal and one or more inputs as negative group, the multiCM creates a CM run for each possible combination of elements from the signal and negative sets. The result is presented in an easy-to-read format, allowing at a glance interpretation of the CM submissions (Fig. 1). Each of the individual CM runs is linked on the multiCM page, allowing further in-depth analysis. The multiCM provides visualisation of enrichment strengths per group, enabling to see easily for which groups the various properties like disorder, alpha-helical propensity, etc. are enriched. Details about this new method are available at http:// www.tartaglialab.com/cs_multi/submission. DisEMBL analysis. In order to validate our CM analysis, we used DisEMBL 20 (http://dis.embl.de). As DisEMBL provides disorder profiles for each of the properties, the analysis was carried out as follows. For each of the profiles, we calculated proportion of the sequence that was above the significance threshold defined by the authors, which yielded strength score for each individual entry. The scores were then averaged to compare individual sets. To visualize strength comparisons, we use the same set of colors as described in Fig. 1 (see multiCleverMachine analysis): if the set on the left (cancer) has enrichment, the color is green and red otherwise. Our results are available at http://www.tartaglialab.com/static/2014/ disembl_analysis.html.