Vijay S. Pande is in the Departments of Chemistry and Structural Biology, Stanford University, Stanford, California 94305, USA. pande@stanford.edu
A new algorithm can predict the propensity of proteins to aggregate.
Many diseases have been linked to protein aggregation, including Creutzfeldt-Jakob, Alzheimer, Huntington and Parkinson diseases1. With the growing list of aggregation-related diseases, it is tempting to ask whether protein aggregation is a universal phenomenon. In other words, are there common aggregation-related aspects to these diseases? The strongest way to test a hypothesis is to use its predictive aspects; this approach also has the important and imminently practical implication of creating a technology with numerous diagnostic applications. The work of Serrano and coworkers2 in this issue is a beautiful test of the hypothesis of the existence of universal aspects in protein aggregation−related diseases. By developing TANGO, a novel means for identifying -sheet aggregation propensity, the authors have found that this propensity can be predicted. Additionally, they show that this propensity can predict which peptides and their mutants will aggregate in various amyloidogenic diseases.
Several other recent studies have provided tantalizing evidence that common aggregation-related characteristics exist among diseases. For example, Dobson and coworkers1 have shown that many different proteins that are normally well-behaved can be induced to aggregate. Moreover, Glabe and coworkers3 have reported that an antibody raised against aggregates involved in Alzheimer disease (A oligomers) can also bind oligomers of peptides believed to be involved in several diseases, including Parkinson (-synuclein) and Huntington (poly-glutamine) diseases and type II diabetes (islet amyloid polypeptide), but does not bind to the monomers or fibrils of any of these peptides3. If an antibody recognizes this broad range of protein aggregates, it may indicate that protein aggregate structures are similar in some way.
How does TANGO work? It is informative to first look at a precursor of TANGO, AGADIR4, which addresses the helical propensities of peptides through a statistical mechanics perspective. AGADIR uses statistical mechanics to calculate the relative probability of finding a helical versus coiled state for a given peptide sequence, with just a few empirical parameters. The sequence-specific elements are critical, as different side chains will lead to different helical propensities. In a sense, TANGO can be thought of as an AGADIR for -sheet aggregation. Using a strategy similar to AGADIR, TANGO models proteins using a set of four discrete states (unfolded, helix, turn, aggregated) per residue and calculates the partition function for this system based on a few empirically derived parameters (see Fig. 1). With four states, the possible number of configurations becomes computationally intractable with even relatively short proteins (a 20-mer peptide would have trillions of possible states). Therefore, a double stretch approximation (see Fig. 1) is made, assuming that states with more than two ordered stretches would be rare. The key to TANGO is the choice of these four representative states and the ability to empirically obtain parameters from the Protein Data Bank (PDB).
Figure 1. Examples of TANGO model configurations for an eight-residue peptide.
Each residue can be in one of four states: random (R), aggregated (A), helix (H) or turn (T). TANGO runs through all possible configurations (with the exception of those with more than two stretches of aggregated residues, such as AARAARAA, which are assumed to be rare) and calculates the partition function for the system. The relative statistical weight for each residue is determined by a knowledge-based potential (Fold-X7), deriving weights from looking at probabilities of states in the PDB.
Although TANGO uses a simplified representation, its caveats are not onerous. As the extent of aggregation will depend both on the concentration of the peptide and on its association constant, Serrano and coworkers report that TANGO can only give relative probabilities; that is, it allows quantitative comparison inside the same polypeptide chain, or with mutants of the polypeptide chain, but only qualitative comparison between different polypeptide chains. Furthermore, the authors acknowledge and present data that suggest the possibility that the double-stretch approximation may break down with relatively long (e.g., 50 residue) peptides. A natural follow-up to TANGO could relax the double-stretch assumption, potentially allowing one to address longer chains.
It is important to stress that TANGO is not solely a -sheet version of AGADIR, but is rather a model for aggregation (not just -sheet formation). The parameters used in the aggregated state incorporate burial of chains in addition to formation of -sheet. It is for this reason that TANGO can predict a peptide's role in aggregation-related disease, whereas -sheet propensities cannot. However, it should also be emphasized that although TANGO predicts aggregation, it is not a predictor of amyloid formation. Although aggregation may be related to the formation of amyloids, TANGO does not address the amyloid state.
Nevertheless, TANGO has been remarkably successful in predicting aggregation properties of peptides, especially those related to disease. Serrano and coworkers show that TANGO predicts the aggregation of a data set of 179 peptides from 21 different proteins as well as of a new set of 71 peptides derived from disease-related human proteins measured in their laboratory. TANGO also correctly predicts pathogenic as well as protective mutations of the Alzheimer -peptide, human lysozyme and transthyrethin.
The ability of TANGO to predict protein aggregation related to disease, but not amyloid properties, suggests a potentially important implication of this work. Although amyloids have been proposed to be causative in diseases, such as Alzheimer, recent work1,
5,
6 suggests that the relevant states for neurotoxicity in this case are the oligomeric states and not amyloid fibrils. TANGO's ability to predict mutations relevant to Alzheimer disease strongly supports this interpretation. Finally, considering the implications of antibodies that appear to recognize a wide range of peptide aggregates3, perhaps the issues related to aggregation (captured by TANGO's elegant model) are indeed a universal property.