The search for knots in protein has uncovered little that would cause Alexander the Great to reach for his sword. Excluding knots formed by post-translational crosslinking, the few proteins considered to be knotted form simple trefoil knots with one end of the chain extending through a loop by only a few residues1,2, ten in the ‘best’ example3. A knot in an open chain (as distinct from a closed circle) is not rigorously defined and many weak protein knots disappear if the structure is viewed from a different angle. Here I describe a computer algorithm to detect knots in open chains that is not sensitive to viewpoint and that can define the region of the chain giving rise to the knot. It characterizes knots in proteins by the number of residues that must be removed from each end to abolish the knot. I applied this algorithm to the protein structure database and discovered a deep, figure-of-eight knot in the plant protein acetohydroxy acid isomeroreductase4. I propose a protein folding pathway that may explain how such a knot is formed.
Pulling the ends of a given piece of string will usually decide whether it is knotted or not. Because we hold the ends, the string and our body form a closed circle and there is no danger of untying the knot as it is pulled. The definition of knots in circular strings is mathematically well defined5, and one way to approach the problem in open strings is simply to join the ends (as we do when we pick up a string). This is fine for clear knots (where the ends of the string are remote from the knot site), but if the ends are tangled-up together with the knot then any algorithm devised to ‘pick-up’ the ends creates the risk that the external connections might either untie an existing knot, or create a new one. Fortunately, the ends of protein chains (being charged) tend to lie on the surface of the structure and so can often be joined unambiguously by a wide loop. However, not all termini lie on the surface and if we want to locate the knotted region, any deletion from the ends could drive the termini deeper.
An alternative approach is to invert the problem: rather than extending the termini outwards, these can be left fixed and the rest of the protein made to shrink around them. This was done by contracting the protein chain as if it were a rubber band: specifically, with residues represented by the coordinates of their α-carbon, each point was repeatedly replaced by the average of itself and its two neighbours (Fig. 1). This procedure, which has been used previously to visualize protein chains, quickly reduces the protein to a smooth curve but leaves the termini untouched (as they do not have two neighbours). If continued indefinitely, all the points will lie on the line connecting the termini. When investigating topological features of the chain, however, it is most important that two parts of the chain cannot pass through each other and this was checked with each move of a residue point. Any move that violated this check was not implemented, leaving the current residue in its original position. With this condition, unknotted strings will still be reduced to a straight line, but those containing knots will become blocked (Fig. 2).
Although this simple algorithm is sufficient to detect knots, some pragmatic refinements were made. Because of a limit in the numerical accuracy of the computer calculations, each line between residues was represented by an impenetrable tube (1 Å diameter). This allowed residues to be progressively removed: specifically, the middle residue of three that were almost collinear or where the separation of the outer two fell within the tube diameter. This not only improved execution time but led to an even simpler test for knots as any chain that can be reduced to just its two termini is not knotted. Those with more than two residues remaining are either knots or tangles in which a group of moves have become ‘grid-locked’ (like rush-hour traffic at junction). This latter condition was eased (but not completely eliminated) by making a slight reduction in the tube diameter any time the chain became stuck.
In practice, most chains of a few hundred residues are reduced to their termini in around 50 iterations. If by 500 iterations a chain was still not reduced two points, then the resulting configuration was analysed in more detail. As the termini are now well separated from the knot-site, they can be unambiguously joined and analysed as a ‘proper’ circular knot. This can be done using one of the knot-invariant polynomials, such as the Alexander or Jones polynomials1,5. However, the few knots encountered in proteins are so simple that they do not require any sophisticated analysis. Furthermore, from a theoretical perspective, not only are protein knots directional but also they have a unique break-point (between the termini) which is not taken into consideration by any of the polynomial forms. As a working tool, a simpler method was adopted to characterize these open knots based on the Dowker knot notation5 (see legend to Table 1).
Applying this method to a non-redundant selection of protein structures (see Table 1 for selection details) revealed a surprising number of knots. A few of these proved to be unresolved tangles (often forming slip-knots), and some others were caused by breaks in the chain creating an unnatural short-cut. The former were all eliminated by running the program with a smaller ‘tube’ diameter but the latter could only be removed through visual inspection. Of the seven remaining structures ( Table 1), five were right-handed trefoils including related carbonic anhydrase structures (1zncA, 1kopA, 1hcb, 1dmxA) and the protein S-adenoyslmethionine synthetase (1fugA) both of which had been identified previously. In addition, three new knots were found: a left-handed trefoil in ubiquitin (1cmxA), and two figure-of-eight knots in a viral core protein (2btvB) and acetohydroxy acid isomeroreductase (1yveI; Fig. 3b). These last two are of particular interest as they include an additional crossover above the trefoil and are therefore less likely to be formed by a wandering chain during folding. This was confirmed by simulation of random and semi-random compact protein-like chains in which the trefoil was by far the most common knot type (data not shown). The location of the two figure-of-eight knots was determined by a series of deletions from both termini of the protein chain. This revealed that the knot in 2btvB required just the last eight residues, which is similar to the deepest trefoil knot. By contrast, the knot in 1yveI, which is contained in the carboxy-terminal domain of the protein, remained until 70 residues were deleted from the carboxy terminus and 245 residues (including a complete domain) were removed from the amino terminus (Fig. 3a).
It is interesting to speculate how a protein with such a deep and complex knot might fold—as it is difficult to imagine over 50 residues being ‘fed’ through a loop in a reproducible way during folding. Clues to the folding of this protein can be found in a clear internal duplication within the domain, comprising 88 residue pairs with 2.0 r.m.s. deviation (as measured by the program SAP (ref. 6) over the α-carbon positions). If it is assumed that the two most deeply buried, symmetrically equivalent helices initially pack together (A1 and B1, Fig. 4), then the remaining parts of each repeat (A2, A3 and B2, B3) can wrap around this core requiring only that the C-terminal segment can pass through the large loop between the repeats before this contacts and finally packs onto the core (Fig. 4). The symmetry in this arrangement suggests that the protein might have evolved from an exchange of structure or ‘swap’7 between the two duplicated domains in which the first helix in the repeat has been transposed across the twofold axis of symmetry so creating the knot (see ref. 8 for a wider review of related processes). Intriguingly, the best example of a trefoil knot (in 1fugA) appears to have arisen in a similar manner, in which a β-strand on the edge of a sheet has been transferred from one duplicated domain to another. Although it cannot be stated that significant knots in proteins will not arise by other means, it appears that the swapping of elements of secondary structure between duplicated domains can provide a source of knotted proteins.
Mansfield, M. L. Are there knots in proteins? Nature Struct. Biol. 1 , 213–214 (1994).
Mansfield, M. L. Fit to be tied. Nature Struct. Biol. 4, 116–“7 (1997).
Takusagawa, F. & Kamitori, K. A real knot in protein. J. Am. Chem. Soc. 118, 8945–8946 (1996).
Biou, V. et al. The crystal structure of plant acetohydroxy acid isomeroreductase complexed with NADPH, two magnesium ions and a herbicidal transition state analog at 1.65 Å resolution. EMBO J. 16 , 3405–3415 (1997).
Adams, C. C. The Knot Book: An Elementary Introduction to the Mathematical Theory of Knots (Freeman, New York, 1994).
Taylor, W. R. Protein structure alignment using iterated double dynamic programming. Protein Sci. 8, 654–665 (1999).
Bennet, M. J., Schlunegger, M. P. & Eisenberg, D. 3D domain swapping: a mechanism for oligomer assembly. Protein Sci. 4, 2455–2468 (1995).
Heringa, J. & Taylor, W. R. Three-dimensional domain duplication, swapping and stealing. Curr. Opin. Struct. Biol. 7, 416–421 (1995).
Sayle, R. & Milner-White, E. J. RasMol: Biomolecular graphics for all. Trends Biochem. Sci. 20, 374 (1995).
Kraulis, P. J. MOLSCRIPT: A Program to Produce Both Detailed and Schematic Plots of Protein Structures. J. Appl. Crystallogr. 24, 946 –950 (1991).
Taylor, W. R. Multiple sequence threading: an analysis of alignment quality and stability. J. Mol. Biol. 269, 902– 943 (1997).
About this article
Cite this article
Taylor, W. A deeply knotted protein structure and how it might fold. Nature 406, 916–919 (2000). https://doi.org/10.1038/35022623
Sequence and structural patterns detected in entangled proteins reveal the importance of co-translational folding
Scientific Reports (2019)
Scientific Reports (2019)
Nature Communications (2018)
Science China Chemistry (2018)
Nature Communications (2017)