Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A deeply knotted protein structure and how it might fold


The search for knots in protein has uncovered little that would cause Alexander the Great to reach for his sword. Excluding knots formed by post-translational crosslinking, the few proteins considered to be knotted form simple trefoil knots with one end of the chain extending through a loop by only a few residues1,2, ten in the ‘best’ example3. A knot in an open chain (as distinct from a closed circle) is not rigorously defined and many weak protein knots disappear if the structure is viewed from a different angle. Here I describe a computer algorithm to detect knots in open chains that is not sensitive to viewpoint and that can define the region of the chain giving rise to the knot. It characterizes knots in proteins by the number of residues that must be removed from each end to abolish the knot. I applied this algorithm to the protein structure database and discovered a deep, figure-of-eight knot in the plant protein acetohydroxy acid isomeroreductase4. I propose a protein folding pathway that may explain how such a knot is formed.


Pulling the ends of a given piece of string will usually decide whether it is knotted or not. Because we hold the ends, the string and our body form a closed circle and there is no danger of untying the knot as it is pulled. The definition of knots in circular strings is mathematically well defined5, and one way to approach the problem in open strings is simply to join the ends (as we do when we pick up a string). This is fine for clear knots (where the ends of the string are remote from the knot site), but if the ends are tangled-up together with the knot then any algorithm devised to ‘pick-up’ the ends creates the risk that the external connections might either untie an existing knot, or create a new one. Fortunately, the ends of protein chains (being charged) tend to lie on the surface of the structure and so can often be joined unambiguously by a wide loop. However, not all termini lie on the surface and if we want to locate the knotted region, any deletion from the ends could drive the termini deeper.

An alternative approach is to invert the problem: rather than extending the termini outwards, these can be left fixed and the rest of the protein made to shrink around them. This was done by contracting the protein chain as if it were a rubber band: specifically, with residues represented by the coordinates of their α-carbon, each point was repeatedly replaced by the average of itself and its two neighbours (Fig. 1). This procedure, which has been used previously to visualize protein chains, quickly reduces the protein to a smooth curve but leaves the termini untouched (as they do not have two neighbours). If continued indefinitely, all the points will lie on the line connecting the termini. When investigating topological features of the chain, however, it is most important that two parts of the chain cannot pass through each other and this was checked with each move of a residue point. Any move that violated this check was not implemented, leaving the current residue in its original position. With this condition, unknotted strings will still be reduced to a straight line, but those containing knots will become blocked (Fig. 2).

Figure 1: The basic chain smoothing algorithm.

Protein chains are drawn as lines connecting the central carbon atom in the backbone of each residue unit running from the amino (N) terminus to the carboxy (C) terminus. Beginning at the second residue, for each residue point (i) in the starting conformation, the average coordinate of i, i - 1 and i + 1 was taken as the new position (i′) for the residue. This procedure was then repeated, and the results of this are progressively smoother chains, shown as a series of fainter lines. Note that the termini do not move. With each move, it was checked that the chains did not pass through each other. This was implemented by checking that the triangles {i′ - 1, i, i′} and {i, i′, i + 1} (dashed lines in the Figure) did not intersect any line segment {j′ - 1, j′} (j < i) before the move point or any line {j, j + 1} (j > i) following.

Figure 2: Smoothed protein structures.

Applying the smoothing algorithm described in the text (and Fig. 1) to protein structures produces a series of increasingly smoothed chains, coloured from blue to red. (For clarity, the native starting structure is not shown). a, Applied to a protein that has no knots (triosephosphate isomerase, (1tph1)) results in a straight line (red) joining the termini. To reach this stage took 52 smoothing iterations. b, Applied to the knotted protein (the carboxy-terminal domain of acetohydroxy acid isomeroreductase, (1yveI)), a straight line is never attained and a small knot remains deep in the (red) core part of the protein. This is shown in Fig. 3b. Figure prepared using RasMol version 2.6 (ref. 9).

Although this simple algorithm is sufficient to detect knots, some pragmatic refinements were made. Because of a limit in the numerical accuracy of the computer calculations, each line between residues was represented by an impenetrable tube (1 Å diameter). This allowed residues to be progressively removed: specifically, the middle residue of three that were almost collinear or where the separation of the outer two fell within the tube diameter. This not only improved execution time but led to an even simpler test for knots as any chain that can be reduced to just its two termini is not knotted. Those with more than two residues remaining are either knots or tangles in which a group of moves have become ‘grid-locked’ (like rush-hour traffic at junction). This latter condition was eased (but not completely eliminated) by making a slight reduction in the tube diameter any time the chain became stuck.

In practice, most chains of a few hundred residues are reduced to their termini in around 50 iterations. If by 500 iterations a chain was still not reduced two points, then the resulting configuration was analysed in more detail. As the termini are now well separated from the knot-site, they can be unambiguously joined and analysed as a ‘proper’ circular knot. This can be done using one of the knot-invariant polynomials, such as the Alexander or Jones polynomials1,5. However, the few knots encountered in proteins are so simple that they do not require any sophisticated analysis. Furthermore, from a theoretical perspective, not only are protein knots directional but also they have a unique break-point (between the termini) which is not taken into consideration by any of the polynomial forms. As a working tool, a simpler method was adopted to characterize these open knots based on the Dowker knot notation5 (see legend to Table 1).

Table 1 Knots found in proteins

Applying this method to a non-redundant selection of protein structures (see Table 1 for selection details) revealed a surprising number of knots. A few of these proved to be unresolved tangles (often forming slip-knots), and some others were caused by breaks in the chain creating an unnatural short-cut. The former were all eliminated by running the program with a smaller ‘tube’ diameter but the latter could only be removed through visual inspection. Of the seven remaining structures ( Table 1), five were right-handed trefoils including related carbonic anhydrase structures (1zncA, 1kopA, 1hcb, 1dmxA) and the protein S-adenoyslmethionine synthetase (1fugA) both of which had been identified previously. In addition, three new knots were found: a left-handed trefoil in ubiquitin (1cmxA), and two figure-of-eight knots in a viral core protein (2btvB) and acetohydroxy acid isomeroreductase (1yveI; Fig. 3b). These last two are of particular interest as they include an additional crossover above the trefoil and are therefore less likely to be formed by a wandering chain during folding. This was confirmed by simulation of random and semi-random compact protein-like chains in which the trefoil was by far the most common knot type (data not shown). The location of the two figure-of-eight knots was determined by a series of deletions from both termini of the protein chain. This revealed that the knot in 2btvB required just the last eight residues, which is similar to the deepest trefoil knot. By contrast, the knot in 1yveI, which is contained in the carboxy-terminal domain of the protein, remained until 70 residues were deleted from the carboxy terminus and 245 residues (including a complete domain) were removed from the amino terminus (Fig. 3a).

Figure 3: The knot in 1yveI.

a, Backbone representation of the complete native protein structure (coloured from blue to red in the direction of the chain) with the core of the knotted domain drawn thickened. This region is preceded by a complete nucleotide binding domain (blue-green) and followed by a long loop (red) that wraps around the domain. b, The knotted core in the smoothed representation of 1yveI (Fig. 2b), coloured as in a. The figure-of-eight knot can be seen clearly. This form was attained after 50 cycles, and if continued an irreducible core consisting of eight points was attained. Figure prepared using RasMol version 2.6 (ref. 9).

It is interesting to speculate how a protein with such a deep and complex knot might fold—as it is difficult to imagine over 50 residues being ‘fed’ through a loop in a reproducible way during folding. Clues to the folding of this protein can be found in a clear internal duplication within the domain, comprising 88 residue pairs with 2.0 r.m.s. deviation (as measured by the program SAP (ref. 6) over the α-carbon positions). If it is assumed that the two most deeply buried, symmetrically equivalent helices initially pack together (A1 and B1, Fig. 4), then the remaining parts of each repeat (A2, A3 and B2, B3) can wrap around this core requiring only that the C-terminal segment can pass through the large loop between the repeats before this contacts and finally packs onto the core (Fig. 4). The symmetry in this arrangement suggests that the protein might have evolved from an exchange of structure or ‘swap’7 between the two duplicated domains in which the first helix in the repeat has been transposed across the twofold axis of symmetry so creating the knot (see ref. 8 for a wider review of related processes). Intriguingly, the best example of a trefoil knot (in 1fugA) appears to have arisen in a similar manner, in which a β-strand on the edge of a sheet has been transferred from one duplicated domain to another. Although it cannot be stated that significant knots in proteins will not arise by other means, it appears that the swapping of elements of secondary structure between duplicated domains can provide a source of knotted proteins.

Figure 4: The core of the C-terminal domain in 1yveI.

The duplicated portions of the chain (A1–A3 and B2–B3) are viewed down the packing axis of the two central α-helices. (The fine horizontal line between these is a least-squares fit to the pseudo-two-fold axis. The following parts of the duplicated regions (A2, A3 and B2, B3) wrap around this central pair creating the core of the knot. The final tying of the knot requires that the C-terminal part of the chain (including 45 residues following the point labelled c) passes through the long loop connecting the duplications (labelled L). The origin of this structure can be imagined as a gene duplication giving rise to a linked dimer in which the two central helices then exchanged positions. Figure prepared using MOLSCRIPT version 2.1.2 (ref. 10).


  1. 1

    Mansfield, M. L. Are there knots in proteins? Nature Struct. Biol. 1 , 213–214 (1994).

    CAS  Article  Google Scholar 

  2. 2

    Mansfield, M. L. Fit to be tied. Nature Struct. Biol. 4, 116–“7 (1997).

    Article  Google Scholar 

  3. 3

    Takusagawa, F. & Kamitori, K. A real knot in protein. J. Am. Chem. Soc. 118, 8945–8946 (1996).

    CAS  Article  Google Scholar 

  4. 4

    Biou, V. et al. The crystal structure of plant acetohydroxy acid isomeroreductase complexed with NADPH, two magnesium ions and a herbicidal transition state analog at 1.65 Å resolution. EMBO J. 16 , 3405–3415 (1997).

    CAS  Article  Google Scholar 

  5. 5

    Adams, C. C. The Knot Book: An Elementary Introduction to the Mathematical Theory of Knots (Freeman, New York, 1994).

    MATH  Google Scholar 

  6. 6

    Taylor, W. R. Protein structure alignment using iterated double dynamic programming. Protein Sci. 8, 654–665 (1999).

    CAS  Article  Google Scholar 

  7. 7

    Bennet, M. J., Schlunegger, M. P. & Eisenberg, D. 3D domain swapping: a mechanism for oligomer assembly. Protein Sci. 4, 2455–2468 (1995).

    Article  Google Scholar 

  8. 8

    Heringa, J. & Taylor, W. R. Three-dimensional domain duplication, swapping and stealing. Curr. Opin. Struct. Biol. 7, 416–421 (1995).

    Article  Google Scholar 

  9. 9

    Sayle, R. & Milner-White, E. J. RasMol: Biomolecular graphics for all. Trends Biochem. Sci. 20, 374 (1995).

    CAS  Article  Google Scholar 

  10. 10

    Kraulis, P. J. MOLSCRIPT: A Program to Produce Both Detailed and Schematic Plots of Protein Structures. J. Appl. Crystallogr. 24, 946 –950 (1991).

    Article  Google Scholar 

  11. 11

    Taylor, W. R. Multiple sequence threading: an analysis of alignment quality and stability. J. Mol. Biol. 269, 902– 943 (1997).

    CAS  Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to William R. Taylor.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Taylor, W. A deeply knotted protein structure and how it might fold. Nature 406, 916–919 (2000).

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing