Proteins analysed as virtual knots

Long, flexible physical filaments are naturally tangled and knotted, from macroscopic string down to long-chain molecules. The existence of knotting in a filament naturally affects its configuration and properties, and may be very stable or disappear rapidly under manipulation and interaction. Knotting has been previously identified in protein backbone chains, for which these mechanical constraints are of fundamental importance to their molecular functionality, despite their being open curves in which the knots are not mathematically well defined; knotting can only be identified by closing the termini of the chain somehow. We introduce a new method for resolving knotting in open curves using virtual knots, which are a wider class of topological objects that do not require a classical closure and so naturally capture the topological ambiguity inherent in open curves. We describe the results of analysing proteins in the Protein Data Bank by this new scheme, recovering and extending previous knotting results, and identifying topological interest in some new cases. The statistics of virtual knots in protein chains are compared with those of open random walks and Hamiltonian subchains on cubic lattices, identifying a regime of open curves in which the virtual knotting description is likely to be important.

. This conformation could not arise from virtual closure of an open curve, as its virtual crossings do not lie sequentially along a single arc. An alternative presentation of this knot from the genus one table of [3] (labelled there as 4 8 ), is shown in (f), and does have such a conformation, although it is difficult to see by eye that this is the same knot as v4 64 . (b)-(e) show how (a) may be transformed to (f) by a combination of virtual Reidemeister moves and planar isotopies of the knot. In (e), the planar isotopy moving the green strand across the knot is not directly allowed by the virtual Reidemeister moves, but as the knot diagram is implicitly drawn on S 2 this represents the strand passing 'behind' the sphere (or on the plane, passing through infinity). In general it is difficult to test whether two (virtual) knot diagrams can be related this way, hence the calculation of knot invariants which remove the need for diagrammatic manipulation.  Fig. 2(a) and 2(b) of the main text. Included are ∆(t), the Alexander polynomial at the numerical values used, ∆ g (s,t) the generalised Alexander polynomial, both at the numerical values used and the full symbolic expression, and V (q) the Jones polynomial. Virtual knots have no Alexander polynomial and so these columns are omitted. In the cases where chiral mirrors give different knots, only one mirror is given.

Classical Knot Theory
Here we summarise some extended details of mathematical knot theory as used in deriving the results of the main text. Further details can be found in standard elementary texts [4][5][6]. Classical knot theory deals with embeddings of the circle, S 1 (i.e. closed, non-intersecting curves), in threedimensional space R 3 . Any given embedding has a distinct knot type, which is invariant under ambient isotopies (it may change only when the curve passes through itself). It is usual to represent knots using a two-dimensional planar knot diagram, which can be thought of as a plane projection of the three-dimensional space curve, annotated with the extra information of which strand passes over the other at each self-intersection of the diagram (called a crossing). All the information about the three-dimensional knot type is contained in such a diagram, and smooth deformations (i.e. ambient isotopies) of the three-dimensional space curve lead to smooth isotopies of the knot diagram, which may change the configuration of the crossings. In two-dimensional knot diagrams, the changes are represented by combinations of Reidemeister moves as in Supplementary Fig. 1(a); applying these local moves in conjunction with planar isotopies of the knot diagram can transform between any two diagrammatic representations of the same knot, and equivalently any ambient isotopy of a closed three-dimensional space curve corresponds to a combination of planar isotopies and Reidemeister moves in any projection of the knot.
The standard tabulations of knots, in knot tables as discussed in the main text, are ordered according to their minimal crossing number n -the smallest number of crossings a diagram of the knot can have. For instance, the trivial circle can be projected to a plane without self intersection (i.e. no crossings), and so has minimal crossing number n = 0 and is labelled 0 1 . There are no knots with n = 2, and one with n = 3, the trefoil knot, denoted 3 1 . The labelling n m continues, where m is an arbitrary index amongst knots with the same n. These labels are standard, following original tabulations up to n = 10 published over 100 years ago, with more recent extensions using consistent indices [5,7,8]. Some simple knots from these tabulations are shown in Fig. 2(a). 0 1 (the unknot), then 3 1 , 4 1 , etc. The knots appearing in knot tables are prime knots; composite knots, made up of two or more prime knots tied in the same curve, are also possible and are tabulated according to the composition of their prime factors [4]. All the tools of knot theory apply equally to composite knots, but they do not occur significantly in any known protein chain, and are not considered further here.
It is natural to follow the curve of a knot, which endows an orientation to the knot (choosing an orientation is an arbitrary choice that does not affect the results of topological calculations). Observing the relative orientation of the strands at a crossing determines the sign of the crossing, either positive or negative. A crossing has the same sign even if the curve's orientation is reversed. The minimal diagram of a figure-8 knot 4 1 has two positively signed crossings and two negatively signed, and in fact is isotopic to its mirror image. On the other hand, all three crossings of the minimal trefoil knot 3 1 have the same sign, and are all reversed on its mirror image. Knots such as the trefoil are thus chiral knots, and this chirality not directly represented in the tabulation (i.e. there are two enantiomeric trefoil knots which cannot be be smoothly deformed into one another). Other chiral knots are 5 1 , 5 2 and 6 1 in Fig. 2(a) of the main text; the others are achiral. We do not distinguish between chiral knot pairs in our analysis, although knot invariant quantities such as used to distinguish knots below could be used to do so.
In practice the knot type of a space curve is determined as follows. First the curve is projected to a 2D knot diagram, which contains all the topological information in its ordered set of signed crossings along the curve. Several topological notations representing this information are standard [4,5]; we use below the Gauss code, constructed from an arbitrary starting point and orientation for the curve. As each new crossing is encountered along the curve, it is labelled 1, 2, . . . in order as it is encountered. The Gauss code is the ordered list of these crossing numbers as they occur along the curve, together with whether the curve passes over or under the intersecting strand, represented by using a positive number in the former case and negative in the latter (this is not the same as the crossing sign); each crossing must be encountered exactly twice before reaching the original starting point, once positive and once negative. For instance, a Gauss code for a minimal diagram of the trefoil knot 3 1 is 1, −2, 3, −1, 2, −3, and for a minimal figure-8 knot 4 1 is 1, −2, 3, −1, 4, −3, 2, −4. It is obvious that changing the starting point on the curve cyclically permutes the crossings encountered, but all the Gauss codes obtained this way, or by changing numeric labels (as long as each crossing retains a unique label) represent the same knot diagram. The Gauss code written in this way also does not specify the chirality of the original three-dimensional curve, this information is contained in the local twisting of the two strands around one another and is sometimes included in extended Gauss code notations. Crossings which can be removed by Reidemeister moves I and II can be easily identified in a Gauss code; if crossing k occurs adjacent to itself, ±k, ∓k then it can be removed by Reidemeister move I, and if ±k, ±k + 1, . . . ∓ k, ∓k + 1 (or ∓k + 1, ∓k), then crossings k, k + 1 can be removed by Reidemeister move II.
All knot diagrams can be represented by Gauss codes, but in fact not all Gauss code sequences represent knot diagrams; for instance, the sequence 1, −2, −1, 2 appears to be a consistent Gauss code of only two crossings, which cannot be simplified by Reidemeister moves, and no knot has n = 2. On attempting to draw a diagram with this code, one finds it would be necessary for there to be one extra crossing to allow the curve to return to its starting point. In fact, this is the Gauss code of the open diagram shown in Fig. 2 (e) of the main text, and Gauss codes for open diagrams, and their relation to virtual knots, is the subject of the next section.
It can be practically difficult to calculate the knot type of a diagram coming from a projection of a complicated 3D space curve, which may have many more crossings than its minimal number n. These crossings would represent local geometrical or biochemical features that do not affect the overall knot type; the knot diagrams found from closures of protein backbones often contain several hundred crossings. Our knot identification proceeds first by algorithmic simplification via removal of crossings, repeatedly applying Reidemeister moves I and II where they would remove crossings locally (Supplementary Fig. 1(a)), as discussed above. There is no known efficient method to produce minimal knot diagrams in this way as Reidemeister move III may also be essential to simplify the diagram but does not directly reduce the crossing number. In the case of protein backbones, this occasionally produces minimal diagrams but in most cases tens to hundreds of crossings remain.
The knot types of the simplified diagrams are calculated using knot invariants, quantities that depend only on the knot type but are calculated from the geometrical information of the curve, i.e. they can be calculated from only the information in a Gauss code and their value is invariant to Reidemeister moves. Much of mathematical knot theory is devoted to the study of knot invariants, and many types are known. For instance, the minimal crossing number discussed above is a knot invariant [4], but there is no simple algorithm to calculate it directly from a presentation of a knot. The minimal crossing number also demonstrates that most invariants do not perfectly distinguish knots [4], as multiple different knots can clearly have the same number of crossings in their minimal projections; for instance, both 5 1 and 5 2 in Fig. 2(a) have n = 5. More discriminatory invariants exist but are generally relatively difficult to calculate.
For knot identification we use knot invariants that can be calculated efficiently (ideally in low order polynomial time in the number of crossings), while still discriminating knots sufficiently well. In particular, we choose invariants which leave no ambiguity between the knots common on closure of proteins such as those in Fig. 2(a) of the main text. Some protein closures produce complex knots whose knot type cannot be uniquely identified using these efficient invariants, but these occur only rarely and do not impact our analysis. For classical knots, we employ only the Alexander polynomial ∆(t), which can be found as the determinant of a matrix whose rows and columns relate to the crossings of a projected diagram and can be easily constructed from a Gauss code [9]. Computing symbolic matrices numerically is relatively slow, and we instead use the values of |∆(t)| evaluated at roots of unity t = −1, t = exp(2πi/3) and t = i, such that the calculation can be performed using floating point arithmetic (this does not introduce appreciable error). Each of these is individually a lesser knot invariant, but together they have discriminatory power comparable to the full Alexander polynomial up to at least 11 minimal crossings (certainly sufficient for the relatively simple knots that appear in protein chains).
Many knot invariants, including the Alexander polynomial, are available from standard online resources including the Knot Atlas [7] for all knots with up to 15 crossings, and KnotInfo [8] for a wider selection of invariants up to 12 crossings. Supplementary Table 1 shows values of ∆(t) at the roots of unity used above, for each of the simple knots that appear most commonly in protein chains.

Virtual Knots
Virtual knots are an extension to the theory of classical knots [10] which classify all topological objects formed of ordered crossings, which generalises the theory of knot diagrams while keeping a sense of isotopy through Reidemeister moves. In particular, this includes those orderings which cannot be realised as plane projections of (closed) space curves in R 3 . They can be thought of as the objects represented by the set of all Gauss codes, including sequences such as 1, −2, −1, 2, which does not correspond to any closed knot diagram, as discussed above. In this sense, they provide a natural framework to describe open diagrams, with endpoints that cannot directly be joined, so do not correspond to classical knots but have knot-like structure in their sequence of ordered crossings.
Many concepts from classical knot theory naturally generalise to virtual knots, such as the distinction between prime and composite virtual knots (including composites with classical and virtual components). Virtual knots are tabulated according to their minimum classical crossing number n [2], and they are denoted here as vn m , following the tabulation of [2], as described in the main text. The simplest nontrivial virtual knot, v2 1 , has n = 2, and Gauss code 1, −2, −1, 2. There are many more prime virtual knots for n ≥ 2 than classical knots; complete tabulations only extend to virtual knots up to n = 5. There are also up to three distinct chiral symmetric partners of a given virtual knot (compared to at most one partner of opposite chirality for classical knots): a mirror reflection of the diagram preserving the classical crossing signs, an inversion where all classical crossing signs are flipped, and the combination of both mirrors. As with the classical knots, we identify all chiral partners of the same virtual knot type as equivalent.
[10] presents two further equivalent interpretations of virtual knots, both of which illustrate properties discussed in the main text. The first, convenient for diagrammatic representation, draws virtual knots as classical knot diagrams (without endpoints) but augmented with an additional crossing type at self intersection, the virtual crossing, denoted by a circle around the intersection (e.g. Fig. 2(b) of the main text). Virtual crossings do not have a sign and do not contribute to topological calculations, so the Gauss code follows only by considering the virtual diagram's classical crossings and ignoring virtual crossings entirely. In such virtual diagrams, virtual crossings can be manipulated by suitable generalisations of the classical Reidemeister moves, which can affect the configuration of virtual and classical crossings but do not change the virtual knot type; these moves are shown in Supplementary Fig. 1(b). In particular, virtual Reidemeister moves I and II can change the number of virtual crossings, and minimal virtual crossing number is an invariant of virtual knots; those with minimum virtual crossing number zero are the classical knots, which make up a subset of the generalised, virtual knots. We describe a knot here as virtual if the minimum number of virtual crossings is greater than zero.
The other interpretation of virtual knots is as closed knot diagrams drawn on surfaces with topology different to the standard plane of projection (equivalent to its one-point compactification, the 2-sphere), i.e. drawn on handlebodies with nonzero genus. Any virtual knot can be drawn as a knot diagram without virtual crossings on a surface of sufficiently high genus [10]. The virtual crossings previously described are then interpreted as a consequence of projection from the handlebody to a plane, in which case the virtual crossings are intersections of two strands from different bridges of the handlebody (likewise, a virtual knot diagram with virtual crossings can be made a knot diagram on a handlebody by replacing each virtual crossing with a handle which one strand passes 'along' and the other 'under' the handle). The minimum genus of any handlebody on which the virtual knot can be drawn defines the virtual genus (hereafter referred to as the genus, although this is distinctly different to the genus referred to in classical knot theory [4]) of the virtual knot, and is therefore 0 for classical knots while any virtual knot must have genus at least 1.
Here It is possible to consider the open diagram in these terms alone (i.e. the open diagram is subject only to the three classical Reidemeister moves, but the endpoints are forbidden to pass over or under a strand creating (or removing) new crossings, Supplementary Fig. 1(d), otherwise the open diagram could be untangled to the trivial open curve); this would produce a classical knotoid [11], a topological object that encodes information about the topology of the open curve, but whose classes are not isomorphic to the virtual knots [1]. Representing knotoids by virtual knots loses some information -for instance it may not be clear, from a virtual diagram, which arc at a virtual crossing is the virtual closure arc (i.e. multiple, distinct knotoids give the same virtual knot). However, in our analysis, we opt to work with virtual knots since their tabulation, invariants and other properties are a lot better developed and understood than for knotoids, and therefore are more convenient for application without new mathematics. Only a small amount of information is apparently lost through the ambiguity of knotoids as virtual knots, which does not appear to unduly limit topological analysis; this can be considered as a similar simplification to ignoring the chirality of knots.
Since all the virtual crossings resulting from virtual closure necessarily occur sequentially along the same arc, the genus of virtual knots obtained by closing open diagrams is at most one. That is, all the virtual crossings of the diagram may be removed by adding a single handle to the surface on which it is drawn, in between the endpoints of the open curve, and along which the closing arc runs. Not all genus one virtual knots can be represented in this way such that their virtual crossings occur sequentially; an example is shown in Supplementary Fig. 2(a), whose two virtual crossings can never be adjacent even under the application of (virtual) Reidemeister moves, although the knot can be drawn on a genus one surface sich as the planar diagram shown in Supplementary Fig. 2(b). The class of virtual knots that can be obtained from closures of open knot diagrams is therefore a subset of genus one virtual knots, whose minimal presentations pass around the torus exactly once in one generator direction, and at least once in the other. This is related to the homology of the curve as drawn on a genus one handlebody: for any such diagram we can associate an index with the number of times a curve wraps around the torus in each direction, and for a virtual knot these homology indices must be of the form (±1, j) for | j| ≥ 1 (although this condition is not on its own sufficient due to the presence of more complex topologies with the same overall homology). We therefore refer to the virtual knots appearing as virtual closures of open curves as minimally genus one virtual knots.
The virtual knots of genus one were studied and tabulated by [3]. Their description involves a virtual knot invariant that is a generalisation of the Kauffman bracket polynomial with two variables a and x, calculated from the virtual knot diagram as drawn on the 2-torus. Each possible bracket smoothing of this diagram, s, is associated with a factor of x δ (s) , where δ (s) is the number of circles of nontrivial homology in a given smoothing. The polynomials for all minimally genus 1 virtual knots therefore have the form x f (a), where f (a) is a function of the knot which does not depend on x, and this property therefore allows all minimally genus one knots to be readily identified. The minimally genus one virtual knots of up to n = 4, in the genus one table [3] are, in the notation of that work: 2 1 , 3 1 , 4 1 , 4 2 , 4 3 , 4 6 , 4 7 , 4 8 and 4 9 . In the complete virtual knot table [2], the diagrams which are explicitly minimally genus one are: v2 1 , v3 2 , v4 12 , v4 43 , v4 65 , v4 94 and v4 100 . After comparing knot invariants between the two tabulations, we were unable to find a partner in [3] for the minimally genus one v4 12 (i.e. it appears to be an erroneous omission). Thus, from [3] we could identify three further minimally genus one virtual knots than the complete table, with this property also confirmed via the Kauffman bracket method; these correspond to v4 36 , v4 37 and v4 64 in [2] (up to chiral mirrors). This relationship would be difficult to see by direct inspection of the diagram, and Supplementary Fig. 3 demonstrates the equivalence of the different presentations for v4 64 , via a combination of virtual Reidemeister moves and planar isotopies. All other minimally genus one examples agree in the two tables, and we believe that this completes the full set of minimally genus one virtual knots with up to four classical crossings.
Just as with classical knots, we identify virtual knot types by calculating virtual knot invariants (which are, in many cases, generalisations of classical invariants, such as the Kauffman bracket polynomial already discussed). Typically it is more computationally expensive to discriminate virtual knots than classical knots of the same minimum crossing number n. The basic procedure of invariant calculation is similar to that of classical knots, although now virtual crossings may also be algorithmically removed via virtual Reidemeister moves I and II. This does not directly affect the classical crossings, but may allow more of them to be removed. The Alexander polynomial has a number of extensions in virtual knot theory; we work with the two variable generalised Alexander polynomial ∆ g (s,t) [12]. As with classical knots, the calculation is significantly faster evaluated at constant values of s and t, and we use the combinations (s = −1, t = e 2πi/3 )), (s = −1, t = i) and (s = e 2πi/3 , t = i). However, in contrast to classical knots, the generalised Alexander polynomial is not enough to distinguish the two simplest virtual knots possible from open curves, v2 1 and v3 2 , as well as some other simple virtual knots (the next are v4 36 and v4 65 , but although they are relatively simple these do not contribute significantly to any of our analysis). When necessary (but primarily in the case of v2 1 and v3 2 ), we resolve this ambiguity using the Jones polynomial V (q) [13], which is a classical knot invariant that extends to virtual knots without modification. Since computation of the Jones polynomial takes exponential time in the number of crossings [4,7], we compute it only at the constant q = −1 (sufficient to distinguish v2 1 , v3 2 , etc.), and only when our chosen values of ∆ g (s,t) are not sufficiently discriminatory to identify the virtual knot.
Virtual knot invariants for each of the virtual knots with up to four classical crossings can be found in the online knot table of [2] or, for the Kauffman bracket variant explained above, in [3]. Supplementary Table 1 further shows the values of ∆ g and V for each of the minimally genus one virtual knots in these tables, which together are clearly sufficient to distinguish all relevant knot types.