## Abstract

We study here global and local entanglements of open protein chains by implementing the concept of knotoids. Knotoids have been introduced in 2012 by Vladimir Turaev as a generalization of knots in 3-dimensional space. More precisely, knotoids are diagrams representing projections of open curves in 3D space, in contrast to knot diagrams which represent projections of closed curves in 3D space. The intrinsic difference with classical knot theory is that the generalization provided by knotoids admits non-trivial topological entanglement of the open curves provided that their geometry is frozen as it is the case for crystallized proteins. Consequently, our approach doesn’t require the closure of chains into loops which implies that the geometry of analysed chains does not need to be changed by closure in order to characterize their topology. Our study revealed that the knotoid approach detects protein regions that were classified earlier as knotted and also new, topologically interesting regions that we classify as pre-knotted.

## Introduction

Since early observations of topological entanglements in biopolymers^{1,2,3}, biology became one of practical fields of application of knot theory. For instance, the appearance of particular knot types in unknotted circular DNA molecules that served as a substrate of various enzymes helped to determine the molecular mechanism of these enzymes^{4,5,6,7,8}. Likewise, the analysis of DNA knots formed inside bacteriophage capsids permitted to elucidate how the densely packed DNA is arranged within heads of bacteriophages^{9,10,11}. From a mathematical point of view, a knot is a closed curve in 3-dimensional Euclidean space that does not intersect itself anywhere and can be continuously deformed as if it was made out of rubber^{12}. For closed curves, any continuous deformation maintains the original topology. Thus, for example, a trefoil knot will always be a trefoil knot upon continuous deformation. In case of proteins, which are open linear chains with complex geometry, a continuous deformation can convert them into a straight open chain. Therefore from an orthodox topological point of view all proteins with open chains are unknotted. However, to analyse the topology of protein chains it makes sense to treat their configurations as rigid and thus not able to undergo any continuous deformation. In fact, proteins can undergo some internal motion but crystallized forms are essentially rigid. Here we work mostly with coordinates of such proteins. By introducing the condition that proteins are rigid, the question of protein knottedness becomes interesting and treatable. Early approaches required though closure of the protein polypeptide chain before the analysis of protein topology^{3}. Of course, how to close the protein is a nontrivial question and several methods were proposed^{13}. The problem is that different methods of closure may lead to different knot types being associated with a given protein. Closure was required in the past since available mathematical tools that can be used to determine the knot type could only analyse closed curves. Interesting advances opening the possibility to analyse topology of open curves came with the discovery of concept of knotoids and with the mathematical tools permitting their analysis^{14, 15}. Using the knotoid concept, the topology of open chains can be analyzed using just projections of these chains without the formal need to close them. We use here the knotoid approach to analyse knottedness of entire protein chains, as well as of their all possible subchains.

## A primer in knot theory

Mathematical knots are usually studied through their diagrams. A knot diagram is obtained by projecting the knot onto a plane in such a way so that only double points are allowed in the projection. A double point or a *crossing* of the knot diagram, is a point of the diagram where two arcs of the knot cross transversely one another (see Fig. 1). The crossings also carry the extra information that indicates the undercrossing arc. In this way, one can always reconstruct the knot in the 3D space^{16}. Two knots are equivalent, if they can be continuously deformed to one another without allowing cutting and regluing of the knot. On the level of diagrams, this equivalence is proven if the two diagrams can be transformed to one another by a finite sequence of three elementary diagram moves, known as the Reidemeister moves (see Fig. 2) and planar isotopy i.e. stretching, shrinking, bending or straightening of portions of the diagram in the plane so that the underlying structure is preserved (see for example^{12, 16, 17}). However, finding out whether there is a combination of Reidemeister moves and planar isotopy transformation that can convert two diagrams into each other may be a very difficult task, especially if diagrams have many crossings. In such cases, more sophisticated tools are required. Such tools are the knot invariants, which are functions defined on the set of all knots that assign the same value to equivalent knots. The most well-known knot invariants are the knot polynomials such as the Alexander polynomial^{18}, the Jones polynomial^{19} and its generalization the Homflypt polynomial^{20, 21}.

## Knots in proteins

Numerous studies conducted in the last twenty-five years have revealed the existence of a large number of proteins whose main chain fold into non-trivial topologies, a fact that implies the presence of knots in their conformation^{3, 22,23,24,25,26,27}. The precise nature of the structural and functional advantages created by the presence of knots in the protein backbone is a subject of high interest from both experimental and theoretical point of view. It has been conjectured that these non-trivial topologies provide a stabilizing function that can act by holding together certain protein domains^{23, 26, 28, 29}. However, the precise structural and functional advantages provided by the presence of knots is uncertain in the majority of the cases. To better understand this open problem, several efforts have been made towards the characterization and classification of the protein chains based on their knot type^{30}. This characterization required though an important departure or even apostasy from the orthodox knot theory that consists of accepting that linear chains can be knotted. This contradicted the central axiom of knot theory where any open arc, no matter the degree of entanglement, is topologically equivalent to a straight line since any open arc can be continuously deformed to a straight line^{12}. However, when the continuous deformation is not allowed, the determination of knottedness of a given open curve with frozen geometry starts to make sense and is mathematically challenging. In fact, proteins in their native folded structure are frequently quite rigid and show only limited internal motion. Therefore analysis of their knottedness is done for their open chains with fixed geometry.

Having established the above assumptions, a procedure to capture the knotting type has to be chosen. Until recently, characterization of knottedness of proteins required closure of protein chains since available knot invariants could only be calculated for closed curves. In this context, various methodologies of chain closure have been proposed, both deterministic (for example^{22, 23}) and probabilistic (e.g. refs 3, 30–33). All deterministic methods deal with the choice of a closure for the open chain, and setting the condition of choice is somewhat arbitrary. Depending on the choice of closure, different knot types can be associated to the same protein. To avoid this arbitrarity and a possible bias, probabilistic methods were introduced that use unbiased multiple closures to detect the most likely knot type of a linear chain with a given geometry^{3, 30,31,32,33}. An example of such a method is the following: the chain is placed near the center of a large sphere. Then, both ends of the chain are extended towards the direction of a randomly chosen point on the sphere where they are joined. This procedure, after multiple repetitions, produces a spectrum of knots that are associated to the given linear chain^{32}. After every closure, either by deterministic or probabilistic methods, formed knots’ identification is achieved via computation of a knot polynomial. Note that knot polynomials are not complete invariants, in the sense that there are pairs of knots that cannot be distinguished (e.g. mutant knots). However, all knots that are encountered within the backbone of a protein are relatively simple and thus they can be identified by polynomial invariants of knots such as the Jones polynomial.

All approaches that require protein chain closure, deterministic and probabilistic, suffer though from a formal problem. The configurations that are analyzed for knotting are not the original configurations of studied proteins but configurations that were changed i.e. deformed by the addition of the closing parts.

We describe here a method that allows us to characterize knottedness of unperturbed configurations of proteins using the concept of knotoids. That concept is explicitly used to analyze diagrams resulting from the projections of open chains.

## Knotoids as an extension of knot theory

Knotoids were first introduced by V. Turaev in ref. 14 and have been studied further by L. Kauffman and N. Gügümcü in ref. 15. In brief, they are equivalence classes of open ended knot diagrams that generalize the notion of a 1-1 tangle (or a long knot) since they allow the endpoints to be in different regions of the diagram, and thus they provide a rigorous definition for open knots. In this way, a new diagrammatic theory is formed which is an extension of the classical knot theory^{15}.

Knotoid diagrams were originally defined in the 2-sphere *S*
^{2} however, their definition can be extended to \({{\mathbb{R}}}^{2}\). A knotoid diagram is defined as the generic immersion of the closed unit interval in *S*
^{2} whose only singularities are transversal double points endowed with over/undercrossing data. The images of 0 and 1 under this immersion are called *the tail* and *the head* of the knotoid diagram respectively. These two points are distinct from each other and from the double points; they are also called *the end points* of the knotoid diagram. Every knotoid diagram comes with an orientation that goes from the tail to the head. The double points of the knotoid diagram are called *crossings*
^{14} (see Fig. 3).

Two knotoid diagrams are considered equivalent if and only if they differ by a finite sequence of the Reidemeister moves that modify the diagrams within a small disk but which do not utilize the endpoints. The corresponding equivalence classes are called *knotoids*. Since it is forbidden to move diagram portions over or under the end points, a non-trivial knotoid diagram cannot reduce to the trivial one (see Fig. 4). Any knotoid that has both endpoints in the same local region of *S*
^{2} is called *a knot-type* knotoid and they are denoted by *k*°, where *k* is the corresponding entry in the knotoid table. All other knotoids are called *pure* or *proper knotoids*. Many knot invariants, like the Kauffman bracket and the Jones polynomial extend to the case of knotoids in a natural way^{14, 15}.

## Knotoids, open curves and protein chains

In this section we shall discuss the connection between knotoid diagrams and oriented curves in the 3-dimensional space. A knotoid diagram represents an open oriented curve in \({{\mathbb{R}}}^{3}\) if it is in the equivalence class of the knotoid that corresponds to the generic projection of the curve to some plane. Consider now a smooth curve which lies inside a large ball in \({{\mathbb{R}}}^{3}\). Each point of that ball points towards a generic projection to a plane that lies outside the ball in \({{\mathbb{R}}}^{3}\). The two end points of the curve determine two parallel lines, each one passing through one endpoint. Both lines are perpendicular to the plane that corresponds to the generic projection. By considering the generic projection of the curve to the plane along the lines together with the information of the overpassing and underpassing arcs, one obtains a knotoid diagram in \({{\mathbb{R}}}^{2}\)
^{15} (see Fig. 5). The resulting knotoid diagrams and their knotoid types depend on the choice of a projection plane. However, to characterize knottedness of open curves such as represented by structures of proteins, we characterize the spectrum of knotoids observed when a given open curve is projected in all possible directions equisampling the sphere. The most frequently observed knotoid type is then associated with the given structure as its dominant knotoid type.

Although the knotoid approach allows us to study knottedness of protein chains without any deformations of the chains, the knotoid types formed by projections with many crossings are difficult to determine as the computation of Jones polynomials becomes too time demanding. In such cases we simplify knotoids diagrams by actually doing triangle elimination moves on 3D configurations of analysed proteins^{34}. Importantly, the knotoid type is not changed when one does not permit triangle elimination moves to pass through the two infinite lines that are perpendicular to the projection plane and which go through the ends of the linear chains (see Fig. 5a,b).

One can now define a measure for the entanglement of a smooth open curve in \({{\mathbb{R}}}^{3}\) by analyzing many directions of projections equisampling the sphere. More precisely, the set of all knotoid equivalence classes that are obtained from generic projections of the curve in question to different planes in \({{\mathbb{R}}}^{3}\) is the *measure of knottedness* of the curve. The knotoid with the highest number of occurrences in this measure shall be called *dominant* and in this study we are mainly interested in the determination of the dominant knotoid type for a given protein chain.

If the dominant knotoid that appears in the measure is the trivial one (or the *unknotoid*) then the chain is considered as topologically not entangled. Otherwise, the chains are considered as topologically entangled and we characterize them by determining the dominant knotoid type resulting from projections of a given chain. The concepts mentioned above can be adopted to the case of any orientable surface. For computational reasons, we shall consider in this paper that the resulting knotoids are then embedded in the 2-sphere *S*
^{2}.

## Results

### Global entanglement: Knotoids versus stochastic closure

Our first aim is to study the global entanglement of protein chains. The protein structure we analyze here is this of bacterial N-acetylornithine transcarbamoylase with the PDB entry 3KZN and it is known to form an open trefoil knot. Figure 6a shows what types of knotoids are obtained when the protein is projected along a large number (80,000) of directions. The results are presented as a map that identifies territories on the surface of the sphere, where each distinct territory (with a given color) encloses directions of projections resulting in a given knotoid. Depending on the original orientation of the protein, the map produced on the surface of the sphere can rotate but the fraction of projections that result in a given knotoid stays always the same for a given protein structure. The 3KZN protein forms a deep and relatively tight, open trefoil knot^{29}. As could be expected for that particular protein, the great majority of projections resulted in diagrams, which upon topology conserving moves can be simplified to a knot-type knotoid diagram with 3 crossings. The adopted here topological notation of that knotoid is k3.1°. The superscript ° indicates that it is a knot-type knotoid, which means that its diagram can be closed without crossing other parts of the diagram. In this case such a closure will result in 3_{1} knot. In fact, more than 75% of projections of 3KZN protein resulted in k3.1° knotoids. The territories enclosing directions of projections resulting in k3.1° knotoid are indicated with the blue color on Fig. 6a. Figure 6b shows the equirectangular projection of the spherical map shown in Fig. 6a. Although the equirectangular projections distort the relative sizes of distinct knotoid territories we do not use them here to evaluate the relative area of these territories but to show that antipodal directions of projections give the same knotoid types. Looking at the knotoid maps of 3KZN protein (Fig. 6a,b), we can see that in addition to projections resulting in knotoid k3.1° there are projections resulting in formation of many other types of knotoids, including some with more than 6 crossings in their minimal crossing diagrams (Figures S1 and S2 presents the types of knotoids resulting from the projection of studied proteins). Interestingly, there were no trivial knotoids k0.1. However, we did observe the simplest nontrivial knotoids with the notation k2.1. These knotoids have just two crossings in their diagrams (see Fig. 3a) but can’t be continuously deformed into trivial knotoid k0.1. Using knotoid approach to characterize topology of proteins, we associate the most frequent knotoid type to a given protein or its subchain. Table 1 lists the dominant knotoid types and the fraction of random direction that give rise to the corresponding knotoid type for 14 proteins that are known to form open knots^{30}.

It is interesting to compare the characterization of protein chain topology using knotoid approach, presented above (Fig. 6a,b), with the stochastic closure technique, described earlier (see Fig. 6c,d)^{30}. Stochastic closure technique is formally equivalent to the first phase of knotoid approach, i.e. the protein chain is placed in the centre of a big sphere and is projected along random directions on the surface of the enclosing sphere. From here on, the closure approach diverge from the knotoid approach. In the stochastic closure approach the projection-derived diagrams, containing the information which segments were above and which under, are closed with a straight segment, where the closing segment passes over all other segments it crosses with on the diagram (see Fig. 5c,d). The knot type of the diagram gets fixed upon closure and its type can be directly determined by the calculation of a knot invariant such as the Jones polynomial. To facilitate the computation of knot invariants the diagrams may be simplified by triangle elimination moves and by Reidemeister moves. For purpose of better comparison between knotoid and knotting approach we use the same directions of projections for both approaches. Looking at Fig. 6a,b and c, d it is not difficult to observe that the knotoid approach involves a wider spectrum of knotoid types when compared the spectrum of knots, even if we group all knotoids or knots with more than 6 crossings into one category. Additionally, we observe that in both cases the dominant knot type involves structures with three crossings; the trefoil knot for the case of the stochastic closure and the knot-type knotoid k3.1°. Further, one can see on the knotoid map that all territories corresponding to knot type knotoids (in this case k3.1°) are contained within corresponding knot type territories on the knot map. The situation is different though for proper types of knotoids as only one of their antipodal territories is carried through to the knot map with the appropriate change of topological notation.

One may recover the knotoid map from the knotting map by “twinning” of all territories of non-dominant knots, i.e. by adding the antipodal territories to territories of non-dominant knots. While the overall shapes of territories which indicate projection directions that generate diagrams of non-dominant knots and knotoids are maintained. The knotoid territories are divided into a larger number of different topological subclasses. This indicates that knotoid approach provides a finer topological distinction between various topological states than knotting approach. Comparing the knotoid approach to the knotting approach, we can see that one obtains similar but not identical information about the topology of analysed chains by using these two methods. It is comforting to see that the method that does not invoke closure of analysed configurations captures similar topological features as a method that always closes analysed trajectories.

Very recently Alexander *et al*., used virtual knot formalism to analyse configurations of knotted proteins. Although virtual knot formalism requires closure of the diagrams derived from projections of open knots, the virtual closure imposes very similar limitations on topology of the virtual knot diagrams as the end points of knotoid diagrams. Therefore the methods of studying the topology of a protein chain using the knotoids approach and the virtual knots approach^{35} produce very similar results.

### Local entanglement: Slipknotoids and knotted cores

Characterization of the global knottedness of proteins by knotoid or chain closure approach tells us only what type of open knot forms a given entire protein. This method does not inform us where the knotted portion is located within the protein structure, how large is the knotted core or whether a given polypeptide chain contains subknots, which are less complex knots located within more complex knots^{36}. This information can be provided though when one analyses knottedness of every possible subchain of a given protein^{37, 38}. The analysis of every subchain provides what is known as the *knotting fingerprint* of a given protein, which is usually presented in a form of triangular matrix where every entry in the matrix informs what is the dominant knot type for a given subchain^{30}. The dominant knotoid types are indicated in the provided coloured scheme (see Fig. 7a). We decided to compare knotting fingerprints of several knotted proteins with their knotoid fingertprints. Since for a given number of crossings in a minimal crossing diagram there are more types of knotoids than of knots, knotoid approach can provide a finer characterization of protein topology than the knotting approach necessitating chain closure.

Given a protein chain, we start studying its local entanglement behaviour by clipping the chain one alpha-Carbon atom at a time, starting from either of the termini of the chain. Each time we obtain a shorter chain, which we analyze in terms of the knotoid technique that is, by projecting the trimmed chain along 1000 random directions and then computing the Jones polynomial of each projection. Notice that as we trim the chain, we observe that the knotoids types change. The shortest subchains that form a given knotoid type are cores of that particular knotoid and their size and position is indicated by entries of a given colour that are closest to the diagonal of the matrix.

Our working example here is the protein DehI (*α*-haloacid dehalogenase) whose backbone forms a 6_{1} knot, the most complex protein knot known so far^{39}. In ref. 30 all subchains of DehI were analyzed using the stochastic closure technique and it was observed that, in addition to the 6_{1} dominant knot formed by the entire protein, smaller subchains formed 4_{1} and 3_{1} knots. Interestingly, after analyzing DehI using the knotoid approach and comparing the resulting knotoid fingerprint to its knot fingerprint (see Fig. 7a,b) we observe that the knotoid approach exhibits a much richer diversity in terms of the different topological forms.

Going into further detail, the dominant knotoid type of the entire chain is the knot-type knotoid 6.1. Recall that knot-type knotoids have both endpoints in the same region of the plane and so they always yield the same knot regardless of how one chooses to close the knotoid diagram but without introducing additional crossings. Returning to the comparison, of knotoid and knotting figerprints, we can see that the regions of the knotoid fingerprint that correspond to the trivial knotoid carry through to the matrix of the knot fingerprint as trivial knots. Furthermore, the regions of non-trivial knots of the knot fingerprint of DehI are contained within regions of the knotoid fingerprint. In addition, there are new regions in knotoid fingerprints that correspond to non-trivial proper knotoids that either border regions of knot types knotoids (e.g. the thick regions of of 2.1^{−} knotoids that encircle the 3.1°) or show up as islands within trivial knotoids (the small slices of 2.1^{−} and 3.2^{−}). These regions on knotoid fingerprints that are not visible on knotting fingerprints correspond to polypeptide portions that are not completely knotted and we propose to call them *pre-knots*.

Figure 8 explains how by progressively trimming one end of an entangled subchain that forms a trefoil slipknot one can pass from trivial knotoid to pure knotoid 2.1, then to knot type knotoid 3.1°, then to pure knotoid 2.1 and finally to trivial knotoid again. The starting diagram in Fig. 8 is based on ref. 30.

It is interesting to compare the size of the knotted core to the size of knotoid core. The size of knotted core in knotting fingerprint corresponds to the size of the knot type knotoid on knotoid fingerprint. Pure knotoids though have smaller cores than knot type knotoids. This is a trivial consequence of the fact that trimming the subchain that forms a knot type knotoid one usually passes first into a subchain that forms pure knotoid before passing to a subchain that forms trivial knotoid.

## Discussion

In order to study the topology of a protein chain there are two options so far. The first one is to study the protein as a closed chain, so one has to determine a way to close the chain and then study the resulting knot. The second option would be to study all possible projections of the open chain resulting in diagrams of knotoids.

The advantage of using the knotoids approach is the increased sensitivity of entanglement detection. In this paper we have shown that the immediate impact of this can be observed in the subchain analysis and local entanglement study of a protein chain. To be more precise, our study indicated that the knotoids approach produces more refined fingerprints of the protein chains. Additionally, the knotoids method detects in higher detail the minimal length of the chain that can remain tangled giving thus a more accurate overview of the entanglement pattern of each protein chain. Knotoid fingerprints detect also the protein regions with topological entanglement that is not detected by knotting fingerprints. We propose to call these protein portions as pre-knots. Figures S3–S5 show knotoid fingerprints of 12 other proteins that were analyzed with knotting fingerprint by Sułkowska *et al*.

## Methods

For our analysis, we import for each protein its structure from the PDB and convert it to *xyz*-coordinates. We reconstruct the protein chain using only *Cα* atoms and then choose its direction of projection. Once the direction is chosen we perform triangle simplification moves^{34} that do not pass through two lines that are parallel to the direction of projection and which go through the end points of the reconstructed protein chain. This procedure keeps the knotoid type while reducing the number of nugatory crossings, which in turn facilitates the computation of Jones polynomials. We then project the protein chain and evaluate the knotoid diagram. Repeated applications of the three Reidemeister moves allows us to reduce the number of crossings of the corresponding knotoid diagram.

For each projection, we compute its Jones polynomial for the case of knotoids^{14, 15} in the following way. We smooth each crossing of the knotoid diagram \(K\) using the following rules:

Equations 1–3 comprise the extension of the Kauffman bracket polynomial to the case of knotoids^{15}. The diagrams involved in (1) are identical except at one crossing. The second rule means that whenever we have a disjoint circle in a state, we can remove it and multiply by (*A*
^{2} − *A*
^{−2}). The Jones polynomial for a knotoid diagram *K* is then given by the following:

where 〈*K*〉 is the Kauffman bracket polynomial of *K* and \({\rm{w}}r(K)\) is the writhe of *K*. The Jones polynomial for knotoids is computed by a routine that follows^{40}. For the subchain analysis, we remove one bead at a time starting from the *C*-terminus of the chain and we apply to the resulting chain the above procedure. Once we identify the dominant knotoid for each subchain, we proceed and create the knotoid fingerprint using R.

## Additional information

**Publisher's note:** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## References

- 1.
Dean, F. B., Stasiak, A., Koller, T. & Cozzarelli, N. R. Duplex DNA knots produced by

*Escherichia coli*topoisomerase I. Structure and requirements for formation.*J. Biol. Chem.***260**, 4975–4983 (1985). - 2.
Connolly, M. L., Kuntz, I. D. & Crippen, G. M. Linked and threaded loops in proteins.

*Biopolymers***19**, 167–1182 (1980). - 3.
Mansfield, M. L. Are there knots in proteins?

*Nat. Struct. Biol.***1**, 213–214 (1994). - 4.
Spengler, S. J., Stasiak, A. & Cozzarelli, N. R. The stereostructure of knots and catenanes produced by phage lambda integrative recombination: implications for mechanism and DNA structure.

*Cell***42**, 325–34 (1985). - 5.
Sumners, D. W., Ernst, C., Spengler, S. J. & Cozzarelli, N. R. Analysis of the mechanism of DNA recombination using tangles.

*Q. Rev. Biophys.***28**, 253–313 (1995). - 6.
Crisona, N. J., Weinberg, R. L., Peter, B. J., Sumners, D. W. & Cozzarelli, N. R. The topological mechanism of phage lambda integrase.

*J. Mol. Biol.***289**, 747–75 (1999). - 7.
Buck, D. & Flapan, E. Predicting knot or catenane type of site-specific recombination products.

*J. Mol. Biol.***374**, 1186–99 (2007). - 8.
Olorunniji, F. J.

*et al*. Gated rotation mechanism of site-specific recombination by*ϕ*C31 integrase.*Proc. Natl. Acad. Sci. U.S.A.***109**, 19661–6 (2012). - 9.
Arsuaga, J.

*et al*. DNA knots reveal a chiral organization of DNA in phage capsids.*Proc. Natl. Acad. Sci. U.S.A.***102**, 9165–9169 (2005). - 10.
Marenduzzo, D.

*et al*. DNA–DNA interactions in bacteriophage capsids are responsible for the observed DNA knotting.*Proc. Natl. Acad. Sci. U.S.A.***106**, 22269–22274 (2009). - 11.
Reith, D., Cifra, P., Stasiak, A. & Virnau, P. Effective stiffening of DNA due to nematic ordering causes DNA molecules packed in phage capsids to preferentially form torus knots.

*Nucleic Acids Res.***40**, 5129–5137 (2012). - 12.
Adams, C. C.

*The Knot Book*(New York: Freeman, 1994). - 13.
Millett, K. C., Rawdon, E. J., Stasiak, A. & Sułkowska, J. Identifying knots in proteins.

*Biochem. Soc. Trans.***41**, 533–7 (2004). - 14.
Turaev, V. Knotoids.

*Osaka J. Math.***49**, 195–223 (2012). - 15.
Gügümcü, N. & Kauffman, L. H. New Invariants of Knotoids. European J. Combin. (in press, 2017); arXiv:1602.03579.

- 16.
Kauffman, L. H.

*Knots and Physics*. Series on Knots and Everything Vol. 53 (World Scientific, 2013). - 17.
Kauffman, L. H. New invariants in the theory of knots.

*Am. Math. Monthly***95**, 195–242 (1998). - 18.
Alexander, J. W. A Lemma on System of Knotter Curves.

*Proc. Natl. Acad. Sci. U.S.A.***9**, 93–95 (1923). - 19.
Jones, V. F. R. Hecke algebra representations of braid groups and link polynomials.

*Ann. Math.***126**, 335–388 (1987). - 20.
Freyd, P.

*et al*. A new polynomial invariant of knots and links.*Bull. Am. Math. Soc.***12**, 239–246 (1985). - 21.
Przytycki, J. H. & Traczyk, P. Invariants of links of Conway type.

*Kobe J. Math.***4**, 115–139 (1987). - 22.
Taylor, W. R. A deeply knotted protein structure and how it might fold.

*Nature***406**, 916–919 (2000). - 23.
Virnau, P., Mirny, L. A. & Kardar, M. Intricate knots in proteins: Function and evolution.

*PLoS Comput. Biol.***2**, 1074–1079 (2006). - 24.
Mallam, A. L. & Jackson, S. E. Probing nature’s knots: the folding pathway of a knotted homodimeric protein

*. J. Mol. Biol.***359**, 1420–1436 (2006). - 25.
Mallam, A. L., Morris, E. R. & Jackson, S. E. Exploring knotting mechanisms in protein folding.

*Proc. Natl. Acad. Sci. U.S.A.***105**, 18740–18745 (2008). - 26.
Yeates, T. O., Norcross, T. S. & King, N. P. Knotted and topologically complex proteins as models for studying folding and stability.

*Curr. Opin. Chem. Biol.***11**, 595–603 (2007). - 27.
Sułkowska, J. I., Noel, J. K. & Onuchic, J. N. Energy landscape of knotted protein folding.

*Proc. Natl. Acad. Sci. U.S.A.***109**, 17783–17788 (2012). - 28.
Sułkowska, J. I., Sułkowski, P., Szymczak, P. & Cieplak, M. Stabilizing effect of knots on proteins.

*Proc. Natl. Acad. Sci. U.S.A.***105**, 19714–19719 (2008). - 29.
Dabrowski-Tumanski, P., Stasiak, A. & Sułkowska, J. I. In search of functional advantages of knots in proteins.

*PLoS One***11**, e0165986 (2016). - 30.
Sułkowska, J. I., Rawdon, E. J., Millett, K. C., Onuchic, J. N. & Stasiak, A. Conservation of complex knotting and slipknotting patterns in proteins.

*Proc. Natl. Acad. Sci. U.S.A.***109**, E1715 (2012). - 31.
Lua, R. C. & Grosberg, A. Y. Statistics of knots, geometry of conformations, and evolution of proteins.

*PLoS Comput. Biol.***2**, 350–357 (2006). - 32.
Millett, K. C., Dobay, A. & Stasiak, A. Linear random knots and their scaling behavior.

*Macromolecules***38**(2004). - 33.
Jamroz, M.

*et al*. KnotProt: a database of proteins with knots and slipknots.*Nucleic Acids Res.***11**, 1–9 (2014). - 34.
Koniaris, K. & Muthukumar, M. Self-entanglement in ring polymers.

*J. Chem. Phys.***95**, 2873–2881 (1991). - 35.
Alexander, K., Taylor, A. J. & Dennis, M. R. Proteins analysed as virtual knots.

*Sci. Rep.***7**, 42300 (2017). - 36.
Rawdon, E. J., Millett, K. C. & Stasiak, A. Subknots in ideal knots, random knots, and knotted proteins.

*Sci. Rep.***5**(2015). - 37.
King, N. P., Yeates, E. O. & Yeates, T. O. Identification of rare slipknots in proteins and their implications for stability and folding.

*J. Mol. Biol.***373**, 153–166 (2007). - 38.
Taylor, W. R. Protein folds, knots and tangles. In Calvo, J., Millet, K., Rawdon, E. & Stasiak, A. (eds) Physical and numerical models in knot theory, 171–202 (World Scientific, Singapore, 2005).

- 39.
Bölinger, D.

*et al*. A Stevedore’s Protein Knot.*PLoS Comput. Biol.***6**, e1000731 (2010). - 40.
Bar-Natan, D., Morrison, S.

*et al*. The Knot Atlas. http://katlas.org. - 41.
R Core Team, R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2016, https://www.R-project.org.

## Acknowledgements

This work was funded in part by the Leverhulme Trust (RP2013-K-017 to A.S.) and by the Swiss National Science Foundation (31003A-138267 to A.S.).

## Author information

### Author notes

Dimos Goundaroulis and Julien Dorier contributed equally to this work.

### Affiliations

#### Center for Integrative Genomics, University of Lausanne, 1015, Lausanne, Switzerland

- Dimos Goundaroulis
- , Julien Dorier
- , Fabrizio Benedetti
- & Andrzej Stasiak

#### Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland

- Dimos Goundaroulis
- & Andrzej Stasiak

#### Vital-IT, SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland

- Julien Dorier
- & Fabrizio Benedetti

### Authors

### Search for Dimos Goundaroulis in:

### Search for Julien Dorier in:

### Search for Fabrizio Benedetti in:

### Search for Andrzej Stasiak in:

### Contributions

A.S. and D.G. designed research; D.G., J.D., F.B. performed the research; D.G., J.D., F.B. and A.S. analyzed the data; D.G. and A.S. wrote the paper. All authors reviewed the manuscript.

### Competing Interests

The authors declare that they have no competing interests.

### Corresponding author

Correspondence to Andrzej Stasiak.

## Electronic supplementary material

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

## About this article

## Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.