Structural genomic applied on the rust fungus Melampsora larici-populina reveals two candidate effector proteins adopting cystine-knot and nuclear transport factor 2-like protein folds

Rust fungi are plant pathogens that secrete an arsenal of effector proteins interfering with plant functions and promoting parasitic infection. Effectors are often species-specific, evolve rapidly, and display low sequence similarities with known proteins or domains. How rust fungal effectors function in host cells remains elusive, and biochemical and structural approaches have been scarcely used to tackle this question. In this study, we used a strategy based on recombinant protein production in Escherichia coli to study eleven candidate effectors of the leaf rust fungus Melampsora larici-populina. We successfully purified and solved the three-dimensional structure of two proteins, MLP124266 and MLP124017, using NMR spectroscopy. Although both proteins show no sequence similarity with known proteins, they exhibit structural similarities to knottin and nuclear transport factor 2-like proteins, respectively. Altogether, our findings show that sequence-unrelated effectors can adopt folds similar to known proteins, and encourage the use of biochemical and structural approaches to functionally characterize rust effector candidates.


INTRODUCTION
To infect their host, filamentous pathogens secrete effector proteins that interfere with plant physiology and immunity to promote parasitic growth (Win et al., 2012). Although progresses have been made in the past decade, how effectors act in host cells remains a central question in molecular plant pathology. Effectors of filamentous pathogens are secreted and either stay in the apoplast or penetrate inside the cell cavity through specialized infection structures such as haustoria (Petre and Kamoun, 2014). Effectors are detected by the host plant by two layers of immune receptors at the cell surface or inside the cell, which triggers plant defence response (Jones and Dangl, 2006).
To evade recognition by the host immune system, pathogen effector genes evolve rapidly, notably through the diversification of the amino acid sequence of the encoded proteins (Persoons et al., 2017). Such diversification impairs the identification of amino acid motifs or sequences similar to known proteins, which could give insights on effectors function inside the host cell (Franceschetti et al., 2017). However, effector protein structures may be more conserved than their primary sequences. Indeed, several superfamilies of effector proteins, such as the fungal MAX or the oomycete WY-domain families, have members showing similar fold but divergent primary sequences (de Guillen et al. 2015;Win et al. 2012) the fold being conserved probably due to the strong link between protein structure and function (Illergård et al., 2009). Thus, structure resolution of candidate effectors represents a powerful approach to gain insights into effector functions, but also to define and identify structural effector families. Research efforts have been recently set in this direction and applied in order to determine the structure of several effector proteins (de Guillen et al., 2015;Ve et al., 2013;Wang et al., 2007;Wirthmueller et al., 2013).
Rust fungi (Pucciniales, Basidiomycetes) constitute the largest group of obligate biotrophic pathogens, and they infect almost all plant families, causing serious damages to cultures (Dean et al., 2012;Lorrain et al., 2019). In the past decade, progresses made in rust fungal effector biology were mostly based on genomics and transcriptomics, and common but not exclusive features have been used to identify candidate effectors in predicted secretomes (Duplessis et al., 2014). Catalogues of hundreds to thousands of candidate effectors have been unravelled. Generally, criteria such as the protein size, the presence of a predicted secretion signal, the absence of functional information, the richness in cysteines, the transcriptional regulation during infection, and/or the presence of signatures of rapid evolution, have been used as a basis for identifying candidate effectors (Cantu et al., 2013;Hacquard et al., 2012;Nemri et al., 2014;Saunders et al., 2012). Due to the difficulty to genetically manipulate rust fungi and their host plants, only a handful of rust effectors have been characterized so far (Petre and Kamoun, 2014). Apart from their avirulence properties (i.e. recognition by plant immune receptors inside the cell), the functions of these rust effectors remain unknown (Lorrain et al., 2019). Effectoromic pipelines have been recently established to prioritize rust candidate effectors based on heterologous systems to get insights about their plant cellular and molecular targets (Germain et al., 2018;Lorrain et al., 2018;Petre et al., 2015;Petre et al., 2016;Qi et al., 2018). But so far, only one study has set up a small-scale effort using production of candidate effectors in bacterial system to unravel their structure and function (Zhang et al., 2017). The identification of plant targets of effectors associated with structure/function analyses of recombinant effectors can reveal how they interact with plant partners and how co-evolution with the plants promotes the diversification of surfaceexposed amino acids (Boutemy et al., 2011;Chou et al., 2011;Leonelli et al., 2011;Wang et al., 2007;Win et al., 2012;Yaeno et al., 2011). The avirulence proteins AvrL567, AvrM, and AvrP of the flax rust fungus Melampsora lini are the three effector structures described in rust fungi so far (Ve et al., 2013;Wang et al., 2007;Zhang et al., 2018).
The poplar rust fungus Melampsora larici-populina is the causal agent of the poplar leaf rust disease. It causes important damages in poplar plantations across Europe (Pinon and Frey, 2005). It is also a model pathosystem to study tree-microbe interaction. As such, recent research efforts have identified and initiated the characterization of M. larici-populina candidate effectors using transcriptomics and functional screens in heterologous plant systems such as Nicotiana benthamiana and Arabidopsis thaliana (Gaouar et al., 2016;Germain et al., 2018;Petre et al., 2015;Petre et al., 2016). These studies highlighted that M. larici-populina candidate effectors target multiple cell compartments and plant proteins. The same conclusions have been drawn from similar effectoromic screens in other rust fungi (Lorrain et al., 2018). It remains to make progress in the determination of the functions of unravelled priority effectors.
In this study, we combined biochemical and structural approaches to explore further M. larici-populina candidate effector proteins. To this end, we used Escherichia coli as an heterologous system to express eleven candidate effectors that were previously described to target particular cell compartments and/or to interact with specific plant proteins and/or that are homologues of known rust avirulence effectors (Gaouar et al., 2016;Germain et al., 2018;Petre et al., 2015;Petre et al., 2016). Among the eleven selected proteins, only three were successfully produced and purified from E. coli as recombinant proteins. We could determine the nuclear magnetic resonance (NMR) structures of two of them, highlighting structural similarities with Nuclear Transport Factor 2-like proteins and with Knottins.

Three out of eleven candidate effectors were successfully produced in E. coli and purified
To investigate the structural properties of the 11 selected candidate effectors (Figure 1), we first aimed at obtaining the recombinant proteins. To this end, we first cloned the sequence coding the mature proteins (i.e. without signal peptide) into pET-26b (for Mlp124111, Mlp124478, Mlp124530, Mlp124561, Mlp37347, Mlp109567, Mlp107772, and Mlp124202) or pET-28a (for Mlp124266, Mlp124499, and Mlp124017) expression vectors, in order to incorporate a C-terminal 6-histidine tag that allows protein purification by immobilized metal ion affinity chromatography (IMAC) (Table S1). We then inserted the expression vectors into the E. coli Rosetta2 (DE3) pLysS strain and induced the production of the His-tagged recombinant proteins. Small-scale expression assays indicated that nine out of the eleven proteins accumulated using a standard induction protocol (i.e. addition of 100 µM IPTG in mid-exponential growth phase and further growing for 3 to 4 hours at 37°C). Among those, five (MLP124111, MLP124561, MLP37347, MLP107772 and MLP124202) accumulated in the insoluble protein fraction, five (MLP124478, MLP124530, MLP124017, MLP124266 and MLP124499) were expressed as soluble proteins and one (MLP109567) is not expressed (Figure 2). Despite the use of other E. coli expression strains (SoluBL21 (DE3), Origami2 (DE3) pLysS, as well as Rosetta-Gami2 (DE3) pLysS) which are known to increase the solubility of recombinant proteins or to improve the maturation of disulfide bridges, and despite the modification of the induction conditions (induction time, temperature, and osmolarity), we were neither able to enhance the expression of MLP109567 nor the solubility of MLP124111, MLP124561, MLP37347, MLP107772 and MLP124202. As MLP124017, MLP124266, and MLP124499 were expressed in the soluble protein fraction of E. coli and enough stable along the purification procedure, we decided to retain these three proteins for further analyses. We thus purified the His-tagged recombinant proteins in native conditions using a two-step protocol including first affinity chromatography, then exclusion size chromatography ( Figure 3). The purified proteins, yielding respectively 50 mg/L (cell culture), 0.5 mg/L and 0.5 mg/L for MLP124017, MLP124266 and MLP124499, respectively, eluted in size exclusion chromatography as a single peak corresponding to an estimated apparent molecular mass compatible with a monomeric organization.

MLP124266 is a thermosoluble protein that exhibits a cystine-knot
From a previous study, we reported that the Mlp124266 and Mlp124499 genes are strongly expressed and induced during poplar leaf colonization by M. larici-populina, and belong to large multigene families of 13 and 31 members, respectively, in M. larici-populina (Hacquard et al., 2012). Mlp124266 and Mlp124499 encode mature proteins of 69 and 50 amino acids, respectively ( Figure S1A). MLP124266 has an N-terminal part enriched in charged residues and a C-terminal region that possesses six conserved cysteines predicted to form a cystine-knot structure. This typical protein organization is shared by all members of the family as well as by alleles of M. lini AvrP4 (Barrett et al., 2009;Persoons et al., 2014). In MLP124499, several acid residues are found in the N-terminal part whereas the C-terminal part contains three conserved cysteines ( Figure S1B). Prediction programs indicate that all members of both protein families exhibit highly conserved N-terminal signal peptides for protein secretion. Following the production and the purification of both MLP124266 and MLP124499, we undertook a structural characterization of each recombinant protein using a NMR spectroscopy approach. Standard homonuclear 2D experiments and 15 N-edited TOCSY-HSQC and NOESY-HSQC experiments carried out on MLP124499 (Table S2) allowed the assignment of 1 H and 15 N resonances except for the four N-terminal residues ( Figure S2B). However, several minor peaks were observed, especially for Ala14-Glu16, Gly25-Gln26, Glu30, Asp49 residues, suggesting the presence of multiple forms or conformations ( Figure S2A). Changing the temperature and the ionic strength, or adding dithiothreitol failed to improve the quality of the MLP124499 NMR spectra. Therefore, the structural models obtained were not well enough defined to characterize a specific fold for this protein.
For MLP124266, the assignment of 1 H, 15 N and 13 C resonances has been obtained for all residues except the five N-terminal amino acids ( Figure S2A) and its 3D structure could further be modelled by the NMR derived constraints (Figure 4; Figure S3). This analysis showed that the Cys36-Leu69 C-terminal region exhibits a typical cystine-knot structure involving three disulfide bonds (Cys39-Cys55, Cys44-Cys58, and Cys50-Cys64), a β-sheet composed of anti-parallel strands between Thr42-Cys44, Gly57-Ser59 and Val63-Val65, and a short α-helix formed by the Gln49-Ala52 segment. In contrast, the Met1-Asp35 N-terminal region displays large structural disorder, as shown by the superposition of the 20 NMR models ( Figure  S3). Very few NOE correlations were indeed observed for residues 1 to 35. A few sequential and medium-range NOE correlations characteristic of transient helical conformations can however be noticed ( Figure S3) and explain the presence of short secondary structures in some of these 20 models. Indeed, residues 8-17 and 28-30 exhibit helical structures in 30 to 50 % and around 25 % of the models, respectively. Moreover, backbone dynamic properties of MLP124266 have been investigated by 15 N relaxation measurements. Heteronuclear 1 H-15 N NOE values showed a contrasted profile with low values for N-terminal residues (indicative of a flexible structure) and high values for C-terminal residues (indicative of a rigid structure). Indeed, amino acids Asp6 to Gly38 and Cys39 to Leu69 presented heteronuclear NOE averaged values of 0.26 ± 0.05 and 0.66 ± 0.10, respectively. Interestingly 6 out of the 8 cysteines are gathered in the C-terminal region between Cys39 and Cys69, following a spacing (Cys-X2-7-Cys-X3-10-Cys-X0-7-Cys-X1-17-Cys-X4-19-Cys) typical of cystine-knot structures, (i.e., three intricate disulfide bridges that confer very high stability to proteins; Figure 4; (Daly et al., 2011)). Hence, it is likely that rigidity originates from the structure formed by these cysteines that are highly conserved in the protein family, as indicated by a Consurf analysis ( Figure S1, Figure S4). Thus, we sought to determine whether these disulfides are formed and whether they influence the stability and/or the oligomerization state of the protein by covalent bonds. A single peak corresponding to the theoretical mass of MLP124266 monomer was obtained by mass spectrometry (data not shown). The titration of free thiol groups in an untreated recombinant MLP124266 gave an averaged value of 1 mole SH per mole of protein. Considering the presence of 8 cysteines in the protein, these results are consistent with the existence of three intramolecular disulfide bridges ( Figure S3). The thermostability of recombinant protein was estimated by heating the protein for 10 min at 95°C. The fact that the protein remained in solution (i.e. no precipitation was observed) led us to conclude that it is thermosoluble. In order to investigate the role of the disulfides for such property, we should compare the results obtained with an oxidized and a reduced protein. However, as assessed by thiol titration experiments, we failed to obtain a complete reduction of these disulfides despite extensive incubation of the protein at high temperatures, in denaturing and reducing environments. Altogether, these results indicate that recombinant MLP124266 is properly folded by E. coli, and that the disulfide bridges, which are somehow resistant to reduction, confer a high rigidity and stability to the protein.

MLP124017 is part of the Nuclear Transport Factor 2-like protein superfamily
MLP124017 is a small-secreted protein (167 amino acids with its signal peptide; 150 amino acids in its mature form, with a molecular mass of 18 kDa) of unknown function, highly expressed during infection of poplar leaves by M. larici-populina (Hacquard et al., 2012). MLP124017 shares sequence similarity with neither other M. larici-populina nor other rust fungal proteins. In a previous study, we demonstrated the nucleocytoplasmic localization of MLP124107 in N. benthamiana and its interaction with poplar TOPLESS-related 4 protein (Petre et al., 2015). To further investigate MLP124017 structure and to get insights about its function, we first attempted to solve its 3D structure by crystallization coupled to X-ray diffraction. We were unable to obtain exploitable diffracting crystals despite the use of different versions (untagged or N-or C-terminal His-tagged) of MLP124017 protein and therefore switched to NMR. The recombinant 15 N and 13 C-labelled MLP124017 protein was used for structure determination by two-and three-dimensional NMR experiments (Table S3). The assigned 1 H, 15 N-HSQC spectra were well dispersed but the peaks for residues from the N-terminal 1-14 and 86-95 segments were missing ( Figure S5). From preliminary structures, the production of a truncated recombinant protein for the first eight N-terminal residues that could mask residues 86-95 did not improve the data. The solution structure of MLP124017 was determined based on 1727 NOE-derived distance restraints, 214 dihedral angle restraints and 102 hydrogen bond restraints. All proline residues have been determined to be in a trans-conformation according to the 13 Cβ chemical shift at 32.21; 32.46; 31.02 and 32.40 ppm for Pro36; Pro51; Pro54 and Pro146 respectively. The best conformers with the lowest energies, which exhibited no obvious NOE violations and no dihedral violations > 5° were selected for final analysis. The Ramachandran plot produced shows that 99.6% of the residues are in favoured regions (Table S4). MLP124017 structure is composed of a α+β barrel with seven β-strands forming one mixed β-sheet, four β-hairpins, four β-bulges, and four α-helices ( Figure 5A). Residues 1-14 and 150-151 having missing assignments are not defined in the final models. This arrangement of secondary structure produces a cone-shaped fold for the protein, which generates a distinctive hydrophobic cavity ( Figure 5B). To identify potential structural homologs of MLP124017, we performed structural similarity searches using the Dali server (Holm and Rosenström, 2010). Queries identified SBAL_0622 (PDB code 3BLZ) and SPO1084 (PDB code 3FKA) as the closest structural homologs with the highest Z-score of 6.0 and 6.3, respectively, and a RMSD of 4.1 and 3.5Å, respectively ( Figure S6). These two proteins, which are from the bacteria Shewanella baltica and Ruegeria pomeroyi, have no known function, but share a common Nuclear Transport Factor 2-like (NTF2) fold. The NTF2 superfamily comprises a large group of proteins that share a common fold and that are widespread in both prokaryotes and eukaryotes (Eberhardt et al., 2013). Taken together these results show that although MLP124017 do not share sequence similarities or domain with other proteins in sequence databases, its structure is similar to proteins of the NTF2 folding superfamily.

DISCUSSION
In this study, we have set up a small-throughput effectoromics pipeline based on recombinant protein production and structural characterization to get insights on eleven candidate effectors of the poplar leaf rust fungus M. larici-populina. Out of the eleven selected effectors, three were successfully purified as recombinant proteins, and two were structurally characterized by NMR spectroscopy. MLP124266 is a homolog of the M. lini AvrP4 effector protein (Catanzariti et al., 2006), and we showed that it exhibits a cystine-knot (or knottin) structural motif commonly encountered in small disulfiderich proteins. MLP124017 is an orphan protein in M. larici-populina with no known ortholog in Pucciniales. MLP124017 physically associates with the poplar TOPLESS-related protein 4 (TRP4) (Petre et al., 2015), and we showed that it exhibits a fold similar to two bacterial proteins that belong to the Nuclear-Transport factor 2-like protein superfamily.
Recombinant protein production in E. coli is a valuable approach to perform biophysical and biochemical analyses of candidate effectors (Franceschetti et al., 2017;Zhang et al., 2017). However, we have faced issues for the production of soluble small-cysteine rich proteins in this system. Indeed, among the eleven candidate effectors screened for expression only three were found in the soluble protein fraction and stable enough along the purification procedure. Although we tested different E. coli strains and protein expression induction protocols, the eight other candidate effectors were either not expressed at all or expressed as inclusion bodies. It is possible to purify recombinant proteins from inclusion bodies by using denaturing extraction conditions and further refolding proteins (Palmer and Wingfield, 2012). However, this approach is not recommended for structural analysis as the refolding of the proteins in vitro may alter folding. Another limit of prokaryote systems to produce eukaryote proteins is the lack of post-translational modifications such as methylations. An alternative to this is to use the yeast Pichia pastoris, which has proven useful for several fungal effectors such as Leptospheria maculans AvrLm4-7 or Cladosporium fulvum Avr2 and Avr4 (Blondeau et al., 2015;Rooney et al., 2005). We have tried this system to produce the candidate effector (MLP107772), but without success. Nevertheless, this system may be useful and deserves to be considered as an alternative to assay other rust effectors for which we were not able to obtain production in E. coli.
We showed that MLP124266 possesses two distinct regions with contrasted structural properties. The C-terminal part is rigidified by a cystine-knot motif, whereas the N-terminal part is globally flexible. The knottin folded proteins display a variety of functions such as venoms and spider toxins (Garcia, 2004;Lee and MacKinnon, 2004) but also antimicrobial properties such as the cyclotides (Tam et al., 1999). Some are also found to interact with receptor like protease inhibitors found in plants, insects and plant parasites (Kim et al., 2009). The three disulfide bridges within the C-terminal part of MLP124266 confer its rigidity, as confirmed by our NMR dynamic study, and probably contribute to the high protein stability (Daly et al., 2011). MLP124266 presents a β-sheet structure typical of knottins, but interestingly it also has an additional helix between β2 and β3 strands. To our knowledge, the presence of such a helix in knottins has been reported in cyclotides only, and more precisely in bracelet cyclotides containing six or seven residues in the loop between Cys(III) and Cys(IV) (Chen et al, 2005). This loop often contains an alanine, which favours the formation of the helix as well as a highly conserved glycine allowing its connection to the cystine knot (Rosengren et al., 2003). Interestingly, the loop in MLP124266 has such residues, i.e. Ala 52 and Gly 54 , but consists of four residues only. In Viola odorata cycloviolacin O2 (cO2), the additional helix is located in a hydrophobic loop that interacts with the membrane-mimicking micelles (Wang et al., 2009). Therefore, it might help disrupting membranes and thus contribute to the cytotoxicity activity of cO2 and play a role in plant defence. In MLP124266, the helical turn is not particularly hydrophobic ( Figure S4B) and may not have these properties. To our knowledge, MLP124266 is the first fungal protein to present a knottin-like structure (Postic et al., 2018). It would be interesting to collect structural data from other potential fungal knottins to find out whether the additional helix is always present and to clarify its role.
Several properties of MLP124266 N-terminal region also deserve to be pointed out. This region, approximately extending up to residue 35 and thus representing half of the primary sequence, globally presents a higher flexibility, as demonstrated by the NMR dynamic results. Nevertheless, it is not totally unstructured, as the heteronuclear NOE values remain positive and as a few residues exhibit gamma turn propensity. This results in the presence of short helical turns, especially in the segment 8-systematically found in all NMR models and their limits are very variable, several secondary structure prediction softwares predict a helix between residues 11 and 18. Moreover, the proportion of conserved amino acids is strikingly high in this N-terminal region, much higher than in the folded Cterminal part ( Figure S4). Altogether, these results suggest that the relatively flexible N-terminal region have some propensity to form helical structures, which may support a biochemical role that remains to be elucidated. Interestingly, other effectors possess a predicted disordered N-terminal region (Boutemy et al., 2011). For instance, M. lini effectors AvrL567 and AvrM have predicted disordered Nterminal regions that are susceptible to protease degradation (Catanzariti et al., 2006;Wang et al., 2007). Flexible folds are known to be adaptable linkers that favour the ability to bind partners (Chouard, 2011). As the N-terminus of many cytoplasmic effectors is anticipated to mediate protein entry into host cells (Petre and Kamoun, 2014), it is tempting to speculate that this flexible part may bind a target important for cell entry. MLP124266 belongs to a rust multigene family reported in the Melampsoraceae for which population data has been collected. The survey of synonymous and nonsynonymous nucleotide substitutions in isolates of M. lini, M. larici-populina and in several Melampsora spp. identified positions under positive selection in between strictly conserved cysteines in the C-terminus region of the proteins (Barrett et al., 2009;Hacquard et al., 2012;Persoons et al., 2014;Van der Merwe et al., 2009). In the flax rust fungus M. lini, the homolog of MLP124266 is an avirulence protein that evolved under the pressure of the plant host immune system. The resolution of the MLP124266 (AvrP4 homolog) structure suggests that its compact and structured C-terminal region may be more prone to detection by plant immune receptors.
We solved the structure of MLP124017 by NMR spectroscopy. MLP124017 showed a fold similar to members of the NTF2-like superfamily. The NTF2-like superfamily is a group of protein domains sharing a common fold, but showing no sequence similarity. MLP124017 is structurally similar to two bacterial proteins, despite the lack of sequence similarity. The structures of these two bacterial proteins consist of a β-sheet surrounding a binding pocket and α-helices acting as a lid (Marcos et al., 2017). The NTF2 family regroups catalytic and non-catalytic proteins that contain cone-like structured proteins with a cavity that often act as a molecular container involved in a wide range of cellular functions (Eberhardt et al., 2013). Interestingly, the cone-shaped structure of MLP124017 is widespread across both prokaryotes and eukaryotes. The first proteins of the NTF2 family were reported to play a role in the transport of molecules from the cytoplasm to the nucleus. Arabidopsis NTF2 protein is required to import nuclear proteins via the recognition of a nuclear localization signal (NLS). This protein also plays a role in the nuclear import of the small-GTPase Ran-GDP that is a central protein in various signal transduction pathways (e.g. mitotic spindle formation, nuclear envelope assembly, or responses to biotic stresses; (Avis and Clarke, 1996;Carazo-Salas et al., 2001;Hetzer et al., 2000;Zhang and Clarke, 2000). In bacteria, some NTF2-like proteins play a role in bacterial conjugation as part of the type IV secretion system (Goessweiner-Mohr et al., 2013), whereas non-catalytic NTF2-like domains act as immunity proteins (Zhang et al., 2012). In fungi, Saccharomyces cerevisiae NTF2 mutant is defective for nuclear import (Corbett and Silver, 1996). Although NTF2 folded proteins are widespread across kingdoms, very few is known about their role. A recent study presented that the silencing of NTF2 in wheat decreased the resistance against avirulent isolates of the wheat stripe rust fungus P. striiformis f. sp. Tritici (Zhang et al., 2018). Since MLP124017 has been shown to interact with TOPLESS and TOPLESS-related proteins (Petre et al., 2015), it is tempting to speculate that the cavity formed by the β-sheet could be involved in the association with these plant partners.
Although MLP124266 and MLP124017 show no primary sequence similarity to known proteins, they adopt a three dimensional fold similar to knottins or NTF2 family members. Thus, knowing the structure of both candidate effectors allowed us to classify them as members of large superfamilies of proteins. The concept of structural families whose members share no, or very limited, primary sequence homology emerges in effector biology (Białas et al., 2018). This concept promises to revolution the way we predict and categorize effector proteins (de Guillen et al., 2015). For instance, members of the MAX effector family share a common β-sandwich fold, but show no primary sequence similarity (de Guillen et al., 2015). Similarly, members of the WY superfamily of RXLR effectors in oomycetes share a common three-to four-helix bundle (Boutemy et al., 2011). Such features are now used to search and categorize fungal and oomycete effector proteins into structural superfamilies (de Guillen et al., 2015;Win et al., 2012). To our knowledge, MLP124266 and MLP124017 are the first effector proteins to adopt knottins and NTF2 folds. Whether other effector proteins adopt similar folds remains to be identified to determine if knottins and NTF2 folds define structural superfamilies in fungi.
To conclude, in this study we screened eleven rust candidate effectors using biochemical and structural approaches, and managed to solve the structure for two of them. This demonstrates the usefulness of such approaches for systematically screening candidate effectors and gaining information about their putative biological roles. Although it is too early to speculate about the function of MLP124266 and MLP124017, these structures form a basis for future comparative structural studies and bring complementary information regarding fungal knottins.

Cloning of selected effector encoding sequences
Open reading frames coding for the mature forms (i.e. devoid of the sequence encoding N-terminal secretion peptide) of MLP124266 and MLP124499 of M. larici-populina isolate 98AG31 were ordered as synthetic genes cloned in pBSK(+) vectors (Genecust). Coding sequence of the mature forms of the nine other candidate effectors (Mlp124111, Mlp124478, Mlp124530, Mlp124561, Mlp37347, Mlp109567, Mlp124017, Mlp107772, and Mlp124202) were amplified by polymerase chain reaction (PCR) using cDNAs from leaves of the poplar hybrid Beaupré infected by M. larici-populina (isolate 98AG31) and further cloned into pICSL01005 vector as described previously (Petre et al., 2015). The sequences encoding the mature form of each effector were subsequently cloned by PCR in either pET26b or pET28a vector between NdeI and NotI or NcoI and NotI restriction sites, respectively, using primers shown in Table S1.

Expression and purification of recombinant proteins in Escherichia coli
Expression of recombinants proteins was performed at 37°C using the E. coli SoluBL21 (DE3) pRARE2 (Amsbio Abington, UK), Rosetta2 (DE3) pLysS, Origami2 (DE3) pLysS or RosettaGami2 (DE3) pLysS strains (Novagen) containing the adequate pET expression vector coding for the selected candidate effector (Table S1) in LB medium supplemented with kanamycin (50 μg/ml) and chloramphenicol (34 μg/ml). When the cell culture reached an OD600nm of 0.7, protein expression was induced with 0.1 mM isopropyl-β-D-1-thiogalactopyranoside (IPTG) and cells were grown for a further 4 h. To improve the solubility of some recombinant candidate effectors, other protocols were used as follows. First, we added 0.5% (v/v) of ethanol in the medium when culture reached an OD600nm of 0.7. The cells were cooled to 4°C for 3 h, recombinant protein expression induced with 0.1 mM IPTG and cells further grown for 18 h at 20°C. We also tested a combination of an osmotic and a thermal shock (Oganesyan et al., 2007). When the culture reached an OD600nm of 0.5-0.6, 500 mM NaCl and 2 mM of betaine were added to the culture medium and the culture incubated at 47°C for 1 hour under stirring. Cells were then cooled to 20°C and the expression of recombinant proteins induced with 0.1 mM IPTG. After induction, cells were harvested by centrifugation, suspended in a 30 mM Tris-HCl pH 8.0 and 200 mM NaCl lysis buffer, and stored at -20°C. Cell lysis was completed by sonication (three times for 1 min with intervals of 1 min). The cell extract was then centrifuged at 35 000 g for 25 min. at 4 °C to remove cellular debris and aggregated proteins. C-terminal His-tagged recombinant proteins were then purified by gravity-flow chromatography on a nickel nitrilotriacetate (Ni-NTA) agarose resin (Qiagen) according to the manufacturer's recommendations followed by an exclusion chromatography on a Superdex75 column connected to an ÄKTA Purifier TM (GE Healthcare). The fractions containing recombinant MLP124017 were pooled, concentrated, and stored at -20°C as such, whereas for MLP124266 and MLP124499 fractions were pooled, dialyzed against 30 mM Tris-HCl, 1 mM EDTA (TE) pH 8.0 buffer, and stored at 4°C until further use. For the NMR spectroscopy analyses, MLP124266-(His6) and MLP124499-(His6) recombinant proteins were 15 N-labeled in M9 minimal synthetic medium containing 15 NH4Cl (1 g/L). MLP124017 was single 15 N or double 15 N and 13 C labelled in M9 minimal medium containing 1 g/l NH4Cl ( 15 N) and 2 g/l glucose ( 13 C) supplemented with 2,5% thiamine (m/v), 1mg/ml biotin, 50 mM FeCl3, 10 mM MnCl2, 10 mM ZnSO4, 2 mM CoCl2, 2 mM NiCl2, 2 mM NaSeO3, and 2 mM H3BO3. After purification as described above, labelled MLP124266, MLP124499, and MLP124017 were dialyzed against a 50 mM phosphate pH 6.0 or a 20 mM phosphate pH 6.8 buffer supplemented with 200 mM NaCl. The homogeneity of purified proteins was checked by SDS-PAGE and protein concentration determined by measuring the absorbance at 280 nm and using theoretical molar absorption coefficients of 500 M −1 .cm −1 , 3 105 M −1 .cm −1 , 29 450 M −1 .cm −1 deduced from the primary sequences of mature MLP124266, MLP124499, and MLP124017 proteins respectively. For MLP124266, protein concentration was also verified using a colorimetric assay (BC assay, Interchim).

Protein sample preparation for NMR spectroscopy
Uniformly labelled 15 N MLP124017 (1 mM in 20 mM phosphate pH 6.8, 200 mM NaCl) was supplemented with 0.02% sodium azide and 10% of 5 mM 4,4-dimethyl-4-silapentane-1-sulfonic acid (DSS) in D2O as a lock/reference. For the D2O experiments, the sample was lyophilized and dissolved in 200 µL D2O. For the 3D heteronuclear experiments, a 13 C/ 15 N labelled sample at 0.6 mM was prepared with 200 µL in the same phosphate buffer previously used supplemented with 20 µL of D2O at 5 mM DSS as a reference. MLP124266 and MLP124499 samples were dissolved in 50 mM phosphate pH 6.0 buffer with 10% D2O and 0.02% sodium azide. The concentration of unlabelled MLP124266 and MLP124499 was 0.85 and 0.65 mM, and for uniformly 15 N labelled MLP124266 and MLP124499 was 1.8 and 0.135 mM, respectively.

Structure calculation
NOE peaks identified in 3D 15 N-NOESY-HSQC and 2D NOESY experiments were automatically assigned during structure calculations performed by the program CYANA 2.1 (Güntert, 2004). The 15 N, HN, 13 C', 13 Cα, Hα, and 13 Cβ chemical shifts were converted into φ/Ψ dihedral angle constraints using TALOS+ (v. 1.2) (Shen et al., 2009). Hydrogen bonds constraints were determined according to 1 H/ 2 H exchange experiments of backbone amide protons (HN). Each hydrogen bond was forced using following constraints: 1.8-2.0 Å for HN,O distance and 2.7-3.0 Å for NH,O distance. Final structure calculations were performed with CYANA (v. 2.1) using all distance and angle restraints (Table S3). 600 structures were calculated with CYANA 2.1, of which the 20 conformers with the lowest target function were refined by CNS (v. 1.2) using the refinement in water of RECOORD (Nederveen et al., 2005) and validated using PROCHECK (Laskowski et al., 1993). For MLP124266 structure calculation, 783 NOE peaks identified in 3D 15 N-NOESY-HSQC or 2D NOESY spectra, 75 φ/Ψ dihedral angles generated by DANGLE (Cheung et al, 2010) and 10 hydrogen bonds were input as unambiguous restraints in ARIA2. Covalent disulfide bonds between Cys39-Cys55, Cys44-Cys58 and Cys50-Cys64 were also introduced. Among the 400 structures generated by ARIA2, 20 models of lowest energy were refined in water (Table 2). NMR assignment and structure coordinates have been deposited in the Biological Magnetic Resonance Data Bank (BMRB code 34298 and 34423) and in the RCSB Protein Data Bank (PDB code 6H0I and 6SGO), respectively.

ACKNOWLEDGMENTS
The UMR1136 is supported by a grant overseen by the French National Research Agency (ANR) as part of the "Investissements d'Avenir" program (ANR-11-LABX-0002-01, Lab of Excellence ARBRE). This work was supported by the French Infrastructure for Integrated Structural Biology (ANR-10-INBS-0005).

DATRA AVAILABILITY STATEMENT
The data that support the findings of this study are openly available in the Biological Magnetic Resonance Data Bank (http://www.bmrb.wisc.edu/) and in the RCSB Protein Data Bank (http://www.rcsb.org/).

COMPETING INTEREST
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.  indicates that expression of the gene has been detected in planta (Duplessis et al., 2011;Joly et al., 2010), red '+' indicates that the RT-qPCR expression profile of the gene has been established (Hacquard et al., 2012). The black arrows indicate the predicted cleavage site of the signal peptide.

AUTHOR'S CONTRIBUTION
Cysteine residues are in red, basic residues (lysine/K and arginine/R) are in blue, acid residues (aspartic acid/D and glutamic acid/E) are in green. Blue and green lines show basic and acid stretches of amino acids and the red line delimited the predicted cystine-knot motif. An amino acid conservation of 50% or more is marked by a colon. An amino acid conservation of 90% or more is marked by a star.   Cross peak assignments are indicated using the one-letter amino acid and number (the asterisk indicates a folded peak). The central part of the spectrum is expanded in the insert. Missing or non-assigned residues: 1-15, 40, 82, 86-95, 98, 102, 111, 128, 150, 151.