A conserved sequence in the small intracellular loop of tetraspanins forms an M-shaped inter-helix turn

Tetraspanins are a family of small proteins with four transmembrane segments (TMSs) playing multiple roles in human physiology. Nevertheless, we know little about the factors determining their structure. In the study at hand, we focus on the small intracellular loop (SIL) between TMS2 and TMS3. There we have identified a conserved five amino acid core region with three charged residues forming an M-shaped backbone, which we call M-motif. The M´s plane runs parallel to the membrane surface and the central amino acid constitutes the inter-helix turning point. At the second position of the M-motif, in tetraspanin crystal structures we identified a glutamate oriented towards a lysine in the juxtamembrane region of TMS1. Using Tspan17 as example, we find that by mutating either the glutamate or juxtamembrane-lysine, but not upon glutamate/lysine swapping, expression level, maturation and ER-exit are reduced. We conclude that the SIL is more than a short linking segment but propose it is involved in shaping the tertiary structure of tetraspanins.

A conserved sequence in the small intracellular loop of tetraspanins forms an M-shaped inter-helix turn Nikolas Reppert * & Thorsten Lang * Tetraspanins are a family of small proteins with four transmembrane segments (TMSs) playing multiple roles in human physiology. Nevertheless, we know little about the factors determining their structure. In the study at hand, we focus on the small intracellular loop (SIL) between TMS2 and TMS3. There we have identified a conserved five amino acid core region with three charged residues forming an M-shaped backbone, which we call M-motif. The M´s plane runs parallel to the membrane surface and the central amino acid constitutes the inter-helix turning point. At the second position of the M-motif, in tetraspanin crystal structures we identified a glutamate oriented towards a lysine in the juxtamembrane region of TMS1. Using Tspan17 as example, we find that by mutating either the glutamate or juxtamembrane-lysine, but not upon glutamate/lysine swapping, expression level, maturation and ER-exit are reduced. We conclude that the SIL is more than a short linking segment but propose it is involved in shaping the tertiary structure of tetraspanins.
Tetraspanins comprise a family of small membrane proteins expressed in all multicellular organisms. The human genome encodes 33 family members 1 . They are involved in many cellular processes of either physiological or pathological nature, including adhesion, cell-cell fusion, endocytosis, exosome formation, immune response, migration, neurite navigation, pericellular proteolysis, proliferation, signalling, spreading, trafficking, vascular morphogenesis and remodelling, thrombosis, tumor progression and metastasis, viral and other pathogen entry, and viral release 1 . The basis for this broad range of functions is their capability to form so-called tetraspanin-enriched microdomains (TEMs) 2 , sometimes also referred to as tetraspanin web 3 . The underlying mechanisms include weaker secondary interactions among themselves and stronger primary interactions 4 with a variety of non-tetraspanins, for instance integrins, members of the immunoglobulin superfamily, and signalling receptors 3,5,6 .
Until recently, studies have concentrated on members that locate to the plasma membrane, a characteristic that has led to their nick name 'master organizers of the plasma membrane' 7 . However, lately, their intracellular roles have become more and more obvious. Especially their function in extracellular vesicle formation and targeting 8 is shifting into focus.
Tetraspanins are membrane anchored via a bundle of four transmembrane segments (TMSs). They share a conserved topology (Fig. 1A) comprised of a small and a large extracellular loop (LEL) connecting extracellularly the first and last TMS pairs, respectively. The LEL, which is often glycosylated, contains up to five helical segments (A-E) and up to four disulphide bridges 9 . With few exceptions, the intracellular N-and C-termini, as well as the small intracellular loop (SIL) connecting TMS2 and TMS3, are short segments 1,10 .
For their tertiary structure, two different models are envisioned. On the one hand, using the crystal structure of the LEL of CD81 as starting point, an early model predicts a tight four transmembrane helix bundle (similar to Fig. 1B, left). The alpha helical structure of TMS3 and TMS4 protrudes and merges with the alpha-helical segments A and E of the LEL, respectively. As a result, the more bulky extracellular domain sits enthroned on top of the bundle 13 ; the entire structure resembles a mushroom. In cryo-EM, a closely related open conformation is found for CD81 (Fig. 1B, left) and CD9 in complex with their primary binding partner CD19 14 and EWI-F 15 , respectively. On the other hand, all known crystal structures of complete tetraspanins (CD81, CD9 and CD53) 12,16,17 reveal a funnel shaped arrangement of the TMSs opening towards the extracellular site (Fig. 1B,  right), with a cholesterol bound inside the cavity of CD81 12 . It is important to note that here the alpha-helical connections between TMS3/4 and the LEL helices are disrupted by a kink, causing the LEL to fold-back onto the membrane, thereby closing the cholesterol cavity. Removal of the cavity-bound cholesterol in molecular dynamics simulation, the connecting segments become more alpha-helical and the LEL unbends, similar to the early

Alignment of amino acid sequences and obtaining the consensus sequences. All alignments
were done with BioEdit 28 v7.0.5 (http:// www. mbio. ncsu. edu/ BioEd it/ bioed it. html). The tetraspanins' SIL sequences were aligned with the most N-terminally glutamate or aspartate and if there was none the sequences were aligned at the most N-terminally lysine or arginine. The sequences without a glutamate, aspartate or lysine and arginine were aligned due to overall similarity to the other sequences. The tetraspanins' N-terminal sequences with the first five amino acids of the TMS1 were aligned at their most C-terminal lysine or arginine.
The consensus sequences of SIL and N-terminus were obtained by counting the frequency of each amino acid at the given position. An amino acid was counted as consensus, if its frequency (f) was equal to or higher than the mean frequency (f mean ) plus its standard deviation (f ≥ f mean + SD).
For the Claudin SIL sequence alignment we used the ClustelW multi alignment tool of BioEdit.

Results
Definition of the SIL core sequence. In human Tetraspanins, the shortest and longest SIL sequences comprise six and 21 amino acids, respectively 11 . Comparing these sequences, we frequently find a positively charged amino acid directly followed by a negatively charged one, to which we assigned 'position 1' and 'position 2' of the core sequence ( Fig. 2A). We started the alignment of the 33 tetraspanins with position 2, assigning to it the most upstream glutamate (in 23 SILs) or aspartate (2 SILs). In eight SILs there is neither a glutamate nor an aspartate present, which is why we used position 1 for further alignment, finding altogether 24 matches (13 arginines and 11 lysines). Finally, we searched for frequently occurring amino acids in positions 3-5 ( Fig. 2A; for details see figure legend). As a result, we identified a [R/K] E [N/S] [R/K/Q] C core sequence (Fig. 2B). With the exception of glutamine at positon 4 that is a polar amino acid, the chemical signature of the sequence is positive charge-negative charge-polar-positive charge-polar.
To compare the human SIL with other species we determined the SIL consensus sequences, as it was done for human, from Uniprot database sequences in mouse (31 family members in Mus musculus 11 ), zebrafish (50 in Danio rerio 35 , but only 39 available in the Uniprot database), fruit fly (34 in Drosophila melanogaster 36 ) and arabidopsis (17 in Arabidopsis thaliana 37 (Fig. 2C-F).
Hence, with the exception of plant tetraspanins (Fig. 2F), the degree of conservation between the SILs of all animal species is high (see also Fig. 2G). The question arose whether the same SIL core region is present in similarly structured proteins. Members of the claudin family have four TMSs and two extracellular loops 38 as well. However, they exhibit a longer and differently structured SIL between TMS2 and TMS3, harbouring a short beta-strand (Fig. S2).
Following, the secondary structure of the SIL core sequences was predicted employing Jpred4. In animal SILs, alpha-helicity gradually decreases from the edges to the central position (Fig. S3)  Part of the consensus sequence are amino acids occurring with a frequency f equal or higher than the mean frequency plus one time the frequencies standard deviation (f ≥ f mean + SD). In cases of more than one, they are listed in order of abundancy from high to low. The alignment was created using BioEdit 28 v7.0.5 (http:// www. mbio. ncsu. edu/ BioEd it/ bioed it. html) and the illustration of residue frequency was done using Weblogo 33  www.nature.com/scientificreports/ with a YXXΦ internalization motif outside the core region, pointing towards possibly additional specialized roles of these two SILs.
In the crystal structures of CD9 and CD53, albeit different on the level of amino acids (QESQC versus KENKC; note that for crystallographic reasons in the CD53 structure cysteine is exchanged by a serine), the SIL core structures are essentially identical. The bonds defining the SIL backbone are coplanar, forming an "M", while the two bottom M-endings mark the transition to the helical structure (Fig. 3A). The non-helical central polar residue (position 3) not only marks the turning point of the protein backbone, but also interacts with the backbone and residues of TMS3 (Fig. 3B). The glutamate (position 2) and glutamine/lysine (position 4) residues are oriented towards TMS1 and TMS4, respectively, whereas the amino acid residues at positions 1 and 5 are not oriented towards any of the TMSs. With the exception of position 4, all residues are roughly lying in the "M" plane (Fig. 3A, bottom) that runs parallel to the membrane surface. In CD9, the glutamate at position 2 forms a salt-bridge with a lysine (K11) of the N-terminus (Fig. S4A), and in CD53, the lysine at position 4 interacts with an asparagine (N207) within the C-terminus (Fig. S4B).
This raised the question if the M-motif is a structural element for which the SIL consensus sequence is a prerequisite. For verification, we screened the PDB database for other α-helix rich proteins exhibiting M-motifs. We readily found more examples (Fig. 4A) from which the detailed structure of a selection is shown in Fig. S5, that however have a sequence very different from the SIL consensus sequence (Fig. 4B). The only overlap is the asparagine at position 3, which is involved in interactions between the side chain carbonyl group and the backbone of the C-terminally attached helix in tetraspanins and non-tetraspanins. On the other hand, when measuring the φ and ψ angles in the M-motif, there was clear position-dependent segregation into φ/ψ angle categories. Amino acids in positions 1, 4 and 5 adopt angles typical for alpha helices, in position 2 we find angles associated with a left-handed helix, and for position 3 the angles are typical of a beta-strand (Fig. 4C) 39 . These specific dihedral angles are the basis of the M-motifs shape (Fig. 4D).
Crosstalk between the SIL and the N terminus. As outlined above, the SIL could be simply linking TMS2 and TMS3. On the other hand, it might influence the tertiary structure. Of particular significance for stabilizing the protein could be a salt bridge between the SIL glutamate and the N-terminal lysine, such as seen in the CD9 crystal structure (Fig. S4A).  Fig. S6); in CD53 similar arrangement but distance is too long for a salt-bridge (for details see Fig. S4). Position 3: interactions between the carbonyl-group of an asparagine with the backbone of TMS3 in CD53 (Fig. S4). Position 4: interaction between the lysine side chain and the backbone of TMS4 in CD53 (Fig. S4B). In the stick representation of the side chains, red, blue and yellow indicate oxygen, nitrogen and sulphur, respectively. ⊕/⊖ indicate amino acid residue charges at physiological pH. The scheme was created using CorelDRAW 2019 (www. corel. com) and the crystal structure images were created using PyMOL 2.5 (https:// pymol. org/2/). www.nature.com/scientificreports/ We had some preliminary data pointing towards the functional importance of the SIL in Tspan17. As for Tspan17 no crystal structure is available, we used another type of analysis to predict whether crosstalk between the SIL glutamate and the N-terminal lysine is possible. The N terminus constitutes of 19 amino acids with two lysines at positions 4 and 18. An analysis of the TMS1 N-terminal sequence shows that the positive charge, mostly provided by a lysine, is highly conserved across all animal species (Fig. S6). Based on this conserved position in the alpha helix, relative to a previously described conserved asparagine 12,13 , the lysine of Tspan17 and most other tetraspanins is likely oriented towards the middle of the four helix bundle, and consequently towards the SIL (Fig. S7). This suggest that some crosstalk between the SIL and the N-terminus is possible.
To test experimentally the hypothesis that the SIL interacts with the N-terminus, we analysed the expression levels of Tspan17 after mutating single SIL core amino acids to alanine (Fig. S8), including for comparison CD9 and CD53 as well. With the exception of CD53, we observed strongest diminishment of expression after mutation of the SIL glutamate at position 2.
If the SIL glutamate interacts electrostatically with the N-terminal lysine, exchanging in Tspan17 the lysine to alanine should affect expression just like the glutamate mutation. Because reduced expression levels can have many explanations, in the following we included as well the analysis of glycosylation by western blot and ERexit by microscopy. In the latter assay, retention in the ER is revealed by an increase in overlap between Tspan17 and an ER-marker. As shown in Fig. 5, mutation of either the SIL glutamate or the N-terminal lysine reduces The 180° turn of the direction of the protein backbone involves three structural elements. The turn is initiated by the left-handed character adopted by position 2. Between position 2 and 4 the gap is spanned by the beta-strand character of position 3. Finally, at position 4 already alpha-helicity in the opposite direction is realized. Please note that arrows shown at position 2 and 4 look identical but indicate different angle combinations, which is due to simplification in structure presentation. The alignment was created using BioEdit 28 v7.0.5 (http:// www. mbio. ncsu. edu/ BioEd it/ bioed it. html) and the illustration of residue frequency was done using Weblogo 33 3.7.4 (https:// weblo go. berke ley. edu/ logo. cgi). The presentation of the dihedral angles was done using GraphPad Prism version 6.04 for Windows (www. graph pad. com) and PyMOL 2.5 (https:// pymol. org/2/).

Scientific Reports
| (2022) 12:4494 | https://doi.org/10.1038/s41598-022-07243-y www.nature.com/scientificreports/ expression and glycosylation and causes ER retention. Next, we exchanged the positions of glutamate and lysine, which may neutralize the effect of the single mutations as the putative electrostatic interaction may work as well with exchanged positions of the charges. As shown in Fig. 5, all effects are back to normal in the double mutant. The same mutations in other tetraspanins, e.g. CD9, yield the same expression pattern as for Tspan17 (Fig. S11A) but no effect on ER-exit (Fig. S12). Because in our assay we detect no change in the CD9 band pattern after tunicamycin treatment (Fig. S9), we did not employ maturation analysis of CD9 via probing its glycosylation status. Instead, we examined whether the interaction with the primary binding partner EWI-2 is affected. As shown in Fig. S13, single mutations precipitated more EWI-2 than the wild-type or the double The actin normalized GFP signal shows a drop in expression level upon glutamate or lysine mutation (wild type values are set to 100%). The mutation with interchanged charged residues restores the expression level. The drop in Tspan17 expression and maturation is detectable shortly after expression starts (Fig. S10). (C) The maturation was calculated as ratio between mature to total protein and was compared to the wild type (set to 100%). The glutamate-/lysine-mutation leads to a drop in protein maturation, whereas the double mutation has no effect. (D) The co-localization of the mEGFP-Tspan17 constructs with the ER were analyzed by confocal microscopy, comparing the distribution of the GFP signal to an ER marker fused to RFP. For quantification, the Pearson correlation coefficient between the two channels was calculated. Exemplary images of Tspan17 wt and the mutants are shown. (E) The Pearson correlation coefficient showed an increase of ER co-localization for the glutamate-/lysine-mutants and no effect of the double mutation. Values are given as means ± SD (n = 4 for Western blot analysis and n = 3 for microscopy; for each biological replicate 10 cells were imaged). The statistical analysis was done employing a repeated measures ANOVA comparing each mutation with the wild type. The full blots are shown in the supplementary data (Fig. S20). The data analysis and illustration was performed using Fiji-ImageJ 34 (https:// imagej. net/) and GraphPad Prism version 6.04 for Windows (www. graph pad. com), respectively. www.nature.com/scientificreports/ mutant. Finally, in CD53, single mutations had no effect on expression and maturation (Fig. S11B), although the double mutation drastically diminished expression. Altogether, the data across different tetraspanins is not consistent, in particular not between CD9 and CD53. Trying to understand better the different roles of the glutamates in CD9 and CD53, we took a closer look at the crystal structures. The short N-termini (12 residues) of CD9 and CD53 contain three and two positively charged amino acids, respectively. In CD9, as already mentioned above, the SIL core E84 forms a salt bridge (distance 2.8 Å) with the second last amino acid of the N-terminal peptide (K11) (for illustration see Fig. S4A). In CD53, although the SIL core E77 is oriented towards K7 and K10 of the N terminus, the distances are about one angstrom too long to establish a salt bridge ( Fig. S4B; K7-E77, 5.1 Å and K10-E77, 5.2 Å instead of 4 Å required to form a salt bridge 40 ). This suggests a weaker or no electrostatic interaction between the CD53 SIL and the N-terminus and could explain the lacking effect in the CD53 mutants.

Discussion
The SIL core sequence. The tetraspanin sequence analysis of the SIL between TMS2 and TMS3 reveals a conserved [R/K] E [N/S] [R/K/Q] C core sequence in human, similar to mouse and zebrafish. In fruit fly, the positive charge in position 4 is lacking, and in arabidopsis the sequence is very different. In proteins with the same topology, as in the family of claudins, the SIL is longer (Fig. S2), has a longer unstructured stretch with a predicted beta-strand, is more diverse, and exhibits no similarity to the tetraspanin core sequence. This may point to a specific function of the SIL in mammalian tetraspanins.
It is known that positively charged residues close to the cytosolic site of a TMS are beneficial for its membrane insertion, whereas negatively charged or polar residues decrease the TMS insertion 41 , known as the "positiveinside rule" and the "negative inside depletion/outside enrichment rule" 42 . Therefore, the presence of positively charged amino acids in the SIL is not surprising, as it aids the correct insertion of the nascent protein into the ER membrane 43 . The negatively charged glutamate neutralizes one positive charge. Apparently, this is not relevant for expression, as in CD9 and Tspan17 mutants with swapped glutamate/lysine express equally well as wild-type (Fig. 5, Fig. S11A).
Apart from that, tetraspanins are known to be palmitoylated at several intracellular cysteine-residues, among them cysteines in the SIL of CD9 and CD81 44 , explaining the abundancy of cysteines at the end of the SIL core region (Fig. 2G).
The SIL forms an M-motif. In animal tetraspanins, the three central residues of the SIL core region are predicted to be less helical, which is consistent with the crystal structures of CD53 and CD9. Please note that a crystal structure of CD81 is also available, but could not be used for detailed SIL analysis as the 2nd and 3rd amino acids of the SIL core are unresolved.
In all animal tetraspanins, the non-helical part of the SIL is on average 2.1 amino acids in length (see also Fig. S3), which is close to the shortest possible linker between two TMSs, that is two amino acids 45 . Roughly speaking, the five core residues form a U-turn with helical arms continued by the TMS helices.
The amino acids at positions 2 and 4 constitute the upper two tips of the M-motif and their residues point towards TMS1 and TMS4, respectively. The position 1, 3, and 5 define the lower three tips of the M-motif, all pointing away from the centre of the TMS-bundle. The M-motif shape is not exclusive to tetraspanins but found in many other soluble and membrane proteins (Fig. 4, Fig. S5), although the amino acid sequence is different from the SIL core sequence, with the exception of the central asparagine.
The left-handed character of position 2 and the beta-strand character of position 3 define the starting point and the bridge of the U-turn (Fig. 4D). Moreover, they are involved in stabilizing interactions as shown by frequent examples for the residue at position 3 that interacts with the backbone of the C-terminal helix (e.g. CD9, PDB: 6K4J and POT family transporter, PDB: 6HZP) or forms a salt-bridge with position 1 (e.g. adenosine A2A receptor, PDB: 7ARO or Smoothened, PDB: 5L7I). Additionally, the residue of position 2 can form salt-bridges with adjacent structures (e.g. CD9, PDB: 6K4J or voltage-gated calcium channel Cav1.1, PDB: 5GJV). In conclusion, the M-motif is less defined by a specific amino acid sequence (Fig. 4B) but rather by its secondary structure and interactions.
There are two known groups of short loops/turns connecting secondary structure elements, which are both not defined by a characteristic secondary structure. The first is classified by its length and that the loops' residues are not incorporated into the hydrogen bonding of the neighbouring secondary structure elements 46 . The other group is defined by the side chain (typically Asp, Asn, Ser or Thr) that interacts with the backbone but only moderately changes the backbone orientation and does not result in a pair of antiparallel helices 47 . The residues of the M-motif are all forming backbone hydrogen bonds with the neighbouring alpha-helices, which excludes the M-motif from the first group of turns. Frequently, there is an asparagine/serine in the M-motif that interacts with the C-terminal helix backbone, but in the M-motif, a complete turn is formed. Therefore, the M-motif does not strictly fit into any of the two known groups and defines its own category of inter-helix turns.
Role of the SIL glutamate in Tspan17. For Tspan17, we find that mutating the SIL glutamate or the TMS1 N-terminal lysine reduces glycosylation and ER-exit (Fig. 5). In addition, the expression level is reduced, which could be a secondary effect of disturbed trafficking through the ER. Altogether, the three assays yield a consistent picture.
Importantly, glycosylation, ER-exit and expression are back to normal levels when the SIL glutamate and N-terminal lysine are swapped (Fig. 5). This points towards a functionally important glutamate-lysine interaction between the SIL and the TMS1 N-terminal lysine. Because the positions of the two oppositely charged amino acids can be interchanged, we speculate that the interaction is most likely of electrostatic nature. www.nature.com/scientificreports/ In other tetraspanins, the SIL glutamate seems to be of relevance as well, although the overall picture is unclear. For instance, in CD9, glutamate/lysine mutations have no effect on ER exit (Fig. S12), but increase CD9 association with EWI-2 (Fig. S13). This is very interesting as it implies two things. First, without salt-bridge, CD9 still adopts a functional conformation, or in other words, lack of the salt-bridge does not lead to complete misfolding. Second, its higher affinity to EWI-2 may be explained by a switch towards an open conformation, as shown for CD81 that interacts in the open conformation stronger with its primary binding partner CD19 12 . In cryo-EM, CD9 interacts with EWI-2 not in a complete but partial open conformation 16 . The four helices still arrange in a funnel shape but the LEL is folded more upright. In complex with EWI-F, that is a EWI-2 homolog, cryo-EM reveals a CD9 conformation resembling the open conformation shown in Fig. 1B (see also reference 15 ). Hence, elimination of the salt-bridge could trigger partial CD9 opening and enhance binding to EWI-2. Moreover, from the 33 human tetraspanins, we have performed mutational analysis of eight family members. In three cases each, mutation of the SIL glutamate either significantly decreases or increases expression (Figs. S8, S14).
Altogether, the picture is neither consistent nor complete and we are just at the beginning of understanding the mechanism by which the SIL modulates tetraspanin structure. In fact, we find it is not surprising that equivalent mutations produce different effects in different tetraspanins, as they have different binding partners and functions.

Conclusion
In this study, we show that the SIL of tetraspanins contains a conserved five amino acid core sequence forming a structural motif that resembles the letter M. Using Tspan17 as example, we find that mutation of the SIL glutamate or the N-terminal lysine adjacent to TMS1 reduces glycosylation, ER-exit, and expression. All effects are back to normal levels upon position swapping of the two oppositely charged amino acids. We speculate that glutamate and lysine interact electrostatically, which might impact the tertiary structure and as a result modulate the interaction network of Tspan17.

Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.