Classes of non-conventional tetraspanins defined by alternative splicing

Tetraspanins emerge as a family of membrane proteins mediating an exceptional broad diversity of functions. The naming refers to their four transmembrane segments, which define the tetraspanins‘ typical membrane topology. In this study, we analyzed alternative splicing of tetraspanins. Besides isoforms with four transmembrane segments, most mRNA sequences are coding for isoforms with one, two or three transmembrane segments, representing structurally mono-, di- and trispanins. Moreover, alternative splicing may alter transmembrane topology, delete parts of the large extracellular loop, or generate alternative N- or C-termini. As a result, we define structure-based classes of non-conventional tetraspanins. The increase in gene products by alternative splicing is associated with an unexpected high structural variability of tetraspanins. We speculate that non-conventional tetraspanins have roles in regulating ER exit and modulating tetraspanin-enriched microdomain function.


Results and Discussion
We screened the National Center for Biotechnology Information (NCBI) data bank for human tetraspanin gene products. Taking into account only validated and reviewed sequences, we identified 86 mRNAs originating from the 33 human tetraspanin genes. In addition, we found via PCR the sequence of two novel mRNAs, one from a human whole brain and one from a natural killer cells cDNA library (Fig. S1).
Finally, we included a splice variant of CD82 described in the literature 34 . In total, the 89 gene products include the known 33 conventional tetraspanin proteins and 31 different, non-conventional isoforms. The non-conventional isoforms originate from 18 conventional tetraspanins. For Tspan17, we found the highest number of five isoforms (Table 1).

Figure 1.
Conventional tetraspanin topology. Depicted is the typical topology of a tetraspanin. Intracellular domains include the N-terminus, the small intracellular loop (SIL), and the C-terminus, which are all short (for exceptions see Table S1). At the extracellular site, a small extracellular loop (SEL) connects transmembrane segment 1 (TMS1) and TMS2 and a large extracellular loop (LEL) TMS3 and TMS4. For the complete tetraspanin and its different segments, the three numbers (xx-yy-zz) indicate the sequence lengths of the shortest sequence (xx), the average sequence (yy) and the longest sequence (zz) (for details see Table S1).
Compared to the structure of a conventional tetraspanin (Fig. 1), non-conventional tetraspanins display broad structural variability. As examples, we explain the isoforms of Tspan6 (for illustration of isoforms for Tspan2, Tspan3, Tspan16, Tspan17, CD53, CD82, CD63, and Tspan31 see Figs S2-S9, respectively). Figure 2A shows the genomic sequence together with five mRNAs, from which four are derived by AS. In Fig. 2B, we depict the proteins deriving from the mRNA splice variants. Illustrated are remaining and deleted protein segments with reference to the conventional tetraspanin topology (Fig. 1), not yet predicting how the deletion may affect protein topology and/or the numbers of TMSs. Apart from the deletion of protein segments, in all Tspan6 splice variants AS produces additional changes in the 5′-UTR (untranslated region). These changes eventually cause diminished expression (see below).
The first mRNA codes for the conventional Tspan6 (isoform 1). In all other isoforms, the first two TMSs are missing (Fig. 2B). The second and third mRNA differs in their 5′-end but have the same alternative start codon. Therefore, both yield isoform 2 with large part of the N-terminus deleted, including TMS1, the SEL and TMS2. Also in case of isoform 3 this alternative start codon is used, resulting in TMS1/SEL/TMS2-deletion. Moreover, splicing eliminates exon 7 by which an alternative stop codon located in exon 8 is used. This causes deletion of the C-terminal half of TMS4 and an alternative C-terminus. Finally, isoform 4 again uses the alternative start codon, resulting in the N-terminal truncation. Moreover, exon 6 is eliminated, and thus the C-terminal end of the LEL ε-helix and the N-terminal half of TMS4 are not encoded.
We wondered whether such deletions also occur in other species and analyzed tetraspanin isoforms in mouse. Here, the database has lesser entries, as only 31 tetraspanins are described, four of them with provisional status only, and in general there are not that many mRNA variants available. Still, we identify eight non-conventional isoforms, including isoforms with only three predicted TMSs, LEL deletions, and changes in the N-terminus (Table S2). Between the two species, there is no direct correlation on the level of specific tetraspanins, but there is overlap in the type of structural change caused by AS. That not all structural variations occurring in human are also found in mouse maybe explained by the smaller data base and/or that the two species share only about a quarter of alternatively used exons 35,36 . Structural variability defines classes of non-conventional tetraspanins. We next analyzed the topologies of the human non-conventional isoforms. Based on computational analyses of the proteins' transmembrane helices (TMHMM Server, 2.0) we predict protein isoforms with overall one, two, three or four TMSs (Fig. 3). Thus, the isoforms categorize into four major classes, which are tetraspanins that structurally are mono-, di-, tri-and tetraspanins (Fig. 4). The monospan-tetraspanins maintain either TMS 3 or 4, and the dispan-tetraspanins TMSs 1 & 2, 3 & 4, or 4 and form a novel TMS. In the trispan-tetraspanins, any one of the TMSs is deleted, with the exception of TMS2. In one case in which TMS2 is remaining, TMS2 forms an extended TMS together with a half-deleted TMS1 (CD63 Iso2). Please note that for simplicity in the following we refer to e.g. trispan-tetraspanins just as trispanins.
In about half of the cases, AS results in a partially or completely inverted topology (indicated by an asterisk in Fig. 4). Surprisingly, an inverted topology is also predicted for the conventional Tspan15. However, experimental evidence indirectly indicates that murine Tspan15, which is also predicted to have an inverted topology, inserts with the correct topology 37 . Therefore, topology predictions should be treated with caution.
Most classes include representatives with a modified C-terminus. Moreover, in several cases AS affects the LEL, causing almost its complete elimination (CD53 Iso2), or shortening (CD82 Iso4 and Tspan17 Iso2, 3, 4 and 5). Based on the structure of the CD81 LEL 9 and the prediction of secondary structural elements (Jpred 4.0), the short deletions would largely affect the variable domain of the LEL, which is interesting, as this part is supposed to encode the information for specific interactions. Finally, for Tspan10, the only tetraspanin with a large N-terminus, we find an isoform with a truncation in the large N-terminal domain (Tspan10 Iso2).
In summary, for most tetraspanins AS generates several mRNAs, yielding up to five isoforms per gene (see Tspan17). The number of non-conventional tetraspanin isoforms roughly equals the number of conventional tetraspanins. However, it is very likely that this is greatly underestimated as we included only validated/reviewed sequences. Moreover, the discovery of many yet undocumented sequences is expected.
Expression of non-conventional tetraspanins. The question arises as to how likely the protein isoforms express at levels that would affect cellular function. Several factors would play a role, such as mRNA copy number (about which the data bank makes no statement), the stability of the mRNA, and the stability of the expressed protein.
In the following, we evaluate the stability of the 53 listed alternatively spliced mRNAs by analyzing features making mRNA prone to degradation or influencing its expression level ( Table 1). All mRNAs lack retained introns and premature termination codons (PTCs), which would promote nonsense-mediated decay (NMD) 38 . This argues against enhanced degradation by such elements. For clarity, we have not included this information in Table 1.
We find 31 spliced mRNA variants with a sole alteration in the 5′ UTR (21 mRNAs coding for 8 conventional tetraspanins) or an alteration in the 5′ UTR and the ORF (10 mRNAs coding for 9 non-conventional tetraspanins). The 5′ UTR contains regulatory elements of translation. Effects on expression level upon alteration of this region are unpredictable [39][40][41] . To be on the safe side, we make a conservative estimate and assume that expression rather would be diminished. Therefore, the proteins for which these 31 mRNAs code for do not rate as being "very likely expressed", but "likely expressed" (Fig. 5). The expression of two mRNAs coding for conventional and five mRNAs coding for non-conventional tetraspanins is not likely (Fig. 5). In these cases, we find alterations in the 3′ UTR that may cause NMD, retention in the nucleus and miRNA binding sites and/or an uORF (upstream open reading frame), which reduces expression levels 30-80% and/or makes the mRNA more likely subject to NMD 39,42 . However, this may be an overcautious rating as four not alternatively spliced mRNAs (2019) 9 www.nature.com/scientificreports www.nature.com/scientificreports/ encoding conventional tetraspanins also contain uORFs, arguing against a complete uORF induced decay of tetraspanin mRNAs.
Finally, 15 mRNAs from 10 tetraspanin genes, all coding for non-conventional isoforms, are very likely expressed because they lack 5′ untranslated region (5′ UTR) alterations, uORF, PTCs or 3′ UTR alterations (Table 1) (Fig. 5). From these 15 mRNAs, nine code for proteins that have the extracellular loops on the extracellular site and therefore a membrane topology similar or identical to the respective conventional tetraspanin (Fig. 5), meaning their domains could in principle interact with binding partners. Moreover, there is another isoform with a shortened LEL (CD82 Iso4), from which the mRNA is unknown, wherefore we cannot evaluate its expression probability. Still, from published data we can safely conclude that this isoform expresses at levels that affect cellular function 43 .  Table S1. Second and third columns, mRNA variants are sorted by systematic name, next sorted by the NCBI variant number for the mRNA. For Tpan16 and Tspan21, the first mRNAs variants are trispanins, and the second ones are tetraspanins. In these cases, we moved up the second mRNA variants referring to them as conventional tetraspanins (isoforms 1). Forth column, NCBI reference sequence number for protein (NP). Column 5 lists the isoform number. www.nature.com/scientificreports www.nature.com/scientificreports/ In addition, for those isoforms amplified by PCR, we compared mRNA expression levels of the conventional to the non-conventional isoform(s) by quantitative real-time PCR. In the cDNA library from human brain, the Tspan15 Iso2 mRNA level is about 10% of the conventional form, and this wild-type Tspan15 is expressed at a several fold higher level than RPS9, encoding the 40S ribosomal protein S9, used as a reference (Table S3). Hence, although lower expressed than the conventional, the Tspan15 Iso2 expression level is substantial and in a dominant negative mechanism could be sufficient to alter cellular functions. Moreover, although expressed 10-fold less than Tspan15, Tspan15 Iso2 may dominate in the ER by accumulating there (compare Fig. S13).
CD53 expression in natural killer cells is also dominated by the conventional transcript, which is again found at a several fold higher level than RPS9. In comparison, CD53 Iso2 and Iso3 expression were found to be 5% and 3% of the conventional form. For CD53 Iso2 this is expected, as it has a lower mRNA expression probability when compared to Tspan15 Iso2 (Fig. 5). For CD53 Iso3 the expression probability cannot be evaluated, as only the coding sequence is known. In any case, for the CD53 isoforms it is difficult to predict whether they may influence cell physiology at such low expression levels. However, future analysis of other cDNA libraries may reveal cellular systems with higher expression levels.
Retention in the eR of co-transported factors. What might be the physiological effects of expressing non-conventional tetraspanins? In most cases described here, alternative splicing results in expression of variants with missing TMSs (compare Fig. 4). The role of the individual TMSs for proper folding has been studied to some extent, and especially tight packing of the TMSs1 and 2 appear to be crucial for proper tetraspanin folding 44 . Moreover, all four TMSs of Tspan20 are required for proper protein folding and forward-trafficking from the ER to the plasma membrane 45 . Thus, formation of a proper four-helix bundle structure appears to be crucial for ER exit.
In conclusion, it appears to be very unlikely that tetraspanins with missing TMSs will be able to leave the ER. In fact, when studying the distribution of Tspan15 Iso2, which is a dispanin, we find retention in the ER (Fig. 6 and Fig. S13).
Yet, tetraspanin variants retained in the ER could affect cell physiology in two ways: First, complementation of a truncated tetraspanin via interaction with the "missing" helix of its full-length counterpart is possible, eventually resulting in improper folding of the full-length tetraspanin. Via a domino effect, this could result in cross-linked tetraspanins not leaving the ER. Actually, formation of unspecific tetraspanin aggregates has been suggested to be www.nature.com/scientificreports www.nature.com/scientificreports/ a mechanism causing ER retention 45 . Likely, these aggregates would be degraded and therefore such a mechanism would decrease the tetraspanin level at the cell-surface. Second, isoforms retained in the ER could still bind to their interaction partners, holding these in the ER and causing their degradation (Fig. 6). Evidence that such a mechanisms could exist comes from a study in which a mutation in the CD81 gene produces an isoform that is lacking TMS4, which is accompanied by a lack of expression of the CD81 interaction partner CD19 46 .
Non-functional TEMs. While deletions of TMSs cause ER retention, modifications of the N-or C-terminus, or the LEL may still allow proteins to traffic to the cell membrane. Previously, it has been shown that deletions of segments in the CD81 LEL (deleting the α/β-, γ/δ-, γor δ-helical segment(s)) do not result in inefficient plasma membrane targeting 47 . Moreover, deletion of the entire LEL in CD53 Iso2 still allows for trafficking to the cell membrane (Fig. 6). In Jurkat T cells, that express endogenous CD81 at high levels, the additional expression  11 . For Tspan6, no alpha helical structure of the variable domain is predicted wherefore no γand δ-helix are depicted. The AS of Tspan6 Iso3 leads to an alternative C-terminus. For CD82 in the variable domain only the γ-helix is predicted to be α-helical. (B) Topology of the tetraspanin isoforms illustrated in (A) with reference to the prediction which parts are intra-and extracellular (TMHMM Server, 2.0). Isoforms with an inverted topology are indicated by an asterisk. Yellow and orange spheres indicate cysteine-and glycine-residues, respectively. Cysteine-residues form disulfide bridges in the LEL; the glycine-residue is part of a conserved CCG-motif.  (Fig. 3), all tetraspanins and their isoforms were classified as mono-, di-, tri-or tetraspanins. Subclasses result from the type of remaining TMSs, or whether a novel TMS is formed. Alteration of the N-or C-terminus, or the LEL define further subclasses. Isoforms with a partially or completely inverted topology are marked by an asterisk. In addition, mRNAs were analyzed for an upstream open reading frame (uORF), which induces NMD. They were also tested for alterations in the 3′UTR that could be associated with NMD, retention in the nucleus via nuclear RNA quality control, and miRNA-based gene silencing. Finally, they were analyzed for alteration in the 5′UTR that can alter the expression level of the mRNA. Based on these criteria, the mRNAs were sorted into three groups ranking their expression probability from very likely expressed (green -none of the criteria match), likely expressed (yellow -only alterations in the 5′UTR), or degraded (red -uORF and/or alteration in the 3′UTR).  generates pre-mRNA with introns (blue) and exons (red). AS generates two additional different mRNAs. After translation and insertion into the ER membrane, apart from the classical pathway (middle), isoforms may behave differently in two ways. Middle, the conventional tetraspanin (green) interacts with a binding partner (orange) and both are co-transported to the plasma membrane, where the tetraspanin forms a TEM. Left, most isoforms lack TMS. The isoform shown (green) is an example from the largest group of dispanins. They cannot exit the ER, but may still interact with other proteins. Thus, if it is degraded together with the binding partner, the surface expression level of the binding partner is altered. Right, the LEL deleted isoform (green) does not interact with its binding partner (orange) but exits the ER and forms TEMs in the plasma membrane. These TEMs would lack one or more co-factors and would therefore be non-functional or differently acting TEMs. Bottom, the lower panels show confocal micrographs of GFP-labeled Tspan15 Iso2 (the conventional Tspan15 (2019) 9:14075 | https://doi.org/10.1038/s41598-019-50267-0 www.nature.com/scientificreports www.nature.com/scientificreports/ of the CD81 mutant lacking the δ-helical segment inhibits viral uptake 47 , which indicates that the mutant has a dominant negative effect. This suggests that LEL deletion mutants might still be able to integrate into TEMs into which otherwise the conventional tetraspanin locates. However, as the deletion mutant does not properly interact anymore with its interaction partners, the TEM becomes non-functional (Fig. 6). It is also possible that the TEM loses only part of its functionality, resulting e.g. in aberrant cellular signaling.

conclusion
Little is known about the effect of AS on membrane proteins. Using the tetraspanin family as example, we studied whether AS enriches the gene products, revealing a large structural variability of tetraspanin isoforms. We speculate that non-conventional tetraspanins may regulate ER exit of tetraspanins and their interaction partners, form non-functional TEMs, or TEMs with different roles.

Materials and Methods
Sequence acquisition and cloning. The human tetraspanin sequences are acquired from the National Center for Biotechnology Information (NCBI) database for genes (as of 5 th June 2019), listed under 'NCBI Reference Sequence (RefSeq) -mRNA and Protein(s)' . For human tetraspanins, we considered only sequences with the status report 'reviewed' or 'validated' .
Analysis of transmembrane segments. The protein sequences were analyzed employing the program TMHMM Server v. 2.0 (http://www.cbs.dtu.dk/services/TMHMM/ 48 ). The program predicts with a certain likelihood the length and the position of transmembrane segments, and the intra-and extracellular localization of the segments connected to the TMSs. These TMSs lengths and the lengths of the interconnecting segments are shown in Table S1. In addition, for Tspan10, 19 and 22, we considered TMS as positively predicted if the segment had a total length of 15-35 residues. From these residues, the central 13-33 residues had a transmembrane probability ≥54%, and were flanked by one intracellular and one extracellular residue with a lower transmembrane probability 48 . In some cases, TMSs were shifted or shortened due to AS by a few amino acids. Here, we classified the TMS as a novel one if the shift was greater than five amino acids or more than 1/3 of the original TMS was replaced. Finally, the analysis indicates which domains or segments are changed by AS.
Structure prediction. The helical structural elements of the isoforms shown in Fig. 3 were predicted combining the results from the TMS prediction by the program TMHMM Server v. 2.0 and a secondary structure is shown in Fig. S11), CD53 or CD53 Iso2 expressed in HepG2 cells (for non-GFP-expressing control cells see Fig. S10; Western blot analysis documents the correct size of the expressed constructs; see Fig. S12). Tspan15 reaches the plasma membrane (Fig. S11), whereas Tspan15 Iso2 remains in the ER (for co-staining analysis with an ER marker see Fig. S13). Bottom, upper panels, ER retention is confirmed by analysis of cell-free plasma membrane sheets that were visualized by the membrane dye TMA-DPH. In the respective GFP-channel, only a few Tspan15 Iso2 spots are detected, that arise from ER-PM contact sites 50 . In contrast, CD53 and CD53 Iso2 readily reach the plasma membrane, albeit CD53 Iso2 less efficient. CD53 Iso2 has lost its glycosylation sites and therefore appears in Western blot analysis as a single band (Fig. S12).
Expression and imaging of GFP-labelled tetraspanins. HepG2 cells were transfected essentially as described 47 with a vector for expression of GFP (pEGFP-N1, clonetech, #6085-1) or the above described vectors for expression of GFP fused to the C-terminus of Tspan15, Tspan15 Iso2, CD53 or CD53 Iso2. Cell-free membrane sheets were produced by short ultra-sound pulses 47 . If not stated otherwise, epi-fluorescence microscopy was employed for imaging membrane sheets and whole cells that in this case additionally were visualized with the membrane dye TMA-DPH (Invitrogen, #T204). TMA-DPH and GFP-fluorescence were imaged by epi-fluorescence microscopy essentially as described 47 . For confocal microscopy, cells additionally expressed KDEL-RFP and were stained as described below. They were imaged in the confocal mode of a 4-channel easy3D superresolution STED optics module (Abberior Instruments) coupled to an Olympus IX83 confocal microscope (Olympus, Tokyo, Japan), equipped with an UPlanSApo 100x (1.4 NA) objective (Olympus, Tokyo, Japan). For imaging details see below. Additionally, GFP was excited with a 485 nm laser and recorded with a 525/50 nm filter.

Colocalization of Tetraspanin 15 with the endoplasmic reticulum. HepG2 cells were transfected
with GFP-labelled Tspan15 or Tspan15 Iso2 as described above. Six hours after transfection, cells were transduced with a KDEL-RFP fusion construct (BacMam 2.0, Life Technologies, # C10591) specifically targeting the ER according to the manufacturer's instructions with 20 particles per cell for an additional 16 h. Cells were fixed with 4% paraformaldehyde (PFA) in PBS for 30 minutes. Fixation solution was removed and residual PFA was quenched with 50 mM NH 4 Cl in PBS for 30 minutes. Cells were then permeabilized with 0.2% Triton X-100 in PBS for 2 minutes and blocked with 3% BSA for 1 hour at room temperature (RT). To enhance the GFP-and RFP-signal, samples were incubated with GFP-Booster Atto647N (Chromotek, # gba647n) and RFP-Booster Atto594 (Chromotek, # rba594) diluted 1:200 in 1% BSA for 1 hour at RT. At last, samples were washed with PBS and mounted onto microscopy slides with ProLong ® Gold antifade mounting medium (Invitrogen, #P36930).
Coverslips were cured for 24 hours and sealed with nail polish. Cells were imaged in the confocal mode of the superresolution STED microscope described above. Atto594/RFP was excited with a 561 laser and detected with a 580-630 nm filter (red channel). Atto647N was excited with a 640 nm laser and recorded with a 650-720 nm filter (long red channel). For all images, pixel size was set to 50 nm and pinhole size was set to 60 µm.
Colocalization analysis was performed with the program ImageJ. Regions of interest (ROIs) were placed into the red channel to an area that showed the typical ER network structure, and then propagated to the long red channel (illustrated in the figure employing a green lookup table). The Pearson correlation coefficient (PCC) between the two areas marked by the ROIs was calculated with a custom made ImageJ macro.