Immunoglobulin light chain (IGL) genes in torafugu: Genomic organization and identification of a third teleost IGL isotype

Here, we report a genome-wide survey of immunoglobulin light chain (IGL) genes of torafugu (Takifugu rubripes) revealing multi-clusters spanning three separate chromosomes (v5 assembly) and 45 scaffolds (v4 assembly). Conventional sequence similarity searches and motif scanning approaches based on recombination signal sequence (RSS) motifs were used. We found that three IGL isotypes (L1, L2, and L3) exist in torafugu and that several loci for each isotype are present. The transcriptional orientations of the variable IGL (VL) segments were found to be either the same (in the L2 isotype) or opposite (in the L1 and L3 isotypes) to the IGL joining (JL) and constant (CL) segments, suggesting they can undergo rearrangement by deletion or inversion when expressed. Alignments of expressed sequence tags (ESTs) to corresponding germline gene segments revealed expression of the three IGL isotypes in torafugu. Taken together, our findings provide a genomic framework for torafugu IGL genes and show that the IG diversity of this species could be attributed to at least three distinct chromosomal regions.

for the classification of IGL. Another IGL classification criterion using a set of 21 conserved molecular sequence markers to distinguish κ , λ , and σ IGL isotypes was later proposed by Das et al. 14 .
Teleost IGL genes, as those of cartilaginous fish, have been shown to be in a multi-clustered configuration [15][16][17][18][19][20][21] , defined as independently rearranging mini-loci consisting of few gene segments (multiple V segments, one J) and one C domain exon 22 . The IGL loci in teleosts form tightly linked clusters and there are significant differences in the number of loci for each isotype among species. The presence of multiple clusters on one or more chromosomes, similar to those found in zebrafish (Danio rerio), three-spined stickleback (Gasterosteus aculeatus), and medaka (Oryzias latipes), suggests a major role for cluster duplication in the generation of IG diversity in teleosts 18 .
Torafugu (Takifugu rubripes) has a recognizable adaptive immune system and one of the smallest genomes (~400 Mb) among vertebrates 23 , which makes it a good model for research in comparative immunology. Two partial annotations of torafugu IGLs have been reported 18,24 , revealing IGL assemblages with respect to gene segment number, cluster orientations, and organization on three scaffolds and two clones that contain L1 and L2 loci, respectively. In this work, we have scanned the torafugu genome assemblies to provide an extended annotation of torafugu IGLs as well as their genomic organization. Our research showed the identification of a third teleost IGL isotype (L3) in torafugu and an expansion of the IGL genes that were identified in previous studies.

Results
Identification of IGL genes on torafugu genomic chromosomes and scaffolds. A total of 82 IGL gene segments in torafugu were found to be localized on three different chromosomes, i.e., 2, 3, and 5, and were confined to 45 different genomic scaffolds (see annotation details in Supplementary Dataset File). Of the scaffolds, four (scaffold 10, 158, 54, and 139) were assigned to separate chromosomes, whereas most of the IGL genes could not be anchored to chromosomes. Altogether, 48 V L ( Table 1 and Supplementary Dataset File), 13 J L ( Table 2 and Supplementary Dataset File) and 21 C L ( Fig. 1 and Supplementary Dataset File) gene segments 25 (except for those that might be present in gaps and cannot be identified at present) were identified.

Identification of a third teleost IGL isotype in torafugu.
Homology in the C domain is the most reliable criterion for classifying a teleost IGL isotype 18 . As mentioned, two IGL isotypes have been reported in torafugu: L1 and L2. Here, we used the published IGL sequences from various teleosts to search the torafugu database (http://www.fugu-sg.org/). As a result, three scaffolds (scaffold 2422, 2488, and 3698) were found to carry C L sequences that had homology (47-53% amino acid identities) with the L3 C domains of zebrafish, carp (Cyprinus carpio), and channel catfish (Ictalurus punctatus). This degree of homology in the C domain exceeds the limit used to distinguish mammalian κ and λ C domains (35-37%), thus further strengthens the identification of a torafugu L3. BLAST 26 searches with the V L segments on the three scaffolds revealed similarities with L1/L3 V from other teleosts. After amino acid identity, RSS orientation is the second most common characteristic used for distinguishing IGL isotypes 13 . The torafugu L3 RSSs have the V12-23J motif, similar to that in mammalian κ 27,28 .

Type 3 IGL organization.
Of the three scaffolds (2422, 2488, and 3698) that carry one L3 C sequence each, scaffold 2422 contains one each of a functional L1 V (V1c), L1 V without leader sequence (V1d), and J L (J3a); scaffold 3698 contains one J L (J3b); and scaffold 2488 contains three V L sequences that belong to L1 V (V1e) and L3 V (V3b and V3c) within the same cluster (Fig. 2). This heterogeneity suggests an organization of multiple clusters. If a region harboring one C L is considered as one cluster, at least three clusters should exist at the L3 loci. The L3 C sequences share 48-75% identity with each other at the amino acid level, which suggests their divergence from each other, while they are nonetheless distinguishable from the L1/L2 C sequences (10-31% identity in all inter-type pair-wise comparisons). The functional V L segments fall into two groups and correspond to L1 V (V1c, V1 d , and V1e) and L3 V (V3b and V3c), respectively. Within a group, they are 88-92% identical at amino acid level over the V L coding sequences; between the two groups, they share 34-42% identities. All five V L segments are arranged in the opposite transcriptional orientation to their C L and J L on each individual scaffold, similar to that described for other teleost L3 genes 10 .
The V1d sequence was defined as a pseudogene due to the absence of a leader sequence in the current assembly. However, it may rearrange functionally to J L with its identifiable V L exon and the downstream RSS sequence. Therefore, the V L on both sides of the J L /C L will likely undergo rearrangement with C3a and J3a through inversion as in other teleosts. For example, V1d will possibly invert to join J3a, while V1c will recombine through inversion of J3a and C3a (Fig. 3).

Type 2 IGL organization.
A search with L2 C sequences from various teleosts showed good matches with 10 scaffolds (scaffold 4520, 4988, 5604, 7989, 8603, 2126, 2352, 2681, 3001, and 3330) in the v4 assembly. Other scaffolds were found to contain either L2 V or J sequences (Fig. 4). The torafugu L2 loci contain 22 V L , 8 J L , and 11 C L gene segments. All 22 V-matching sequences (some were found only as fragments owing to gaps in the sequences) were summarized in Table 1. The genomic organization of L2 genes was depicted in Fig. 4. C2a, C2c, and C2i are identical with the published L2 torafugu C sequence 18 . Other L2 C sequences (those with complete coding sequences) are 92-99% identical with C2a in the derived amino acid sequences and only share 15-35% identity with L1/L3 C sequences, suggesting that they duplicated among themselves and diverged long ago from other types. The L2 V gene segments are either in the same or in the opposite transcriptional orientation as their deletion and frameshift at position 1936; 2 nt deletion and frameshift from 1955; j 1 nt insertion and frameshift at position 439 R; 4 nt deletion and frameshift from 456 R; k 2 nt deletions in CDR1-IMGT and CDR2-IMGT regions and frameshift mutations at 1418 and 1487; 4 nt deletion and frameshift from 1429; 1 nt deletion and frameshift at position 1462; l 1st-CYS replaced by Ala.
Scientific RepoRts | 7:40416 | DOI: 10.1038/srep40416 corresponding J L and C L , which is topologically similar to the three-spined stickleback L2 genes on chromosome 11 15 . It is worthy to note that although all the scaffolds carrying V L in the opposite orientation as C L and J L are missing sequence information between V L and J L -C L (e.g., sequences in scaffold 4988, 2352, 2681, and 3001). For example, the orientation of V2f and V2g on scaffold 2352 appears to be opposite to that of C2h and J2g. However, two possibilities should be considered: (1) the gaps between these gene segments may contain novel C L and J L segments with the same orientation as V2f and V2g and (2) scaffold joining might reveal additional V L segments that are downstream of and in the same orientation as C2h and J2g. The L2 locus is most likely occupied by eleven clusters, and on average one V L segment resides in each cluster. Conventional recombination at the L2 locus would occur. For example, rearrangement between V2d and J2d on scaffold 7989 will occur by deletion of the intervening DNA to form a V L J L .
On scaffold 54 and 139, assigned respectively to chromosome 3 and 5 29 , only one L2 V was detected and no corresponding C L or J L could be identified, based on both v4 and v5 assemblies. The other L2 sequences identified on v4 scaffolds could not be assigned to v5 chromosome (s) due to the presence of gaps. Type 1 IGL organization. L1 and L3 V sequences appear to be intermixed (discussed below). We described L1 IGL genes on at least seven genomic scaffolds (scaffolds with L1 C), thus they might operate as seven loci. As expected, L1 C sequences possess high amino acid identity (≥ 96%) with each other and the divergence from other types was evident (15-35% identity compared to L2/L3 C). As depicted in Fig. 5, the transcriptional polarity pattern in the L1 loci presents as V L in both orientations to J L and C L . In fact, in all but one instance (chromosome 2), the overall impression is that the L1 locus is organized as V L opposite to nearby J L and C L . On chromosome 2, four V L segments were identified, with three placed in the same transcriptional orientation to the C L (C1g) and another one in the opposite direction. On the other hand, sequences on scaffold 158 were perfectly assigned to chromosome 2, including V1t and C1g, while scaffold 10 was anchored to chromosome 2 in reverse, that is, it has the same V L segments (V1q, V1r, and V1s) in opposite directions (Fig. 5).
IGL cluster estimation. Southern blots of torafugu genomic DNA from sperm probed with different types of C L reveal that the IGL genomic organization in this species is of the cluster type (Fig. 6). More than two bands in most digests suggest multiple IGL loci. Judging by the number of hybridizing bands, seven and three IGL loci are common in L1 and L3. For the L2 isotype, the number of clusters is lower than predicted. It is noticeable that the two bands digested by PstI are much stronger than other bands in L2 blots, which is attributable, at least in part, to the fact that there is no or limited polymorphism with PstI and many bands are hybridized at the same spot.
Phylogenetic analyses. The V L domains of different teleost species and IGL isotypes were aligned (Fig. 7). Similar to the report by Criscitiello and Flajnik 13 , the comparison analysis revealed the conservation of a long CDR2 in L2 V (relative to other isotypes) and a long CDR1 in L3 V. The torafugu L1 V sequences were found to possess both short CDR1 and short CDR2, and were missing the key amino acid 1st-CYS in the FR1 region; this may be a torafugu-specific finding. J L gene Fct J-Nonamer Spacer J-Heptamer J region nt and AA sequences A phylogenetic tree was constructed based on the alignment of V L amino acid sequences from various vertebrates (Fig. 8). The torafugu L2/σ V sequences (V2a, V2c, V2d, V2e, and V2f) clustered strongly together and were distinct from the κ group (including teleost L1 and L3), which seemed to be mingled (V L sequences from the same scaffold are not necessarily in one group). Interestingly, although all the torafugu IGLV1 and IGLV3 sequences belong to the mammalian κ isotype, they clustered to separate groups. This suggests that they are probably associated with different sub-isotypes or a teleost-specific IGL isotype, as is the case in stickleback 15 .
Torafugu C L segments were compared using phylogenetic trees to evaluate the C L relationships among vertebrates (Fig. 9). None of the torafugu C L segments cluster with mammalian κ or λ IGL sequences. However, torafugu C L segments group strongly in branches with sequences belonging to the same teleost isotype (L1, L2, and L3), suggesting that teleosts share a common derivation and that three or more IGL isotypes may have been present in a teleost ancestor. A close relationship between torafugu (belonging to the Tetradontiformes order, Acanthopterygii superorder), and other species from the Perciformes order (Acanthopterygii), such as seabass (Dicentrarchus labrax), rockcod (Trematomus bernacchii), and wolffish (Anarhichas minor), is also evident from the tree. In addition, phylogenetic analysis consistently revealed the tendency of C L clustering according to taxonomic group rather than the isotype 13,30 . Taken together, the results of the phylogenetic analysis of the torafugu V L and C L sequences revealed different selective pressures on the two domains, wherein C L tends to cluster according to taxonomic group, while V L tends to group by isotype.   Isotype distribution was assessed for the J L segments and J L 1, J L 2, and J L 3 sequences were distinguished ( Supplementary Fig. S1). Of all J L segments identified, those belonging to L1 and L3 were most similar to each other.
Analysis of V L gene 5′ flanking regulatory sequences. We examined 5′ flanking sequences for identified V L segments to reveal possible regulatory features. The 5′ flanking region contains two conserved motifs, namely the octamer motif, which is critical to correct transcription of IGL genes, and the TATA box for the general transcription process 31 . As summarized in Table 1

Functionality of torafugu IGL loci.
A total of fifteen torafugu EST sequences associated with IGL expression were identified from the NCBI EST database. Alignment of torafugu ESTs to concordant genomic V L segments revealed that all functional IGLV3 genes were expressed, while only one IGLV2 sequence (V2k) was expressed. Additionally, expression of all the IGLV1 sequences was observed despite the fact that they were missing the 1st-CYS in the FR1 region. Expression of all the complete C L segments was also observed with one exception: the C1d on scaffold 7391. Upon detailed examination, 9 ESTs and 6 ESTs were found to be concordant with the L2 locus and L1/L3 loci, respectively. Interestingly, ESTs associated with L2 and L3 C sequences were found to lack a V L segment, except for EST AL835785, which carried a complete V L J L -C L (L2 C). In comparison, expression of L1 C sequences was often found to be with either IGLV1 or IGLV3 sequences (Supplementary Table S1). The identity of all the retrieved ESTs to genomic V L and C L segments is 95-100%, suggesting the feasibility of using this method to assign ESTs to concordant genomic sequences.

Discussion
In the present study, we have characterized the torafugu IGL genomic organization based on available genome data sets. It has been reported that torafugu has two IGL isotypes, L1 and L2. Here, a teleost L3 isotype was newly identified, demonstrating that torafugu possesses at least three IGL isotypes. All the IGL genes have been found to be partitioned over multiple scaffolds (v4 assembly). Currently, we can only speculate that torafugu IGL genes should be assigned to three different chromosomes due to incomplete sequence information from the v5 assembly. Our observations must be taken as a step forward in the elucidation of torafugu IGL genomic organization and future studies on more complete genome assembly may help to address the current issues with gaps and false assemblies in the whole genome sequence. During vertebrate phylogeny, IGL genes have undergone major evolutionary transitions involving genomic arrangements. One extreme example is the presence of a single IGL isotype (λ ) in bird species, such as chicken and zebra finch 7,32 . Unlike mammalian κ and λ loci, which are often arranged in a translocon fashion, teleost IGL genes are organized in distinct clusters of (V L -J L -C L ) n . Herein, we show that torafugu IGL genes are arranged in a compact multi-cluster configuration, supported by both the genomic organization and the Southern blot result.  This observation is similar to that found in other teleosts, suggesting a conservation of the cluster IGL organization among teleost species.
In regard to the comparative analysis of the sequences of torafugu C L with those of other vertebrates, the relative distances are in agreement with the phylogenetic relationships. The torafugu C L share the same cluster with teleost L1, L2, and L3, respectively. Moreover, a sister-group relationship (Fig. 9) in the superorder Acanthopterygii between torafugu L1 C sequences and those of the L1b subgroup (wolffish L1b, seabass L1b, and rockcod L1b) is supported by the observed high bootstraps values. At this time, we did not find an L1a C homolog in torafugu, but if such sequences are found in the future, this would further support the hypothesis that L1a and L1b subtypes exist in the Acanthopterygii L1 isotype 33 . In addition, the identification of an L3 in torafugu (Acanthopterygii), together with the presence of L3 in rockcod (Acanthopterygii) and Ostariophysi (catfish, zebrafish, and carp), suggests that the divergence between L1 and L3 took place at or before the emergence of Euteleosts 18 . Finally, screening of the EST database indicates that the majority of IGLV1 and IGLV3 genes are expressed. However, most of the ESTs associated with the expression of L2 C do not have a corresponding V L segment. This phenomenon has been previously described in zebrafish 34 and medaka IGκ 19 , and it may be related to the low efficiency in eliminating aberrant IGL transcripts 35 .
Scientific RepoRts | 7:40416 | DOI: 10.1038/srep40416 isotypes, (2) torafugu with multiple C L on a scaffold are poised to reconstruct the IGL locus by inversional rearrangement, which can bring V L from one cluster into another, similar to that of zebrafish 21 . With efforts to sequence additional genomes, it will be intriguing to investigate whether the inversional inter-cluster rearrangement is teleost-specific or a commonplace in other species.

Methods
Retrieval of IGL genes from the torafugu genome. Genome builds of torafugu (assembly v4, October 2004 and assembly v5, January 2010) available from the Fugu Genome Project 29 (http://www.fugu-sg.org/) were searched to locate the IGL genes. Published IGL amino acid sequences from torafugu 18,24 and other teleosts 15,17,20 were used as queries in TBLASTN alignments (cutoff E-value of 10 −15 ) to retrieve relevant scaffolds and chromosomes. Genomic sequences that contain matches for both V L and C L were downloaded for further analysis. The identified genomic sequences were subsequently used as queries in BLASTN searches against the EST database at NCBI to retrieve expression data. Expression of V L genes was determined by BLAST hits using a 95% threshold identity and a 10 −15 E-value threshold, while ESTs were assigned to concordant C L when a ≥ 99% identity was met.
Annotation of torafugu IGL. Artemis 36 was used to annotate the IGL loci, including the transcriptional polarity and relative positions of V L and C L in the genomic sequences. C exons were discerned by comparing resultant genomic sequences with published IGL mRNAs. V L genes were determined based on the presence of canonical RSS (allowing 2 nucleotide mismatches), with ORFs that match for IG signature sequences using IgBLAST (www.ncbi.nlm.nih.gov/projects/igblast) and IMGT/V-QUEST 37 (the Teleostei unit), and finally by pattern searches for 23RSS or 12RSS flanking ends of gene segments. To identify the J L genes, which are too short to be detected by BLAST searches, we performed pattern searches to find J L -specific RSSs among the initial genomic sequences that contain V L and C L . The pattern is a consensus RSS heptamer and a nonamer with a 22-24 bp spacer (CACAGTG-N22-24-ACAAAAACC) region. Splice sites between leader and V exons were discerned by FSPLICE (http://linux1.softberry.com/berry.phtml). Exon boundaries of V L , J L , and C L were refined by alignment with known VJ-C cDNA sequences and torafugu EST sequences (from Fugu Genome Project) 38 . Nomenclature. Identified IGL genes were annotated according to the IMGT ® nomenclature 39 . For the V L genes, all retrieved sequences without a truncation, frameshift mutation, or premature stop codon in the leader exon and the V exon, which had conserved residues (1 st -CYS, conserved-TRP, and 2 nd -CYS) in FR1, FR2, and FR3 regions, respectively, and possessed a proper RSS, were deemed as functional genes. For the C L and J L gene segments, retrieved sequences without frameshift mutations and internal stop codons were regarded as potentially functional genes. In addition, examination of RSS was implemented to determine putative functionality of J L .

Comparative phylogenetic studies.
Phylogenetic studies were carried out using the MEGA7 program 40 .
Multiple sequence alignments were performed using MAFFT 41 . The neighbor-joining (NJ) method was used to construct phylogenetic trees (pair-wise deletion, Jones-Taylor-Thornton matrix) and to enter range-activated sites by gamma parameter 2.5. Evaluation of the veracity of these trees was done by executing a bootstrap procedure of 1000 replicates. Southern blotting. Genomic DNA from torafugu sperm (5 μ g; extracted using DNeasy ® Blood & Tissue Kit, Qiagen, Valencia, CA) was digested with EcoRI, HindIII, BamHI, and PstI. The digested DNA was electrophoresed on 0.8% agarose gels for 16 h and transferred onto Hybond-N+ membranes (GE Healthcare, Piscataway, NJ). Hybridizations and subsequent detection were performed according to the manufacturer's instructions (Amersham AlkPhos Direct ™ , GE Healthcare). Torafugu C probes consisted of the entire CL domain of L1, L2, and L3. The probes were amplified using Platinum ® Taq DNA Polymerase High Fidelity (Invitrogen, Carlsbad, CA). The conditions for the thermal cycler were: 94 °C for 2 min, followed by 30 cycles of 94 °C for 30 s, 55 °C for 30 s, 68 °C for 1 min, and a final extension at 68 °C for 5 min (see primer details in Supplementary Table S2).