Comprehensive profiling of TCRβ repertoire in a non-model species (the bank vole) using high-throughput sequencing

In recent years, immune repertoire profiling with high-throughput sequencing (HTS) has advanced our understanding of adaptive immunity. However, fast progress in the field applied mostly to human and mouse research, with only few studies devoted to other model vertebrates. We present the first in-depth characterization of the TCRβ repertoire in a non-model mammal with limited genomic resources available – the bank vole (Myodes glareolus). We used 5′RACE and Illumina HTS to describe V and J segments and to qualitatively characterize preferential V–J segment usage and CDR3 length distribution. Finally, a molecular protocol integrating unique molecular identifiers was used for quantitative analysis of CDR3 repertoire with stringent error correction. We found 37 V and 11 J genes that were orthologous to mice genes. A conservative, lower bound estimation of the TCRβ repertoire was 1.7–2.3×105 clonotypes, and the degree of sharing of the observed repertoire between any two individuals was 3.6% of nucleotide sequences and 14.3% of amino acid sequences. Our work adds a crucial element to the immunogenetic resources available for the bank vole, an important species in ecological and evolutionary research. The workflow that we developed can be applied for immune repertoire sequencing of non-model species, including endangered vertebrates.


28
The central role of T cells in the adaptive immune response 1 , both humoral and cytotoxic, is 29 mediated by T-cell receptors (TCRs), which are responsible for self/non-self distinction and 30 foreign antigen recognition in the context of the major histocompatibility complex (MHC). 31 TCRs are heterodimeric membrane proteins formed by two chains-α and β. The variability of 32 TCRs, necessary to ascertain specificity and robustness for interaction with different 33 antigens, is generated similarly to that of immunoglobulins-that is, by a random, somatic 34 recombination of different, germline DNA segments (Fig. 1). During somatic recombination, αβ TCR repertoire of mature lymphocytes has still not been precisely determined even in 51 humans, with estimates differing by an order of magnitude 4-5 from 2.5×10 7 to 1×10 8 . In the 52 house mouse (Mus musculus), the number of unique TCR αβ pairings was estimated at 53 1.9×10 6 , with the number of unique TCRβ chain types estimated to reach 5-54 8×10 5 clonotypes 6 . 55 The composition and size of the TCR repertoire is crucial for a successful immune 56 response 7-9 ; however, its enormous diversity has long impeded in-depth study of an 57 individual TCR's collections at the genomic level. Indirect techniques, such as CDR3 58 spectratyping 10 , allowed a general overview of TCR repertoire dynamics, but a direct 59 exploration of sequence diversity was only possible with exceedingly time-consuming and 60 costly cloning and Sanger sequencing (reviewed in Six et al.) 11 . Thus, profiling of immune 61 repertoires was limited to a few model species of medical or veterinary importance. However, 62 in the past decade, high-throughput sequencing (HTS) has revolutionized and prompted 63 direct, affordable, and efficient sequencing of immune repertoires 12,13 . The accuracy of TCR 64 repertoire characterization (focusing on the short, but extremely variable, CDR3 region) has 65 further been improved by using 5′ Rapid Amplification of cDNA Ends (5′ RACE) 14 , which 66 alleviates the amplification bias introduced by a multiplexed PCR 15,16 . Further integration of 67 unique molecular identifiers (UMIs) 17 into the 5′ RACE protocol allows efficient correction 68 of PCR and sequencing errors 18 . The sequence-level resolution of HTS studies has already 69 granted new insights into various aspects of adaptive immunity, such as immune response to 70 infection 19 , autoimmune diseases 20 , repertoire changes with age 5 , or cancer progression and 71 persistence 21,22 . Despite the potential offered by HTS, few attempts were made to transfer 72 these advances to non-model species or, in general, species other than human and mouse.  In the present study, we applied recent developments in TCR repertoire sequencing to 78 a non-model mammal, with limited genomic resources available, which is, nonetheless, 79 subject to much ecological and evolutionary research. The bank vole (Myodes glareolus) is a 80 small rodent of the Cricetidae family, widespread in western Europe and northern Asia, that 81 is very convenient to study a range of topics-from adaptive radiation 26,27 , genetic basis of 82 adaptation 28 , and response to selection 29 to processes shaping sexual 30 and behavioural 83 traits 31 . Interest focused on its immunity as well as its role as a reservoir of a pathogenic 84 Puumala hantavirus (PUUV) 32,33 , which causes a mild form of haemorrhagic fever with renal 85 syndrome in humans. To date, a number of components of the bank vole's immune system 86 have been studied, such as the MHC 34-37 and Toll-like receptors 38,39 . But, sequencing and 87 profiling of immune repertoires, such as TCRs, has never been attempted in this species.

88
Here, we adapted protocols available for human and mouse to present the first in-89 depth characterization of a TCRβ repertoire in a non-model rodent. We took advantage of the 90 fact that the 5′ RACE technique does not require prior knowledge of V and J segments in the 91 species of interest. Necessary primers within a constant region of the TCRβ were designed 92 based on de novo assembled transcriptomes-an approach successfully tested before in the 93 highly complex MHC gene family 37 . Amplified transcripts of the TCRβ chains from seven 94 bank voles contained full CDR3 sequences, as well as entire V and J segments. These 95 sequences were subjected to detailed qualitative analysis, which involved description of V 96 and J genes, V-J segment usage, and CDR3 length distribution. For three animals, an 97 estimation of TCR repertoire size was undertaken using a protocol that incorporated UMIs.

99
Qualitative description of the bank vole TCRβ repertoire 100 Using 5′ RACE and Illumina paired-end 300-bp MiSeq sequencing ( Supplementary Fig. S1) , 101 we obtained a full variable region of TCRβ chains from seven bank voles, from which V and 102 J segments were extracted. In this section, we describe phylogenetic comparison of these 103 segments with mouse genes, which also served as a basis for naming of the bank vole loci.

104
Furthermore, we analysed V-J segment usage and CDR3 length distribution.  and further justifies treating clades as separate loci. 118 We identified 25 distinct V-segment subgroups. Most of them comprised only one 119 locus; however, two subgroups, BV_TRBV12 and BV_TRBV13, contained multiple loci.

120
Similar expansion took place with the murine orthologue subgroups TRBV12 and TRBV13, 121 which have three loci each. Overall, in the bank vole, we found 37 V putative loci (with 1-5 122 alleles per locus, summing to a total of 116 alleles across loci), compared to 21-22 functional 123 loci in the mouse 40 . In the case of the J segment, we identified 11 loci within two subgroups-124 one with five loci and the other with six loci. This number exactly matches the number of 125 functional J loci in the mouse 40 . We identified 1-3 alleles per locus, with a total of 17 alleles 126 across loci. 127 We compared TCRβ V segments retrieved in the present work from directly    The CDR3 lengths had a bell-shaped distribution and ranged from 5 to 18 amino acids 159 (15-54 nt). The median length equals 12 amino acids, 71% of unique CDR3 sequences had a 160 length of 11-13 amino acids, and 90% of CDR3s had length ranging from 10 to 14 amino 161 acids (Fig. 4e). In silico spectratypes with V and J segments are shown in Supplementary Fig.   162 S8. The global mean length of CDR3s was 11.91 amino acids (weighed by read count), but 163 mean lengths of CDR3s formed by different V and J segments showed consistent differences 164 (Figs. 4b, 4d). For example, V segments BV_TRBV01, BV_TRBV20, and BV_TRBV30 165 tend to form shorter CDR3s (means: 10.37, 11.17, and 11.02, respectively), whereas 166 BV_TRBV02 and BV_TRBV05 slightly longer ones (mean 12.56, for both). Similarly, J 167 segments BV_TRBJ1-1, BV_TRBJ1-2, and BV_TRBJ2-1 form shorter than average CDR3s 168 (mean: 11.35, 11.37, and 11.11, respectively).   Table S3), which indicates 194 that in four samples we directly observed up to 86% splenic TCRβ CDR3 sequences.

195
In our size-estimation protocol, we sub-sampled amplicons to control for differences 196 in the sequencing depth. The 1.5 mln raw reads sufficed to retrieve most of the diversity 197 present in each amplicon and in the individual (across all replicated amplicons). For example, 198 for the highest-coverage individual (s08, mean: 3.5 mln raw reads per replicate), with sub-199 sampling to 1.5 mln, the unique number of observed CDR3s was 1.4×10 5 nucleotide 200 sequences per replicate and, altogether, 1.9×10 5 for the individual (when a total of 6 mln 201 reads were analysed). Doubling the sequencing effort increased the number of unique CDR3s 202 to 1.7×10 5 per replicate, and to a total of 2.1×10 5 for the individual (when all 13.9 mln reads 203 were analysed). The Chao2 estimator was robust to those variations, as the estimates of 204 CDR3 repertoire size increased only by 8%-from 2.3×10 5 (at 1.5 mln reads per replicate) to 205 2.5×10 5 (with ± 3.5 mln reads per replicate).

206
Private versus public repertoire 207 The vast majority of CDR3 nucleotide sequences represented private TCRβ repertoire-that is, 208 these sequences were not present in any other individual. On average, only 3.7% of CDR3 209 nucleotide sequences was shared between any two individuals, and 1.3% was present in all 210 three individuals, thus representing a public repertoire ( Supplementary Fig. S9). In contrast, 211 comparison of common CDR3 nucleotide sequence between replicates from the same 212 individual showed that, on average, 77% of sequences was shared between any two 213 replicates, and 47% was present in all four replicates. The number of shared amino acid 214 CDR3 sequences between any two individuals was, on average, 14.3% (SD: 1%). Further,  Table S3).

218
Taking advantage of advances in HTS, we were able to characterize, for the first time, the 219 TCR repertoire of a non-model rodent. We described V and J gene segments in the bank vole, 220 analysed their phylogenetic relationship to model murine TCRs, and also characterized basic   Table S4). This method was proven to be  Table   372 S4). Subsequently, samples were purified using Agencourt AMPure XP beads with 1:0.6 373 ratio of DNA:beads, to remove short, nonspecific products and primer/adaptor-dimers. Then,        The CDR3 region is defined as the sequence between the last conserved Cys in the V 467 segment and one residue (3 nt) before the first conserved Gly in the J segment.  Table S2). To estimate the lower bound of individual TCRβ diversity (repertoire size) we chose a non-529 parametric, incidence-based Chao2 estimator 46 . Chao2 does not require assumption of a priori 530 clonotype distributions, and it had been previously applied to TCR repertoire estimation 5,45 .