Original Article

Genes and Immunity (2012) 13, 469–473; doi:10.1038/gene.2012.20; published online 24 May 2012

High-throughput antibody sequencing reveals genetic evidence of global regulation of the naïve and memory repertoires that extends across individuals

B S Briney1, J R Willis2, B A McKinney3 and J E Crowe Jr1,4

  1. 1Departments of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN, USA
  2. 2Departments of Chemical and Physical Biology Program, Vanderbilt University Medical Center, Nashville, TN, USA
  3. 3Tandy School of Computer Science and Department of Mathematics, University of Tulsa, Tulsa, OK, USA
  4. 4Departments of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA

Correspondence: Dr JE Crowe Jr, Vanderbilt Vaccine Center, Vanderbilt University Medical Center, 11475 MRB IV, 2213 Garland Avenue, Nashville, TN 37232-0417, USA. E-mail: james.crowe@vanderbilt.edu

Received 21 December 2011; Revised 9 April 2012; Accepted 11 April 2012
Advance online publication 24 May 2012



Vast diversity in the antibody repertoire is a key component of the adaptive immune response. This diversity is generated centrally through the assembly of variable, diversity and joining gene segments, and peripherally by somatic hypermutation and class-switch recombination. The peripheral diversification process is thought to only occur in response to antigenic stimulus, producing antigen-selected memory B cells. Surprisingly, analyses of the variable, diversity and joining gene segments have revealed that the naïve and memory subsets are composed of similar proportions of these elements. Lacking, however, is a more detailed study, analyzing the repertoires of naïve and memory subsets at the level of the complete V(D)J recombinant. This report presents a thorough examination of V(D)J recombinants in the human peripheral blood repertoire, revealing surprisingly large repertoire differences between circulating B-cell subsets and providing genetic evidence for global control of repertoire diversity in naïve and memory circulating B-cell subsets.


antibodies; binding sites; antibody; immunologic memory; antibody specificity



A diverse human antibody repertoire is a key element of the acquired immune response and is critical to the effective prevention and clearance of microbial infections.1 Vast diversity in the antibody repertoire is generated initially through a process of combinatorial rearrangement in which variable (V), diversity (D) and joining (J) gene segments are assembled into a complete immunoglobulin sequence.2, 3 This initial diversity is increased through the use of antigen-driven somatic hypermutation and class-switch recombination.4, 5, 6, 7 These affinity maturation processes result in the creation of distinct memory populations that contain only antigen-experienced B cells.8

Previous analysis of the antibody gene repertoire in peripheral blood B-cell subsets did not detect significant differences in germline V, D or J gene segment usage between the naïve and memory populations.9 This finding was somewhat surprising, as it was expected that the memory subset might contain an altered germline gene repertoire that was biased by antigen selection. Although more narrowly focused work was previously able to identify differential JH gene use in memory subsets,10 this study only analyzed the fraction of sequences that contain the VH6 gene segment and not the total repertoire. A study using a larger sequence pool was able to identify differences in both JH gene and VH gene family use in naïve and memory subsets, with the memory population displaying an increase in JH4 gene use, a corresponding decrease in JH6 gene use and differential use of several VH gene families.11 Recently, there have been several studies that leveraged high-throughput amplicon sequencing to perform in-depth analysis of human antibody repertoires.11, 12, 13, 14, 15 These previous studies have been limited in scope, however, owing to analysis of the use of V, D or J genes in isolation. In the peripheral blood antibody repertoire, individual V, D and J genes are not expressed in isolation, but are linked by recombination. Thus, it is imperative to study gene segment usage not only by individual gene segment use, but also in the context of complete V(D)J pairings to gain a more complete understanding of the antibody repertoire. Here, we present a thorough examination of V(D)J recombinants in the human peripheral blood repertoire. The studies reveal stark repertoire differences between circulating B-cell subsets and provide genetic evidence for global control of repertoire diversity in both naïve and memory subsets.


Results and Discussion

We separately isolated naïve, IgM memory and IgG memory B cells from four healthy individuals using flow cytometric sorting, extracted mRNA and performed RT-PCR to amplify antibody genes from those cells, and subjected the resulting amplicons to high-throughput DNA sequencing. The primers were selected for their ability to produce accurate, reproducible amplification of both naïve and mutated antibody repertoires,12, 13 and the variable gene use of our repertoire closely matched repertoire analysis in which amplification was performed on single B cells.9, 16 After selecting only high-quality antibody sequences, we obtained a total of 294232 naïve cell sequences, 161313 IgM memory cell sequences and 94841 IgG memory cell sequences.

We analyzed the V(D)J recombinant repertoire in each of the three B-cell subsets, and created Circos plots showing the relative prominence of each V(D)J recombination within the repertoire of each cell subset (Figures 1a–c). These plots revealed a large number of trends that were apparent only when analyzing the repertoire in the context of complete V(D)J recombinations. Here, we focus on three of the most prominent trends.

Figure 1.
Figure 1 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Circos diagrams display linked VH, D and JH gene use in naïve, IgM memory or IgG memory B-cell subsets. Circos plots were generated for (a) naïve, (b) IgM memory or (c) IgG memory repertoires. On the right half of each plot (colored segments), each heavy chain variable gene family is shown in a separate color. The arc length of each segment corresponds the relative frequency of each heavy chain variable (VH) gene family in the identified B-cell subset. The smaller ticks on the outer ring represent 1000 sequences, with major ticks marking 5000 sequences. The left half of each plot (black segments) shows each heavy chain joining (JH) gene segment. The arc length of each segment corresponds to the frequency of JH gene use in combination with the given VH gene within the identified subset. The scale of arc lengths on the JH gene side of the diagram is 1/4 that of the arcs on the VH gene side. Just inside the tick ring on the VH gene side of the diagram, segmented arcs identify the individual VH genes within each VH gene family. Within each plot, pairing of JH genes to VH gene segments is represented by colored ribbons. The color of the ribbon indicates VH gene use; increased ribbon width and color intensity corresponds to increased frequency of the represented VH–JH pairing. Just outside the VH–JH links on the VH gene side of the diagram, a stacked histogram indicates D gene use for each VH–JH pairing. Diversity gene families D1, D2, D3, D4, D5, D6 and D7 are plotted in increasingly darker shades of gray (D1 is closest to the center in lightest gray; D7 is furthest from the center in darkest gray). (d) The contribution of the fifty most common V(D)J recombinations for each subset is plotted. Bars indicate mean±s.e.m. The P-values for pairwise comparisons were determined using a two-tailed Student’s t-test.

Full figure and legend (489K)

First, virtually all of the major VH–JH pairings (identified by colored ribbons in Figure 1) follow a similar pattern: increased use of VH–JH pairings that contained heavy chain joining gene 4 (JH4) and decreased use of pairings that contained JH5 or JH6 in both memory subsets, as compared with naïve cells. Use of JH4 has been shown previously to be increased in memory subsets,11 but the consistency with which the broad spectrum of VH–JH pairings exhibited increased JH4 use is surprising. Second, use of diversity gene family 3 (D3) was increased dramatically in recombinations that used heavy chain variable gene 3–30 (VH3–30) and either JH4 or JH5 in both naïve and IgG memory subsets. In the IgM memory cell subset, however, diversity gene use in recombinations that used VH3–30 and JH4 or JH5 was much lower than in the naïve or IgG memory subsets and was comparable to that of other genes in the VH3 family. These data support emerging evidence that the IgM memory repertoire is genetically distinct from the IgG memory repertoire and that this difference is likely the result of different stimuli.9, 11 Finally, the three Circos plots reveal the increased oligoclonality of both memory subsets compared with naïve. The colored ribbons in each plot represent VH–JH pairings that comprise at least 1% of the total subset repertoire. In the naïve subset, there are 66 different VH–JH pairings that each account for at least 1% of the total naïve repertoire. In the IgM memory subset, only 27 different VH–JH pairings exceed 1% of the total subset repertoire, and only 19 VH–JH pairings exceed 1% in the IgG memory subset. These data indicate that the memory subsets become increasingly oligoclonal, with a small selection of VH–JH pairings comprising a larger fraction of the total subset repertoire.

Further analysis of the variable gene use in each of the subsets revealed contrasts with recently published work on the antibody repertoire in human cord blood.15 In the cord blood repertoire, VH1–2 was found to be the most commonly used germline gene. In the naïve, IgM memory and IgG memory subsets, not only was VH1–2 not the most commonly used variable gene, it was not even the most commonly used VH1 family gene. Both VH1–18 and VH1–69 were more frequently used in the naïve and IgG memory populations and VH1–18 was used more frequently in the IgM memory population. In all three peripheral blood subsets, either VH3–23 (naïve and IgM memory) or VH3–30 (IgG memory) was the most commonly used variable gene, which is consistent with previous data.11, 12, 13, 14 Joining gene use was consistent between our sample and the previously published cord blood repertoire, however, with JH4 being the most commonly used joining gene in all peripheral blood subsets and cord blood.

The oligoclonality of each subset was further analyzed by determining the contribution of the 50 most commonly used V(D)J recombinations to the total repertoire of each subset (Figure 1d). In the IgG memory subset, which was most oligoclonal, the top 50 V(D)J recombinations accounted for 38.2% of the subset repertoire. Conversely, in the naïve and IgM memory subsets, which were significantly less oligoclonal than the IgG memory subset, the top 50 V(D)J recombinations accounted for only 19.6% (naïve, P=0.01) or 23.6% (IgM memory, P=0.02) of the total subset repertoire. Additionally, there was a trend toward reduced oligoclonality in the naïve subset compared with IgM memory that fell short of statistical significance (P=0.15).

We next performed a clustering analysis on the V(D)J recombinant repertoire for each donor and subset (Figure 2). We found patterns that were robust to different clustering metrics and linkage types. Specifically, we performed hierarchical bi-clustering with the Euclidean and Pearson correlation metric in combination with single linkage, complete linkage and average linkage. In all scenarios, the samples clustered in the categories shown in Figure 2. Prior to clustering, we used a variance filter to remove genes with very low variation across subjects. Interestingly, repertoires of the same subset from different donors (inter-donor subsets) were more closely related than different subset repertoires from the same donor (intra-donor subsets). In fact, phylogenetic clustering of the subset repertoires of all four donors revealed clustering exclusively among inter-donor subsets, with no observed intra-donor subset clustering. This finding was unexpected, as each donor has experienced a unique set and sequence of pathogen encounters, and each donor might be expected to generate unique memory repertoires appropriately skewed by prior histories of infection. Also of note, donor pairs were consistently clustered together, regardless of subset. Donors HD4 and HD5 clustered most closely in each of the three subsets, as did donors HD6 and HD7. In addition to the inter-clonal subset clustering, the tight groupings of highly over-represented V(D)J recombinations within each individual memory repertoire provide further evidence of the oligoclonality of the memory subsets. The unique V(D)J recombinations that comprise the tight groupings are not shared between like subsets of different donors and, surprisingly, similar groupings are not present in the naïve subset from the same donors. This finding indicates that although the frequency of germline gene family use may appear similar between naïve and memory populations when the VH, D or JH families are analyzed in isolation, deeper analysis of the V(D)J recombinant repertoires of these subsets uncovers stark repertoire differences.

Figure 2.
Figure 2 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Clustergram of V(D)J recombinants reveals global control of expressed antibody repertoires in B-cell subsets. The frequency of each V(D)J recombination was determined for the naïve, IgM memory or IgG memory subsets for four healthy donors, and a clustergram was created. V(D)J recombinants were clustered by relative frequency in each donor and subset, and the resulting phylogenetic tree is shown on the left. Results for three B-cell subsets from each donor were clustered by the V(D)J repertoire, and the resulting phylogenetic tree is shown at the top. The frequency variation for each V(D)J recombination across all subsets was determined, and standardized to a range of −3 to 3. The colorimetric scale for the heat map is shown.

Full figure and legend (157K)

The substantial differences in each subset repertoire at the individual V(D)J recombinant level, coupled with the overarching similarities of the repertoires at the germline gene family level, present something of a paradox. These data seem to suggest the presence of a broad, global mechanism for repertoire regulation at the germline gene family level. Although there is no direct evidence of such a mechanism, several recent studies indicate the presence of repertoire-based regulatory mechanisms in circulating B-cell subsets. Despite the tendency of pathogen-specific antibody responses to exhibit biased germline gene repertoires,16, 17, 18 the frequency of gene family use in naïve and memory subsets is remarkably consistent across individuals.9, 11 Further, alteration of this gene family homeostasis in the circulating B-cell repertoire is associated with disease states.19, 20, 21, 22 In recent work, long-lived plasma cells were shown to contain significantly fewer autoreactive B cells than the circulating IgG memory subset, and this difference was attributed to differential repertoire regulation within the two subsets.23 Thus, although no mechanism for regulation of circulating human antibody repertoires has been identified, mounting indirect evidence, including the data presented in this report, suggests the presence of such regulation.

In summary, this report presents a deep analysis of human antibody repertoire diversity at the V(D)J recombinant level. The data reveal a strong tendency for memory subsets to become substantially more oligoclonal than naïve subsets. The studies also show the near universality with which the memory subsets display greater use of JH4, whereas reducing use of JH5 and JH6 across virtually all VH–JH pairings. Finally, we identified antibody repertoire similarities between inter-donor circulating B-cell subsets that outweigh the similarities between intra-donor subsets, suggesting the existence of a mechanism for broad-based repertoire control.


Materials and methods

Sample preparation and sorting

Peripheral blood was obtained from healthy adult donors following informed consent, under a protocol approved by the Vanderbilt Institutional Review Board. Mononuclear cells from the blood of four donors were isolated by density gradient centrifugation with Histopaque 1077 (Sigma-Aldrch, St Louis, MO, USA). Prior to staining, B cells were enriched by paramagnetic separation using microbeads conjugated with antibodies to CD19 (Miltenyi Biotec, Cambridge, MA, USA). Cells from particular B-cell subsets were sorted as separate populations on a high-speed sorting cytometer (FACSAria III; Becton Dickinson, Franklin, Lakes, NJ, USA) using the following phenotypic markers, naïve B cells: CD19+/CD27/IgM+/IgG/CD14/CD3, IgM memory B cells: CD19+/CD27+/IgM+/IgG/CD14/CD3 and IgG memory B cells: CD19+/CD27+/IgM/IgG+/CD14/CD3. Total RNA was isolated from each sorted cell subset using a commercial RNA extraction kit (RNeasy; Qiagen, Valencia, CA, USA) and stored at −80°C until analysis.

Complementary DNA synthesis and PCR amplication of antibody genes

A total of 100ng of each total RNA sample and 10pmol of each RT-PCR primer (Supplementary Table S1) were used in duplicate 50μl RT-PCR reactions using the OneStep RT-PCR system (Qiagen). Thermal cycling was performed in a BioRad (Hercules, CA, USA) DNA Engine PTC-0200 thermal cycler using the following protocol: 50°C for 30min, 95°C for 15min, 35 cycles of (94°C for 45s, 58°C for 45s, 72°C for 2min), 72°C for 10min. A total of 5μl of each pooled RT-PCR reaction, 20pmol of 454-adapter primers and 0.25 units of AmpliTaq Gold polymerase (Applied Biosystems, Carlsbad, CA, USA) were used for each 454-Adapter PCR reaction, performed in quadruplicate. Thermal cycling was done as before, but for 10 cycles.

Amplicon purification and quantification

Amplicons were purified from the pooled 454-adapter PCR reactions using the Agencourt AMPure XP system (Beckman Coulter Genomics, Danvers, MA, USA). Purified amplicons were quantified using a Qubit fluorometer (Invitrogen, Grand Island, NY, USA).

Amplicon nucleotide sequence analysis

Quality control of the amplicon libraries and emulsion-based clonal amplification and sequencing on the 454 Genome Sequencer FLX Titanium system were performed by the W. M. Keck Center for Comparative and Functional Genomics at the University of Illinois at Urbana-Champaign, according to the manufacturer’s instructions (454 Life Sciences, Branford, CT, USA). Signal processing and base calling were performed using the bundled 454 Data Analysis Software version 2.5.3 for amplicons.

Antibody sequence analysis

For germline gene assignments and initial analysis, the FASTA files resulting from 454 sequencing were submitted to the IMGT HighV-Quest webserver (IMGT, the international ImMunoGeneTics information system; http://www.imgt.org; founder and director: Marie-Paule LeFranc, Montpellier, France). Antibody sequences returned from IMGT were considered to be ‘high-quality’ sequences if they met the following requirements: sequence length of at least 300 nt; identified variable and joining genes; an intact, in-frame recombination; and absence of stop codons or ambiguous nucleotide calls within the reading frame.

Clustering of antibody repertoires

We perform agglomerative hierarchical clustering with complete linkage on both VDJ genes and individual donor subsets. First, we perform a filter that removes VDJ genes with low counts of low variation across all samples. Then we calculate pairwise distances between genes and samples using Pearson correlation. We standardize the values in the heat map to display in the range −3 to +3. Dendrograms and heatmaps were created with Matlab R2010b.

Analysis of differential expression of V(D)J recombinants

We use the edgeR software24 to calculate differential expression between tissues. EdgeR uses the negative binomial as the appropriate distribution for count data. We obtain dispersion estimates and test differential expression using the generalized linear model likelihood ratio test. The columns in the table show the fold change between tissues and the P-value and Benjamini and Hochberg false discovery rate.

Circos Plots

All Circos plots were made using Circos software (http://www.circos.ca/software).


Conflict of interest

The authors declare no conflict of interest.



  1. Crotty S, Ahmed R. Immunological memory in humans. Semin. Immunol 2004; 16: 197–203. | Article | PubMed | ISI | CAS |
  2. Tonegawa S. Somatic generation of antibody diversity. Nature 1983; 302: 575–581. | Article | PubMed | ISI | CAS |
  3. Schatz DG, Oettinger MA, Baltimore D. The V(D)J recombination activating gene, RAG-1. Cell 1989; 59: 1035–1048. | Article | PubMed | ISI | CAS |
  4. Wilson PC, de Bouteiller O, Liu Y, Potter K, Banchereau J, Capra JD et al. Somatic hypermutation introduces insertions and deletions into immunoglobulin genes. J Exp Med 1998; 187: 59–70. | Article | PubMed | ISI | CAS |
  5. Neuberger MS, Milstein C. Somatic hypermutation. Curr Opin Immunol 1995; 7: 248–254. | Article | PubMed | ISI | CAS |
  6. Neuberger MS. Antibody diversification by somatic mutation: from Burnet onwards. Immunol Cell Biol 2008; 86: 124–132. | Article | PubMed | CAS |
  7. Schroeder HW, Cavacini L. Structure and function of immunoglobulins. J Allergy Clin Immunol 2010; 125: S41–S52. | Article | PubMed |
  8. Jackson SM, Wilson PC, James JA, Capra JD. Human B cell subsets. Adv Immunol 2008; 98: 151–224. | PubMed |
  9. Tian C, Luskin GK, Dischert KM, Higginbotham JN, Shepherd BE, Crowe JE. Evidence for preferential Ig gene usage and differential TdT and exonuclease activities in human naïve and memory B cells. Mol Immunol 2007; 44: 2173–2183. | Article | PubMed | CAS |
  10. Rosner K, Winter DB, Tarone RE, Skovgaard GL, Bohr VA, Gearhart PJ. Third complementarity-determining region of mutated VH immunoglobulin genes contains shorter V, D, J, P, and N components than non-mutated genes. Immunology 2001; 103: 179–187. | Article | PubMed | CAS |
  11. Wu Y-C, Kipling D, Leong HS, Martin V, Ademokun AA, Dunn-Walters DK. High-throughput immunoglobulin repertoire analysis distinguishes between human IgM memory and switched memory B-cell populations. Blood 2010; 116: 1070–1078. | Article | PubMed | CAS |
  12. Boyd SD, Gaëta BA, Jackson KJ, Fire AZ, Marshall EL, Merker JD et al. Individual variation in the germline Ig gene repertoire inferred from variable region gene rearrangements. J Immunol 2010; 184: 6986–6992. | Article | PubMed | CAS |
  13. Boyd SD, Marshall EL, Merker JD, Maniar JM, Zhang LN, Sahaf B et al. Measurement and clinical monitoring of human lymphocyte clonality by massively parallel VDJ pyrosequencing. Sci Transl Med 2009; 1: 12ra23. | Article | PubMed | CAS |
  14. Arnaout R, Lee W, Cahill P, Honan T, Sparrow T, Weiand M et al. High-resolution description of antibody heavy-chain repertoires in humans. PLoS ONE 2011; 6: e22365. | Article | PubMed |
  15. Prabakaran P, Chen W, Singarayan MG, Stewart CC, Streaker E, Feng Y et al. Expressed antibody repertoires in human cord blood cells: 454 sequencing and IMGT/HighV-QUEST analysis of germline gene usage, junctional diversity, and somatic mutations. Immunogenetics 2011.
  16. Tian C, Luskin GK, Dischert KM, Higginbotham JN, Shepherd BE, Crowe JE. Immunodominance of the VH1-46 antibody gene segment in the primary repertoire of human rotavirus-specific B cells is reduced in the memory compartment through somatic mutation of nondominant clones. J Immunol 2008; 180: 3279–3288. | PubMed |
  17. Weitkamp JH, Kallewaard NL, Bowen AL, LaFleur BJ, Greenberg HB, Crowe JE. VH1-46 is the dominant immunoglobulin heavy chain gene segment in rotavirus-specific memory B cells expressing the intestinal homing receptor alpha4beta7. J Immunol 2005; 174: 3454–3460. | PubMed | ISI | CAS |
  18. Gorny MK, Wang X-H, Williams C, Volsky B, Revesz K, Witover B et al. Preferential use of the VH5-51 gene segment by the human immune response to code for antibodies against the V3 domain of HIV-1. Mol Immunol 2009; 46: 917–926. | Article | PubMed | CAS |
  19. David D, Demaison C, Bani L, Theze J. Progressive decrease in VH3 gene family expression in plasma cells of HIV-infected patients. Int Immunol 1996; 8: 1329–1333. | Article | PubMed |
  20. David D, Demaison C, Bani L, Zouali M, Theze J. Selective variations in vivo of VH3 and VH1 gene family expression in peripheral B cell IgM, IgD and IgG during HIV infection. Eur J Immunol 1995; 25: 1524–1528. | Article | PubMed | ISI | CAS |
  21. Zouali M. Nonrandom features of the human immunoglobulin variable region gene repertoire expressed in response to HIV-1. Appl Biochem Biotech 1996; 61: 149–155. | Article |
  22. Scamurra RW, Miller DJ, Dahl L, Abrahamsen M, Kapur V, Wahl SM et al. Impact of HIV-1 infection on VH3 gene repertoire of naive human B cells. J Immunol 2000; 164: 5482–5491. | PubMed | CAS |
  23. Scheid JF, Mouquet H, Kofer J, Yurasov S, Nussenzweig MC, Wardemann H. Differential regulation of self-reactivity discriminates between IgG+ human circulating memory B cells and bone marrow plasma cells. Proc Natl Acad Sci USA 2011; 108: 18044–18048. | Article | PubMed |
  24. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2010; 26: 139–140. | Article | PubMed | ISI | CAS |


We thank all patients for participating in the study. We would like to especially thank Chris L Wright and Alvaro G Hernandez at the W. M. Keck Center for Comparative and Functional Genomics at the University of Illinois at Urbana-Champaign for performing the 454 sequencing. We are grateful to the IMGT team for its helpful collaboration and the analysis of nucleotide sequences on the IMGT/HighV-QUEST web portal.

Supplementary Information accompanies the paper on Genes and Immunity website