Letter | Published:

High frequency of shared clonotypes in human B cell receptor repertoires

Naturevolume 566pages398402 (2019) | Download Citation


The human genome contains approximately 20 thousand protein-coding genes1, but the size of the collection of antigen receptors of the adaptive immune system that is generated by the recombination of gene segments with non-templated junctional additions (on B cells) is unknown—although it is certainly orders of magnitude larger. It has not been established whether individuals possess unique (or private) repertoires or substantial components of shared (or public) repertoires. Here we sequence recombined and expressed B cell receptor genes in several individuals to determine the size of their B cell receptor repertoires, and the extent to which these are shared between individuals. Our experiments revealed that the circulating repertoire of each individual contained between 9 and 17 million B cell clonotypes. The three individuals that we studied shared many clonotypes, including between 1 and 6% of B cell heavy-chain clonotypes shared between two subjects (0.3% of clonotypes shared by all three) and 20 to 34% of λ or κ light chains shared between two subjects (16 or 22% of λ or κ light chains, respectively, were shared by all three). Some of the B cell clonotypes had thousands of clones, or somatic variants, within the clonotype lineage. Although some of these shared lineages might be driven by exposure to common antigens, previous exposure to foreign antigens was not the only force that shaped the shared repertoires, as we also identified shared clonotypes in umbilical cord blood samples and all adult repertoires. The unexpectedly high prevalence of shared clonotypes in B cell repertoires, and identification of the sequences of these shared clonotypes, should enable better understanding of the role of B cell immune repertoires in health and disease.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Data availability

Sequencing data for HIP and CORD datasets have been deposited in the NCBI Sequence Read Archive under project number PRJNA511481. FASTA files for Adaptive Biotechnologies datasets used for analyses are available from https://github.com/crowelab/PyIR. Any other relevant data are available from the corresponding author upon reasonable request.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Ezkurdia, I. et al. Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes. Hum. Mol. Genet. 23, 5866–5878 (2014).

  2. 2.

    Zalocusky, K. A. et al. The 10,000 immunomes project: building a resource for human immunology. Cell Rep. 25, 513–522 (2018).

  3. 3.

    Ye, J., Ma, N., Madden, T. L. & Ostell, J. M. IgBLAST: an immunoglobulin variable domain sequence analysis tool. Nucleic Acids Res. 41, W34–W40 (2013).

  4. 4.

    Hsieh, T. C., Ma, K. H. & Chao, A. iNEXT: an R package for rarefaction and extrapolation of species diversity (Hill numbers). Methods Ecol. Evol. 7, 1451–1456 (2016).

  5. 5.

    Kaplinsky, J. & Arnaout, R. Robust estimates of overall immune-repertoire diversity from high-throughput measurements on samples. Nat. Commun. 7, 11881 (2016).

  6. 6.

    Trepel, F. Number and distribution of lymphocytes in man. A critical analysis. Klin. Wochenschr. 52, 511–515 (1974).

  7. 7.

    DeWitt, W. S. et al. A public database of memory and naive B-cell receptor sequences. PLoS ONE 11, e0160853 (2016).

  8. 8.

    Arnaout, R. et al. High-resolution description of antibody heavy-chain repertoires in humans. PLoS ONE 6, e22365 (2011).

  9. 9.

    Boyd, S. D. et al. Measurement and clinical monitoring of human lymphocyte clonality by massively parallel V-D-J pyrosequencing. Sci. Transl. Med. 1, 12ra23 (2009).

  10. 10.

    Correia, B. E. et al. Proof of principle for epitope-focused vaccine design. Nature 507, 201–206 (2014).

  11. 11.

    Jardine, J. G. et al. HIV-1 broadly neutralizing antibody precursor B cells revealed by germline-targeting immunogen. Science 351, 1458–1463 (2016).

  12. 12.

    Briney, B. et al. Tailored immunogens direct affinity maturation toward HIV neutralizing antibodies. Cell 166, 1459–1470 (2016).

  13. 13.

    Crowe, J. E. Jr. Principles of broad and potent antiviral human antibodies: insights for vaccine design. Cell Host Microbe 22, 193–206 (2017).

  14. 14.

    Krause, J. C. et al. Epitope-specific human influenza antibody repertoires diversify by B cell intraclonal sequence divergence and interclonal convergence. J. Immunol. 187, 3704–3711 (2011).

  15. 15.

    Xu, R. et al. A recurring motif for antibody recognition of the receptor-binding site of influenza hemagglutinin. Nat. Struct. Mol. Biol. 20, 363–370 (2013).

  16. 16.

    de Bourcy, C. F. A., Dekker, C. L., Davis, M. M., Nicolls, M. R. & Quake, S. R. Dynamics of the human antibody repertoire after B cell depletion in systemic sclerosis. Sci. Immunol. 2, eaan8289 (2017).

  17. 17.

    Pederson, T. The immunome. Mol. Immunol. 36, 1127–1128 (1999).

  18. 18.

    Briney, B. S., Willis, J. R., Finn, J. A., McKinney, B. A. & Crowe, J. E. Jr. Tissue-specific expressed antibody variable gene repertoires. PLoS ONE 9, e100839 (2014).

  19. 19.

    DeKosky, B. J. et al. High-throughput sequencing of the paired human immunoglobulin heavy and light chain repertoire. Nat. Biotechnol. 31, 166–169 (2013).

  20. 20.

    DeKosky, B. J. et al. In-depth determination and analysis of the human paired heavy- and light-chain antibody repertoire. Nat. Med. 21, 86–91 (2015).

  21. 21.

    Crooks, G. E., Hon, G., Chandonia, J. M. & Brenner, S. E. WebLogo: a sequence logo generator. Genome Res. 14, 1188–1190 (2004).

  22. 22.

    Diss, T. C., Liu, H. X., Du, M. Q. & Isaacson, P. G. Improvements to B cell clonality analysis using PCR amplification of immunoglobulin light chain genes. Mol. Pathol. 55, 98–101 (2002).

  23. 23.

    Smith, K. et al. Rapid generation of fully human monoclonal antibodies specific to a vaccinating antigen. Nat. Protoc. 4, 372–384 (2009).

  24. 24.

    van Dongen, J. J. et al. Design and standardization of PCR primers and protocols for detection of clonal immunoglobulin and T-cell receptor gene recombinations in suspect lymphoproliferations: report of the BIOMED-2 Concerted Action BMH4-CT98-3936. Leukemia 17, 2257–2317 (2003).

  25. 25.

    Khan, T. A. et al. Accurate and predictive antibody repertoire profiling by molecular amplification fingerprinting. Sci. Adv. 2, e1501371 (2016).

  26. 26.

    Andrews, S. FastQC: a quality control tool for high throughput sequence data. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.

  27. 27.

    Edgar, R. C. & Flyvbjerg, H. Error filtering, pair assembly and error correction for next-generation sequencing reads. Bioinformatics 31, 3476–3482 (2015).

  28. 28.

    Roehr, J. T., Dieterich, C. & Reinert, K. Flexbar 3.0 – SIMD and multicore parallelization. Bioinformatics 33, 2941–2942 (2017).

  29. 29.

    Rognes, T., Flouri, T., Nichols, B., Quince, C. & Mahé, F. VSEARCH: a versatile open source tool for metagenomics. PeerJ 4, e2584 (2016).

Download references


We thank M. Mayo and A. Pruijssers for regulatory and human subjects support; G. Sapparapu and O. Koues for technical help; Y. Umareddy for assistance with R; S. B. Day for assistance with artwork; scientists at the VANTAGE core of Vanderbilt University Medical Center (VUMC), Adaptive Biotechnologies, the Genomic Services Laboratory at the Hudson Alpha Institute for Biotechnology, and D. Zhang and team at Abhelix; New England BioLabs for early access to pre-release Abseq reagents; K. Trochez and J. Janssen of the Clinical Trials Center at VUMC and staff and physicians of the Vanderbilt University Medical Center leukapheresis clinic for assistance with large-scale human cell collections; and S. Mallal and M. Pilkinton (Vanderbilt), R. Scheuermann (JCVI), and W. Koff, T. Schenkelberg and the Advisory Board of the Human Vaccines Project for helpful discussions. This work was conducted in part using the resources of the Advanced Computing Center for Research and Education (ACCRE) at Vanderbilt University and the San Diego Supercomputer Center at the University of California, San Diego. We acknowledge the use of cord blood cells procured by the National Disease Research Interchange (NDRI) with support from NIH grant U42 OD11158. This work was supported by a grant from the Human Vaccines Project, and institutional funding from Vanderbilt University Medical Center.

Reviewer information

Nature thanks R. Arnaout, F. Breden, A. McHardy and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Author notes

  1. These authors contributed equally: Cinque Soto, Robin G. Bombardi


  1. The Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, TN, USA

    • Cinque Soto
    • , Robin G. Bombardi
    • , Andre Branchizio
    • , Nurgun Kose
    • , Pranathi Matta
    • , Pavlo Gilchuk
    •  & James E. Crowe Jr
  2. Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA

    • Cinque Soto
    •  & James E. Crowe Jr
  3. Chemical and Physical Biology Program, Vanderbilt University, Nashville, TN, USA

    • Alexander M. Sevy
    •  & Jessica A. Finn
  4. San Diego Supercomputer Center, University of California, San Diego, San Diego, CA, USA

    • Robert S. Sinkovits
  5. Department of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN, USA

    • James E. Crowe Jr


  1. Search for Cinque Soto in:

  2. Search for Robin G. Bombardi in:

  3. Search for Andre Branchizio in:

  4. Search for Nurgun Kose in:

  5. Search for Pranathi Matta in:

  6. Search for Alexander M. Sevy in:

  7. Search for Robert S. Sinkovits in:

  8. Search for Pavlo Gilchuk in:

  9. Search for Jessica A. Finn in:

  10. Search for James E. Crowe Jr in:


R.G.B., C.S. and J.E.C. planned the studies. C.S., R.G.B., A.B., R.S.S., N.K., P.M., P.G., J.A.F. and A.M.S. conducted experiments. R.G.B., C.S., A.B., R.S.S., A.M.S. and J.E.C. interpreted the studies. C.S., R.G.B. and J.E.C. wrote the first draft of the paper. All authors reviewed, edited and approved the paper. J.E.C. obtained funding.

Competing interests

J.E.C. has served as a consultant for Sanofi and Pfizer, is on the Scientific Advisory Boards of CompuVax and Meissa Vaccines, is a recipient of research grants from Takeda, Sanofi and Moderna, and is founder of IDBiologics. All other authors declare no conflicts of interest.

Corresponding author

Correspondence to James E. Crowe Jr.

Extended data figures and tables

  1. Extended Data Fig. 1 Repertoire properties for immunoglobulin V3J clonotype data belonging to HIP1–HIP3.

    a, Normalized frequency histogram of HCDR3 sequence lengths belonging to immunoglobulin heavy-chain V3J clonotypes for HIP1 (left, n = 8,623,076 unique HCDR3s, with a median length of 16 amino acids), HIP2 (middle, n = 15,413,214 unique HCDR3s, with a median length of 16 amino acids) and HIP3 (right, n = 7,081,314 unique HCDR3s, with a median length of 15 amino acids). b, Normalized frequency histogram of germline divergence values for HIP1 (left), HIP2 (middle) and HIP3 (right). Germline divergence was defined as 100 per cent minus the per cent nucleotide identity that a read had with its closest matching germline variable (V) gene sequence. Median per cent germline divergence values for HIP1, HIP2 and HIP3 were 3, 0 and 2, respectively. c, Normalized frequency histogram of germline divergence values by isotype for HIP1 (left), HIP2 (middle) and HIP3 (right). The median germline divergence was 0 for all IgM datasets. All isotype data were obtained from the AbHelix sequencing method. d, Heat map representation of unique VH + JH recombinations in HIP1, HIP2 and HIP3. The data from each set were transformed to obtain z-scores, using the mean and s.d. In this figure, the IGH prefix is omitted from the gene symbols for V and J genes. Source data

  2. Extended Data Fig. 2 Extent of sharing between immunoglobulin clonotypes belonging to HIP1–HIP3.

    a, Normalized frequency histogram of HCDR3 sequence lengths belonging to V3J clonotypes from HIP1+2+3all (blue filled curve, n = 30,156,947 unique HCDR3s, with a median length of 16 amino acids) and HIP1+2+3shared (grey bins, n = 22,934 unique HCDR3s, with a median length of 13 amino acids). Medians were statistically different, based on a two-tailed Mann–Whitney U-test with a P < 2.2 × 10−16 (at an α = 0.05). b, Normalized frequency histogram of HCDR3 lengths belonging to all V3DJ clonotypes from HIP1 (n = 1,750,325 unique HCDR3s, with a median length of 19 amino acids), HIP2 (n = 3,889,527 unique HCDR3s, with a median length of 19 amino acids) and HIP3 (n = 1,437,339 unique HCDR3s, with a median length of 19 amino acids). c, Cumulative distribution of normalized VDJ triple frequencies used for simulation. HIP1, n = 4,371 unique VDJ triples; HIP2, n = 4,346 unique VDJ triples; and HIP3, n = 4,370 unique VDJ triples. d, log–log frequency plot between experimental and synthetic HCDR3 lengths. The Pearson correlation coefficient r = 1.00 with a P < 2.2 × 10−16 (at an α = 0.05) (n = 26 CDR3 length bins for each set). e, Normalized frequency histogram of V3DJ overlap counts between all three synthetic HIP distributions (n = 3,641 common clonotypes between sequenced repertoires). f, V3J clonotypes with the largest numbers of somatic variants. Numbers in parentheses denote counts for the number of unique somatic variants associated with a V3J clonotype for HIP1, HIP2 and HIP3. g, Percentage overlaps for the Igκ V3J clonotypes from the experimentally determined repertoires belonging to HIP1–HIP3. h, Percentage overlaps for Igλ V3J clonotypes from the experimentally determined repertoires belonging to HIP1–HIP3. Source data

  3. Extended Data Fig. 3 Shared immunoglobulin heavy-chain clonotypes for three cord blood samples.

    a, V3DJ clonotype overlaps from three cord blood samples, CORD1 (n = 40,480 unique V3DJ clonotypes), CORD2 (n = 66,718 unique V3DJ clonotypes) and CORD3 (n = 105,555 unique V3DJ clonotypes). b, Cumulative distribution of normalized VDJ triple frequencies for CORD1 (n = 2,273 unique VDJ triples), CORD2 (n = 2,788 unique VDJ triples) and CORD3 (n = 3,002 unique VDJ triples). c, log–log frequency plot between experimental and synthetic CDR3 lengths. The Pearson correlation coefficient r = 1.00 with a P < 2.2 × 10−16 (at an α = 0.05) (n = 21 bins for each set). It should be noted that there were no V3DJ clonotypes with HCDR3s that were less than eight amino acids in length. d, Normalized frequency histogram of V3DJ overlap counts between all three synthetic CORD distributions (n = 45 common clonotypes between all three sequenced repertoires). e, V3J clonotypes identified in HIP1, HIP2 and HIP3 (HIP1+2+3all) were combined with an independently derived set of immunoglobulin heavy-chain V3J clonotypes for which sequences were publicly available7. Starting from the combined set of 59,193,994 clonotypes from six adult immunoglobulin heavy-chain repertoires, each of the three cord blood sets was scanned in a serial fashion, and only the common clonotypes were kept. A total of 130 shared V3J clonotypes was identified. Source data

  4. Extended Data Fig. 4 Schematic showing bioinformatics sequence processing.

    The flow chart shows how a typical sequencing run using paired-ends reads from Illumina was processed using the bioinformatics pipeline. Detailed descriptions for each of the programs used in the pipeline can be found in Supplementary Methods.

  5. Extended Data Fig. 5 Schematic showing placement of primers.

    Annotated example of a biological sequence obtained from the two-step barcoded library preparation protocol. The red and yellow regions show the placement of the first and second steps of PCR amplification. The cyan region shows the location of the RID-tagged reverse transcription gene-specific primer.

  6. Extended Data Table 1 Research subject demographics
  7. Extended Data Table 2 Summary of sequencing methods and cell counts
  8. Extended Data Table 3 One-step RT–PCR primers used in this study
  9. Extended Data Table 4 Two-step RT–PCR primers used in this study

Supplementary information

  1. Supplementary Information

    This file contains Supplementary Methods and References

  2. Reporting Summary

Source data

About this article

Publication history




Issue Date




By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.