Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

High frequency of shared clonotypes in human B cell receptor repertoires

Abstract

The human genome contains approximately 20 thousand protein-coding genes1, but the size of the collection of antigen receptors of the adaptive immune system that is generated by the recombination of gene segments with non-templated junctional additions (on B cells) is unknown—although it is certainly orders of magnitude larger. It has not been established whether individuals possess unique (or private) repertoires or substantial components of shared (or public) repertoires. Here we sequence recombined and expressed B cell receptor genes in several individuals to determine the size of their B cell receptor repertoires, and the extent to which these are shared between individuals. Our experiments revealed that the circulating repertoire of each individual contained between 9 and 17 million B cell clonotypes. The three individuals that we studied shared many clonotypes, including between 1 and 6% of B cell heavy-chain clonotypes shared between two subjects (0.3% of clonotypes shared by all three) and 20 to 34% of λ or κ light chains shared between two subjects (16 or 22% of λ or κ light chains, respectively, were shared by all three). Some of the B cell clonotypes had thousands of clones, or somatic variants, within the clonotype lineage. Although some of these shared lineages might be driven by exposure to common antigens, previous exposure to foreign antigens was not the only force that shaped the shared repertoires, as we also identified shared clonotypes in umbilical cord blood samples and all adult repertoires. The unexpectedly high prevalence of shared clonotypes in B cell repertoires, and identification of the sequences of these shared clonotypes, should enable better understanding of the role of B cell immune repertoires in health and disease.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Estimates of the diversity of V3J clonotypes from three healthy adult subjects.
Fig. 2: Shared clonotypes between three healthy adult subjects.
Fig. 3: Occurrence of public V3J clonotypes that are shared in adult and cord blood repertoires.

Similar content being viewed by others

Data availability

Sequencing data for HIP and CORD datasets have been deposited in the NCBI Sequence Read Archive under project number PRJNA511481. FASTA files for Adaptive Biotechnologies datasets used for analyses are available from https://github.com/crowelab/PyIR. Any other relevant data are available from the corresponding author upon reasonable request.

References

  1. Ezkurdia, I. et al. Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes. Hum. Mol. Genet. 23, 5866–5878 (2014).

    Article  CAS  Google Scholar 

  2. Zalocusky, K. A. et al. The 10,000 immunomes project: building a resource for human immunology. Cell Rep. 25, 513–522 (2018).

    Article  CAS  Google Scholar 

  3. Ye, J., Ma, N., Madden, T. L. & Ostell, J. M. IgBLAST: an immunoglobulin variable domain sequence analysis tool. Nucleic Acids Res. 41, W34–W40 (2013).

    Article  Google Scholar 

  4. Hsieh, T. C., Ma, K. H. & Chao, A. iNEXT: an R package for rarefaction and extrapolation of species diversity (Hill numbers). Methods Ecol. Evol. 7, 1451–1456 (2016).

    Article  Google Scholar 

  5. Kaplinsky, J. & Arnaout, R. Robust estimates of overall immune-repertoire diversity from high-throughput measurements on samples. Nat. Commun. 7, 11881 (2016).

    Article  ADS  CAS  Google Scholar 

  6. Trepel, F. Number and distribution of lymphocytes in man. A critical analysis. Klin. Wochenschr. 52, 511–515 (1974).

    Article  CAS  Google Scholar 

  7. DeWitt, W. S. et al. A public database of memory and naive B-cell receptor sequences. PLoS ONE 11, e0160853 (2016).

    Article  Google Scholar 

  8. Arnaout, R. et al. High-resolution description of antibody heavy-chain repertoires in humans. PLoS ONE 6, e22365 (2011).

    Article  ADS  CAS  Google Scholar 

  9. Boyd, S. D. et al. Measurement and clinical monitoring of human lymphocyte clonality by massively parallel V-D-J pyrosequencing. Sci. Transl. Med. 1, 12ra23 (2009).

    Article  Google Scholar 

  10. Correia, B. E. et al. Proof of principle for epitope-focused vaccine design. Nature 507, 201–206 (2014).

    Article  ADS  CAS  Google Scholar 

  11. Jardine, J. G. et al. HIV-1 broadly neutralizing antibody precursor B cells revealed by germline-targeting immunogen. Science 351, 1458–1463 (2016).

    Article  ADS  CAS  Google Scholar 

  12. Briney, B. et al. Tailored immunogens direct affinity maturation toward HIV neutralizing antibodies. Cell 166, 1459–1470 (2016).

    Article  CAS  Google Scholar 

  13. Crowe, J. E. Jr. Principles of broad and potent antiviral human antibodies: insights for vaccine design. Cell Host Microbe 22, 193–206 (2017).

    Article  CAS  Google Scholar 

  14. Krause, J. C. et al. Epitope-specific human influenza antibody repertoires diversify by B cell intraclonal sequence divergence and interclonal convergence. J. Immunol. 187, 3704–3711 (2011).

    Article  CAS  Google Scholar 

  15. Xu, R. et al. A recurring motif for antibody recognition of the receptor-binding site of influenza hemagglutinin. Nat. Struct. Mol. Biol. 20, 363–370 (2013).

    Article  CAS  Google Scholar 

  16. de Bourcy, C. F. A., Dekker, C. L., Davis, M. M., Nicolls, M. R. & Quake, S. R. Dynamics of the human antibody repertoire after B cell depletion in systemic sclerosis. Sci. Immunol. 2, eaan8289 (2017).

    Article  Google Scholar 

  17. Pederson, T. The immunome. Mol. Immunol. 36, 1127–1128 (1999).

    Article  CAS  Google Scholar 

  18. Briney, B. S., Willis, J. R., Finn, J. A., McKinney, B. A. & Crowe, J. E. Jr. Tissue-specific expressed antibody variable gene repertoires. PLoS ONE 9, e100839 (2014).

    Article  ADS  Google Scholar 

  19. DeKosky, B. J. et al. High-throughput sequencing of the paired human immunoglobulin heavy and light chain repertoire. Nat. Biotechnol. 31, 166–169 (2013).

    Article  CAS  Google Scholar 

  20. DeKosky, B. J. et al. In-depth determination and analysis of the human paired heavy- and light-chain antibody repertoire. Nat. Med. 21, 86–91 (2015).

    Article  CAS  Google Scholar 

  21. Crooks, G. E., Hon, G., Chandonia, J. M. & Brenner, S. E. WebLogo: a sequence logo generator. Genome Res. 14, 1188–1190 (2004).

    Article  CAS  Google Scholar 

  22. Diss, T. C., Liu, H. X., Du, M. Q. & Isaacson, P. G. Improvements to B cell clonality analysis using PCR amplification of immunoglobulin light chain genes. Mol. Pathol. 55, 98–101 (2002).

    Article  CAS  Google Scholar 

  23. Smith, K. et al. Rapid generation of fully human monoclonal antibodies specific to a vaccinating antigen. Nat. Protoc. 4, 372–384 (2009).

    Article  CAS  Google Scholar 

  24. van Dongen, J. J. et al. Design and standardization of PCR primers and protocols for detection of clonal immunoglobulin and T-cell receptor gene recombinations in suspect lymphoproliferations: report of the BIOMED-2 Concerted Action BMH4-CT98-3936. Leukemia 17, 2257–2317 (2003).

    Article  Google Scholar 

  25. Khan, T. A. et al. Accurate and predictive antibody repertoire profiling by molecular amplification fingerprinting. Sci. Adv. 2, e1501371 (2016).

    Article  ADS  Google Scholar 

  26. Andrews, S. FastQC: a quality control tool for high throughput sequence data. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.

  27. Edgar, R. C. & Flyvbjerg, H. Error filtering, pair assembly and error correction for next-generation sequencing reads. Bioinformatics 31, 3476–3482 (2015).

    Article  CAS  Google Scholar 

  28. Roehr, J. T., Dieterich, C. & Reinert, K. Flexbar 3.0 – SIMD and multicore parallelization. Bioinformatics 33, 2941–2942 (2017).

    Article  CAS  Google Scholar 

  29. Rognes, T., Flouri, T., Nichols, B., Quince, C. & Mahé, F. VSEARCH: a versatile open source tool for metagenomics. PeerJ 4, e2584 (2016).

    Article  Google Scholar 

Download references

Acknowledgements

We thank M. Mayo and A. Pruijssers for regulatory and human subjects support; G. Sapparapu and O. Koues for technical help; Y. Umareddy for assistance with R; S. B. Day for assistance with artwork; scientists at the VANTAGE core of Vanderbilt University Medical Center (VUMC), Adaptive Biotechnologies, the Genomic Services Laboratory at the Hudson Alpha Institute for Biotechnology, and D. Zhang and team at Abhelix; New England BioLabs for early access to pre-release Abseq reagents; K. Trochez and J. Janssen of the Clinical Trials Center at VUMC and staff and physicians of the Vanderbilt University Medical Center leukapheresis clinic for assistance with large-scale human cell collections; and S. Mallal and M. Pilkinton (Vanderbilt), R. Scheuermann (JCVI), and W. Koff, T. Schenkelberg and the Advisory Board of the Human Vaccines Project for helpful discussions. This work was conducted in part using the resources of the Advanced Computing Center for Research and Education (ACCRE) at Vanderbilt University and the San Diego Supercomputer Center at the University of California, San Diego. We acknowledge the use of cord blood cells procured by the National Disease Research Interchange (NDRI) with support from NIH grant U42 OD11158. This work was supported by a grant from the Human Vaccines Project, and institutional funding from Vanderbilt University Medical Center.

Reviewer information

Nature thanks R. Arnaout, F. Breden, A. McHardy and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Authors and Affiliations

Authors

Contributions

R.G.B., C.S. and J.E.C. planned the studies. C.S., R.G.B., A.B., R.S.S., N.K., P.M., P.G., J.A.F. and A.M.S. conducted experiments. R.G.B., C.S., A.B., R.S.S., A.M.S. and J.E.C. interpreted the studies. C.S., R.G.B. and J.E.C. wrote the first draft of the paper. All authors reviewed, edited and approved the paper. J.E.C. obtained funding.

Corresponding author

Correspondence to James E. Crowe Jr.

Ethics declarations

Competing interests

J.E.C. has served as a consultant for Sanofi and Pfizer, is on the Scientific Advisory Boards of CompuVax and Meissa Vaccines, is a recipient of research grants from Takeda, Sanofi and Moderna, and is founder of IDBiologics. All other authors declare no conflicts of interest.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Repertoire properties for immunoglobulin V3J clonotype data belonging to HIP1–HIP3.

a, Normalized frequency histogram of HCDR3 sequence lengths belonging to immunoglobulin heavy-chain V3J clonotypes for HIP1 (left, n = 8,623,076 unique HCDR3s, with a median length of 16 amino acids), HIP2 (middle, n = 15,413,214 unique HCDR3s, with a median length of 16 amino acids) and HIP3 (right, n = 7,081,314 unique HCDR3s, with a median length of 15 amino acids). b, Normalized frequency histogram of germline divergence values for HIP1 (left), HIP2 (middle) and HIP3 (right). Germline divergence was defined as 100 per cent minus the per cent nucleotide identity that a read had with its closest matching germline variable (V) gene sequence. Median per cent germline divergence values for HIP1, HIP2 and HIP3 were 3, 0 and 2, respectively. c, Normalized frequency histogram of germline divergence values by isotype for HIP1 (left), HIP2 (middle) and HIP3 (right). The median germline divergence was 0 for all IgM datasets. All isotype data were obtained from the AbHelix sequencing method. d, Heat map representation of unique VH + JH recombinations in HIP1, HIP2 and HIP3. The data from each set were transformed to obtain z-scores, using the mean and s.d. In this figure, the IGH prefix is omitted from the gene symbols for V and J genes.

Source data

Extended Data Fig. 2 Extent of sharing between immunoglobulin clonotypes belonging to HIP1–HIP3.

a, Normalized frequency histogram of HCDR3 sequence lengths belonging to V3J clonotypes from HIP1+2+3all (blue filled curve, n = 30,156,947 unique HCDR3s, with a median length of 16 amino acids) and HIP1+2+3shared (grey bins, n = 22,934 unique HCDR3s, with a median length of 13 amino acids). Medians were statistically different, based on a two-tailed Mann–Whitney U-test with a P < 2.2 × 10−16 (at an α = 0.05). b, Normalized frequency histogram of HCDR3 lengths belonging to all V3DJ clonotypes from HIP1 (n = 1,750,325 unique HCDR3s, with a median length of 19 amino acids), HIP2 (n = 3,889,527 unique HCDR3s, with a median length of 19 amino acids) and HIP3 (n = 1,437,339 unique HCDR3s, with a median length of 19 amino acids). c, Cumulative distribution of normalized VDJ triple frequencies used for simulation. HIP1, n = 4,371 unique VDJ triples; HIP2, n = 4,346 unique VDJ triples; and HIP3, n = 4,370 unique VDJ triples. d, log–log frequency plot between experimental and synthetic HCDR3 lengths. The Pearson correlation coefficient r = 1.00 with a P < 2.2 × 10−16 (at an α = 0.05) (n = 26 CDR3 length bins for each set). e, Normalized frequency histogram of V3DJ overlap counts between all three synthetic HIP distributions (n = 3,641 common clonotypes between sequenced repertoires). f, V3J clonotypes with the largest numbers of somatic variants. Numbers in parentheses denote counts for the number of unique somatic variants associated with a V3J clonotype for HIP1, HIP2 and HIP3. g, Percentage overlaps for the Igκ V3J clonotypes from the experimentally determined repertoires belonging to HIP1–HIP3. h, Percentage overlaps for Igλ V3J clonotypes from the experimentally determined repertoires belonging to HIP1–HIP3.

Source data

Extended Data Fig. 3 Shared immunoglobulin heavy-chain clonotypes for three cord blood samples.

a, V3DJ clonotype overlaps from three cord blood samples, CORD1 (n = 40,480 unique V3DJ clonotypes), CORD2 (n = 66,718 unique V3DJ clonotypes) and CORD3 (n = 105,555 unique V3DJ clonotypes). b, Cumulative distribution of normalized VDJ triple frequencies for CORD1 (n = 2,273 unique VDJ triples), CORD2 (n = 2,788 unique VDJ triples) and CORD3 (n = 3,002 unique VDJ triples). c, log–log frequency plot between experimental and synthetic CDR3 lengths. The Pearson correlation coefficient r = 1.00 with a P < 2.2 × 10−16 (at an α = 0.05) (n = 21 bins for each set). It should be noted that there were no V3DJ clonotypes with HCDR3s that were less than eight amino acids in length. d, Normalized frequency histogram of V3DJ overlap counts between all three synthetic CORD distributions (n = 45 common clonotypes between all three sequenced repertoires). e, V3J clonotypes identified in HIP1, HIP2 and HIP3 (HIP1+2+3all) were combined with an independently derived set of immunoglobulin heavy-chain V3J clonotypes for which sequences were publicly available7. Starting from the combined set of 59,193,994 clonotypes from six adult immunoglobulin heavy-chain repertoires, each of the three cord blood sets was scanned in a serial fashion, and only the common clonotypes were kept. A total of 130 shared V3J clonotypes was identified.

Source data

Extended Data Fig. 4 Schematic showing bioinformatics sequence processing.

The flow chart shows how a typical sequencing run using paired-ends reads from Illumina was processed using the bioinformatics pipeline. Detailed descriptions for each of the programs used in the pipeline can be found in Supplementary Methods.

Extended Data Fig. 5 Schematic showing placement of primers.

Annotated example of a biological sequence obtained from the two-step barcoded library preparation protocol. The red and yellow regions show the placement of the first and second steps of PCR amplification. The cyan region shows the location of the RID-tagged reverse transcription gene-specific primer.

Extended Data Table 1 Research subject demographics
Extended Data Table 2 Summary of sequencing methods and cell counts
Extended Data Table 3 One-step RT–PCR primers used in this study
Extended Data Table 4 Two-step RT–PCR primers used in this study

Supplementary information

Supplementary Information

This file contains Supplementary Methods and References

Reporting Summary

Source data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Soto, C., Bombardi, R.G., Branchizio, A. et al. High frequency of shared clonotypes in human B cell receptor repertoires. Nature 566, 398–402 (2019). https://doi.org/10.1038/s41586-019-0934-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41586-019-0934-8

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing