Letter | Published:

Commonality despite exceptional diversity in the baseline human antibody repertoire


In principle, humans can produce an antibody response to any non-self-antigen molecule in the appropriate context. This flexibility is achieved by the presence of a large repertoire of naive antibodies, the diversity of which is expanded by somatic hypermutation following antigen exposure1. The diversity of the naive antibody repertoire in humans is estimated to be at least 1012 unique antibodies2. Because the number of peripheral blood B cells in a healthy adult human is on the order of 5 × 109, the circulating B cell population samples only a small fraction of this diversity. Full-scale analyses of human antibody repertoires have been prohibitively difficult, primarily owing to their massive size. The amount of information encoded by all of the rearranged antibody and T cell receptor genes in one person—the ‘genome’ of the adaptive immune system—exceeds the size of the human genome by more than four orders of magnitude. Furthermore, because much of the B lymphocyte population is localized in organs or tissues that cannot be comprehensively sampled from living subjects, human repertoire studies have focused on circulating B cells3. Here we examine the circulating B cell populations of ten human subjects and present what is, to our knowledge, the largest single collection of adaptive immune receptor sequences described to date, comprising almost 3 billion antibody heavy-chain sequences. This dataset enables genetic study of the baseline human antibody repertoire at an unprecedented depth and granularity, which reveals largely unique repertoires for each individual studied, a subpopulation of universally shared antibody clonotypes, and an exceptional overall diversity of the antibody repertoire.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Data availability

Sequence data that support the findings in this study are available at the NCBI Sequencing Read Archive (www.ncbi.nlm.nih.gov/sra) under BioProject number PRJNA406949. Raw and processed datasets are available at www.github.com/briney/grp_paper.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Rajewsky, K. Clonal selection and learning in the antibody system. Nature 381, 751–758 (1996).

  2. 2.

    Alberts, B. et al. The Generation of Antibody Diversity (Garland Science, New York, 2002).

  3. 3.

    Boyd, S. D. & Crowe, J. E. Jr. Deep sequencing and human antibody repertoire analysis. Curr. Opin. Immunol. 40, 103–109 (2016).

  4. 4.

    Briney, B. & Burton, D. Massively scalable genetic analysis of antibody repertoires. Preprint at https://www.biorxiv.org/content/early/2018/10/19/447813 (2018).

  5. 5.

    Briney, B., Le, K., Zhu, J. & Burton, D. R. Clonify: unseeded antibody lineage assignment from next-generation sequencing data. Sci. Rep. 6, 23901 (2016).

  6. 6.

    Morbach, H., Eichhorn, E. M., Liese, J. G. & Girschick, H. J. Reference values for B cell subpopulations from infancy to adulthood. Clin. Exp. Immunol. 162, 271–279 (2010).

  7. 7.

    Morisita, M. Measuring of the dispersion of individuals and analysis of the distributional patterns. Mem. Fac. Sci. Kyushu Univ. Ser. E 2, 5–235 (1959).

  8. 8.

    Horn, H. S. Measurement of ‘overlap’ in comparative ecological studies. Am. Nat. 100, 419–424 (1966).

  9. 9.

    Setliff, I. et al. Multi-donor longitudinal antibody repertoire sequencing reveals the existence of public antibody clonotypes in HIV-1 infection. Cell Host Microbe 23, 845–854 (2018).

  10. 10.

    Chao, A. Estimating the population size for capture–recapture data with unequal catchability. Biometrics 43, 783–791 (1987).

  11. 11.

    Kaplinsky, J. & Arnaout, R. Robust estimates of overall immune-repertoire diversity from high-throughput measurements on samples. Nat. Commun. 7, 11881 (2016).

  12. 12.

    Chao, A. & Chiu, C.-H. Nonparametric Estimation and Comparison of Species Richness https://doi.org/10.1002/9780470015902.a0026329 (John Wiley & Sons, 2016).

  13. 13.

    Eren, M. I., Chao, A., Hwang, W.-H. & Colwell, R. K. Estimating the richness of a population when the maximum number of classes is fixed: a nonparametric solution to an archaeological problem. PLoS ONE 7, e34179 (2012).

  14. 14.

    DeKosky, B. J. et al. In-depth determination and analysis of the human paired heavy- and light-chain antibody repertoire. Nat. Med. 21, 86–91 (2015).

  15. 15.

    Arnaout, R. et al. High-resolution description of antibody heavy-chain repertoires in humans. PLoS ONE 6, e22365 (2011).

  16. 16.

    Marcou, Q., Mora, T. & Walczak, A. M. High-throughput immune repertoire analysis with IGoR. Nat. Commun. 9, 561 (2018).

  17. 17.

    Morea, V., Tramontano, A., Rustici, M., Chothia, C. & Lesk, A. M. Conformations of the third hypervariable region in the VH domain of immunoglobulins. J. Mol. Biol. 275, 269–294 (1998).

  18. 18.

    Finn, J. A. et al. Improving loop modeling of the antibody complementarity-determining region 3 using knowledge-based restraints. PLoS ONE 11, e0154811 (2016).

  19. 19.

    Briney, B. S., Willis, J. R., Finn, J. A., McKinney, B. A. & Crowe, J. E. Jr. Tissue-specific expressed antibody variable gene repertoires. PLoS ONE 9, e100839 (2014).

  20. 20.

    van Dongen, J. J. M. et al. Design and standardization of PCR primers and protocols for detection of clonal immunoglobulin and T-cell receptor gene recombinations in suspect lymphoproliferations: report of the BIOMED-2 Concerted Action BMH4-CT98-3936. Leukemia 17, 2257–2317 (2003).

  21. 21.

    Masella, A. P., Bartram, A. K., Truszkowski, J. M., Brown, D. G. & Neufeld, J. D. PANDAseq: paired-end assembler for Illumina sequences. BMC Bioinformatics 13, 31 (2012).

  22. 22.

    Meyerhans, A., Vartanian, J. P. & Wain-Hobson, S. DNA recombination during PCR. Nucleic Acids Res. 18, 1687–1691 (1990).

  23. 23.

    Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10–12 (2011).

  24. 24.

    Rogers, T. F. et al. Zika virus activates de novo and cross-reactive memory B cell responses in dengue-experienced donors. Sci. Immunol. 2, eaan6809 (2017).

Download references


The authors thank all of the study subjects for their participation and the Genomic Services Laboratory at the HudsonAlpha Institute for Biotechnology for their sequencing expertise. This work was supported by the National Institute of Allergy and Infectious Diseases (Center for HIV/AIDS Vaccine Immunology and Immunogen Discovery, UM1AI100663 (D.R.B.); Center for Viral Systems Biology, U19AI135995 (B.B.)), the International AIDS Vaccine Initiative (IAVI) through the Neutralizing Antibody Consortium SFP1849 (D.R.B.), and the Ragon Institute of MGH, MIT and Harvard (D.R.B.).

Author information

B.B. and D.R.B. planned and designed the experiments. B.B., A.I. and C.J. performed experiments. B.B. analysed data. B.B. and D.R.B. wrote the manuscript. All authors contributed to manuscript revisions.

Competing interests

The authors declare no competing interests.

Correspondence to Bryan Briney or Dennis R. Burton.

Extended data figures and tables

Extended Data Fig. 1 Nearly full-length antibody gene amplification from biological and technical replicate samples.

a, Schematic of biological and technical replicate samples. Biological replicates (columns) are derived from distinct cell aliquots, so identical clonotypes or sequences found in multiple biological replicates must arise from different cells. Technical replicates (rows) were amplified using discrete RNA aliquots from a single-cell aliquot. b, Strategy for nearly full-length antibody heavy chains. Black arrows indicate primers. Primers in the cDNA synthesis step anneal to the heavy-chain constant region (CH) and add the first unique molecular identifier (UMI) and the Illumina read 1 primer annealing site. Primers in the second-strand synthesis step anneal to the framework 1 region of the variable gene and add a second UMI and the Illumina read 2 primer annealing site.

Extended Data Fig. 2 V and J frequency correlations of technical and biological replicates.

For each subject, the frequency of V and J combinations was compared for technical replicates (left panels) or biological replicates (right panels). The coefficient of determination (r2) is shown for each plot.

Extended Data Fig. 3 Nucleotide mutation frequencies.

a, The distribution of nucleotide mutations in sequences that encode IgM are shown. On the right, the number of unmutated sequences containing no mutations in the variable-gene segment is also plotted. b, The distribution of nucleotide mutations in sequences that encode IgG are shown. On the right, the mean mutation frequency for the IgG population of each subject is shown. Each line represents a single subject. For legibility, the legend is split between the two plots. Although only five subjects are shown in the legend of each plot, data from all ten subjects is present in each plot.

Extended Data Fig. 4 Cross-subject repertoire similarity.

Pairwise Morisita–Horn similarity comparisons between each subject and all other subjects. Similarity was computed using the frequency of V-gene, J-gene and CDRH3 length combinations. Each line represents the mean of 20 independent repertoire samplings (with replacement). The shading surrounding the mean line indicates the 95% confidence interval.

Extended Data Fig. 5 Collapsing sequences into clonotypes.

a, To demonstrate the effect of collapsing an expanded clonal lineage into clonotypes, we selected a previously reported lineage of Zika-specific monoclonal antibodies isolated from the plasmablast population of an acutely infected patient24. Of 119 sequences, 89 were unique at the nucleotide level. b, Sequences encoding the same V gene, J gene and an identical CDRH3 amino acid sequence were collapsed into clonotypes, and the sequence phylogeny was coloured by clonotype. A total of 119 sequences were collapsed into 18 clonotypes. c, Sequences were collapsed into clonotypes, allowing a single mismatch in the CDRH3 amino acid sequence, and the sequence phylogeny was coloured by clonotype. A total of 119 sequences were collapsed into 10 clonotypes. d, The clonotype fraction (number of clonotypes divided by the total number of filtered sequences), when collapsing clonotypes while allowing zero or one mismatch in the CDRH3 amino acid sequence for each subject in this study. e, Number of total clonotypes recovered when allowing zero or one mismatch in the CDRH3 amino acid sequence for each subject in this study.

Extended Data Fig. 6 Capture–recapture frequency.

a, Recapture frequency for each subject. Lines represent the mean of 10 random samplings (without replacement) for all subsample fractions except compete sampling (1.0). b, Mean recapture frequency for each subsample fraction.

Extended Data Fig. 7 Relative light-chain diversity estimation.

Using previously reported datasets of paired heavy and light antibody chains, clonotype diversity was estimated for heavy and light chains using both Chao 2 and Recon estimators. Estimates are shown in filled or unfilled points. Lines indicate the least-squares polynomial best fit (degree = 2) and is extrapolated to include both the lowest (1.17 × 108) and highest (9.06 × 108) number of UMI-corrected sequences from the 10 sequenced subjects.

Extended Data Fig. 8 Variance between inferred V(D)J recombination models.

a, Frequency of clonotype sharing between observed human subjects (black), synthetic datasets generated with IGoR’s default recombination model (red), synthetic datasets generated with subject-specific recombination models (blue) or synthetic datasets generated with a combined-subject recombination model (purple). b, Combined Kullback–Leibler divergence (KL divergence) between pairs of subject-specific models (blue), between subject-specific models and IGoR’s default model (red), or between subject-specific models and the combined-subject model (purple). c, Combined KL divergence between pairs of subject-specific models, separated by event type.

Extended Data Table 1 Demographic information and sequencing statistics per subject
Extended Data Table 2 Primers used for antibody gene amplification

Supplementary information

Reporting Summary

Source data

Source Data Fig. 1

Source Data Fig. 2

Source Data Fig. 3

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark
Fig. 1: Uniqueness of the repertoires of individual subjects.
Fig. 2: Clonotype and sequence diversity amongst the 10 subjects.
Fig. 3: Shared clonotypes and sequences amongst the 10 subjects.
Extended Data Fig. 1: Nearly full-length antibody gene amplification from biological and technical replicate samples.
Extended Data Fig. 2: V and J frequency correlations of technical and biological replicates.
Extended Data Fig. 3: Nucleotide mutation frequencies.
Extended Data Fig. 4: Cross-subject repertoire similarity.
Extended Data Fig. 5: Collapsing sequences into clonotypes.
Extended Data Fig. 6: Capture–recapture frequency.
Extended Data Fig. 7: Relative light-chain diversity estimation.
Extended Data Fig. 8: Variance between inferred V(D)J recombination models.


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.