Functional antibodies exhibit light chain coherence

Jaffe, David B.; Shahi, Payam; Adams, Bruce A.; Chrisman, Ashley M.; Finnegan, Peter M.; Raman, Nandhini; Royall, Ariel E.; Tsai, FuNien; Vollbrecht, Thomas; Reyes, Daniel S.; Hepler, N. Lance; McDonnell, Wyatt J.

doi:10.1038/s41586-022-05371-z

Download PDF

Article
Open access
Published: 26 October 2022

Functional antibodies exhibit light chain coherence

David B. Jaffe ORCID: orcid.org/0000-0001-8739-568X¹,
Payam Shahi¹^na1,
Bruce A. Adams¹,
Ashley M. Chrisman¹,
Peter M. Finnegan¹,
Nandhini Raman¹,
Ariel E. Royall¹,
FuNien Tsai¹,
Thomas Vollbrecht ORCID: orcid.org/0000-0003-1151-8844¹,
Daniel S. Reyes¹^na1,
N. Lance Hepler^na1^na2 &
…
Wyatt J. McDonnell ORCID: orcid.org/0000-0003-3859-2067¹

Nature volume 611, pages 352–357 (2022)Cite this article

19k Accesses
25 Citations
71 Altmetric
Metrics details

Subjects

Abstract

The vertebrate adaptive immune system modifies the genome of individual B cells to encode antibodies that bind particular antigens¹. In most mammals, antibodies are composed of heavy and light chains that are generated sequentially by recombination of V, D (for heavy chains), J and C gene segments. Each chain contains three complementarity-determining regions (CDR1–CDR3), which contribute to antigen specificity. Certain heavy and light chains are preferred for particular antigens^{2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22}. Here we consider pairs of B cells that share the same heavy chain V gene and CDRH3 amino acid sequence and were isolated from different donors, also known as public clonotypes^23,24. We show that for naive antibodies (those not yet adapted to antigens), the probability that they use the same light chain V gene is around 10%, whereas for memory (functional) antibodies, it is around 80%, even if only one cell per clonotype is used. This property of functional antibodies is a phenomenon that we call light chain coherence. We also observe this phenomenon when similar heavy chains recur within a donor. Thus, although naive antibodies seem to recur by chance, the recurrence of functional antibodies reveals surprising constraint and determinism in the processes of V(D)J recombination and immune selection. For most functional antibodies, the heavy chain determines the light chain.

Dynamics of heavy chain junctional length biases in antibody repertoires

Article Open access 01 May 2020

Origin and evolutionary malleability of T cell receptor α diversity

Article Open access 21 June 2023

VHH CDR-H3 conformation is determined by VH germline usage

Article Open access 19 August 2023

Main

A central challenge of immunology is the grouping of antibodies by function. Ideally, antibodies in such groups would share both an epitope and complementary paratopes dictated by their protein sequences. In practice, small numbers of antibodies are assayed in vitro–for example, for functional activities such as neutralizing capability. Larger numbers of antibodies can be assayed for simple binding to a particular antigen. In the future, antibody properties might be understood at scale from sequence information alone, perhaps via structural modelling, which could lead to antibody grouping²⁵. However, in the absence of a sufficiently large dataset with multiple antigen specificities using cells from multiple humans or donors, it is currently impossible to assess the validity of any functional grouping scheme. Innovative methods such as mitochondrial lineage tracing could perhaps be used to validate computed clonotypes²⁶.

Nevertheless, some inferences can be made. All antibodies within a clonotype—a group of antibodies that share a common ancestral recombined cell that arose in a single donor—usually perform the same function. A clonotype can therefore be treated as the minimal functional group of antibodies. Next, as has been observed, nature repeats itself by creating similar clonotypes that appear to have the same function^{2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22}, and these might be combined into groups. Such recurrences have been observed between donors, but they also occur within individual donors, as we will demonstrate. Regardless, such recurrences arise after recombination randomly creates a vast pool of potential antibodies; recurrences arise through selection from that pool.

Specific examples suggest that sequence similarity can guide the way to understanding functional groups. For example, in the case of influenza virus, antibodies binding the anchor epitope of the haemagglutinin stalk domain reuse four heavy chain V genes (IGHV3-23, IGHV3-30, IGHV3-30-3 and IGHV3-48) and two light chain V genes²¹ (IGKV3-11 and IGKV3-15). A similar observation has been made in the case of Zika virus, where a protective heavy–light chain gene pair IGVH3-23–IGVK1-5 is observed in multiple humans, that also cross-reacts with dengue virus¹⁶. Even in the setting of HIV infections, which lead to diverse and divergent viruses within a single human, recurrent and ultra-broad neutralizing antibodies such as the VRC01 lineage emerge, with subclass members using combinations of heavy chains encoded by IGHV1-2 and IGHV1-46 paired with light chains encoded by IGKV1-5, IGKV1-33, IGKV3-15 and IGKV3-20²².

Motivated by these examples, we set out to determine whether unrelated B cells with similar heavy chains also have similar light chains. We exclude related cells (that is, those in the same clonotype) because they use the same VDJ genes by definition.

We generated a large set of paired V(D)J data to investigate this question. Using peripheral blood samples from four unrelated humans (Methods and Extended Data Table 1), we captured and sequenced paired, full-length antibody sequences from a total of 1.6 million single B cells of 4 phenotypes defined by flow cytometry^27,28: naive, unswitched memory, class-switched memory and plasmablasts (Extended Data Fig. 1). For each cell, we obtained nucleotide sequences spanning from the leader sequence of the V gene through enough of the constant region to determine the isotype and subclass of the antibody.

We computationally split these antibody sequences into two types: naive and memory. To do so, we inferred V gene alleles for each of the four donors (Methods), and then for each B cell used the inferred alleles to estimate the number of somatic hypermutations (SHMs) that occurred outside the junction regions, including both chains. We labelled an antibody sequence as naive if it had no mutations relative to the inferred germline (that is, exhibited no SHM), and as memory otherwise, understanding that these categories are biological oversimplifications and do not account, for example, for class switching. We compared these categories with the flow-sorted categories (Extended Data Table 2). Approximately 80.0% of cells sorted as naive were naive by computational analysis, with a maximum of 90.9% for donor 2. Conversely, just 0.6% of cells sorted as memory were naive by computational analysis. During library preparation, we exhausted the supply of memory cells and deliberately mixed them in some libraries (for example, switched B cells plus naive B cells) to best exploit capacity. Our computational sorting also enabled us to make the best use of all the data.

Memory antibodies show light chain coherence

Next, we investigated whether for unrelated B cells, similar heavy chains imply similar light chains. We explored this question separately in memory and naive cells by considering pairs of cells, either both memory or both naive. We considered only pairs of cells with the same heavy chain V gene and the same CDRH3 length, and whose cells came from different donors. We divided the pairs into 11 sets according to their CDRH3 amino acid per cent identity, rounded down to the nearest 10%. Then for each set, we computed its light chain coherence: the percentage of cell pairs in which the light chain gene names were identical. In this work, we consider light chain V gene paralogs with D in their name to be identical (for example, IGKV1-17 and IGKV1D-17 are considered identical), as previously described²⁹.

We show the results of this analysis in Fig. 1a. For memory B cells found in separate donors having the same heavy chain V genes and 100% CDRH3 amino acid identity (2,813 cells), we found 82% coherence between their light chains, whereas light chain coherence in naive cells (754 cells) was only 10%. This makes sense, as naive cells have generally not yet been selected for functionality or undergone SHM during an immune response. This finding implies that for memory cells, which bear functional antibodies and are typically the products of thymic and peripheral selection, heavy chain coherence implies light chain coherence. We further tested light chain coherence for memory B cells using independent datasets, finding 93% coherence on a heterogeneous compendium of older data (data generated by 10x Genomics for internal development and distributed with enclone³⁰, as well as several other datasets related to multiple sclerosis³¹, COVID-19³²^,33, Kawasaki disease³⁴ and LIBRA-seq³⁵) and 79% coherence on a recent dataset³⁶ (Methods, ‘Additional data’). We redistribute these data as part of our paper and provide details about each dataset in Supplementary Table 1.

**Fig. 1: Functional public and private antibodies exhibit light chain coherence.**

We also analysed these data using only one cell per clonotype, finding 79% light chain coherence for memory B cells for the data in this work (Methods and Supplementary Table 2). Moreover, to completely eliminate the possibility that light chain coherence might be explained by cross-sample contamination, we compared pairwise between our data, the older data, and the recent data, treating each as a single super-donor, finding light chain coherence of 70%. We tested the effect of cell misclassification by randomly swapping memory and naive labels for 10% of cells, finding 82% coherence for memory cells and 17% (versus 10% without swapping) coherence for naive cells, suggesting that naive coherence might be exaggerated by actual labelling errors. We note that light chain coherence does not imply heavy chain coherence (Extended Data Table 3). We also note that if we instead define naive (CD19⁺IgD⁺CD27^±CD38^±CD24^±) and memory cells (Methods) by flow cytometry, we found 86% concordance between light chains for memory cells (87.3% for switched and 84.4% for unswitched) and 16% for naive cells. Supplementary Figs. 1 and 2 show significant associations between Vh and Vl genes and superfamilies and the frequency of Vl superfamily usage by CDRH3 length.

Light chain coherence remains even if light chain V gene paralogs are not treated as the same gene, with light chain coherence of 64% for memory cells (Extended Data Table 4). A more sophisticated approach might make more identifications, as sufficiently similar V genes should be functionally indistinguishable. In fact, light chain coherence can be observed without reference to genes at all. Given pairs of cells from different donors, we can compute their heavy and light chain edit distances. In that case, if the cells have the same CDRH3 amino acid sequence, 78% of the time their light chain edit distance is less than or equal to 20, whereas without the CDRH3 restriction this is true only 9% of the time (Extended Data Fig. 2).

Recurrences (separate recombination events) occur between different donors and also within single donors; from first principles, light chain coherence are expected to occur at a similar rate within a single donor. This is difficult to investigate, because SHM could cause two cells from the same recombination event to manifest as separate events. We addressed this by considering cells in different computed clonotypes, which—if the computation were perfect—would have arisen from independent recombinations. We required additional conditions to further increase the likelihood that the cells in fact represented a true recurrence (Methods). For example, we treated it as sufficient for two cells to have different CDRL3 lengths, as this would in all likelihood arise from separate recombination events—although this cannot be confirmed using VDJ sequencing alone. Using this approach, we observed 65% light chain coherence for memory cells at 100% CDRH3 amino acid coherence for cells from the same donor (Fig. 1b). This is lower than observed for cells from different donors (Fig. 1a), probably because the additional precautionary conditions (Methods) remove many cases of bona fide recurrent antigen-specific responses that would be expected to arise within a human.

As an analytical tool, pairs of cells are useful for understanding VDJ biology. However, it is the non-overlapping groups of memory B cells that might share function that are of greatest interest biologically. The following grouping scheme is an example of a common approach to this problem: placing all memory B cells sharing identical heavy chain V genes and 100% identical CDRH3 amino acid sequences in the same group. This is unsatisfactory, because it ignores clonotypes that share function but differ in amino acid sequence, and because some amino acids can change without affecting the ability of an antibody to bind its antigen.

These limitations are readily overcome. We provide a concrete example at 90% identity, for the sake of clarity. We consider only memory cells and only computed clonotypes consisting entirely of memory cells. We define groups by first defining when two cells are similar. We call two cells similar if they belong to the same clonotype or if they have identical heavy chain V genes and 90% identical CDRH3 amino acid sequences. We then place two cells X and Y in the same group if there is a sequence of cells X = X₁,…,X_n = Y such that for each i, X_i is similar to X_i + 1. The multiple ‘hops’ between these two cells make them transitively similar. This process is a well-known mathematical notion, which we call transitive grouping. It places every cell in a group and yields non-overlapping groups. We analysed these groups for light chain coherence by examining pairs of cells from different donors within the same group.

Transitivity enables the formation of large groups of clonotypes. As Fig. 2a shows, the light chain coherence of these groups decreases as the CDRH3 per cent identity is lowered. This is more rapid than might be expected because of the multiple hops in transitivity, each of which has the potential to connect antibodies having different functions. Even so, large coherent groups can be formed. Figure 2b shows a transitive group of clonotypes computed using 90% percent identity, using both the data of this work and that of Phad et al.³⁶ and sharing the heavy chain V gene IGHV3-9. The light chain coherence for this group is 99.7% (726 out of 728 cells), and all but four cells use the light chain genes IGKV2-30 and IGKJ2. We note that of memory B cells using the heavy chain V gene IGHV3-9 in these datasets, only 7% (1,203 out of 16,880) use IGKV2-30. The cells in the group lie in 122 computed clonotypes, each of which would represent an independent recurrence (recombination) event if the clonotype computation were perfect. Both heavy and light CDR3 sequences exhibit strong conservation. Notably, 93% of the cells have subclasses IgA1 or IgA2.

**Fig. 2: Transitive linking yields large coherent groups.**

Simulation predicts antibody recurrence

Antibody recurrences would be expected to arise preferentially from relatively common recombination events. We analysed each junction sequence by finding the most likely D region, allowing for no D region or a concatenation of two D regions to account for VDDJ junctions^37,38, and aligned the antibody nucleotide sequence to the concatenation of the V(D)J reference sequences (Fig. 3a and Tables 1 and 2). We first tested whether the observed recurrence rate was comparable to that expected by chance. Answering this poses a dilemma as it requires deep enough knowledge of recombination to accurately recapitulate the process by simulation. Other researchers have addressed this challenging problem^39,40,41. We generated random antibody junction sequences using the simulation program soNNia⁴¹ using deduplicated, naive heavy chain nucleotide sequences from our data for training. We also explored simulation variants (see Methods and Supplementary Table 3). Our simulations leverage two features that were not used in previous studies^23,24. Namely, we identify and use for simulator training truly naive junction sequences (with no detectable SHM in the other CDR or FR regions), and we account for central and peripheral selection using the post-selection model⁴¹.

**Fig. 3: Public antibody properties are consistent with recombination biology.**

Table 1 Public antibody properties are consistent with recombination biology

Full size table

Table 2 Light chain coherence in complex antibody junctions

Full size table

We then calculated how many recurrent naive antibodies are predicted by simulation, by making four groups of simulated antibodies of the same sizes as the groups of naive cells in our data, and then counting cross-donor recurrences of heavy chain gene and CDRH3 amino acid pairs from each of ten simulation replicates. Recurrences of naive antibodies are exhibited as cell counts, which makes sense because naive cells rarely appear in clonotypes having more than one cell. Whereas the actual number of recurrences we observed was 754, the average predicted value was 1,190. Given the challenges of simulation, these numbers are close. This confirmed that the soNNia simulator recapitulates statistical properties of real repertoires, including the frequency of VDDJ recombinations (Table 1) and the recurrence of junctions even when training the model on memory rather than naive sequences (Supplementary Table 3).

Next, we investigated whether the junction regions of recurrent antibodies are as complex as those of arbitrary antibodies (Fig. 3b). We found that recurrent antibodies have an order of magnitude fewer inserted bases, and that this is true for both naive and memory cells. This shows that recurrent antibodies have intrinsically less complex junctions than arbitrary antibodies and—as expected—are thus more likely to recur by chance. In fact, most recurrent antibodies, whether naive or memory, have no inserted bases (Fig. 3b).

Finally, we investigated light chain coherence in antibodies that recur in a few individuals. On average, such antibodies have relatively simple junction regions (Fig. 3b). Antibodies with more complex junctions should recur at a lower rate. Within a larger population of individuals these would be highly visible, but they can still be observed in our data. We found that recurrent antibodies with more inserted nucleotides (non-templated regions 1 and 2, and other inserted bases relative to the reference sequence of the junction alignment) have greater light chain coherence than those with no inserted nucleotides (Table 2). This lessened our concern that recurrent antibodies are exceptional with respect to light chain coherence. Indeed, our findings suggest that all antibodies are recurrent, but at varying rates depending on their junction complexity and the prevalence of their cognate antigen. Our findings also suggest that aside from frequency, more complex antibodies do not behave differently with respect to light chain coherence.

Discussion

This work supports the following model, generalizing observed constraints on gene usage by some antibodies^{2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24}. In nature, many heavy chain configurations yield effective binding of a given antibody target. However, for each of those heavy chain configurations, the cognate light chain is largely determined—at a rate of around 80% at the level of light chain gene or paralog. We call this phenomenon light chain coherence and observe it by looking for recurrences of heavy chains in memory and naive cells from four donors. The small number of donors biases our analysis towards junction regions with low complexity, although the same phenomenon occurs for more complex junctions that appear in our data. Our findings suggest that light chain coherence may apply to memory B cells in general.

Although we analysed V(D)J data for only around 2 million cells in this work, deeper data have been generated separately for heavy and light chains. This has enabled the identification of recurrences (public clonotypes) within such data, using strict definitions based on 100% CDRH3 amino acid identity^23,24. We reinterpret these data here with caution, owing to the differences in scale and technical approaches between the studies. We show here that previously described recurrences in these and other studies come from two types of B cells: naive B cells making ‘not yet functional’ antibodies with minimal light chain coherence and memory B cells making functional antibodies with light chain coherence. The lack of light chain coherence in naive B cells is expected, given their lack of acquired functionality and selection. Conversely, the light chain coherence in memory B cells is expected because of their acquired functionality and selection.

The simplest explanation for the recurrence that we and others observe between naive cells is that their sequences repeat purely by chance. We show that in our data, recurrent naive cells (as well as memory cells) have markedly lower junction complexity, and it is thus no surprise that they recur. One might ask whether the observed recurrence frequency is consistent with the mechanistic biology of V(D)J recombination. Answering this question would require precise quantitative knowledge regarding this exquisitely complex process, and such knowledge does not exist. However, we show here that simulation of this process does in fact predict recurrences, at a similar rate to our observations. Thus, we propose that naive sequences recur by chance. Conversely, recurrent memory sequences are a product of both chance and common exposure to related antigens. We suggest that recurrent naive sequences are instructive with respect to recombination and that recurrent memory sequences are instructive with respect to antibody function, central selection and peripheral selection.

We postulate that light chain coherence implies functional coherence. However, we do not claim to solve the problem of functional grouping. For this to be possible, at least two hurdles remain. First, all approaches (including ours) that are based on direct sequence comparison are naive to the structural consequences of amino acid changes. Rather than compare sequences, a more effective route to functional grouping may be to first computationally model antibody structures from their sequences and to then compare those structures^{25,42,43,44,45}. Second, far better truth data are needed to assess any method. It follows that although similar antibodies for the same antigen have been widely observed in multiple individuals, it remains unknown how frequently antibodies to different antigens might be equally similar. Truth data targeted at such questions could comprise a suite of large datasets of naturally occurring antibodies, with one dataset for each of several antigens, along with binding data for each antibody. These data would be most powerful if they included nucleotide sequences (enabling, for example, consistent VDJ gene identification) and enough donor information to distinguish bona fide recurrence from clonal expansion within given individuals. The generation of such truth data at scale is feasible using existing methods^{21,35,46,47,48}. The immunoglobulin loci and their products are complex and challenging to analyse. It is reasonable to assume that some sequences treated as naive in this study are in fact antigen-specific, as others have described in SARS-CoV-2 infection⁴⁹. Conversely, although progress has been made in the identification of novel germline alleles^50,51, some sequences treated as memory in this study may in fact be naive and produced by novel alleles not detected by our inference methods.

By virtue of how V(D)J recombination works, the light chain sequences of antibodies carry less information than the heavy chain sequences¹. Our work reveals that the light chains of functional antibodies are highly constrained, which implies that the light chains used in nature are the most functional ones: natural selection has won out. The complex dance between the heavy chain and the light chain is best studied at the level of individual cells, the context in which antibodies are produced, selected and expanded. Although we do not yet understand why, we show that the choreography of this interaction leads to a limited number of acceptable light chains. We suggest that antibody designers would be wise to actively look for optimal light chains used widely in nature, rather than focusing on the heavy chain alone. Similarly for bispecific antibodies, it could be advantageous to find two heavy chains whose native light chains are similar.

Methods

For several details, we refer to ref. ⁵¹. We provide commands using executables in the enclone code and data of this work at https://github.com/DavidBJaffe/enclone/blob/master/enclone_paper/which_code_does_what. A single line can be used to install the enclone executable and download the data (https://10xgenomics.github.io/enclone/). We used version 0.5.175 of enclone. The enclone runs on the data of this work require about 145 GB of memory and 20–40 min on a multi-core server (24 cores). As an intermediate, we generated a file per_cell_stuff (https://plus.figshare.com/articles/dataset/Dataset_supporting_Functional_antibodies_exhibit_light_chain_coherence_/20338177?file=36366549) with one line per cell, to facilitate direct analysis of the data of this work by other methods. There are also files per_cell_stuff.old_data (https://plus.figshare.com/articles/dataset/Dataset_supporting_Functional_antibodies_exhibit_light_chain_coherence_/20338177?file=36366552) and per_cell_stuff.phad (https://plus.figshare.com/articles/dataset/Dataset_supporting_Functional_antibodies_exhibit_light_chain_coherence_/20338177?file=36366543) corresponding to the other data that were used. Use of the other executables requires compilation from source code available at https://github.com/DavidBJaffe/enclone, and are in its directory enclone_paper/src/bin. These calculations (typically taking as input the files such as per_cell_stuff) were run on a MacBook Pro with 16 GB of memory.

Flow cytometry

We used a Sony MA900 cell sorter to purify single B cell suspensions from PBMCs from the four donors described in this paper. We used the following flow gating definitions for each population: naive: live, CD3⁻CD19⁺IgD⁺CD27^±CD38^±CD24^±; unswitched memory: live, CD3⁻CD19⁺CD27⁺IgD^lowIgM⁺⁺CD38^±CD24^±; switched memory: live, CD3⁻CD19⁺CD27⁺IgD⁻CD38^±CD24^±CD95^±; and plasmablast: live, CD3⁻CD19⁺CD27⁺IgD⁻CD38⁺⁺CD24⁻.

The antibody panel we used comprised of the following eight clones in a stain volume of 200 μl, totalling 1.25 μg of antibody with each antibody at a 1:40 dilution (5 μl): LIVE/DEAD dye (Invitrogen, 7-AAD, 00-6993-50), anti-CD3 (BioLegend, BV711, 317327, clone OKT3, IgG2a/kappa, mouse), anti-CD19 (BioLegend, PE, 982402, clone HIB19, IgG1/kappa, mouse), anti-IgD (BioLegend, APC, 348221, clone IA6-2, IgG2a/kappa, mouse), anti-IgM (BioLegend, PE/Dazzle 594, 314529, clone MHM-88, IgG1/kappa, mouse), anti-CD24 (BioLegend, BV605, 311123, clone ML5, IgG2a/kappa, mouse), anti-CD27 (BioLegend, FITC, 302805, clone O323, IgG1/kappa, mouse), anti-CD38 (BioLegend, BV421, 356617, clone HB-7, IgG1/kappa, mouse), and anti-CD95 (BioLegend, BV510, 305639, clone DX2, IgG1/kappa, mouse).

We titrated and developed the B cell fractionation panel using 20 million fresh PBMCs (AllCells, 3050363) from a healthy human donor whose cells were not used to generate single-cell data in this study. We thawed the cells per the 10x Genomics Demonstrated Protocol for Fresh Frozen Human Peripheral Blood Mononuclear Cells for Single Cell RNA Sequencing (CG00039, Revision D). In brief, we resuspended cells in 20 μl PBS/2% FBS and incubated the cells on ice for 30 min in the dark. Before sorting, we washed the cells in 3× 1 ml PBS/2% FBS and then resuspended them in 300 μl PBS/2% FBS for the sort step.

Single-cell data generation

Cells from four donors were flow sorted as naive, switched memory, unswitched memory, and plasmablast. Donor blood was collected under IRB-approved protocols with informed consent managed by AllCells (a subsidiary of Discovery Life Sciences); no clinical trials were conducted and no personally identifying information is reported in this study or its data. V(D)J sequences were obtained using the 10x Genomics Immune Profiling Platform, using six Chromium X HT chips and standard manufacturer methods. cDNA libraries were sequenced on the NovaSeq 6000 platform on S4 flow cells. Certain memory B cell populations were relatively uncommon within certain donors. To account for this, in some libraries we added naive B cells to isolated memory cells from each donor in order to capture a sufficiently large number of total B cells and unique sequences from each donor (see also Extended Data Table 1). We targeted 20,000 cells recovered from each lane on the HT chip.

Additional data

We used several other datasets. These included single-cell data for 247,516 cells from Phad et al.³⁶, which were approximately evenly distributed between two donors. A second combined collection (‘older data’) included data generated by 10x Genomics for internal development, and distributed with enclone, as well as several other datasets related to multiple sclerosis³¹, COVID-19^32,33, Kawasaki disease³⁴ and LIBRA-seq³⁵.

Some of these data predated dual indexing and showed evidence of mixing (that is, index hopping) within a given flow cell. In such cases we treated cells from multiple donors as belonging to a single donor. After merging there were 23 donors, and a total of 280,669 cells in the older data.

Additional condition on pairs of cells from the same donor

Pairs of cells were chosen from different computed clonotypes (Fig. 1b). In order to make it very unlikely that the two cells belonged to the same true clonotype, we required in addition that at least one of three conditions was satisfied: (1) The data support different heavy chain J gene usage for the two cells. For this, we examined framework region four for both cells. There had to exist at least three positions where the reference sequences were different, and the cells had bases consistent with those, and no positions where the reference sequences were different, and the cells both supported just one of the references. (2) The same criteria as in (1) but for the light chain instead of the heavy chain. (3) The light chain CDR3 lengths differed.

Sequence simulation

We used soNNia v0.1.2, commit 85c7169, to simulate 10 replicates of 1,408,939 heavy chain junctions. A reproducible Conda environment, scripts to generate these files, and simulation data are provided at https://plus.figshare.com/articles/dataset/Dataset_supporting_Functional_antibodies_exhibit_light_chain_coherence_/20338177?file=36354231. We trained two soNNia models using naive-annotated sequences in pre-selection and post-selection mode, and two models using memory-annotated sequences in pre-selection and post-selection mode.

Statistical testing

Permutation analysis of light chain coherence

We performed a permutation test of light chain coherence by permuting light chain data, while leaving heavy chain data fixed, then calculating light chaincoherence between pairs of cells as in Fig. 1, using only one cell per clonotype to reduce potential bias from clonal expansions. We performed 1,000 permutations at values of light chain coherence between 0% and 100% by steps of 10% and then calculated the standard error of the mean at each level. We show these curves in Fig. 1a. We tested for difference between slope coefficients of linear regression models of these curves using a sum-of-squares F test.

Vh/Vl contingency test

We tested the significance of the Vh/Vl contingency table (where each count is a clonotype with a given Vh/Vl pair) using Monte Carlo simulation of 100,000 P values.

Sequence logo plots

We used the ggseqlogo⁵² and msa⁵³ R packages to align CDRH3 and CDRL3 sequences with the MUSCLE algorithm and to generate logo plots using position-wise entropy/Shannon information (y axis unit: bits). Letters were coloured based on the properties of various amino acids. The amino acid-property colour coding and other code necessary to reproduce these figures are publicly available as part of this paper.

Allele inference

Donor alleles for V genes are partially inferred (as part of the enclone software, in the file allele.rs), using the following algorithm. The core concept is to pile up the observed sequences for a given V gene and identify variant bases. If we used all cells (one sequence per cell), then the sequences would be biased by clonal expansion and thus yield incorrect alleles. Ideally we would instead use just one cell per clonotype that uses the given V gene. However the order of operations is that we first compute donor alleles, and then compute clonotypes. Therefore we use a heuristic for picking cells that does not depend on knowing the clonotypes. The heuristic is that we pick just one cell among those using the given V gene, and that share the same CDRH3 length, CDRL3 length, and partner chain V and J genes. The pileup is then made from the V gene sequences of these cells.

Next, for each position along the V gene, excluding the last 15 bases (to avoid the junction region), we determine the distribution of bases that occur within these selected cells. We only consider those positions where a non-reference base occurs at least four times and represents at least 25% of the total. Then each cell has a footprint relative to these positions, which is its list of base calls for the given positions; we require that these footprints satisfy similar evidence criteria. Each such non-reference footprint then defines an ‘alternate allele’. We do not restrict the number of alternate alleles because they could arise from duplicated gene copies. The ability of the algorithm to reconstruct alleles is limited by the depth of coverage (counted in ‘non-redundant’ cells) of a given V gene. Moreover the algorithm cannot identify germline mutations which occur in the terminal bases of the V gene, inside the junction region. An example of allele calling may be found in ref. ⁵¹, in the section on donor reference analysis.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

All data are publicly available at https://doi.org/10.25452/figshare.plus.20338177, including processed full-length V(D)J sequences and annotations.

Code availability

All code to replicate key findings and figures of the paper are available at https://github.com/DavidBJaffe/enclone (Git hash 561e3ac); a separate copy of this code has also been deposited on Figshare+ at https://plus.figshare.com/articles/dataset/Dataset_supporting_Functional_antibodies_exhibit_light_chain_coherence_/20338177?file=37819143.

References

Tonegawa, S. Somatic generation of antibody diversity. Nature 302, 575–581 (1983).
Article ADS CAS PubMed Google Scholar
Forgacs, D. et al. Convergent antibody evolution and clonotype expansion following influenza virus vaccination. PLoS ONE 16, e0247253 (2021).
Article CAS PubMed PubMed Central Google Scholar
Heilmann, C. & Barington, T. Distribution of κ and λ light chain isotypes among human blood immunoglobulin-secreting cells after vaccination with pneumococcal polysaccharides. Scand. J. Immunol. 29, 159–164 (1989).
Article CAS PubMed Google Scholar
Roy, B. et al. High-throughput single-cell analysis of B cell receptor usage among autoantigen-specific plasma cells in celiac disease. J. Immunol. 199, 782–791 (2017).
Article CAS PubMed Google Scholar
Zhu, D., Lossos, C., Chapman-Fredricks, J. R. & Lossos, I. S. Biased immunoglobulin light chain use in the Chlamydophila psittaci negative ocular adnexal marginal zone lymphomas. Am. J. Hematol. 88, 379–384 (2013).
Article CAS PubMed PubMed Central Google Scholar
Wang, L. T. et al. The light chain of the L9 antibody is critical for binding circumsporozoite protein minor repeats and preventing malaria. Cell Rep. 38, 110367 (2022).
Article CAS PubMed PubMed Central Google Scholar
Zachova, K. et al. Galactose-deficient IgA1 B cells in the circulation of IgA nephropathy patients carry preferentially lambda light chains and mucosal homing receptors. J. Am. Soc. Nephrol. 33, 908–917 (2022).
Article CAS PubMed Google Scholar
Hadzidimitriou, A. et al. Evidence for the significant role of immunoglobulin light chains in antigen recognition and selection in chronic lymphocytic leukemia. Blood 113, 403–411 (2009).
Article CAS PubMed Google Scholar
Shah, H. B. et al. Human C. difficile toxin-specific memory B cell repertoires encode poorly neutralizing antibodies. JCI Insight 5, e138137 (2020).
Article PubMed Central Google Scholar
Lindop, R. et al. Molecular signature of a public clonotypic autoantibody in primary Sjögren’s syndrome: a ‘forbidden’ clone in systemic autoimmunity. Arthritis Rheum. 63, 3477–3486 (2011).
Article CAS PubMed Google Scholar
Parameswaran, P. et al. Convergent antibody signatures in human dengue. Cell Host Microbe 13, 691–700 (2013).
Article CAS PubMed PubMed Central Google Scholar
Al Kindi, M. A. et al. Serum SmD autoantibody proteomes are clonally restricted and share variable-region peptides. J. Autoimmun. 57, 77–81 (2015).
Article CAS PubMed Google Scholar
Hou, D. et al. Immune repertoire diversity correlated with mortality in avian influenza A (H7N9) virus infected patients. Sci Rep. 6, 33843 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Bailey, J. R. et al. Broadly neutralizing antibodies with few somatic mutations and hepatitis C virus clearance. JCI Insight 2, e92872 (2017).
Article PubMed Central Google Scholar
Pieper, K. et al. Public antibodies to malaria antigens generated by two LAIR1 insertion modalities. Nature 548, 597–601 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Robbiani, D. F. et al. Recurrent potent human neutralizing antibodies to Zika virus in Brazil and Mexico. Cell 169, 597–609.e11 (2017).
Article CAS PubMed PubMed Central Google Scholar
Setliff, I. et al. Multi-donor longitudinal antibody repertoire sequencing reveals the existence of public antibody clonotypes in HIV-1 infection. Cell Host Microbe 23, 845–854.e6 (2018).
Article CAS PubMed PubMed Central Google Scholar
Ahmed, R. et al. A public BCR present in a unique dual-receptor-expressing lymphocyte from type 1 diabetes patients encodes a potent T cell autoantigen. Cell 177, 1583–1599.e16 (2019).
Article CAS PubMed PubMed Central Google Scholar
Ehrhardt, S. A. et al. Polyclonal and convergent antibody response to Ebola virus vaccine rVSV-ZEBOV. Nat. Med. 25, 1589–1600 (2019).
Article CAS PubMed Google Scholar
Sheward, D. J. et al. Structural basis of Omicron neutralization by affinity-matured public antibodies. Preprint at bioRxiv https://doi.org/10.1101/2022.01.03.474825 (2022).
Guthmiller, J. J. et al. Broadly neutralizing antibodies target a haemagglutinin anchor epitope. Nature 602, 314–320 (2022).
Article ADS CAS PubMed Google Scholar
Havenar-Daughton, C. et al. The human naive B cell repertoire contains distinct subclasses for a germline-targeting HIV-1 vaccine immunogen. Sci. Transl. Med. 10, eaat0381 (2018).
Article PubMed PubMed Central Google Scholar
Soto, C. et al. High frequency of shared clonotypes in human B cell receptor repertoires. Nature 566, 398–402 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Briney, B., Inderbitzin, A., Joyce, C. & Burton, D. R. Commonality despite exceptional diversity in the baseline human antibody repertoire. Nature 566, 393–397 (2019).
Article ADS PubMed PubMed Central Google Scholar
Raybould, M. I. J., Rees, A. R. & Deane, C. M. Current strategies for detecting functional convergence across B-cell receptor repertoires. mAbs 13, 1996732 (2021).
Article PubMed PubMed Central Google Scholar
Miller, T. E. et al. Mitochondrial variant enrichment from high-throughput single-cell RNA sequencing resolves clonal populations. Nat. Biotechnol. 40, 1030–1034 (2022).
Article CAS PubMed Google Scholar
Akkaya, M., Kwak, K. & Pierce, S. K. B cell memory: building two walls of protection against pathogens. Nat. Rev. Immunol. 20, 229–238 (2020).
Article CAS PubMed Google Scholar
Weisel, F. & Shlomchik, M. Memory B cells of mice and humans. Annu. Rev. Immunol. 35, 255–284 (2017).
Article CAS PubMed Google Scholar
Pech, M. et al. A large section of the gene locus encoding human immunoglobulin variable regions of the Kappa type is duplicated. J. Mol. Biol. 183, 291–299 (1985).
Article CAS PubMed Google Scholar
Jaffe, D. B. et al. enclone: precision clonotyping and analysis of immune receptors. Preprint at bioRxiv https://doi.org/10.1101/2022.04.21.489084 (2022).
Ramesh, A. et al. A pathogenic and clonally expanded B cell transcriptome in active multiple sclerosis. Proc. Natl Acad. Sci. USA 117, 22932–22943 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Sokal, A. et al. Maturation and persistence of the anti-SARS-CoV-2 memory B cell response. Cell 184, 1201–1213.e14 (2021).
Article CAS PubMed PubMed Central Google Scholar
Woodruff, M. C. et al. Extrafollicular B cell responses correlate with neutralizing antibodies and morbidity in COVID-19. Nat. Immunol. 21, 1506–1516 (2020).
Article PubMed PubMed Central Google Scholar
Wang, Z. et al. Single-cell RNA sequencing of peripheral blood mononuclear cells from acute Kawasaki disease patients. Nat. Commun. 12, 5444 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Setliff, I. et al. High-throughput mapping of B cell receptor sequences to antigen specificity. Cell 179, 1636–1646.e15 (2019).
Article CAS PubMed PubMed Central Google Scholar
Phad, G. E. et al. Clonal structure, stability and dynamics of human memory B cells and circulating plasmablasts. Nat. Immunol. 23, 1–10 (2022).
Article CAS PubMed PubMed Central Google Scholar
Briney, B. S., Willis, J. R., Hicar, M. D., Thomas, J. W. 2nd & Crowe, J. E. Jr. Frequency and genetic characterization of V(DD)J recombinants in the human peripheral blood antibody repertoire. Immunology 137, 56–64 (2012).
Article CAS PubMed PubMed Central Google Scholar
Safonova, Y. & Pevzner, P. A. V(DD)J recombination is an important and evolutionarily conserved mechanism for generating antibodies with unusually long CDR3s. Genome Res. 30, 1547–1558 (2020).
Article CAS PubMed PubMed Central Google Scholar
Sethna, Z., Elhanati, Y., Callan, C. G., Walczak, A. M. & Mora, T. OLGA: fast computation of generation probabilities of B- and T-cell receptor amino acid sequences and motifs. Bioinformatics 35, 2974–2981 (2019).
Article CAS PubMed PubMed Central Google Scholar
Marcou, Q., Mora, T. & Walczak, A. M. High-throughput immune repertoire analysis with IGoR. Nat. Commun. 9, 561 (2018).
Article ADS PubMed PubMed Central Google Scholar
Isacchini, G., Walczak, A. M., Mora, T. & Nourmohammad, A. Deep generative selection models of T and B cell receptor repertoires with soNNia. Proc. Natl Acad. Sci. USA 118, e2023141118 (2021).
Article CAS PubMed PubMed Central Google Scholar
Finn, J. A. et al. Identification of structurally related antibodies in antibody sequence databases using Rosetta-derived position-specific scoring. Structure 28, 1124–1130.e5 (2020).
Article CAS PubMed PubMed Central Google Scholar
Wong, W. K. et al. Ab-Ligity: identifying sequence-dissimilar antibodies that bind to the same epitope. mAbs 13, 1873478 (2021).
Article PubMed PubMed Central Google Scholar
Richardson, E. et al. A computational method for immune repertoire mining that identifies novel binders from different clonotypes, demonstrated by identifying anti-pertussis toxoid antibodies. mAbs 13, 1869406 (2021).
Article PubMed PubMed Central Google Scholar
Robinson, S. A. et al. Epitope profiling using computational structural modelling demonstrated on coronavirus-binding antibodies. PLoS Comput. Biol. 17, e1009675 (2021).
Article CAS PubMed PubMed Central Google Scholar
Wilson, P. et al. Distinct B cell subsets give rise to antigen-specific antibody responses against SARS-CoV-2. Preprint at Research Square https://doi.org/10.21203/rs.3.rs-80476/v1 (2020).
Shiakolas, A. R. et al. Efficient discovery of SARS-CoV-2-neutralizing antibodies via B cell receptor sequencing and ligand blocking. Nat. Biotechnol. 40, 1270–1275 (2022).
Article CAS PubMed PubMed Central Google Scholar
Rush, S. A. et al. Characterization of prefusion-F-specific antibodies elicited by natural infection with human metapneumovirus. Cell Rep. 40, 111399 (2022).
Kim, S. I. et al. Stereotypic neutralizing Vh antibodies against SARS-CoV-2 spike protein receptor binding domain in patients with COVID-19 and healthy individuals. Sci. Transl. Med. 13, eabd6990 (2021).
Article CAS PubMed PubMed Central Google Scholar
Lees, W. et al. OGRDB: a reference database of inferred immune receptor genes. Nucleic Acids Res. 48, D964–D970 (2020).
Article CAS PubMed Google Scholar
Rodriguez, O. L. et al. Genetic variation in the immunoglobulin heavy chain locus shapes the human antibody repertoire. Preprint at bioRxiv https://doi.org/10.1101/2022.07.04.498729 (2022).
Wagih, O. ggseqlogo: a versatile R package for drawing sequence logos. Bioinformatics 33, 3645–3647 (2017).
Article CAS PubMed Google Scholar
Bodenhofer, U., Bonatesta, E., Horejš-Kainrath, C. & Hochreiter, S. msa: an R package for multiple sequence alignment. Bioinformatics 31, 3997–3999 (2015).
CAS PubMed Google Scholar

Download references

Acknowledgements

We wish to express our gratitude to the donors for their patience in undergoing apheresis and donating their blood. We thank P. Marks, M. Stubbington, S. Taylor, P. Shah and V. Kumar for their comments and suggestions. Funding and resources were provided by 10x Genomics.

Author information

These authors contributed equally: Payam Shahi, Daniel S. Reyes, N. Lance Hepler
Unaffiliated: N. Lance Hepler

Authors and Affiliations

10x Genomics, Pleasanton, CA, USA
David B. Jaffe, Payam Shahi, Bruce A. Adams, Ashley M. Chrisman, Peter M. Finnegan, Nandhini Raman, Ariel E. Royall, FuNien Tsai, Thomas Vollbrecht, Daniel S. Reyes & Wyatt J. McDonnell

Authors

David B. Jaffe
View author publications
You can also search for this author in PubMed Google Scholar
Payam Shahi
View author publications
You can also search for this author in PubMed Google Scholar
Bruce A. Adams
View author publications
You can also search for this author in PubMed Google Scholar
Ashley M. Chrisman
View author publications
You can also search for this author in PubMed Google Scholar
Peter M. Finnegan
View author publications
You can also search for this author in PubMed Google Scholar
Nandhini Raman
View author publications
You can also search for this author in PubMed Google Scholar
Ariel E. Royall
View author publications
You can also search for this author in PubMed Google Scholar
FuNien Tsai
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Vollbrecht
View author publications
You can also search for this author in PubMed Google Scholar
Daniel S. Reyes
View author publications
You can also search for this author in PubMed Google Scholar
N. Lance Hepler
View author publications
You can also search for this author in PubMed Google Scholar
Wyatt J. McDonnell
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.B.J. and W.J.M. were responsible for conceptualization, formal analysis, methodology, software, supervision, validation and writing of the original draft manuscript. D.B.J., P.S., B.A.A., N.R., D.S.R., N.L.H. and W.J.M. were responsible for data curation. All authors were responsible for investigation. D.B.J., P.S., D.S.R., N.L.H. and W.J.M. were responsible for project administration. D.B.J., P.S., N.R. and W.J.M. were responsible for visualization. All authors were responsible for review and editing of the final draft manuscript.

Corresponding authors

Correspondence to David B. Jaffe or Wyatt J. McDonnell.

Ethics declarations

Competing interests

All authors except N.L.H. were employees of 10x Genomics at the time of submission. Several authors were also shareholders of 10x Genomics at the time of submission. D.B.J., P.S., B.A.A. and W.J.M. are inventors on patent applications assigned to 10x Genomics in relation to algorithms and methods for the study of immune repertoires.

Peer review

Peer review information

Nature thanks Yana Safonova and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer review reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Flow cytometry gating schemes for B cell subsets.

Gating strategy for B cell isolation. Panels a—d show naive cell gating, panels e—g show memory cell gating, and panels h—i show plasmablast gating. a, Hierarchical gating scheme for lymphocytes, single cells, live cells, and CD3-negative cells. b, We gated CD19+CD27± cells from CD3− cells for further analysis. Donor samples displayed noticeable differences in CD19 and CD27 expression. c, We analyzed CD19+CD27− cells for surface IgD expression and gated IgD+ cells for further analysis. d, We selected naive B cells by sorting CD19+CD27–IgD+CD24±CD38± B cells. e, For memory cell gating, we selected CD19+CD27+ cells from b for CD24+CD38+ positivity. f, We analyzed cells from e and isolated unswitched memory cells using IgD±IgM++ gating. g, We analyzed cells from e and isolated switched memory cells using IgD−CD95+ gating. h, We analyzed CD19+CD27+ cells from b and gated the IgD–CD27+ population. i, We sorted plasmablasts using CD24–CD38++ gating.

Extended Data Fig. 2 Light chain coherence is visible by sequence similarity.

Each point represents a pair of memory cells from different donors. Heavy and light chain edit distances are plotted, using the amino acids starting at the end of the leader and continuing through the last amino acid in the J segment. Points with identical coordinates are combined by showing a large point whose area is proportional to the number of such points. a, Cell pairs are displayed if the two cells in the pair have the same CDRH3 amino acid sequence. To increase readability, only one third of such pairs were selected at random for display. Of the pairs, 78% have light chain edit distance ≤ 20. This number (78%) is the fraction of cell pairs lying below the horizontal line at light chain edit distance 20, and was computed separately. It is proportional to the fraction of red below the line, if overlap is taken into account. b, [control] The same number of cell pairs were selected at random for display, without regard to CDRH3. Of the pairs, 9% have light chain edit distance ≤ 20.

Extended Data Table 1 Donor information

Full size table

Extended Data Table 2 Flow categories by donor with number of cells and naive fraction

Full size table

Extended Data Table 3 Light chain coherence does not imply heavy chain coherence

Full size table

Extended Data Table 4 Light chain coherence in memory B cells (public antibodies), without identifying light chain V gene paralogs

Full size table

Supplementary information

Supplementary Information

Supplementary Table 1 describes all data analysed and distributed as part of this work, and how to retrieve them. Supplementary Table 2 exhibits light chain coherence, computed in different ways for several datasets. Supplementary Table 3 displays summary statistics related to junctions within each dataset and recurrence rates within groups of datasets. Supplementary Fig. 1 exhibits associations between Vh and Vl gene usage from 1.2 million single B cells. Supplementary Fig. 2 exhibits per donor CDRH3 length distributions.

Reporting Summary

Peer Review File

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Jaffe, D.B., Shahi, P., Adams, B.A. et al. Functional antibodies exhibit light chain coherence. Nature 611, 352–357 (2022). https://doi.org/10.1038/s41586-022-05371-z

Download citation

Received: 24 April 2022
Accepted: 21 September 2022
Published: 26 October 2022
Issue Date: 10 November 2022
DOI: https://doi.org/10.1038/s41586-022-05371-z

This article is cited by

Adaptive immune receptor repertoire analysis
- Vanessa Mhanna
- Habib Bashour
- Encarnita Mariotti-Ferrandiz
Nature Reviews Methods Primers (2024)
Single-cell immune repertoire analysis
- Sergio E. Irac
- Megan Sioe Fei Soon
- Zewen Kelvin Tuong
Nature Methods (2024)
Contextualising the developability risk of antibodies with lambda light chains using enhanced therapeutic antibody profiling
- Matthew I. J. Raybould
- Oliver M. Turnbull
- Charlotte M. Deane
Communications Biology (2024)
Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies
- Jeffrey A. Ruffolo
- Lee-Shin Chu
- Jeffrey J. Gray
Nature Communications (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Main

Memory antibodies show light chain coherence

Simulation predicts antibody recurrence

Discussion

Methods

Flow cytometry

Single-cell data generation

Additional data

Additional condition on pairs of cells from the same donor

Sequence simulation

Statistical testing

Permutation analysis of light chain coherence

Vh/Vl contingency test

Sequence logo plots

Allele inference

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data figures and tables

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links