Contextualising the developability risk of antibodies with lambda light chains using enhanced therapeutic antibody profiling

Raybould, Matthew I. J.; Turnbull, Oliver M.; Suter, Annabel; Guloglu, Bora; Deane, Charlotte M.

doi:10.1038/s42003-023-05744-8

Download PDF

Article
Open access
Published: 08 January 2024

Contextualising the developability risk of antibodies with lambda light chains using enhanced therapeutic antibody profiling

Communications Biology volume 7, Article number: 62 (2024) Cite this article

3324 Accesses
8 Altmetric
Metrics details

Subjects

Abstract

Antibodies with lambda light chains (λ-antibodies) are generally considered to be less developable than those with kappa light chains (κ-antibodies). Though this hypothesis has not been formally established, it has led to substantial systematic biases in drug discovery pipelines and thus contributed to kappa dominance amongst clinical-stage therapeutics. However, the identification of increasing numbers of epitopes preferentially engaged by λ-antibodies shows there is a functional cost to neglecting to consider them as potential lead candidates. Here, we update our Therapeutic Antibody Profiler (TAP) tool to use the latest data and machine learning-based structure prediction, and apply it to evaluate developability risk profiles for κ-antibodies and λ-antibodies based on their surface physicochemical properties. We find that while human λ-antibodies on average have a higher risk of developability issues than κ-antibodies, a sizeable proportion are assigned lower-risk profiles by TAP and should represent more tractable candidates for therapeutic development. Through a comparative analysis of the low- and high-risk populations, we highlight opportunities for strategic design that TAP suggests would enrich for more developable λ-antibodies. Overall, we provide context to the differing developability of κ- and λ-antibodies, enabling a rational approach to incorporate more diversity into the initial pool of immunotherapeutic candidates.

Refining the impact of genetic evidence on clinical success

Article Open access 17 April 2024

An autoantibody signature predictive for multiple sclerosis

Article 19 April 2024

Antibody drug conjugate: the “biological missile” for targeted cancer therapy

Article Open access 22 March 2022

Introduction

Antibodies are the dominant category of biotherapeutics; more than 140 therapeutic antibodies have now been approved by regulators with over 550 currently active in clinical trials^1,2. Their popularity is tied to their use by natural immune systems and their ability to achieve high affinity/specificity for seemingly any targeted pathogen (antigen), enabling its selective eradication³.

Conventional antibodies are dimeric, comprise two identical heavy and light chains and accomplish precise antigen recognition through two dedicated antigen binding sites, termed the variable regions (Fvs). These Fvs are identical and structurally/chemically intricate, containing six proximal complementarity-determining region (CDR) loops — three on the variable domain of the heavy chain (VH, CDRH1-3) and three on the variable domain of the light chain (VL, CDRL1-3).

A large portion of the VH sequence derives from the recombination of a heavy V, D, and J gene, while most of the VL sequence is analogously the product of recombination of a light V and J gene. These heavy and light chain immunoglobulin germline genes are encoded at different loci across the chromosomes. For example, in humans, the heavy chain V, D, and J genes (IGHV, IGHD, IGHJ) lie solely on chromosome 14, while light chain V and J genes exist at two loci; a kappa (κ, IGKV and IGKJ) locus on chromosome 2, and a lambda (λ, IGLV and IGLJ) locus on chromosome 22^4,5.

Within each locus, different V(D)J genes recombine to create a considerable baseline diversity in both the VH and VL sequence⁶. Nucleotide insertions/deletions in the junction region between genes (which falls within the CDR3 loops) further contribute to exceptional VH and VL sequence diversification. Pairing of the recombined heavy and light chains then adds an additional combinatorial diversity; in this manuscript we term antibodies containing a κ light chain as κ-antibodies, and those containing a λ light chain as λ-antibodies. Finally, antibody sequence diversity is magnified through somatic hypermutation during an immune response. This process is often artificially mimicked during therapeutic development through in vitro affinity maturation/engineering.

Although the VH sequence is more diversified, the VL sequence is often critical to an antibody’s function. For example, it has been observed in different toxin, virus, and vaccine response contexts that κ- and λ-antibodies are expressed in characteristic proportions with restricted usages, and that they tend to have different antigen specificities⁷. Amongst the thousands of anti-coronavirus antibodies independently isolated throughout the pandemic, the same VL germline genes have been frequently observed amongst antibodies with a high confidence of engaging the same epitope^8,9. This link between VL germline genes and function has recently been shown to apply more generally, as evidenced by Jaffe et al. who found light chain coherence of memory B-cell compartments¹⁰, and by Shrock et al. who identified the presence of germline amino acid-binding motifs — many of which lie in the VL sequence¹¹. Together, these phenomena are likely by-products of the documented sequence^11,12,13,14 and structural^11,15,16,17 differences between κ- and λ-VLs, which may have evolved to increase the efficacy of receptor editing¹⁸, a process during which maturing BCRs can exchange their initial recombined κ light chain for a λ light chain to prevent autoreactivity.

Despite their functional utility, λ-antibodies are currently under-represented across clinical-stage therapeutic antibodies (CSTs). Of a set of 242 CSTs curated in 2019¹⁹, all of which were designed for human application, only around 10% derived from λ-genes. By comparison, λ-antibodies are estimated to comprise roughly 35% of natural human repertoires^20,21.

The precise reasons behind the paucity of λ-antibodies in the clinic are unknown, but there are several probable origins. These include factors related to the dominant methods of therapeutic discovery^22,23, such as unintended selection bias in screening library design²⁴ and the higher κ: λ ratios (up to 20:1) seen in mouse antibody repertoires²⁵.

There are also reports that suggest that λ-antibodies may fail more frequently than κ-antibodies to advance through pre-clinical development²⁶. Several studies have identified that λ-VLs exhibit a higher average hydrophobicity than κ-VLs^12,13,18,19; higher hydrophobicity suggests an increased propensity towards the formation of aggregates via the hydrophobic effect. This mechanism is understood to be the primary force driving light chain amyloidosis, where free light chains self-associate, and data suggest that λ-VLs prone to dimerisation outnumber κ-VLs²⁷.

This has earned λ-antibodies a reputation for poor developability that has fed back into systematic discovery biases, such as through the intentional development of κ-only screening libraries²⁸, or, when given a choice of progressing κ- or λ-antibodies to downstream lead optimisation, a tendency to prioritise the former. However, it is probable that a sizeable proportion of λ-antibodies are indeed developable, and that rational engineering could be used to make some more challenging λ-antibodies biophysically tractable²⁹. In general, better distinction between more developable and less developable λ-antibodies should be applied to limit the degree to which we artificially restrict candidate diversity, and therefore targetable epitope space, during early stage discovery.

In 2019, we published the Therapeutic Antibody Profiler (TAP), a method for the computational developability assessment of lead candidates based on comparing their 3D biophysical properties to those of CSTs¹⁹. At the time, we had access to only 25 λ-based CST sequences and artificially-paired representations of natural human antibodies. Now, through dedicated efforts to track the sequences of CSTs as they are designated by the World Health Organisation (WHO)¹ and increased public availability of paired-chain natural antibody repertoires³⁰, we are able to more confidently profile the physicochemical properties of therapeutic and natively-expressed human λ-antibodies.

In this paper, we first improve TAP by incorporating ABodyBuilder2³¹, a state-of-the-art deep-learning based antibody structure prediction method and highlight changes and robustness of the new guideline values. We then use our updated protocol to characterise developability-linked biophysical differences across CST and natural κ- and λ-antibodies. Finally, we probe the subset of red-flagging antibodies for recurrent features associated with extreme scores, and which may be avoided to derisk the incorporation of λ-antibodies into screening libraries.

Overall, our study provides an improved methodology for therapeutic antibody profiling and adds context to the developability of λ-antibodies, facilitating their selective consideration as leads during early-stage drug discovery.

Results

Curating datasets of therapeutic and natural antibodies

We first curated the latest set of non-redundant, post Phase-I clinical stage therapeutics (CSTs) designated for use in humans from Thera-SAbDab¹ (25th January, 2023). We obtained 664 CST Fv sequences (the CST_all dataset), compared to the 242 used in our previous analysis¹⁹. To obtain a reference set of natural human antibodies, we utilised the paired Observed Antibody Space (OAS) database³⁰, which tracks, cleans, and annotates single-cell antibody V(D)J sequencing datasets in the public domain. We curated all 79,759 non-redundant natively-paired human antibody sequences from paired OAS (25th January, 2023), which we term the Nat_all dataset. This compares to datasets of between 14,000–19,000 artificially-paired human antibody sequences used in our previous analysis¹⁹.

Benchmarking a new TAP modeling protocol

The original Therapeutic Antibody Profiler used the homology modeling tool ABodyBuilder³² (ABodyBuilder1, for clarity) for antibody structural modeling. In 2018, this was the state-of-the-art tool for high-throughput antibody modeling. However, recent advances in deep learning have yielded several pretrained ab initio structure prediction architectures that can be applied or adapted to the task of rapid antibody/CDR loop modeling^31,33,34,35. Their average performance has been shown to be consistently higher than that of homology-based antibody modeling methods. Since better models of antibodies should improve the reliability of our developability guidelines, we explored the case for updating our TAP protocol to use a more recent machine learning-based tool (ABodyBuilder2³¹) for 3D structural modeling. We selected ABodyBuilder2 as it has been shown to outperform other antibody-specific modelling methods³¹, while being competitive with AlphaFold Multimer³³ at orders of magnitude faster modelling rates.

We first confirmed that ABodyBuilder2’s improved general performance translates specifically to CSTs, observing increased backbone and side chain modeling accuracy relative to ABodyBuilder1 across a set of recently-solved therapeutics (Supplementary Note 1, Supplementary Tables 1–2). This motivated us to formally adopt ABodyBuilder2 as the tool for 3D modeling prior to computation of the TAP metrics.

We next analysed the impact of using ABodyBuilder2 versus ABodyBuiler1 for structural modeling on the TAP developability guidelines calculated across the CST_all set. We measure this based on their impact on the amber and red flagging thresholds — characteristic percentile values used to demark the extrema of each TAP property distribution linked to poor developability¹⁹.

For reference, amber flags for Total CDR Length (L_tot) or Patches of Surface Hydrophobicity (PSH) are assigned to scores in the 0th-5th or 95th-100th percentiles relative to CSTs, while red flags are assigned if the L_tot/PSH score falls below the 0th or above the 100th percentile. Amber flags for Patches of Positive Charge (PPC) or Patches of Negative Charge (PNC) are assigned if a score falls in the 95th–100th percentile range relative to CSTs, and red flags are assigned to scores above the 100th percentile. Finally, amber flags are assigned for the Structural Fv Charge Symmetry Parameter (SFvCSP) metric if the score falls between the 0th-5th percentile values relative to CSTs, while red flags are assigned to scores below the 0th percentile.

The ABodyBuilder2-modeled CST_all flagging thresholds show high similarity to those obtained by ABodyBuilder1 (Table 1, Supplementary Fig. 1). There is, however, some evidence of a systematic bias associated with the different modeling protocol. Comparing the amber flag thresholds (less volatile than red flag thresholds as they capture 5% of the data) shows that ABodyBuilder2-modeled CSTs have lower PSH scores than ABodyBuilder1-modeled CSTs. This drop in PSH score is consistent with ABodyBuilder2’s more accurate modeling; we found in our original TAP paper that PSH values calculated over solved crystal structures (theoretically perfect predictions) were lower on average (a difference of c. 10) than those calculated over ABodyBuilder1 models of the same CSTs¹⁹. This emphasises the need to use the same modelling tool for setting the guidelines and evaluating new candidates.

Table 1 Latest TAP thresholds based on ABodyBuilder1 or ABodyBuilder2 models.

Full size table

Testing the robustness of the TAP developability guidelines

We then probed the robustness of our developability guidelines to various perturbations. TAP values calculated on the subset of CSTs modeled with higher certainty should be more reliable. Model confidence can be estimated through the frame-aligned prediction error (FAPE) metric, a property minimised as part of the ABodyBuilder2 loss function that can be interpreted as a measure of backbone prediction uncertainty for each residue³¹. To investigate the impact of FAPE-based confidence filtering on our guidelines, we first determined an appropriate CDRH3 root-mean squared predicted error threshold that would filter out the least-confidently modeled CDRH3s (1.31 Å, see Methods for the derivation), then calculated our developability guidelines based only on the subset of most confident CST predictions (the CST_conf set, see Supplementary Fig. 2, Supplementary Table 3). This set of generally higher quality models provides a more accurate reference set of physicochemical distributions, which, if they were to differ substantially from the general set, would imply that ABodyBuilder2’s model accuracy has a systematic impact on the aggregate guidelines set over all CSTs. Overall, the guidelines derived from only the most confident models aligned closely with those set over all CSTs, suggesting that the new TAP guidelines are robust to the variable prediction accuracy of ABodyBuilder2.

Next, we examined the effect of ABodyBuilder2’s non-deterministic side chain modeling to explore statistically how side chain conformational uncertainty impacts the guidelines. We ran the TAP protocol three times per CST and investigated the consistency of structure-dependent TAP metrics for each CST (Supplementary Fig. 3). The results for all metrics were all highly consistent between runs. PPC, PNC, and SFvCSP values were the most consistent, with Pearson’s coefficient values in the range of 0.993–0.996. Due to their sensitivity to structural variations in any CDR vicinity residue, we expected the PSH values to be more susceptible to inter-run fluctuations. This was borne out, however PSH remained strongly correlated between two independent modeling runs (ρ: 0.945), and even more strongly correlated between one run and the mean of three independent runs (ρ: 0.981). The proportions of flagging inconsistencies (instances where a CST would be flagged for that property based on one TAP run but not based on three repeats), were as follows: PSH (lower): 3.31%, PSH (upper): 1.81%, PSH (overall): 5.12%, PPC: 0.30%, PNC: 0.30%, SFvCSP: 0.75%.

To capture the absolute variability of scores across repeats, we evaluated for each metric/CST the variance across the three runs and averaged these values on a per metric basis across the CST_all dataset. The mean PSH variance was 10.53 while the mean PPC, PNC, and SFvCSP variances were below 1 (Supplementary Table 4). When values from three runs were amalgamated to establish aggregate TAP guidelines, this translated to a very small variation in threshold values from those obtained based on a single model of each CST (Supplementary Table 5).

In addition to this statistical sampling of energy-minimised side chain conformations, we also evaluated the variation in TAP scores calculated over the course of molecular dynamics simulations; incorporation of dynamics in guideline evaluation was suggested in a recent study on computational developability prediction³⁶. We selected 14 case study CSTs, seven of which had solved Fv coordinates in the ABodyBuilder2 training set and seven of which did not; for details of the molecular dynamics simulation and TAP calculations, see Methods.

The profiles for each of the four structure-dependent TAP metrics are shown in the Supplementary Information (Supplementary Figs. 4–7). The mean value of the TAP properties over the course of the simulation showed good agreement with an ensemble of three TAP predictions on the static Fv models output directly by ABodyBuilder2. Based on a paradigm where if a developability flag is raised on any of the repeat calculations we consider the antibody formally flagged for that property, the ensemble of direct ABodyBuilder2 outputs agreed with the flag assigned to the simulation mean for 12/14 calculations for PSH, 13/14 calculations for PPC, and 14/14 calculations for PNC and SFvCSP. Furthermore, we tested whether three ABodyBuilder2 modelling runs were sufficient to explore the diversity of side chain conformations by doubling to six runs and comparing the results. Based on an analogous ensemble paradigm, the agreement remained the same. However, there was some evidence that the additional runs helped to improve statistical consensus with the simulation-mean flag (Supplementary Table 6). For example, Simaravibart and Regdanvimab - which were assigned flags for PSH based on the molecular dynamics - flagged in a higher proportion of the six runs than the first three (1/3 vs. 3/6 runs, and 2/3 vs. 5/6 runs, respectively).

TAP metric profiles over time and by development status

Finally, we investigated the impact on our metric distributions of filtering our CSTs by metadata properties.

To assess the properties of CSTs over time, we split the set by the year they were given a proposed WHO International Non-proprietary Name, yielding 356 named between 1987–2017 and 308 named between 2018 and the present day. Comparing their TAP property distributions (Supplementary Fig. 8) indicates that while their charge metrics are similar, the amber and red flag thresholds of the L_tot and PSH properties have shifted to more extreme values at both tails, suggesting an increased willingness to push CST design into new property spaces and perhaps reflecting formulation advances able to accommodate more extreme physicochemical properties.

Recent studies have suggested that developability guidelines may be better derived from marketed therapeutics^36,37; we evaluated our TAP distributions for the subsets of CSTs in Phase-II (341), Phase-III (141), or in Preregistration/Approved (182), however observed no clear trend in their properties along the clinical progression axis (Supplementary Fig. 9). Equally, we saw little difference in the properties of CSTs known to be in active development or that completed the development pipeline versus CSTs whose campaigns were terminated before reaching approval (Supplementary Fig. 10). These observations are consistent with the principle that CSTs with unmanageable developability issues do not tend to progress past pre-clinical/Phase-I development, and that decisions to terminate campaigns at later clinical stages are often attributable to other causes.

Updated comparison of CSTs to natural human antibodies

A key biotechnological advance in recent years has been the advent of high-throughput paired B-cell receptor (BCR) sequencing³⁸. Publicly available paired antibody sequences are increasingly abundant³⁰ and provide a higher fidelity comparison set than the artificially-paired natural single chain reads we used in previous repertoire characterisation work^19,39. These samples, coupled with the availability of 2.75 times as many CSTs and a more accurate/versatile modeling protocol, provides an ideal opportunity to revisit prior analyses and explore whether we observe similar trends in the biophysical properties of therapeutics and natural antibodies.

We calculated the TAP profiles for our new curated datasets of naturally-paired human sequences (Nat_all, N = 88,274) and CSTs (CST_all, N = 664). The patterns of the distributions aligned with our findings in the original paper (Fig. 1a–e). CSTs and natural human antibodies adopted similar PPC, PNC, and SFvCSP distributions, but natural antibodies were even more enriched at longer CDRs and higher PSH scores than observed previously (30.16% and 24.23% fall above the upper amber flag thresholds set by the top-5% of CSTs, respectively). To ensure the length bias was not the sole driver of higher PSH scores, we plotted the L_tot against the PSH score for the Nat_all and CST_all datasets (Fig. 1f). While almost all the natural antibodies found at extreme L_tot values flag for PSH, so too do a disproportionate number of natural antibodies at more moderate CDR lengths, even down to the smallest recorded L_tot value.

**Fig. 1: TAP metric distributions based on ABodyBuilder2 models of the latest therapeutic data.**

To further test the robustness of these conclusions, we then restricted our analysis to a confidence-filtered subset of ABodyBuilder2 models of the CSTs (CST_conf, N = 510) and the natural data (Nat_conf, N = 30,402), generated using the FAPE threshold benchmarked on CSTs (see Methods). In these sets a much smaller fraction of natural antibodies survived the filtering cut-off (~38% of Nat_allversu ~75% of CST_all), likely due to the fact that natural antibodies sample longer CDRH3 lengths — which are both more conformationally diverse and harder to crystallise — as well as the general under-representation of natural antibodies in the Protein Data Bank⁴⁰, on which ABodyBuilder2 is trained.

The resulting CST_conf and Nat_conf TAP distributions show analogous relative positioning to our original results¹⁹, with CSTs occupying shorter L_tot and smaller PSH values than natural antibodies, but having similar charge characteristics (Supplementary Fig. 2). Quantitatively, over 16% and over 18% of Nat_conf antibodies surpassed the L_tot and PSH upper amber thresholds set by the CST_conf set (compared with ~30% and ~24% on the Nat_all set, respectively). The large reduction in the number of natural antibodies flagging for L_tot confirms that antibodies with longer CDR loops are modeled with lower confidence. The smaller percentage reduction in natural antibodies flagging for PSH reflects the increased tendency for natural antibodies of all lengths to occupy higher PSH values.

In summary, our investigations strengthen the evidence that CSTs and natural antibodies differ in their CDR length and surface hydrophobicity properties.

Using the new TAP protocol to explore λ-antibody developability

We then used our updated TAP protocol to explore the relative developability of κ- and λ-antibodies.

We examined the growth trends of κ- and λ-CSTs. Plotting their numbers over time reveals distinct patterns in usage (Fig. 2). For example, while novel κ-CST Fvs have been continuously released in double-digit quantities per year since 2007, new λ-CST Fvs only reached this level in 2018. In 2019, for example, the industry developed 53 new κ-CST Fvs, but only 10 new λ-CSTs.

**Fig. 2: Progression of kappa and lambda therapeutic antibodies into late-stage clinical trials, by year.**

This lag has led to a disparity in the abundance of κ- and λ-CSTs. As of January 2023, Thera-SAbDab contained 576 non-redundant κ-CST Fvs (86.7%) and 88 non-redundant λ-CST Fvs (13.3%); far below the relative abundance of human λ-antibodies (30–35%^20,21). However, prior to the disruption of the pandemic, there were signs of an upwards growth trend in λ-CSTs (Fig. 2). There is evidence to suggest this is driven by the propensity of λ-VLs to bind different targets/epitopes to κ-VLs; amongst the therapeutics designated by the WHO since 2022, six λ-antibodies (Acimtamig, Firastotug, Golocdacimig, Temtokibart, Tolevibart, and Zinlivimab) are first-in-class clinical candidates against novel antigen targets or epitopes (FCGR3A, HHV gB AD, OLR1, IL22RA1, HPV Envelope Protein, and the HIV-1 gp120 V3 epitope, respectively¹). Overall, the 88 sequence non-redundant λ-CSTs now in Thera-SAbDab represents a 250% increase on the 25 λ-CSTs we had access to when developing our original guidelines.

TAP distributions across κ- and λ-antibodies

We studied the CST biophysical property distributions for the two types of light chain (Fig. 3a–e; kappa N = 576, lambda N = 88).

**Fig. 3: TAP properties of kappa and lambda therapeutic antibodies.**

λ-CSTs disproportionately amber flag at the upper extrema of the L_tot (3.1% κ, 27.27% λ) and PSH (2.7% κ, 21.1% λ) distributions, and, to a lesser extent, for PPC (4.1% κ, 11.8% λ). In contrast, κ-CSTs predominate in the lower extrema of the L_tot (10.4% κ, 1.3% λ) and PSH (5.8% κ, 0% λ) distributions. While the relative proportions in the flagging region for the SFvCSP score are similar, only κ-CSTs occupy the most extreme values (below −15).

As PSH values correlate to some extent with L_tot, we checked whether the preponderance of λ-CSTs at high PSH values was simply a by-product of length (Fig. 3f). Our results indicate that λ-CSTs are not noticeably more driven towards high PSH scores by longer CDR lengths than κ-CSTs are, with λ-CSTs having higher average PSH scores than κ-CSTs at every sampled L_tot value.

The observation that λ-CSTs sit at such longer average L_tot values than κ-CSTs was surprising. While λ-CDRL3s are known to be longer on average than their κ equivalents¹⁸, this alone cannot explain the shift. Instead, for this dataset, the disparity is also driven by biased pairing of λ-VLs with VH chains with longer average CDRH3 lengths (μ_{κ-CST, H3}: 12.53 ± 3.07, μ_{λ-CST, H3}: 14.30 ± 3.87).

We then studied the biophysical property distributions for the natural human sets of κ- (N = 44,420) and λ-antibodies (N = 35,341; Fig. 4). On these datasets we found a much smaller difference in L_tot scores between the natural κ-antibody and λ-antibody models than observed in the CSTs; an offset fully explained by CDRL3 length biases across the two types of light chain (μ_{κ-Nat, L3}: 9.12 ± 0.8, μ_{λ-Nat, L3}: 10.61 ± 1.03). This result is consistent with Townsend et al.¹⁸ and provides strong evidence that the longer CDRH3 lengths seen in λ-CSTs are due to a bias (e.g. species origin¹⁹) in therapeutic development.

**Fig. 4: TAP properties of natural and therapeutic antibodies, split by light chain type.**

Analogous to the CSTs, natural human λ antibodies were disproportionately flagged for high PSH relative to human κ-antibodies, but both were flagged at an even higher rate: 11.26% of natural κ-antibodies and 40.52% of natural λ-antibodies flagged, relative to 1.91% and 26.14% of κ-CSTs and λ-CSTs, respectively.

Both κ-CSTs and λ-CSTs therefore occupy a lower-risk region of CDR length and PSH property space relative to natural antibodies, strengthening the findings from the original TAP paper¹⁹ where we suggested that CSTs in general require more conservative values of these properties than natural antibodies to be amenable to therapeutic development. It also highlights the complexity of developability optimisation in drug discovery: improvements in the humanness of the antibodies in screening libraries can have the unintended byproduct of enhanced therapeutic aggregation risk, regardless of the genetics of the VL sequence.

The charge properties of the natural κ- and λ-antibodies can be found in Supplementary Fig. 11. The natural λ-antibodies also show an enhanced propensity for PPC values over 1 relative to their kappa equivalents, suggesting a natural origin for the disproportionate flagging of λ-CSTs for PPC.

Residue positions associated with driving λ-antibodies towards high PSH scores

Our analyses suggest that the structure-dependent property biases across λ-CSTs are inherited from natural trends, especially for the PSH score. We therefore examined the λ-CSTs to determine which features in the Fv tend to correlate with their high PSH scores, with a view to guiding rational engineering and library design.

We selected the upper red-flagging sets of natural κ- (N = 134) and λ-antibodies (N = 968) and decomposed the overall TAP PSH score into its pairwise-residue component parts. We investigate in more detail the top-20-most hydrophobic sequence-adjacent interactions, and top-20-most hydrophobic sequence non-adjacent interactions, across antibodies red-flagging for PSH.

We observed a broad diversity of VH (Supplementary Fig. 12) and VL (Supplementary Fig. 13) residues involved in driving extreme PSH scores, emphasising the challenge of finding molecular rules of thumb for antibody optimisation. However, the VH residue positions involved in elevating the PSH of either κ-antibodies or λ-antibodies were highly similar, suggesting minimal bias in the physicochemical properties of VH sequences associating with κ- or λ-VLs.

Of particular interest to antibody optimisation engineers are positions outside of the CDR regions, since mutations at these sites are less likely to impact antigen specificity. Amongst the dominant residues contributing to high PSH scores were κ-VL positions 1–3 (framework region L1) and 79–85 (framework region L3), and λ-VL positions 24–26 (framework region L1) and 71–72 (framework region L3); while not in the formal CDRs, these residues lie in the vicinity of the CDRs and may be serving to extend hydrophobic self-association surfaces.

As a case study, we investigated in more detail VL positions 24–26, which drive higher PSH scores in λ-antibodies but not κ-antibodies (Fig. 5a, Supplementary Fig. 14). The λ-antibodies exhibit a broader diversity of amino acids at these positions than κ-antibodies, although, with the exception of leucine at position 25 (Supplementary Fig. 14), do not exhibit particularly hydrophobic residues. However, we observed that over 90% of κ-antibodies have positively-charged residues at position 24, while λ-antibodies almost exclusively use smaller, less polar residues (Fig. 5a), which would be expected to be more accommodating of a hydrophobic self-association interface. Meanwhile, though serine is the most commonly observed residue at position 26 in λ-antibodies (and seen in 100% of κ-antibodies, Supplementary Fig. 14), threonine becomes by far the most prevalent residue amongst red-flagging λ-antibodies when position 26 is involved in the subset of most hydrophobic interactions (Fig. 5b). This is due both to its slightly higher intrinsic hydrophobicity and to more complex co-associations with other residues.

**Fig. 5: Features of green- and red-flagging kappa and lambda antibodies.**

In summary, by decomposing TAP scores such as the PSH into pairwise residue components, we can elucidate the regions driving high risk scores for individual or classes of antibodies and help to orient developability-motivated mutagenesis studies.

λ-VL genes harbour characteristic risk profiles

Associations of certain genes or gene families with PSH flagging propensity would offer a simple strategy to develop diverse but developable screening libraries, for example by incorporating only select lambda genes with a more moderate risk of poor developability. To investigate if such associations exist, we evaluated the gene usages of λ-antibodies that green-flagged (N = 21,001) or red-flagged (N = 968) for PSH (Fig. 5c, d). From a gene family perspective (Fig. 5c), IGLV1 and IGLV3 were associated with a lower PSH-mediated developability risk, while others such as IGLV2, were highly enriched amongst red-flagging λ-antibodies. IGLV9 is almost exclusively found amongst flagging antibodies. These risk profiles are supported by the gene family usages across λ-CSTs (Supplementary Table 7): IGLV1 and IGLV3 are over-represented relative to their natural abundances, while IGLV2 is under-represented, and no CST has yet derived from an IGLV9 gene.

To study what features might be driving differential risk across families, we generated separate sequence logo plots of all IGLV2 sequences and all non-IGLV2 sequences (Supplementary Fig. 15). This highlighted positions that are commonly more hydrophobic among IGLV2 antibodies. For example, position 57 is mostly valine with a trace of glycine in IGLV2 antibodies, whereas it is predominantly asparagine or aspartate in antibodies from the other families. Similarly, position 3 is entirely hydrophobic across IGLV2 antibodies but is found to be glutamate in roughly 1/3 of the other LV gene subgroups. Consistent with Fig. 5b, position 26 is almost entirely threonine in IGLV2 antibodies, while other families tend to use the less hydrophobic serine and highly polar residues such as asparagine and aspartate. Additionally, we observed that the CDRL1 loop, which typically bears a central motif containing hydrophobic residues, is frequently longer, and therefore more protruding, in IGLV2 antibodies.

We then dissected the TAP PSH risk profiles for the IGLV2 antibodies into profiles for individual genes (Fig. 5d). This demonstrated that the higher developability risk associated with the family is not shared evenly amongst its constituent genes: for example, IGLV2-14*01 is found in a higher fraction of green-flagging antibodies than red-flagging antibodies, while every allele of IGLV2-23 is associated with enhanced abundance among red-flagging λ-antibodies. Again, this is supported by gene usages across λ-CSTs: IGLV2-14 is the dominant gene observed amongst the relatively small number of CSTs deriving from the IGLV2 family (Supplementary Table 8).

Together these results suggest that a λ-antibody’s germline V gene contributes substantially to its developability risk profile, and that TAP can be used to stratify lower from higher risk scaffolds.

Discussion

In this paper, we benchmarked the latest machine learning-based antibody modeling technology for use in the Therapeutic Antibody Profiler¹⁹.

We found that, while the precise guideline values we derived in 2019 have modulated slightly due to the availability of nearly three-times as many CST datapoints, the broad trends in property distribution between CSTs and natural antibodies have remained consistent; i.e. CSTs as a whole have shorter CDR loops and smaller patches of surface hydrophobicity, while their charge properties are highly similar. The patterns also hold when limited to the subset of higher-certainty models (based on ABodyBuilder2’s statistical heuristic³¹).

When split by year of designation by the WHO, new therapeutics are more frequently sampling the extremes of CDR and PSH property space, indicating that our definitions of druglikeness are likely to continue evolving over time. This phenomenon has also been observed in small molecule drug discovery, where several typical properties of today’s drugs differ from the original trends documented by Lipinski et al.^41,42, and may be related to advances in developmental/formulation technologies.

On the other hand, we observe no obvious trends in the properties of post-Phase I active/approved therapeutics versus discontinued therapeutics, nor in therapeutics that have advanced to different clinical stages, suggesting that, at least in terms of the TAP properties, we would not expect predictive power to improve by only considering therapeutics that have advanced to later stages. Unfortunately, there remains a void in publicly available data on antibodies that failed pre-clinical evaluation due to poor developability, against which physicochemical property guideline thresholds could be benchmarked.

Due to ABodyBuilder2’s modeling protocol, statistical uncertainty in side chain positioning can now be captured to some extent through repeat modeling and TAP calculations. Guidelines derived from repeat runs are almost identical to the guidelines derived from a single run per therapeutic, while mean variances of property values of the CST therapeutics are near-0 for charge metrics and only around 10 for the PSH metric. Variances on this order can lead to classification disparities across repeat runs between adjacent boundaries (i.e. green/amber risk, or amber/red risk) but are extremely unlikely to result in the same antibody being assigned green risk and red risk for a given property.

Molecular dynamics simulations of a representative set of CST Fab models indicate that the flags assigned by an ensemble of repeat static ABodyBuilder2 predictions are highly consistent with simulation-average flags. Best agreement with simulation is obtained by considering an antibody to have flagged for a property if a flag is seen on any of the repeat runs. As running TAP multiple times takes a few minutes, several orders of magnitude faster than running molecular dynamics, repeat TAP calculations may offer a sensible strategy for high-throughput developability screening with consideration for side chain mobility.

We then used our new TAP protocol to investigate developability-linked property biases across κ- and λ-antibodies, exploiting the rise in both CST and paired-chain natural sequence data. λ-VLs have distinct epitope specificities to κ-VLs, driven by features such as locus-specific germline-encoded amino acid binding-motifs¹¹. However λ-antibodies have been suggested to be less developable than κ-antibodies²⁶ and are heavily under-sampled amongst CSTs relative to their natural abundance. Therapeutic antibody profiling adds quantitative evidence that natural λ-antibodies are generally at higher risk of developablity issues, especially hydrophobicity-driven aggregation, than natural κ-antibodies. Indeed, the mean of the natural λ-antibody distribution sits just below the amber-flagging threshold; a substantial population of λ-antibodies are prone to being nudged into being flagged by, say, an affinity maturation pipeline based on unconstrained mutagenesis.

However, through a quantification of the risk of each λ-antibody, TAP profiles can now enable us to identify subpopulations expected to be more amenable to therapeutic development, and therefore to offer strategies towards augmenting the targetable epitope space through rational design. The observation of particular lambda gene associations with higher risk profiles, and a preliminary concomitant signal in the germline gene origin distributions of λ-CSTs that have so far progressed to the clinic, suggest that approaches such as family-holdout (e.g. all IGLV2) or gene-holdout (e.g. all IGLV2-23) libraries should enrich for λ-antibodies with lower expected developability risk. Alternatively, libraries could be constructed at a more granular level, incorporating more risk-prone genes but only when the associated sequence is considered by TAP to be lower risk. While sequence-by-sequence in vitro screening library design may still be a distant prospect, such approaches are gaining traction in the field of in silico library design^39,43.

The interpretability of the TAP metrics means they can be readily deconstructed to explore which regions of the CDR vicinity tend to contribute to higher developability risk. We show that the positions that contribute most to high risk scores in both κ- and λ-antibodies are diverse and distinct. Differences lie in the VL sequence itself rather than through any biases in the properties of their paired VH sequences. While preliminary, we note that some residues in the periphery of the CDR vicinity can help drive antibodies towards being red flagged; on a case-by-case basis, mutations to these regions may impact developability while being less likely to affect antibody specificity.

To date, TAP has primarily found use in industry for the early-stage removal of candidate antibodies more likely to suffer from developability issues. This increases the efficiency of drug discovery, but risks artificially constraining diversity, reinforcing current established property trends. We have shown how TAP, applied to identify more nuanced λ-VL residue and λ-gene associations with developability risk, could also guide selective consideration of a broader diversity of lead candidates and so enable access to a wider pool of epitopes.

Methods

Dataset curation

The Therapeutic Structural Antibody Database (Thera-SAbDab) was downloaded on 25th January, 2023¹. The entries were filtered for those designed for human application, that have reached at least Phase-II of clinical trials, and that have complete variable regions (Fvs, i.e. no single domain antibodies were carried forward). This set was then mined for sequence non-redundant Fv regions (at the level of 100% identity), to filter out biosimilars with no changes to the Fv and to reduce biases caused by the use of previously-developed monoclonal Fv domain sequences in new multispecific formats. This resulted in 664 non-redundant CST Fvs (the CST_all dataset), of which 576 were κ-based (86.7%) and 88 were λ-based (13.3%). Thera-SAbDab light-gene locus labels were confirmed via alignment to the latest set of human and mouse IMGT V domains using ANARCI⁴⁴.

The sequence non-redundant (100% identity) Fv sequences of 88,274 natural human antibodies were retrieved from the Observed Antibody Space (OAS) database³⁰ (timestamp: 25th January 2023). These were filtered for sequences with complete CDRs⁴⁵, leaving 79,761 antibodies (the Nat_all dataset): 44,420 (55.7%) κ-antibodies, 35,341 (44.3%) λ-antibodies.

Benchmarking TAP modeling methods

ABodyBuilder1³² was run using template databases built on a copy of SAbDab^45,46 timestamped to 30th April 2022, and with a template sequence identity cut-off of 99% to ensure genuine models were produced³². ABodyBuilder2 was run using the pre-trained weights from the paper³⁵. Relative performance to ABodyBuilder1 was evaluated by root-mean-squared deviation (RMSD) by IMGT-defined region⁴⁷ and was calculated with an in-house script that first aligns each model structure to the ground truth structure based on the backbone atoms of the framework region of the investigated chain and then calculates the RMSD over the backbone atoms of the residues of the region (for heavy or light chains in the IMGT numbering scheme, CDR1: residues 27–38, CDR2: 56-65, CDR3: 105-117).

The classification of residues as solvent exposed or buried was based on an in-house implementation of the Shrake and Rupley algorithm⁴⁸, using a spherical probe of radius 1.4 Å. A residue ‘X’ was considered exposed if its solvent-accessible surface area (SASA) was ≥7.5% of its theoretical maximum value (based on the open-chain form of Alanine-X-Alanine)¹⁹. In accordance with the parametrisations of these theoretical maximum SASAs, all hydrogen atoms were stripped out of ABodyBuilder2 predictions prior to SASA calculations.

A threshold to filter out the least confident ABodyBuilder2 models was obtained by a two-step process. First, the 119/664 CST Fv domains for which 100% sequence identical X-ray crystal structures exist (identified using Thera-SAbDab metadata¹, Supplementary Table 2), were filtered out of the CST_all dataset. The root-mean-squared predicted error for each remaining CST CDRH3 was then calculated as \(\sqrt{\frac{{\sum }_{res(CDRH3)}P{E}^{2}}{{L}_{CDRH3}}}\), where PE represents backbone predicted error, res(CDRH3) represents the sum over all CDRH3 residues and L(CDRH3) represents the length of the CDRH3. Finally, the threshold was derived by evaluating the 75th percentile (1.31 Å). This filter was applied to retain the 510 most confidently-modeled CSTs (the CST_conf dataset), and applied to the Nat_all dataset to derive the 30,402 most confidently modeled natural antibodies (the Nat_conf subset).

Running TAP on ABodyBuilder2 models

Sets of CST and natural antibody Fv domains were run through TAP and their five computational developability metrics calculated (Total IMGT-defined⁴⁷ CDR Length [L_tot], Patches of Surface Hydrophobicity using the Kyte and Doolittle scale [PSH], Patches of Surface Positive Charge [PPC], Patches of Surface Negative Charge [PNC] and Structural Fv Charge Symmetry Parameter [SFvCSP])¹⁹. PSH, PPC, and PNC metrics were calculated across the CDR vicinity (IMGT-defined CDR residues ±2 on each side plus any other surface exposed residue within 4.5 Å of one of these residues). Throughout the work, amber and red thresholds and were set at the percentile values suggested in the original paper¹⁹.

Whenever the properties of κ- and λ-antibodies where compared, threshold values were calculated from the CST_all set (i.e. not evaluated separately by light chain type).

Assessing TAP score variation over molecular dynamics trajectories

14 CST Fab regions were modelled by grafting the constant regions of their crystal structures (see Supplementary Table 9) onto the Fv models generated by ABodyBuilder2, obtaining the initial arrangement by aligning the crystal and model Fv backbones (full Fab regions were used instead of Fv regions based on the results of previous studies⁴⁹). We then modelled-in missing residues in the constant region using MODELLER v10.2⁵⁰ and generated 10 models using the ‘very slow’ refinement setting, selecting the lowest energy model. All systems were prepared and simulations performed using OpenMM v7.7⁵¹. N-methyl groups were used to cap C-termini using an in-house script. Next, using pdbfixer⁵¹, we protonated the models at a pH of 7.5, soaked them in truncated octahedral water boxes with a padding distance of 1 nm, and added sodium or chloride counter-ions to neutralise charges and then NaCl to an ionic strength of 150 mM. We parameterised the systems using the Amber14-SB forcefield⁵² and modelled water molecules using the TIP3P-FB model⁵³. Non-bonded interactions were calculated using the particle mesh Ewald method⁵⁴ using a cut-off of distance of 0.9 nm, with an error tolerance of 5x10^-4. Water molecules and heavy atom-hydrogen bonds were rigidified using the SETTLE⁵⁵ and SHAKE⁵⁶ algorithms, respectively. We used hydrogen mass repartitioning⁵⁷ to allow for 4 fs time steps. Simulations were run using the mixed-precision CUDA platform in OpenMM using the Middle Langevin Integrator with a friction coefficient of 1 ps⁻¹ and the Monte-Carlo Barostat set to 1 atm. We equilibrated systems using a multi-step protocol detailed in Supplementary Table 10. Following equilibration, we performed 200 ns of unrestrained simulation of the NPT ensemble at 300K, calculating TAP properties over the final 120 ns of each simulation, when all systems had reached relatively stable RMSD values from their initial coordinates (Supplementary Fig. 16). To estimate convergence, we aligned Fv regions on the starting structure using mdtraj v1.9.6⁵⁸ and calculated the RMSD of the Fv domains relative to the starting structure.

Determining molecular correlates with poor developability

Natural human antibodies lying above the red flag thresholds set by TAP across all CSTs were investigated for recurrent molecular patterns that contribute towards their high scores. The PSH scores for each antibody were split into components from sequence-adjacent residues and components from sequence non-adjacent residues, and these pairwise interactions were separately rank-ordered by hydrophobicity. Germline assignments for natural sequences were taken from the OAS Paired metadata³⁰, which derives from IgBlast⁵⁹ alignments of each nucleotide sequence to a recent set of human genes from the IMGT GeneDB⁵. Germline assignments for CSTs were evaluated using ANARCI⁴⁴ on amino acid sequences (allele predictions were ignored here due to the difficulty of accurately assigning alleles at the amino acid level). All percentage abundances of gene/gene family usages across λ-CSTs were calculated based the subset that mapped closest to human rather than mouse germlines.

Visualisations

All visualisations were made using open-source PyMOL or matplotlib version 3.5.2.

Statistics and reproducibility

All statistics were calculated using the numpy Python package (version 1.23.3). For fairness, the relative performance of ABodyBuilder1 and ABodyBuilder2 was benchmarked using CSTs whose structures were not available in the database or training set of the model, respectively. Additionally, TAP metrics were calculated across a redundancy-filtered set of CSTs to reduce bias caused by biosimilars with no changes to the Fv and by the use of the same variable domain sequence in multiple formats. We explored the impact of performing up to six independent ABodyBuilder2 modeling runs on TAP metric values, thresholds, and agreement with molecular dynamics simulations.

Data availability

All crystal structures of antibodies were downloaded from SAbDab⁴⁵. Numerical source data for all figures and tables is supplied as Supplementary Data 1. Additional supplementary files can be accessed on Zenodo (10.5281/zenodo.10357509), including the curated structures used for ABodyBuilder2 benchmarking and the ABodyBuilder2 models of all CSTs analysed in this study. ABodyBuilder2 models of natural paired-chain human antibodies were released in the Supplementary Materials of Abanades et al.^31,60.

Code availability

The updated TAP protocol is available on our web application (https://opig.stats.ox.ac.uk/webapps/tap) and the source code is available under a free academic licence via the Vagrant Virtual Machine (https://process.innovation.ox.ac.uk/software/p/15303a/sabbox-academic/1) and Singularity container (https://process.innovation.ox.ac.uk/software/p/20120-a/sabbox-singularity-platform---academic-use/1) releases of our SAbDab-SAbPred codebase. The ABodyBuilder2 source code is available from GitHub (https://github.com/oxpig/ImmuneBuilder), with model weights used in this study available on Zenodo (https://doi.org/10.5281/zenodo.7258552).

References

Raybould, M. I. J. et al. Thera-SAbDab: the therapeutic structural antibody database. Nucleic Acids Res. 48, D383–D388 (2019).
Article PubMed Central Google Scholar
Senior, M. M. Fresh from the biotech pipeline: fewer approvals, but biologics gain share. Nat. Biotechnol. 42, 174–182 (2023).
Google Scholar
Mullard, A. FDA approves 100th monoclonal antibody product. Nat. Rev. Drug Discov. 20, 491–495 (2021).
Article CAS PubMed Google Scholar
Lefranc, M.-P. Nomenclature of the Human Immunoglobulin Genes. Curr. Protoc. Immunol. 40, 1–37 (2001).
Google Scholar
Giudicelli, V., Chaume, D. & Lefranc, M.-P. IMGT/GENE-DB: a comprehensive database for human and mouse immunoglobulin and T cell receptor genes. Nucleic Acids Res. 33, D256–D261 (2005).
Article CAS PubMed Google Scholar
Rees, A. R. Understanding the human antibody repertoire. mAbs 12, 1729683 (2020).
Article PubMed PubMed Central Google Scholar
Smith, K. et al. Antigen nature and complexity influence human antibody light chain usage and specificity. Vaccine 34, 2813–2820 (2016).
Article CAS PubMed PubMed Central Google Scholar
Raybould, M. I. J., Kovaltsuk, A., Marks, C. & Deane, C. M. CoV-AbDab: the coronavirus antibody database. Bioinformatics 37, 734–735 (2021).
Article CAS PubMed Google Scholar
Robinson, S. A. et al. Epitope profiling using computational structural modelling demonstrated on coronavirus-binding antibodies. PLoS Comput. Biol. 17, e1009675 (2021).
Article CAS PubMed PubMed Central Google Scholar
Jaffe, D. B. et al. Functional antibodies exhibit light chain coherence. Nature 611, 352–357 (2022).
Article CAS PubMed PubMed Central Google Scholar
Shrock, E. L. et al. Germline-encoded amino acid–binding motifs drive immunodominant public antibody responses. Science 380, eadc9498 (2023).
DeKosky, B. J. et al. Large-scale sequence and structural comparisons of human naive and antigen-experienced antibody repertoires. Proc. Natl Acad. Sci. 113, E2636–E2645 (2016).
Article CAS PubMed PubMed Central Google Scholar
Rawat, P., Prabakaran, R., Kumar, S. & Gromiha, M. M. Exploring the sequence features determining amyloidosis in human antibody light chains. Sci. Rep. 11, 13785 (2021).
Article CAS PubMed PubMed Central Google Scholar
Gibson, W. S. et al. Characterization of the immunoglobulin lambda chain locus from diverse populations reveals extensive genetic variation. Genes Immun. 24, 21–31 (2023).
Article CAS PubMed Google Scholar
Stanfield, R. L., Zemla, A., Wilson, I. A. & Rupp, B. Antibody elbow angles are influenced by their light chain class. J. Mol. Biol. 357, 1566–1574 (2006).
Article CAS PubMed Google Scholar
Kuroda, D., Shirai, H., Kobori, M. & Nakamura, H. Systematic classification of CDR-L3 in antibodies: implications of the light chain subtypes and the VL-VH interface. Proteins 75, 139–146 (2009).
Article CAS PubMed Google Scholar
van der Kant, R. et al. Adaption of human antibody λ and κ light chain architectures to CDR repertoires. Protein Eng. Des. Sel. 32, 109–127 (2019).
Article PubMed PubMed Central Google Scholar
Townsend, C. L. et al. Significant differences in physicochemical properties of human immunoglobulin Kappa and Lambda CDR3 regions. Front. Immunol. 7, 388 (2016).
Article PubMed PubMed Central Google Scholar
Raybould, M. I. J. et al. Five computational developability guidelines for therapeutic antibody profiling. Proc. Natl Acad. Sci. 116, 4025–4030 (2019).
Article CAS PubMed PubMed Central Google Scholar
Molé, C. M., Béné, M. C., Montagne, P. M., Seilles, E. & Faurea, G. C. Light chains of immunoglobulins in human secretions. Clin. Clim. Acta 224, 191–197 (1994).
Article Google Scholar
Kovaltsuk, A. et al. Observed antibody space: a resource for data mining next-generation sequencing of antibody repertoires. J. Immunol. 201, 2502–2509 (2018).
Article CAS PubMed Google Scholar
Lu, R.-M. et al. Development of therapeutic antibodies for the treatment of diseases. J. Biomed. Sci. 27, 1 (2020).
Article CAS PubMed PubMed Central Google Scholar
Laustsen, A. H., Greiff, V., Karatt-Vellatt, A., Muyldermans, S. & Jenkins, T. P. Animal immunization, in vitro display technologies, and machine learning for antibody discovery. Trends Biotechnol. 39, 1263–1273 (2021).
Article CAS PubMed Google Scholar
Teixeira, A. A. R. et al. Drug-like antibodies with high affinity, diversity and developability directly from next-generation antibody libraries. mAbs 13, 1980942 (2021).
Article Google Scholar
Larijani, M. et al. The recombination difference between mouse kappa and lambda segments is mediated by a pair-wise regulation mechanism. Mol. Immunol. 43, 870–881 (2006).
Article CAS PubMed Google Scholar
Lehmann, A. et al. Stability engineering of anti-EGFR scFv antibodies by rational design of a lambda-to-kappa swap of the VL framework using a structure-guided approach. mAbs 7, 1058–1071 (2015).
Article CAS PubMed PubMed Central Google Scholar
Bodi, K. et al. AL-Base: a visual platform analysis tool for the study of amyloidogenic immunoglobulin light chain sequences. Amyloid 16, 1–8 (2009).
Article CAS PubMed PubMed Central Google Scholar
Almagro, J. C., Pedraza-Escalon, M., Arrieta, H. I. & Pérez-Tapia, S. M. Phage display libraries for antibody therapeutic discovery and development. Antibodies 8, 44 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kumar, S. et al. Rational optimization of a monoclonal antibody for simultaneous improvements in its solution properties and biological activity. Prot. Eng. Des. Sel. 31, 313–325 (2018).
Article CAS Google Scholar
Olsen, T. H., Boyles, F. & Deane, C. M. Observed antibody space: a diverse database of cleaned, annotated, and translated unpaired and paired antibody sequences. Protein Sci. 31, 141–146 (2022).
Article CAS PubMed Google Scholar
Abanades, B. et al. ImmuneBuilder: Deep-Learning models for predicting the structures of immune proteins. Commun. Biol. 6, 575 (2023).
Article CAS PubMed PubMed Central Google Scholar
Leem, J., Dunbar, J., Georges, G., Shi, J. & Deane, C. M. ABodyBuilder: Automated antibody structure prediction with data–driven accuracy estimation. mAbs 8, 1259–1268 (2016).
Article CAS PubMed PubMed Central Google Scholar
Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. bioRxiv https://doi.org/10.1101/2021.10.04.463034 (2022).
Ruffolo, J. A., Chu, L.-S., Mahajan, S. P. & Gray, J. J. Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies. Nat. Commun. 14, 2389 (2023).
Article CAS PubMed PubMed Central Google Scholar
Abanades, B., Georges, G., Bujotzek, A. & Deane, C. M. ABlooper: fast accurate antibody CDR loop structure prediction with accuracy estimation. Bioinformatics 38, 1877–1880 (2022).
Article CAS PubMed PubMed Central Google Scholar
Licari, G. et al. Embedding dynamics in intrinsic physicochemical profiles of market-stage antibody-based biotherapeutics. Mol. Pharmaceutics 20, 1096–1111 (2023).
Article CAS Google Scholar
Ahmed, L. et al. Intrinsic physicochemical profile of marketed antibody-based biotherapeutics. Proc. Natl Acad. Sci. 118, e2020577118 (2021).
Article CAS PubMed PubMed Central Google Scholar
Goldstein, L. D. et al. Massively parallel single-cell B-cell receptor sequencing enables rapid discovery of diverse antigen-reactive antibodies. Commun. Biol. 2, 304 (2019).
Article PubMed PubMed Central Google Scholar
Raybould, M. I. J. et al. Public Baseline and shared response structures support the theory of antibody repertoire functional commonality. PLoS Comput. Biol. 17, e1008781 (2021).
Article CAS PubMed PubMed Central Google Scholar
wwPDB Consortium. Protein Data Bank: the single global archive for 3D macromolecular structure data. Nucleic Acids Res. 47, D520–D528 (2019).
Article Google Scholar
Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 23, 3–25 (1997).
Article CAS Google Scholar
Hartung, I. V., Huck, B. R. & Crespo, A. Rules were made to be broken. Nat. Rev. Chem. 7, 3–4 (2023).
Article PubMed Google Scholar
Amimeur, T. et al. Designing feature-controlled humanoid antibody discovery libraries using generative adversarial networks. bioRxiv https://doi.org/10.1101/2020.04.12.024844 (2020).
Dunbar, J. & Deane, C. M. ANARCI: antigen receptor numbering and receptor classification. Bioinformatics 32, 298–300 (2016).
Article CAS PubMed Google Scholar
Dunbar, J. et al. SAbDab: the structural antibody database. Nucleic Acids Res. 42, D1140–D1146 (2014).
Article CAS PubMed Google Scholar
Schneider, C., Raybould, M. I. J. & Deane, C. M. SAbDab in the age of biotherapeutics: updates including SAbDab-nano, the nanobody structure tracker. Nucleic Acids Res. 50, D1368–D1372 (2022).
Article CAS PubMed Google Scholar
Lefranc, M.-P. et al. IMGT unique numbering for immunoglobulin and T cell receptor variable domains and Ig superfamily V-like domains. Dev. Comp. Immunol. 27, 55–77 (2003).
Article CAS PubMed Google Scholar
Shrake, A. & Rupley, J. A. Environment and exposure to solvent of protein atoms. Lysozyme and insulin. J. Mol. Biol. 79, 351–371 (1973).
Article CAS PubMed Google Scholar
Knapp, B., Dunbar, J., Alcala, M. & Deane, C. M. Variable regions of antibodies and T-Cell receptors may not be sufficient in molecular simulations investigating binding. J. Chem. Theory Comput. 13, 3097–3105 (2017).
Article CAS PubMed Google Scholar
Webb, B. & Sali, A. Comparative protein structure modeling using modeller. Curr. Protoc. Bioinform. 54, 5.7.1–5.7.37 (2016).
Article Google Scholar
Eastman, P. et al. OpenMM 7: Rapid development of high per-formance algorithms for molecular dynamics. PLoS Comput. Biol. 13, e1005659 (2017).
Article PubMed PubMed Central Google Scholar
Maier, J. A. et al. ff14sb: Improving the accuracy of protein side chain and backbone parameters from ff99sb. J. Chem. Theory Comput. 11, 3696–3713 (2015).
Article CAS PubMed PubMed Central Google Scholar
Neria, E., Fischer, S. & Karplus, M. Simulation of activation free energies in molecular systems. J. Chem. Phys. 105, 1902–1921 (1996).
Article CAS Google Scholar
Darden, T., York, D. & Pedersen, L. Particle mesh Ewald: AnN ⋅ log(N) method for Ewald sums in large systems. J. Chem. Phys. 98, 10089–10092 (1993).
Article CAS Google Scholar
Miyamoto, S. & Kollman, P. A. Settle: An analytical version of the SHAKE and RATTLE algorithm for rigid water models. J. Comp. Chem. 13, 952–962 (1992).
Article CAS Google Scholar
Ryckaert, J. P., Ciccotti, G. & Berendsen, H. J. C. Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. J. Chem. Phys. 23, 327–341 (1977).
CAS Google Scholar
Hopkins, C. W., Le Grand, S., Walker, R. C. & Roitberg, A. E. Long-time-step molecular dynamics through hydrogen mass repartitioning. J. Chem. Theory Comput. 11, 1864–1874 (2015).
Article CAS PubMed Google Scholar
McGibbon, R. T. et al. MDTraj: a modern open library for the analysis of molecular dynamics trajectories. Biophys. J. 109, 1528–1532 (2015).
Article CAS PubMed PubMed Central Google Scholar
Ye, J., Ma, N., Madden, T. L. & Ostell, J. M. IgBLAST: an immunoglobulin variable domain sequence analysis tool. Nucleic Acids Res. 41, W34–W40 (2013).
Article PubMed PubMed Central Google Scholar
Abanades, B. ImmuneBuilder: Deep-Learning models for predicting the structures of immune proteins [Data set]. Zenodo https://doi.org/10.5281/zenodo.7258553 (2022)

Download references

Acknowledgements

The authors would like to thank Dr. Sandeep Kumar (Boehringer Ingelheim) for critically reviewing the manuscript. This work was supported by a Postdoctoral Research grant funded by Boehringer Ingelheim (MR), and funding from the UK Engineering and Physical Sciences Research Council (OT, reference EP/S024093/1) and the Wellcome Trust (BG, reference 102164/Z/13/Z).

Author information

Authors and Affiliations

Oxford Protein Informatics Group, Department of Statistics, University of Oxford, 24-29 St Giles’, Oxford, OX1 3LB, UK
Matthew I. J. Raybould, Oliver M. Turnbull, Annabel Suter, Bora Guloglu & Charlotte M. Deane

Authors

Matthew I. J. Raybould
View author publications
You can also search for this author in PubMed Google Scholar
Oliver M. Turnbull
View author publications
You can also search for this author in PubMed Google Scholar
Annabel Suter
View author publications
You can also search for this author in PubMed Google Scholar
Bora Guloglu
View author publications
You can also search for this author in PubMed Google Scholar
Charlotte M. Deane
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.R.: designed the research, performed the research, analysed data, wrote and edited the paper. O.T.: performed the research, analysed data, wrote and edited the paper. A.S.: performed the research, analysed data. B.G.: performed the research, analysed data, wrote and edited the paper. C.D.: designed the research, analysed data, wrote and edited the paper, supervised the research.

Corresponding author

Correspondence to Charlotte M. Deane.

Ethics declarations

Competing interests

The authors declare the following competing interests: C.D. discloses part-time employment by Exscientia plc and membership of the Scientific Advisory Board of Fusion Antibodies and AI proteins.

Peer review

Peer review information

Communications Biology thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editors: Theam Soon Lim, Anam Akhtar and Dario Ummarino. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Peer Review File

Supplementary Information

Description of Additional Supplementary Files

Supplementary Data 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Raybould, M.I.J., Turnbull, O.M., Suter, A. et al. Contextualising the developability risk of antibodies with lambda light chains using enhanced therapeutic antibody profiling. Commun Biol 7, 62 (2024). https://doi.org/10.1038/s42003-023-05744-8

Download citation

Received: 09 August 2023
Accepted: 26 December 2023
Published: 08 January 2024
DOI: https://doi.org/10.1038/s42003-023-05744-8

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.