ImmuneBuilder: Deep-Learning models for predicting the structures of immune proteins

Abanades, Brennan; Wong, Wing Ki; Boyles, Fergus; Georges, Guy; Bujotzek, Alexander; Deane, Charlotte M.

doi:10.1038/s42003-023-04927-7

Download PDF

Article
Open access
Published: 29 May 2023

ImmuneBuilder: Deep-Learning models for predicting the structures of immune proteins

Communications Biology volume 6, Article number: 575 (2023) Cite this article

15k Accesses
31 Citations
27 Altmetric
Metrics details

Subjects

Abstract

Immune receptor proteins play a key role in the immune system and have shown great promise as biotherapeutics. The structure of these proteins is critical for understanding their antigen binding properties. Here, we present ImmuneBuilder, a set of deep learning models trained to accurately predict the structure of antibodies (ABodyBuilder2), nanobodies (NanoBodyBuilder2) and T-Cell receptors (TCRBuilder2). We show that ImmuneBuilder generates structures with state of the art accuracy while being far faster than AlphaFold2. For example, on a benchmark of 34 recently solved antibodies, ABodyBuilder2 predicts CDR-H3 loops with an RMSD of 2.81Å, a 0.09Å improvement over AlphaFold-Multimer, while being over a hundred times faster. Similar results are also achieved for nanobodies, (NanoBodyBuilder2 predicts CDR-H3 loops with an average RMSD of 2.89Å, a 0.55Å improvement over AlphaFold2) and TCRs. By predicting an ensemble of structures, ImmuneBuilder also gives an error estimate for every residue in its final prediction. ImmuneBuilder is made freely available, both to download (https://github.com/oxpig/ImmuneBuilder) and to use via our webserver (http://opig.stats.ox.ac.uk/webapps/newsabdab/sabpred). We also make available structural models for ~150 thousand non-redundant paired antibody sequences (https://doi.org/10.5281/zenodo.7258553).

Highly accurate protein structure prediction with AlphaFold

Article Open access 15 July 2021

John Jumper, Richard Evans, … Demis Hassabis

Opportunities and challenges in design and optimization of protein function

Article 02 April 2024

Dina Listov, Casper A. Goverde, … Sarel Jacob Fleishman

An immunophenotype-coupled transcriptomic atlas of human hematopoietic progenitors

Article Open access 21 March 2024

Xuan Zhang, Baobao Song, … H. Leighton Grimes

Introduction

The adaptive immune system in humans is effective at identifying and neutralising a wide range of pathogens. To achieve this, immune cells have developed antigen-specific proteins such as T-cell receptors (TCRs) or, in the case of B-cells, antibodies. While antibodies are capable of binding with great affinity and specificity to the surface of almost any antigen, TCRs target digested pieces of intracellular proteins that are presented on the cell surface by the major histocompatibility complex. Due to their key role in identifying a wide range of antigens, antibodies and TCRs have become proteins of particular interest for therapeutic development, with several TCR drugs in clinical trials¹ and over a hundred approved antibody drugs^2,3. Nanobodies, single-domain antibodies naturally found in organisms such as camelids and sharks, have also received significant interest as therapeutics, with a recently accepted nanobody drug and a number undergoing clinical trials⁴.

All three of these immune proteins are built up from immunoglobulin (Ig) domains with the binding site either sitting across two Ig domains in the case of antibodies (VH and VL) and TCRs (Vα and Vβ), or being found at the tip of one Ig domain, in the case of nanobodies.

The binding site of antibodies and TCRs is concentrated in six loops, three on each of the two Ig domains known collectively as the complementarity-determining regions (CDRs). In nanobodies, the binding site is concentrated in only three CDR loops on its single Ig domain. These CDR loops show variable length, composition and structure with the most variable being CDR-H3 in the case of antibodies and nanobodies⁵. This loop also tends to be the largest contributor to the binding site⁶. An example of the structure of an antibody variable domain, a TCR variable domain and a nanobody are shown in Fig. 1.

**Fig. 1: Structural representation of an antibody variable domain (PDB code 1GIG), a TCR variable domain (PDB code 7SU9) and a nanobody (PDB code 4LAJ) with labelled regions.**

Despite the similarities in the global structure of antibodies, TCRs and nanobodies, their binding sites are known to have distinct properties and their CDRs have different length distributions as well as occupying distinct areas of structural space^7,8.

As with many proteins, the availability of sequence data far outstrips structural information^{9,10,11,12,13}, but structural information allows for a more in-depth understanding than studies focused on sequence alone¹⁴. For example, knowledge of CDR loop conformations has been used to help identify antibodies that bind to similar targets¹⁵, while accurate knowledge of side chain atom placement can aid in identifying key interactions in antibody-antigen binding^16,17.

Experimental structure determination is time-consuming and expensive¹⁸. Computationally predicted structural models can be used to circumvent this problem. This is particularly the case for immune proteins, as next-generation sequencing of immune receptor repertories is now routinely used in the study of the adaptive immune system^19,20. These methods enable researchers to obtain millions of sequences per study, making structural analysis of this data a challenge. For example, Observed Antibody Space (OAS) contains over two billion antibody heavy chain sequences and is growing rapidly^9,10. If this huge amount of sequence data is to even partially be analysed in terms of structure, rapid accurate methods for the prediction of antibody structures are required.

AlphaFold2 is a deep learning method that has revolutionised the field of computational protein structure prediction, achieving near experimental accuracy for a large number of single-chain proteins²¹. This was then extended to AlphaFold-Multimer to accurately predict protein complexes²². Many methods have followed from AlphaFold2 and AlphaFold-Multimer but these remain the de facto gold standard for single domains and complexes^23,24,25.

The AlphaFold2 model can be divided into two main steps: In the first step, the Evoformer module is used to extract evolutionary couplings from alignments of many protein sequences into information-rich embeddings. It then uses these embeddings in the structure module to predict the 3D structure of a given protein sequence.

Structure prediction methods specific to a certain class of protein tend to outperform more general methods^26,27. By using knowledge specific to a type of protein, they can easily predict the conserved regions in that protein allowing greater focus on harder details. For example, DeepH3 was shown to outperform TrRosetta on antibodies^28,29, while Nanonet obtains results of similar accuracy to AlphaFold2 on nanobodies with a far simpler architecture³⁰. More recent examples of this are IgFold²⁵ and EquiFold³¹, where the authors trained antibody-specific models that predict structures of comparable accuracy to AlphaFold-Multimer.

In this paper, we present ImmuneBuilder, a set of deep learning models developed to predict the structure of proteins of the immune system. By training on specific protein types, we are able to create rapid accurate models, enabling ImmuneBuilder to be routinely used on large sequence data sets. We have built three models, ABodyBuilder2, an antibody-specific model, NanoBodyBuilder2, a nanobody-specific model and TCRBuilder2 a TCR-specific model. We show that these methods perform at least as well as state-of-the-art methods for their respective protein types while predicting structures in a fraction of the time. We also demonstrate that these methods both accurately predict details of the structure and create physically and biologically sensible structures.

The three ImmuneBuilder models are made freely available for download and as web servers.

Results

Throughout the results section we will focus on the results for ABodyBuilder2 (AB2) on antibodies with the results for NanoBodyBuilder2 (nanobodies) and TCRBuilder2 (TCRs) discussed in Supplementary Notes 1 and 2. All three methods show qualitatively similar results.

We compare ABodyBuilder2 to several other methods. These methods are a homology modelling method (the original version of ABodyBuilder³² (ABB)), one general protein structure prediction method (AlphaFold-Multimer²² (AFM)), and three antibody-specific methods (ABlooper³³ (ABL), IgFold²⁵ (IgF) and EquiFold³¹ (EqF)). As a benchmark, we selected a non-redundant set of 34 antibody structures recently added to SAbDAb^11,13 (see methods). This was done so none of the antibody structures in the benchmark would have been seen during training for any of the benchmarked methods. To give a complete picture of how these methods perform, we carryout a comprehensive benchmark using five different measures. Figure 2 shows an example of a prediction by ABodyBuilder2, highlighting important aspects of structural modelling.

**Fig. 2: Example of an antibody structure predicted with ABodyBuilder2.**

Accuracy of prediction

To measure how accurately the backbone atoms are predicted, the RMSD between predicted and true structures for each antibody region was compared. The RMSD for each CDR and framework is computed by aligning each protein chain to the crystal structure and then calculating the RMSD between the C_α, N, C and C_β atoms. Regions are defined using the IMGT numbering scheme³⁴. The results of this analysis are shown in Table 1.

Table 1 Comparison between ABodyBuilder, ABlooper, IgFold, EquiFold, AlphaFold-Multimer and ABodyBuilder2 at predicting the backbone atoms for each antibody chain.

Full size table

The experimental error in protein structures generated via X-ray crystallography has been estimated to be around 0.6Å for regions with organised secondary structures (such as the antibody frameworks) and around 1Å for protein loops³⁵. On average, the predicted structures for most of the antibody regions using any method have errors within the range of what would be expected from experimentally resolved crystal structures. The exception to this is CDR-H3, where all methods make the worst predictions.

ABodyBuilder2 and AlphaFold-Multimer are the most accurate methods at predicting the structure of CDR-H3 (RMSD of 2.81 Å and 2.90 Å, respectively). EquiFold, IgFold and ABlooper generate structures with CDR-H3 loops around 10% less accurate than ABodyBuilder2 and AlphaFold-Multimer. The worst method for predicting CDR-H3 loops is the original version of ABodyBuilder, showcasing how deep learning has improved our ability to model. Supplementary Note 4 explores how the accuracy of ABodyBuilder2 for CDR-H3 prediction correlates to the maximum sequence identity in the training set. A comparison of the CDR-H3 RMSD for each individual structure in the test set between each pair of methods is shown in Supplementary Fig. 4.

Tables 2 and 3 show how accurate TCRBuilder2 and NanoBodyBuilder2 are at predicting the position of atoms in the backbone. We compare them to homology modelling methods (RepertoireBuilder³⁶ and TCRBuilder²⁶ for TCR modelling, and MOE³⁷ and ABodyBuilder³² for nanobody modelling) and machine learning methods AlphaFold-Multimer²² for TCRs and AlphaFold2²¹ for nanobodies. Supplementary Fig. 1 provides a visual representation of the potential differences in the predicted conformation of the CDR-H3 region using NanoBodyBuilder2 and ABodyBuilder2, even when they correspond to the same sequence. Full results for TCRBuilder2 and NanoBodyBuilder2 and details on their respective test sets are given in Supplementary Notes 1 and 2.

Table 2 Comparison between TCRBuilder, RepertoireBuilder, AlphaFold-Multimer, ABodyBuilder2 and TCRBuilder2 at predicting the backbone atoms for each TCR chain.

Full size table

Table 3 Comparison between ABodyBuilder, MOE, AlphaFold2 and NanoBodyBuilder2 at predicting the backbone atoms of nanobodies.

Full size table

Heavy and light chain packing

As described in the introduction, in antibodies the binding site sits across the heavy and light chain variable regions (VH and VL). With half of the CDRs on each chain, the relative VH-VL orientation can have an impact on the structure of the binding site.

To quantify how accurate each method is at predicting the relative orientation between chains, in Table 4 we show the average absolute error in the five angles (Hl, HC1, HC2, LC1, LC2) and distance (dc) that fully characterise VH-VL orientation³⁸. A brief description of how these values are defined is given in Supplementary Note 3, for a more complete description see ref. ³⁸. The results for TCR domains are given in Supplementary Table 2.

Table 4 Comparison of VH-VL orientation between ABodyBuilder (ABB), ABlooper (ABL), IgFold (IgF), EquiFold (EqF), AlphaFold-Multimer (AFM) and ABodyBuilder2 (AB2).

Full size table

As an upper bound for the accuracy of predicted structures, the average standard deviation of the VH-VL orientation measurements in 55 antibodies with structures resolved over five times is shown in Table 4. In the original study³⁸, the vector dc was chosen as an axis as it was found to be the most conserved amongst antibody structures. All of the benchmarked methods predict this distance with very high accuracy. All methods are also accurate at predicting the angles, with small errors with respect to what is observed in experiments. However, small deviations in these angles will still have an impact on the structure of the binding site. ABodyBuilder2 is on average the most accurate method at heavy and light chain packing by a small margin.

Side chain and chemical surface accuracy

During binding, an antigen will mostly form interactions via side chain atoms on the surface. Therefore to be able to study antigen binding, predicted antibody structures must accurately model the position of side chain atoms and whether they are exposed on the surface or buried. To benchmark the accuracy of side-chain modelling we use a method similar to ref. ³⁹, where a side-chain torsion angle is considered correct if it is within 40 degrees of the true conformation. The original implementation of ABodyBuilder will occasionally fail to predict a side chain, this is treated as an incorrect prediction. A residue is labelled as buried if its relative solvent accessibility (calculated as described in ref. ⁴⁰) is below 7.5%. The results of this analysis are given in Table 5 for ABodyBuilder2 and in Supplementary Tables 1 and 2 for Nanobodies and TCRs respectively.

Table 5 Comparison of surface accuracy for ABodyBuilder (ABB), ABlooper (ABL), IgFold (IgF), EquiFold (EqF), AlphaFold-Multimer (AFM) and ABodyBuilder2 (AB2).

Full size table

As ABlooper and IgFold are deep learning methods that only predict the backbone (leaving side chain prediction to OpenMM⁴¹ and Rosetta⁴², respectively), it is perhaps not surprising that they are the least accurate at modelling the chemical surface. EquiFold, AlphaFold-Multimer and ABodyBuilder2, all of which output all-atom structures, predict the χ1 and χ2 side chain atoms with high accuracy while struggling to model longer side chains. The original implementation of ABodyBuilder predicts side chains with comparable accuracy to AlphaFold-Multimer and ABodyBuilder2. All methods are highly accurate at predicting whether a residue is exposed or buried, EquiFold is the most accurate.

Physical plausibility and accurate stereochemistry

Although deep learning models are trained on crystal structures, they will occasionally predict conformations that are very rare or do not occur in nature. We next check for the presence of steric clashes, cis-peptide bonds, D-amino acids, or bonds with nonphysical lengths in the models generated by each method. For bond lengths, only the peptide bond is checked as all other bond lengths are fixed to their literature value in all benchmarked methods but ABlooper and EquiFold.

ABodyBuilder2 and AlphaFold-Multimer both generate structures of comparable quality to experimentally resolved ones, whereas IgFold appears to generate a number of cis-peptide bonds and clashes even after being refined with Rosetta⁴². EquiFold does not use an energy-based method to refine its predicted structures and hence all of the structures it generates are unphysical. This shows that a refinement step is still necessary to ensure structures generated by deep learning-based methods are realistic (Table 6).

Table 6 Quality check for models generated using ABodyBuilder (ABB), ABlooper (ABL), IgFold (IgF), EquiFold (EqF), AlphaFold-Multimer (AFM) and ABodyBuilder2 (AB2).

Full size table

The results for TCRs and nanobodies are shown in Supplementary Tables 1 and 2, respectively.

Computational cost

The original version of AlphaFold-Multimer is by far the most computationally expensive of the benchmarked methods. It requires over one terabyte of sequence data and takes around three hours to generate one structure when run on five CPUs. Large speed-ups can be obtained by reducing the size of the sequence database, using faster sequence alignment algorithms, or using GPUs^43,44. Even with these modifications, it takes around thirty minutes on a GPU to generate a single structure. All other methods benchmarked can be run on five CPUs in under a minute, with the fastest being EquiFold due to its lack of a refinement step. This makes them all well suited for high throughput structural modelling of next-generation sequencing data. ABodyBuilder2 can also be sped up significantly by using a GPU taking around five seconds to generate an antibody structure on a Tesla P100.

Error estimation

ABodyBuilder2 predicts four structures for each antibody. We found that the diversity between predictions, as in ABlooper, can be used to estimate the uncertainty in the final prediction. If the structures predicted by all four models disagree in a region then the final prediction for this region is likely to be incorrect. This allows ABodyBuilder2 to give a confidence score for each residue that can be used to filter for incorrectly modelled structures. In Fig. 3 we show how the root mean squared predicted error for CDR-H3 residues correlates with CDR-H3 RMSD.

**Fig. 3: Scatter plot showing the CDR-H3 RMSD against the root mean squared predicted error for all structures in the benchmark.**

A low predicted error does not necessarily indicate an accurate structure. However, a high predicted error works as a good filter for removing inaccurate models. For example, if a predicted error cut-off of around 1 Å is set for the current benchmark, it would remove five structures with an average RMSD of 4.46 Å. The average CDR-H3 RMSD for the remaining set would then be 2.53 Å.

Discussion

We present ImmuneBuilder, a set of three open-source and freely available tools for modelling immune proteins capable of rapidly generating accurate antibody, TCR, and nanobody structures. ImmuneBuilder can produce structures of antibodies and TCRs with accuracy comparable to AlphaFold-Multimer while being over a hundred times faster and without the need for large sequence databases or multiple sequence alignments. ABodyBuilder2 is shown to be the most accurate of the antibody-specific tools and the only one to consistently predict structures with correct stereochemistry.

The comparison with homology modelling methods, such as ABodyBuilder, shows the benefits that deep learning has brought to the field of antibody structure prediction. However, all methods still struggle to accurately predict the conformation of CDR-H3, suggesting that models capable of predicting multiple conformations may be required to accurately capture this loop. Deep learning methods also still struggle to consistently predict physically plausible structures. This challenge can be addressed by using physics-based methods, such as restrained energy minimisation, but for fast methods like ABodyBuilder2 this greatly increases computational cost.

By measuring the variability between predictions, ImmuneBuilder is able to provide an error estimate for each residue. In combination with its prediction speed and accuracy, the ability to filter for incorrect models makes it a useful tool for incorporating structural information into data from next-generation sequencing experiments.

To further demonstrate the usefulness of ImmuneBuilder, we predicted the structure of around 148 thousand non-redundant paired antibody sequences from OAS¹⁰ and make these freely available at (https://doi.org/10.5281/zenodo.7258553).

Methods

In the methods section we describe in detail the data and models for creating ABodyBuilder2. An overview of the steps used to predict an antibody structure with ABodyBuilder2 is shown in Fig. 4. Details for NanoBodyBuilder2 and TCRBuilder2 are given in Supplementary Notes 1 and 2.

**Fig. 4: Pipeline used to predict structures by ABodyBuilder2.**

Data

The data used to train, test, and validate ABodyBuilder2 was extracted from SAbDab¹¹, a database containing all antibody structures in the PDB⁴⁵. The training data was extracted on the 31st of July 2021 resulting in a total of 7084 structures. Filters were used to ensure structures in the training data had both the VH and VL chains, were not missing residues other than at the start and end of the chain, and had a resolution of 3.5 Å or better. Structures with the same amino acid sequence were kept in the training data to expose the model to antibodies with multiple conformations. As a validation set, we used the set of 49 antibodies in the Rosetta Antibody Benchmark. Structures with the same sequence as antibodies in the validation set were removed from the training set.

For the test data, we extracted all PDB files containing antibody Fv structures in SAbDab added between the 1st of August 2021 and the 1st of June 2022. Only crystal structures resolved by X-ray diffraction and with a resolution better than 2.3 Å were kept. A set of non-redundant Fvs were then selected from these and further filtered to remove antibodies with CDR-H3s longer than 22 amino acids and structures with missing residues. Finally, it was ensured that there were no structures with the same sequence in the test, training, and validation sets. This resulted in the set of 34 Fvs that was used to benchmark ABodyBuilder2 against other methods. A comparison of the maximum sequence identity to the training set against CDR-H3 RMSD for each Fv in the test set is shown in Supplementary Fig. 3. A full list of PDB codes for structures used in the training, validation and test set is given at https://github.com/oxpig/ImmuneBuilder.

Deep learning architecture

The architecture of the deep learning model behind ABodyBuilder2 is an antibody-specific version of the structure module in AlphaFold-Multimer with several modifications. Residues are treated as rigid bodies, each one defined by a 3D point in space and a matrix representing its orientation. The input node features are a one-hot encoded representation of the sequence and the input edge features are relative positional encodings. At the start, all residues are set at the origin with the same orientation.

The model is composed of eight update blocks that run sequentially. At every iteration, the node features are first updated in a structurally aware way using the Invariant Point Attention layer, and then residue coordinates and orientations are updated using the Backbone Update layer. For further details on how these layers work, see the original AlphaFold2 paper²¹. Finally, torsion angles for each residue are predicted from node features and are then used to reconstruct an all-atom structure using hard-coded rules. Unlike AlphaFold-Multimer, all blocks have their own weights.

The main term in the loss function is the Frame Aligned Point Error (FAPE) loss, which quantifies how structurally similar the true and predicted structures are in the local reference frame of each residue. For details see ref. ²¹. In AlphaFold2, the FAPE loss is clamped at 10 Å focusing on correctly placing residues relative to those closest to it. Similar to AlphaFold-Multimer, a modified version of FAPE loss is used for ABodyBuilder2 in which more focus is given to correctly placing CDR residues relative to the framework. This is achieved by clamping the FAPE loss at 30 Å when it is calculated between framework and CDR residues and at 10 Å otherwise. The final loss term is a sum of the average backbone FAPE loss after every backbone update and the full atom FAPE loss from the final structure.

As is done in AlphaFold2, a structural violation term is added to the loss function. This penalises nonphysical conformations with a term for bond angles, bond distances, and clashing heavy atoms. In our models, this term was reduced by an order of magnitude with respect to AlphaFold2 as this was found to slightly improve prediction accuracy without significantly harming the physicality of the final prediction. Finally, the side-chain and backbone torsion angle loss from AlphaFold2 is also used.

Each model was trained in two stages. In the first stage, the structural violations term of the loss function was set to zero and a dropout of 10% was used. The RAdam optimiser⁴⁶ was used with a cosine annealing scheduler with warm restarts every 50 epochs, learning rates between 1e-3 and 1e-4, and a weight decay of 1e-3. For the second stage, the structural violations loss is added and dropout is set to zero. RAdam is also used for this stage with a fixed learning rate of 1e-4 and weight decay of 1e-3. To aid with stability, the norm of gradients is clipped to a value of 0.1 in the second stage of training. For both stages, a batch size of 64 is used and training is stopped if there was no improvement in the validation set after 100 epochs. On average, training took around four weeks for each model on a single GPU.

Model selection

ABodyBuilder2 is composed of four deep-learning models trained independently to predict antibody structures. To select the best prediction, we align all predicted structures and choose as the final prediction the closest one to the average. This reduces the method’s sensitivity to small fluctuations in the training set. It also results in a small improvement in prediction accuracy.

Structural refinement

Although the models are encouraged to predict physically plausible structures during training, they will occasionally produce structures with steric clashes, incorrect peptide bond lengths, or cis-peptide bonds. A restrained energy minimisation procedure with OpenMM is used to resolve these issues. The AMBER14 protein force field⁴⁷ with an added harmonic force term to keep the heavy atoms of the backbone close to their original positions is used. In the rare case when two side chain atoms are predicted by the model to be within 0.2 Å of each other, the clashing side chains are deleted and remodelled using pdbfixer⁴¹. AMBER14 does not explicitly consider chirality, so when the predicted structure contains peptide bonds in the cis configuration, an additional force is added to flip their torsion angles into the trans configuration.

By design, the ABodyBuilder2 deep learning model will always generate amino acids in their L-stereoisomeric form. However, it was found that during energy minimisation residues are occasionally flipped into their D-stereoisomer. To fix this, a method similar to that in ref. ⁴⁸ is used. First, the chirality at the carbon alpha centre of each D-stereoisomeric residue is fixed by flipping the hydrogen atom. The structure is then relaxed keeping the flipped hydrogen atoms in place before a final minimisation.

Benchmarked methods

We compared ABodyBuilder2 to five other methods: AlphaFold-Multimer, EquiFold, IgFold, the original version of ABodyBuilder and ABlooper. AlphaFold-Multimer was run using the freely available version of code²². It was run using the weights from version 2.2 and without the use of templates. The effect of templates on antibody structure prediction is shown in Supplementary Table 3. This generated 25 structures per antibody out of which the top-ranked was selected for the benchmark. The public version of their respective code bases (as of December 15th) was used to generate EquiFold³¹ and IgFold²⁵ models. As in their paper, Rosetta⁴² is used to minimise IgFold models. The original version of ABodyBuilder³² was run by using the SAbBox Singularity container (https://process.innovation.ox.ac.uk/software/p/20120-a/sabbox-singularity-platform—academic-use/1) from July 2022 excluding all templates with a sequence identity of 99% or higher. ABlooper³³ (version 1.1.2) was run to remodel the CDR loops from the ABodyBuilder predictions. Structures generated by all methods were numbered using ANARCI⁴⁹.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The data used to generate the ImmuneBuilder models was extracted from public repositories such as SAbDab¹¹ and STCRDab¹². All data generated from this study is available in the public repository located at https://doi.org/10.5281/zenodo.7258553.

Code availability

Source code for the ImmuneBuilder models, trained weights and inference script are available under an open-source license at https://github.com/oxpig/ImmuneBuilder.

References

Kingwell, K. T cell receptor therapeutics hit the immuno-oncology stage. Nat. reviews. Drug Discov. https://www.nature.com/articles/d41573-022-00073-7 (2022).
Kaplon, H., Chenoweth, A., Crescioli, S. & Reichert, J. M. Antibodies to watch in 2022. mAbs 14, 2014296 (2022).
Article PubMed PubMed Central Google Scholar
Lu, R.-M. et al. Development of therapeutic antibodies for the treatment of diseases. J. Biomed. Sci. 27, 1–30 (2020).
Article CAS PubMed PubMed Central Google Scholar
Yang, E. Y. & Shah, K. Nanobodies: next generation of cancer diagnostics and therapeutics. Front. Oncol. 10, 1182 (2020).
Article CAS PubMed PubMed Central Google Scholar
Regep, C., Georges, G., Shi, J., Popovic, B. & Deane, C. M. The H3 loop of antibodies shows unique structural characteristics. Proteins Struct. Funct., Bioinform. 85, 1311–1318 (2017).
Article CAS Google Scholar
Tsuchiya, Y. & Mizuguchi, K. The diversity of H3 loops determines the antigen-binding tendencies of antibody CDR loops. Protein Sci. 25, 815–825 (2016).
Article CAS PubMed PubMed Central Google Scholar
Wong, W. K., Leem, J. & Deane, C. M. Comparative analysis of the CDR loops of antigen receptors. Front. Immunol. 10, 2454 (2019).
Article CAS PubMed PubMed Central Google Scholar
Mitchell, L. S. & Colwell, L. J. Comparative analysis of nanobody sequence and structure data. Proteins: Struct. Funct. Bioinform. 86, 697–706 (2018).
Article CAS Google Scholar
Kovaltsuk, A. et al. Observed antibody space: a resource for data mining next-generation sequencing of antibody repertoires. J. Immunol. 201, 2502–2509 (2018).
Article CAS PubMed Google Scholar
Olsen, T. H., Boyles, F. & Deane, C. M. Observed antibody space: a diverse database of cleaned, annotated, and translated unpaired and paired antibody sequences. Protein Sci. 31, 141–146 (2022).
Article CAS PubMed Google Scholar
Dunbar, J. et al. SAbDab: the structural antibody database. Nucleic Acids Res. 42, D1140–D1146 (2014).
Article CAS PubMed Google Scholar
Leem, J., de Oliveira, S. H. P., Krawczyk, K. & Deane, C. M. STCRDab: the structural T-cell receptor database. Nucleic Acids Res. 46, D406–D412 (2018).
Article CAS PubMed Google Scholar
Schneider, C., Raybould, M. I. & Deane, C. M. SAbDab in the age of biotherapeutics: updates including SAbDab-nano, the nanobody structure tracker. Nucleic Acids Res. 50, D1368–D1372 (2022).
Article CAS PubMed Google Scholar
Chiu, M. L., Goulet, D. R., Teplyakov, A. & Gilliland, G. L. Antibody structure and function: the basis for engineering therapeutics. Antibodies 8, 55 (2019).
Article CAS PubMed PubMed Central Google Scholar
Robinson, S. A. et al. Epitope profiling using computational structural modelling demonstrated on coronavirus-binding antibodies. PLoS Comput. Biol. 17, e1009675 (2021).
Article CAS PubMed PubMed Central Google Scholar
Ambrosetti, F., Jiménez-García, B., Roel-Touris, J. & Bonvin, A. M. Modeling antibody-antigen complexes by information-driven docking. Structure 28, 119–129 (2020).
Article CAS PubMed Google Scholar
Schneider, C., Buchanan, A., Taddese, B. & Deane, C. M. DLAB: deep learning methods for structure-based virtual screening of antibodies. Bioinformatics 38, 377–383 (2021).
Article PubMed Central Google Scholar
Slabinski, L. et al. The challenge of protein structure determination-lessons from structural genomics. Protein Sci. 16, 2472–2482 (2007).
Article CAS PubMed PubMed Central Google Scholar
Brown, A. J. et al. Augmenting adaptive immunity: progress and challenges in the quantitative engineering and analysis of adaptive immune receptor repertoires. Mol. Syst. Des. Eng. 4, 701–736 (2019).
Article CAS Google Scholar
Nielsen, S. C. & Boyd, S. D. Human adaptive immune receptor repertoire analysis-past, present, and future. Immunol. Rev. 284, 9–23 (2018).
Article CAS PubMed Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article CAS PubMed PubMed Central Google Scholar
Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. bioRxiv (2021).
Lin, Z. et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. bioRxiv (2022).
Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).
Article CAS PubMed PubMed Central Google Scholar
Ruffolo, J. A., Chu, L.-S., Mahajan, S. P. & Gray, J. J. Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies. Nat. Commun. 14, 2389 (2023).
Article CAS PubMed PubMed Central Google Scholar
Wong, W. K. et al. TCRBuilder: multi-state T-cell receptor structure prediction. Bioinformatics 36, 3580–3581 (2020).
Article CAS PubMed Google Scholar
Ruffolo, J. A., Sulam, J. & Gray, J. J. Antibody structure prediction using interpretable deep learning. Patterns 3, 100406 (2022).
Article CAS PubMed Google Scholar
Ruffolo, J. A., Guerra, C., Mahajan, S. P., Sulam, J. & Gray, J. J. Geometric potentials from deep learning improve prediction of CDR H3 loop structures. Bioinformatics 36, i268–i275 (2020).
Article CAS PubMed PubMed Central Google Scholar
Yang, J. et al. Improved protein structure prediction using predicted interresidue orientations. Proc. Natl Acad. Sci. 117, 1496–1503 (2020).
Article CAS PubMed PubMed Central Google Scholar
Cohen, T., Halfon, M. & Schneidman-Duhovny, D. Nanonet: rapid and accurate end-to-end nanobody modeling by deep learning. Front. Immunol. 13, 958584 (2022).
Article CAS PubMed PubMed Central Google Scholar
Lee, J. H. et al. Equifold: Protein structure prediction with a novel coarse-grained structure representation. bioRxiv (2022).
Leem, J., Dunbar, J., Georges, G., Shi, J. & Deane, C. M. ABodyBuilder: automated antibody structure prediction with data-driven accuracy estimation. MAbs 8, 1259–1268 (2016).
Article CAS PubMed PubMed Central Google Scholar
Abanades, B., Georges, G., Bujotzek, A. & Deane, C. M. ABlooper: fast accurate antibody CDR loop structure prediction with accuracy estimation. Bioinformatics 38, 1877–1880 (2022).
Article CAS PubMed PubMed Central Google Scholar
Lefranc, M.-P. et al. IMGT unique numbering for immunoglobulin and T cell receptor variable domains and Ig superfamily V-like domains. Deve. Comp. Immunol. 27, 55–77 (2003).
Article CAS Google Scholar
Eyal, E., Gerzon, S., Potapov, V., Edelman, M. & Sobolev, V. The limit of accuracy of protein modeling: influence of crystal packing on protein structure. J. Mol. Biol. 351, 431–442 (2005).
Article CAS PubMed Google Scholar
Schritt, D. et al. Repertoire builder: high-throughput structural modeling of b and t cell receptors. Mol. Syst. Des. Eng. 4, 761–768 (2019).
Article CAS Google Scholar
Maier, J. K. & Labute, P. Assessment of fully automated antibody homology modeling protocols in molecular operating environment. Proteins: Struct., Funct. Bioinforma 82, 1599–1610 (2014).
Article CAS Google Scholar
Dunbar, J., Fuchs, A., Shi, J. & Deane, C. M. ABangle: characterising the VH-VL orientation in antibodies. Protein Eng., Des. Select. 26, 611–620 (2013).
Article CAS Google Scholar
Leem, J., Georges, G., Shi, J. & Deane, C. M. Antibody side-chain conformations are position-dependent. Proteins: Struct., Funct., Bioinforma. 86, 383–392 (2018).
Article CAS Google Scholar
Tien, M. Z., Meyer, A. G., Sydykova, D. K., Spielman, S. J. & Wilke, C. O. Maximum allowed solvent accessibilites of residues in proteins. PloS One 8, e80635 (2013).
Article PubMed PubMed Central Google Scholar
Eastman, P. et al. OpenMM 7: rapid development of high-performance algorithms for molecular dynamics. PLoS Comput. Biol. 13, e1005659 (2017).
Article PubMed PubMed Central Google Scholar
Alford, R. F. et al. The Rosetta all-atom energy function for macromolecular modeling and design. J. Chem. Theory Comput. 13, 3031–3048 (2017).
Article CAS PubMed PubMed Central Google Scholar
Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).
Article CAS PubMed Google Scholar
Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).
Article CAS PubMed PubMed Central Google Scholar
Berman, H. M. et al. The protein data bank. Nucleic Acids Res. 28, 235–242 (2000).
Article CAS PubMed PubMed Central Google Scholar
Liu, L. et al. On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019).
Maier, J. A. et al. ff14SB: improving the accuracy of protein side chain and backbone parameters from ff99SB. J. Chem. Theory Comput. 11, 3696–3713 (2015).
Article CAS PubMed PubMed Central Google Scholar
Schreiner, E., Trabuco, L. G., Freddolino, P. L. & Schulten, K. Stereochemical errors and their implications for molecular dynamics simulations. BMC Bioinform. 12, 1–9 (2011).
Article Google Scholar
Dunbar, J. & Deane, C. M. ANARCI: antigen receptor numbering and receptor classification. Bioinformatics 32, 298–300 (2016).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This work was funded by the Engineering and Physical Sciences Research Council (EPSRC) with grant number EP/S024093/1 and Roche.

Author information

Authors and Affiliations

Department of Statistics, University of Oxford, Oxford, UK
Brennan Abanades, Fergus Boyles & Charlotte M. Deane
Large Molecule Research, Roche Pharma Research and Early Development, Roche Innovation Center Munich, Penzberg, Germany
Wing Ki Wong, Guy Georges & Alexander Bujotzek

Authors

Brennan Abanades
View author publications
You can also search for this author in PubMed Google Scholar
Wing Ki Wong
View author publications
You can also search for this author in PubMed Google Scholar
Fergus Boyles
View author publications
You can also search for this author in PubMed Google Scholar
Guy Georges
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Bujotzek
View author publications
You can also search for this author in PubMed Google Scholar
Charlotte M. Deane
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

B.A. and C.M.D. conceived the project and designed the study with input from all authors. B.A. designed and implemented the deep learning model. B.A. and W.K.W. ran ablation studies to optimise the model architecture and training procedure. B.A. trained the final models and implemented the final version of the code. B.A. compared ImmuneBuilder against other methods and compiled the results. F.B. made a web-server to run the model and visualise predictions. B.A. and C.M.D. wrote the manuscript with input from all authors. C.M.D., W.K.W., G.G., and A.B. supervised the project.

Corresponding author

Correspondence to Charlotte M. Deane.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Biology thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editors: Zhijuan Qiu and Gene Chong. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Peer Review File

Supplementary Information

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Abanades, B., Wong, W.K., Boyles, F. et al. ImmuneBuilder: Deep-Learning models for predicting the structures of immune proteins. Commun Biol 6, 575 (2023). https://doi.org/10.1038/s42003-023-04927-7

Download citation

Received: 15 December 2022
Accepted: 11 May 2023
Published: 29 May 2023
DOI: https://doi.org/10.1038/s42003-023-04927-7

This article is cited by

Opportunities and challenges in design and optimization of protein function
- Dina Listov
- Casper A. Goverde
- Sarel Jacob Fleishman
Nature Reviews Molecular Cell Biology (2024)
Contextualising the developability risk of antibodies with lambda light chains using enhanced therapeutic antibody profiling
- Matthew I. J. Raybould
- Oliver M. Turnbull
- Charlotte M. Deane
Communications Biology (2024)
Assessing antibody and nanobody nativeness for hit selection and humanization with AbNatiV
- Aubin Ramon
- Montader Ali
- Pietro Sormanni
Nature Machine Intelligence (2024)
Adaptive immune receptor repertoire analysis
- Vanessa Mhanna
- Habib Bashour
- Encarnita Mariotti-Ferrandiz
Nature Reviews Methods Primers (2024)
Reducing Immunogenicity by Design: Approaches to Minimize Immunogenicity of Monoclonal Antibodies
- Chantal T. Harris
- Sivan Cohen
BioDrugs (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Accuracy of prediction

Heavy and light chain packing

Side chain and chemical surface accuracy

Physical plausibility and accurate stereochemistry

Computational cost

Error estimation

Discussion

Methods

Data

Deep learning architecture

Model selection

Structural refinement

Benchmarked methods

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links