Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

Linguistics-based formalization of the antibody language as a basis for antibody language models

Abstract

Apparent parallels between natural language and antibody sequences have led to a surge in deep language models applied to antibody sequences for predicting cognate antigen recognition. However, a linguistic formal definition of antibody language does not exist, and insight into how antibody language models capture antibody-specific binding features remains largely uninterpretable. Here we describe how a linguistic formalization of the antibody language, by characterizing its tokens and grammar, could address current challenges in antibody language model rule mining.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Integration of a linguistic formalization of antibody sequences into antibody LMs.
Fig. 2: Linguistic formalization as a solution to current challenges in antibody-specificity prediction with LMs.
Fig. 3: Shared properties between linguistic and antibody sequences.
Fig. 4: The formalization of the antibody language.

Similar content being viewed by others

References

  1. Burnet, M. Auto-Immunity and Auto-Immune Disease (Springer, 1972); https://doi.org/10.1007/978-94-011-8095-5

  2. Jerne, N. K. The generative grammar of the immune system. Science 229, 1057–1059 (1985).

    Article  Google Scholar 

  3. Tonegawa, S. Somatic generation of antibody diversity. Nature 302, 575–581 (1983).

    Article  Google Scholar 

  4. Landsteiner, K. The Specificity of Serological Reactions (Harvard Univ. Press, 1945).

  5. Akbar, R. et al. A compact vocabulary of paratope–epitope interactions enables predictability of antibody–antigen binding. Cell Rep. 34, 108856 (2021).

    Article  Google Scholar 

  6. Guest, J. D. et al. An expanded benchmark for antibody-antigen docking and affinity prediction reveals insights into antibody recognition determinants. Structure 29, 606–621.e5 (2021).

    Article  Google Scholar 

  7. Rappazzo, C. G. et al. Defining and studying B cell receptor and TCR interactions. J. Immunol. 211, 311–322 (2023).

    Article  Google Scholar 

  8. Talmage, D. W. Immunological specificity. Science 129, 1643–1648 (1959).

    Article  Google Scholar 

  9. Robert, P. A. et al. Unconstrained generation of synthetic antibody–antigen structures to guide machine learning methodology for antibody specificity prediction. Nat. Comput. Sci. 2, 845–865 (2022).

    Article  Google Scholar 

  10. Mason, D. M. et al. Optimization of therapeutic antibodies by predicting antigen specificity from antibody sequence via deep learning. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-021-00699-9 (2021).

  11. Robert, P. A., Marschall, A. L. & Meyer-Hermann, M. Induction of broadly neutralizing antibodies in germinal centre simulations. Curr. Opin. Biotechnol. 51, 137–145 (2018).

    Article  Google Scholar 

  12. Greiff, V., Yaari, G. & Cowell, L. G. Mining adaptive immune receptor repertoires for biological and clinical information using machine learning. Curr. Opin. Syst. Biol. 24, 109–119 (2020).

    Article  Google Scholar 

  13. Burbach, S. M. & Briney, B. Improving antibody language models with native pairing. Preprint at https://arxiv.org/abs/2308.14300 (2023).

  14. Singh, R. et al. Learning the language of antibody hypervariability. Preprint at bioRxiv https://doi.org/10.1101/2023.04.26.538476 (2023).

  15. Deutchmann, N. et al. Do domain-specific protein language models outperform general models on immunology-related tasks? ImmunoInformatics 14, 100036 (2024).

    Article  Google Scholar 

  16. Greiff, V. et al. Systems analysis reveals high genetic and antigen-driven predetermination of antibody repertoires throughout B cell development. Cell Rep. 19, 1467–1478 (2017).

    Article  Google Scholar 

  17. Min, B. et al. Recent advances in natural language processing via large pre-trained language models: a survey. ACM Comput. Surv. 56, 1–40 (2023).

    Article  Google Scholar 

  18. Li, J., Tang, T., Zhao, W. X., Nie, J.-Y. & Wen, J.-R. Pre-trained language models for text generation: a survey. ACM Comput. Surv. https://doi.org/10.1145/3649449 (2024).

    Article  Google Scholar 

  19. Linzen, T. What can linguistics and deep learning contribute to each other? Response to pater. Language 95, e99–e108 (2019).

    Article  Google Scholar 

  20. Akbar, R. et al. Progress and challenges for the machine learning-based design of fit-for-purpose monoclonal antibodies. mAbs 14, 2008790 (2022).

    Article  Google Scholar 

  21. Mhanna, V. et al. Adaptive immune receptor repertoire analysis. Nat. Rev. Methods Primer 4, 6 (2024).

  22. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) 4171–4186 (Association for Computational Linguistics, 2019); https://doi.org/10.18653/v1/N19-1423

  23. Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020).

  24. Vu, M. H. et al. Linguistically inspired roadmap for building biologically reliable protein language models. Nat. Mach. Intell. https://doi.org/10.1038/s42256-023-00637-1 (2023).

  25. Leem, J., Mitchell, L. S., Farmery, J. H. R., Barton, J. & Galson, J. D. Deciphering the language of antibodies using self-supervised learning. Patterns 3, 100513 (2022).

    Article  Google Scholar 

  26. Olsen, T. H., Moal, I. H. & Deane, C. M. AbLang: an antibody language model for completing antibody sequences. Bioinform. Adv. 2, vbac046 (2022).

    Article  Google Scholar 

  27. Ruffolo, J. A., Gray, J. J. & Sulam, J. Deciphering antibody affinity maturation with language models and weakly supervised learning. Machine Learning for Structural Biology Workshop (NeurIPS, 2021).

  28. Shuai, R. W., Ruffolo, J. A. & Gray, J. J. IgLM: infilling language modeling for antibody sequence design. Cell Syst. 14, 979–989.e4 (2023).

    Article  Google Scholar 

  29. Ruffolo, J. A., Sulam, J. & Gray, J. J. Antibody structure prediction using interpretable deep learning. Patterns 3, 100406 (2022).

    Article  Google Scholar 

  30. Prihoda, D. et al. BioPhi: a platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning. mAbs 14, 2020203 (2022).

    Article  Google Scholar 

  31. Ostrovsky-Berman, M., Frankel, B., Polak, P. & Yaari, G. Immune2vec: embedding B/T cell receptor sequences in N using natural language processing. Front. Immunol. 12, 680687 (2021).

    Article  Google Scholar 

  32. Chandra, A., Tünnermann, L., Löfstedt, T. & Gratz, R. Transformer-based deep learning for predicting protein properties in the life sciences. eLife 12, e82819 (2023).

    Article  Google Scholar 

  33. Barton, J., Gaspariunas, A., Galson, J. D. & Leem, J. Building representation learning models for antibody comprehension. Cold Spring Harb. Perspect. Biol. 16, a041462 (2024).

    Article  Google Scholar 

  34. Dounas, A., Cotet, T.-S. & Yermanos, A. Learning immune receptor representations with protein language models. Preprint at https://arxiv.org/abs/2402.03823 (2024).

  35. Hie, B. L. et al. Efficient evolution of human antibodies from general protein language models. Nat. Biotechnol. 42, 275–283 (2024).

    Article  Google Scholar 

  36. Zhao, Y. et al. SC-AIR-BERT: a pre-trained single-cell model for predicting the antigen-binding specificity of the adaptive immune receptor. Brief. Bioinform. https://doi.org/10.1093/bib/bbad191 (2023).

  37. Wang, Y. et al. An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies. Biophys. J. 123, 3 (2024).

    Article  Google Scholar 

  38. Barton, J., Galson, J. D. & Leem, J. Enhancing antibody language models with structural information. In Machine Learning for Structural Biology Workshop (NeurIPS, 2023).

  39. Teney, D., Oh, S. J. & Abbasnejad, E. ID and OOD performance are sometimes inversely correlated on real-world datasets. In 37th Conference on Neural Information Processing Systems (NeurIPS, 2023).

  40. Chomsky, N. in The Structure of Language: Readings in the Philosophy of Language (eds Fodor, J. A. & Katz, J. J.) 50–118 (Prentice-Hall, 1964).

  41. Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019).

    Article  Google Scholar 

  42. Chen, V. et al. Best practices for interpretable machine learning in computational biology. Preprint at bioRxiv 10.1101/2022.10.28.513978 (2022).

  43. Sundermeyer, M., Schlüter, R. & Ney, H. LSTM neural networks for language modeling. In Proc. Interspeech 2012 194–197 (ISCA, 2012); https://doi.org/10.21437/Interspeech.2012-65

  44. Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst 33, 1877–1901 (2020).

    Google Scholar 

  45. Church, K. & Liberman, M. The future of computational linguistics: on beyond alchemy. Front. Artif. Intell. 4, 625341 (2021).

    Article  Google Scholar 

  46. Mielke, S. J. et al. Between words and characters: a brief history of open-vocabulary modeling and tokenization in NLP. Preprint at https://arxiv.org/abs/2112.10508 (2021).

  47. Kutuzov, A. & Kuzmenko, E. To Lemmatize or not to lemmatize: how word normalisation affects ELMo performance in word sense disambiguation. In Proc. First NLPL Workshop on Deep Learning for Natural Language Processing 22–28 (Linköping Univ. Electronic Press, 2019).

  48. Peters, M. E. et al. Deep contextualized word representations. In Proc. 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) 2227–2237 (Association for Computational Linguistics, 2018); https://doi.org/10.18653/v1/N18-1202

  49. Olsen, T. H., Boyles, F. & Deane, C. M. Observed antibody space: a diverse database of cleaned, annotated, and translated unpaired and paired antibody sequences. Protein Sci. 31, 141–146 (2022).

    Article  Google Scholar 

  50. Corrie, B. D. et al. iReceptor: a platform for querying and analyzing antibody/B-cell and T-cell receptor repertoire data across federated repositories. Immunol. Rev. 284, 24–41 (2018).

    Article  Google Scholar 

  51. Elhanati, Y. et al. Inferring processes underlying B-cell repertoire diversity. Phil. Trans. R. Soc. B 370, 20140243 (2015).

    Article  Google Scholar 

  52. Ferdous, S. & Martin, A. C. R. AbDb: antibody structure database—a database of PDB-derived antibody structures. Database 2018, bay040 (2018).

    Article  Google Scholar 

  53. Pan, Y., Li, X., Yang, Y. & Dong, R. Morphological Word Segmentation on Agglutinative Languages for Neural Machine Translation. Preprint at http://arxiv.org/abs/2001.01589 (2020).

  54. Schwartz, L. et al. Neural polysynthetic language modelling. Preprint at https://arxiv.org/abs/2005.05477 (2019).

  55. Adams, O., Makarucha, A., Neubig, G., Bird, S. & Cohn, T. Cross-lingual word embeddings for low-resource language modeling. In Proc. 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. 937–947 (Association for Computational Linguistics, 2017); https://doi.org/10.18653/v1/E17-1088

  56. Agić, Ž., Hovy, D. & Søgaard, A. If all you have is a bit of the Bible: learning POS taggers for truly low-resource languages. In Proc. 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. (Volume 2: Short Papers) 268–272 (Association for Computational Linguistics, 2015); https://doi.org/10.3115/v1/P15-2044

  57. Fang, M. & Cohn, T. Model transfer for tagging low-resource languages using a bilingual dictionary. In Proc. 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (eds Barzilay, R. & Kan, M.-Y.) 587–593 (Association for Computational Linguistics, 2017); https://doi.org/10.18653/v1/P17-2093

  58. Marcou, Q., Mora, T. & Walczak, A. M. High-throughput immune repertoire analysis with IGoR. Nat. Commun. 9, 561 (2018).

    Article  Google Scholar 

  59. Dong, Y. et al. Structural principles of B cell antigen receptor assembly. Nature 612, 156–161 (2022).

    Article  Google Scholar 

  60. Wong, W. K. et al. Ab-Ligity: identifying sequence-dissimilar antibodies that bind to the same epitope. mAbs 13, 1873478 (2021).

    Article  Google Scholar 

  61. Antanasijevic, A. et al. From structure to sequence: antibody discovery using cryoEM. Sci. Adv. 8, eabk2039 (2022).

    Article  Google Scholar 

  62. Abu-Shmais, A. A. et al. Convergent sequence features of antiviral B cells. Preprint at bioRxiv https://doi.org/10.1101/2023.09.06.556442 (2023).

  63. Sangesland, M. et al. Allelic polymorphism controls autoreactivity and vaccine elicitation of human broadly neutralizing antibodies against influenza virus. Immunity 55, 1693–1709.e8 (2022).

    Article  Google Scholar 

  64. Hauser, M. D., Chomsky, N. & Fitch, W. T. The faculty of language: what is it, who has it, and how did it evolve? Science 298, 1569–1579 (2002).

    Article  Google Scholar 

  65. Pantazes, R. J. et al. Identification of disease-specific motifs in the antibody specificity repertoire via next-generation sequencing. Sci. Rep. 6, 30312 (2016).

    Article  Google Scholar 

  66. Shrock, E. L. et al. Germline-encoded amino acid–binding motifs drive immunodominant public antibody responses. Science 380, eadc9498 (2023).

    Article  Google Scholar 

  67. Aguilar Rangel, M. et al. Fragment-based computational design of antibodies targeting structured epitopes. Sci. Adv. 8, eabp9540 (2022).

    Article  Google Scholar 

  68. Zhou, J., Panaitiu, A. E. & Grigoryan, G. A general-purpose protein design framework based on mining sequence–structure relationships in known protein structures. Proc. Natl Acad. Sci. USA 117, 1059–1068 (2020).

    Article  Google Scholar 

  69. Chomsky, N. Three models for the description of language. IRE Trans. Inf. Theory 2, 113–124 (1956).

    Article  Google Scholar 

  70. Rossmann, M. G. & Argos, P. Protein folding. Annu. Rev. Biochem. 50, 497–532 (1981).

    Article  Google Scholar 

  71. Qing, R. et al. Protein design: from the aspect of water solubility and stability. Chem. Rev. https://doi.org/10.1021/acs.chemrev.1c00757 (2022).

  72. Searls, D. B. A primer in macromolecular linguistics. Biopolymers 99, 203–217 (2013).

    Article  Google Scholar 

  73. Hockenmaier, J., Joshi, A. K. & Dill, K. A. Routes are trees: the parsing perspective on protein folding. Proteins Struct. Funct. Bioinform. 66, 1–15 (2006).

    Article  Google Scholar 

  74. Hockenmaier, J., Joshi, A. K. & Dill, K. A. Protein folding and chart parsing. In Proc. 2006 Conference on Empirical Methods in Natural Language Processing, EMNLP ’06 293–300 (Association for Computational Linguistics, 2006); https://doi.org/10.3115/1610075.1610117

  75. Dill, K. A. et al. Computational linguistics: a new tool for exploring biopolymer structures and statistical mechanics. Polymer 48, 4289–4300 (2007).

    Article  Google Scholar 

  76. Thellmann, K.-D., Stadler, B., Usbeck, R. & Lehmann, J. Transformer with tree-order encoding for neural program generation. Preprint at https://arxiv.org/abs/2206.13354 (2022).

  77. AlQuraishi, M. End-to-end differentiable learning of protein structure. Cell Syst. 8, 292–301.e3 (2019).

    Article  Google Scholar 

  78. Zhang, L. et al. AnglesRefine: refinement of 3D protein structures using Transformer based on torsion angles. Preprint at bioRxiv https://doi.org/10.1101/2023.07.25.550599 (2023).

  79. Malliavin, T. E., Mucherino, A., Lavor, C. & Liberti, L. Systematic exploration of protein conformational space using a distance geometry approach. J. Chem. Inf. Model. 59, 4486–4503 (2019).

    Article  Google Scholar 

  80. Sela-Culang, I., Kunik, V. & Ofran, Y. The structural basis of antibody–antigen recognition. Front. Immunol. 4, 302 (2013).

    Article  Google Scholar 

  81. Boughter, C. T. et al. Biochemical patterns of antibody polyreactivity revealed through a bioinformatics-based analysis of CDR loops. eLife 9, e61393 (2020).

    Article  Google Scholar 

  82. Bunker, J. J. et al. Natural polyreactive IgA antibodies coat the intestinal microbiota. Science 358, eaan6619 (2017).

    Article  Google Scholar 

  83. Lecerf, M., Kanyavuz, A., Lacroix-Desmazes, S. & Dimitrov, J. D. Sequence features of variable region determining physicochemical properties and polyreactivity of therapeutic antibodies. Mol. Immunol. 112, 338–346 (2019).

    Article  Google Scholar 

  84. Guo, J. Critical tokenization and its properties. Comput. Linguist. 23, 569–596 (1997).

    Google Scholar 

  85. Hindle, D. & Rooth, M. Structural ambiguity and lexical relations. Comput. Linguist. 19, 103–120 (1993).

    Google Scholar 

  86. Cunningham, O., Scott, M., Zhou, Z. S. & Finlay, W. J. J. Polyreactivity and polyspecificity in therapeutic antibody development: risk factors for failure in preclinical and clinical development campaigns. mAbs 13, 1999195 (2021).

    Article  Google Scholar 

  87. Fernández-Quintero, M. L. et al. Characterizing the diversity of the CDR-H3 loop conformational ensembles in relationship to antibody binding properties. Front. Immunol. 9, 3065 (2019).

    Article  Google Scholar 

  88. Bachas, S. et al. Antibody optimization enabled by artificial intelligence predictions of binding affinity and naturalness. Preprint at bioRxiv https://doi.org/10.1101/2022.08.16.504181 (2022).

  89. Makowski, E. K. et al. Co-optimization of therapeutic antibody affinity and specificity using machine learning models that generalize to novel mutational space. Nat. Commun. 13, 3788 (2022).

    Article  Google Scholar 

  90. Pittala, S. & Bailey-Kellogg, C. Learning context-aware structural representations to predict antigen and antibody binding interfaces. Bioinformatics 36, 3996–4003 (2020).

    Article  Google Scholar 

  91. Jespersen, M. C., Mahajan, S., Peters, B., Nielsen, M. & Marcatili, P. Antibody specific B-cell epitope predictions: leveraging information from antibody–antigen protein complexes. Front. Immunol. 10, 298 (2019).

    Article  Google Scholar 

  92. Del Vecchio, A., Deac, A., Liò, P. & Veličković, P. Neural message passing for joint paratope-epitope prediction. In 2021 ICML Workshop on Computational Biology (2021).

  93. Brown, A. J. et al. Augmenting adaptive immunity: progress and challenges in the quantitative engineering and analysis of adaptive immune receptor repertoires. Mol. Syst. Des. Eng. 4, 701–736 (2019).

    Article  Google Scholar 

  94. de Saussure, F. Course in General Linguistics (Open Court, 1986).

  95. Hozumi, N. & Tonegawa, S. Evidence for somatic rearrangement of immunoglobulin genes coding for variable and constant regions. Proc. Natl Acad. Sci. USA 73, 3628–3632 (1976).

    Article  Google Scholar 

  96. Adams, R. M., Kinney, J. B., Walczak, A. M. & Mora, T. Epistasis in a fitness landscape defined by antibody–antigen binding free energy. Cell Syst. 8, 86–93.e3 (2019).

    Article  Google Scholar 

  97. Linzen, T., Dupoux, E. & Goldberg, Y. Assessing the ability of LSTMs to learn syntax-sensitive dependencies. Trans. Assoc. Comput. Linguist. 4, 521–535 (2016).

    Article  Google Scholar 

  98. Goldberg, Y. Assessing BERT’s syntactic abilities. Preprint at https://arxiv.org/abs/1901.05287 (2019).

  99. Erk, K. The probabilistic turn in semantics and pragmatics. Annu. Rev. Linguist. 8, 101–121 (2022).

    Article  Google Scholar 

  100. Sutton, P. R. Towards a probabilistic semantics for vague adjectives. In Bayesian Natural Language Semantics and Pragmatics (eds Zeevat, H. & Schmitz, H.-C.) 221–246 (Springer, 2015); https://doi.org/10.1007/978-3-319-17064-0_10

  101. Baroni, M. & Zamparelli, R. Nouns are vectors, adjectives are matrices: representing adjective-noun constructions in semantic space. In Proc. 2010 Conference on Empirical Methods in Natural Language Processing 1183–1193 (Association for Computational Linguistics, 2010).

  102. Clark, S., Coecke, B. & Sadrzadeh, M. A compositional distributional model of meaning. in Proceedings of the Second Symposium on Quantum Interaction (eds Bruza, P. et al.) 133–140 (Oxford, 2008).

  103. Sadrzadeh, M. & Kartsaklis, D. Compositional distributional models of meaning. In Proc. COLING 2016 26th International Conference on Computational Linguistics: Tutorial Abstracts (eds Matsumoto, Y. & Prasad, R) 1–4 (2016).

  104. McCoy, R. T., Frank, R. & Linzen, T. Does syntax need to grow on trees? Sources of hierarchical inductive bias in sequence-to-sequence networks. Trans. Assoc. Comput. Linguist. 8, 125–140 (2020).

    Article  Google Scholar 

  105. Harer, J., Reale, C. & Chin, P. Tree-Transformer: a transformer-based method for correction of tree-structured data. Preprint at https://arxiv.org/abs/1908.00449 (2019).

  106. Akbar, R. et al. In silico proof of principle of machine learning-based antibody design at unconstrained scale. mAbs 14, 2031482 (2022).

    Article  Google Scholar 

  107. Su, J. et al. SaProt: protein language modeling with structure-aware vocabulary. in The Twelfth International Conference on Learning Representations (2024).

  108. Varadi, M. et al. AlphaFold protein structure database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 50, D439–D444 (2022).

    Article  Google Scholar 

  109. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article  Google Scholar 

  110. Abanades, B. et al. ImmuneBuilder: deep-learning models for predicting the structures of immune proteins. Commun. Biol. 6, 575 (2023).

    Article  Google Scholar 

  111. Ruffolo, J. A., Chu, L.-S., Mahajan, S. P. & Gray, J. J. Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies. Nat. Commun. 14, 2389 (2023).

    Article  Google Scholar 

  112. Fang, X. et al. A method for multiple-sequence-alignment-free protein structure prediction using a protein language model. Nat. Mach. Intell. https://doi.org/10.1038/s42256-023-00721-6 (2023).

  113. Ambrosetti, F., Jiménez-García, B., Roel-Touris, J. & Bonvin, A. M. J. J. Modeling antibody–antigen complexes by information-driven docking. Structure 28, 119–129.e2 (2020).

    Article  Google Scholar 

  114. Sandve, G. K. & Greiff, V. Access to ground truth at unconstrained size makes simulated data as indispensable as experimental data for bioinformatics methods development and benchmarking. Bioinformatics https://doi.org/10.1093/bioinformatics/btac612 (2022).

  115. Fernández-Quintero, M. L. et al. Challenges in antibody structure prediction. mAbs 15, 1 (2023).

    Article  Google Scholar 

  116. Noé, F., Tkatchenko, A., Müller, K.-R. & Clementi, C. Machine learning for molecular simulation. Annu. Rev. Phys. Chem. 71, 361–390 (2020).

    Article  Google Scholar 

  117. Wang, Y., Lamim Ribeiro, J. M. & Tiwary, P. Machine learning approaches for analyzing and enhancing molecular dynamics simulations. Curr. Opin. Struct. Biol. 61, 139–145 (2020).

    Article  Google Scholar 

  118. Doerr, S. et al. TorchMD: a deep learning framework for molecular simulations. J. Chem. Theory Comput. 17, 2355–2363 (2021).

    Article  Google Scholar 

  119. Jackson, N. E., Savoie, B. M., Statt, A. & Webb, M. A. Introduction to machine learning for molecular simulation. J. Chem. Theory Comput. 19, 4335–4337 (2023).

    Article  Google Scholar 

  120. Yang, Y. I., Shao, Q., Zhang, J., Yang, L. & Gao, Y. Q. Enhanced sampling in molecular dynamics. J. Chem. Phys. 151, 070902 (2019).

    Article  Google Scholar 

  121. Phillips, J. C. et al. Scalable molecular dynamics on CPU and GPU architectures with NAMD. J. Chem. Phys. 153, 044130 (2020).

    Article  Google Scholar 

  122. Heinz, J. in The Oxford Handbook of Developmental Linguistics Vol. 1 (eds Lidz, J. L. et al.) 633–663 (Oxford Univ. Press, 2016).

  123. Wilson, M., Petty, J. & Frank, R. How abstract is linguistic generalization in large language models? Experiments with argument structure. Trans. Assoc. Comput. Linguist. 11, 1377–1395 (2023).

    Article  Google Scholar 

  124. Delétang, G. et al. Neural networks and the Chomsky hierarchy. In 11th International Conference on Learning Representations, ICLR 2023 (2023).

  125. Bhattamishra, S., Ahuja, K. & Goyal, N. On the ability and limitations of transformers to recognize formal languages. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 7096–7116 (Association for Computational Linguistics, 2020); https://doi.org/10.18653/v1/2020.emnlp-main.576

  126. Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020).

  127. Luo, S. et al. Antigen-specific antibody design and optimization with diffusion-based generative models for protein structures. Adv. Neural Inf. Process. Syst. 35, 9754–9767 (2022).

    Google Scholar 

  128. Keidar, D., Opedal, A., Jin, Z. & Sachan, M. Slangvolution: a causal analysis of semantic change and frequency dynamics in slang. In Proc. 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (eds Muresan, S. et al.) 1422–1442 (Association for Computational Linguistics, 2022); https://doi.org/10.18653/v1/2022.acl-long.101

  129. Kutuzov, A., Øvrelid, L., Szymanski, T. & Velldal, E. Diachronic word embeddings and semantic shifts: a survey. In Proc. 27th International Conference on Computational Linguistics. (eds Bender, E. M. et al.) 1384–1397 (Association for Computational Linguistics, 2018).

  130. Krovi, S. H., Kappler, J. W., Marrack, P. & Gapin, L. Inherent reactivity of unselected TCR repertoires to peptide-MHC molecules. Proc. Natl Acad. Sci. USA 116, 22252–22261 (2019).

    Article  Google Scholar 

  131. Chomsky, N. Cartesian Linguistics: A Chapter in the History of Rationalist Thought (Cambridge Univ. Press, 2009).

  132. Perelson, A. S. Immune network theory. Immunol. Rev. 110, 5 (1989).

    Article  Google Scholar 

  133. Coutinho, A. The self-nonself discrimination and the nature and acquisition of the antibody repertoire. Ann. Immunol. 131D, 235–253 (1980).

    Google Scholar 

  134. Piattelli-Palmarini, M. The rise of selective theories: A case study and some lessons from immunology. In Language Learning and Concept Acquisition (ed. Demopoulos, W.) Ch. 5 (Ablex, 1986).

  135. Piattelli-Palmarini, M. & Uriagereka, J. The immune syntax: The evolution of the language virus. In Variation and universals in biolinguistics (ed. Jenkins, L.) 341–377 (Brill, 2004).

  136. The Semiotics of Cellular Communication in the Immune System (Springer, 1988); https://doi.org/10.1007/978-3-642-73145-7

  137. Atlan, H. & Cohen, I. R. Immune information, self-organization and meaning. Int. Immunol. 10, 711–717 (1998).

    Article  Google Scholar 

Download references

Acknowledgements

Supported by the Leona M. and Harry B. Helmsley Charitable Trust (number 2019PG-T1D011, to V.G.), UiO World-Leading Research Community (to V.G.), UiO: LifeScience Convergence Environment Immunolingo (to V.G. and G.K.S.), EU Horizon 2020 iReceptorplus (number 825821) (to V.G.), a Norwegian Cancer Society Grant (number 215817, to V.G.), Research Council of Norway projects (numbers 300740, and 331890 to V.G.), a Research Council of Norway IKTPLUSS project (number 311341, to V.G. and G.K.S.), and Stiftelsen Kristian Gerhard Jebsen (K. G. Jebsen Coeliac Disease Research Centre, to G.K.S.).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the conceptualization, writing and editing of this paper.

Corresponding authors

Correspondence to Mai Ha Vu or Victor Greiff.

Ethics declarations

Competing interests

V.G. declares advisory board positions in aiNET GmbH, Enpicom B.V, Specifica Inc, Adaptyv Biosystems, EVQLV, Omniscope, Diagonal Therapeutics and Absci. V.G. is a consultant for Roche/Genentech, immunai, Proteinea and LabGenius.

Peer review

Peer review information

Nature Computational Science thanks Sheng-ce Tao and Hao Zhou for their contribution to the peer review of this work. Primary Handling Editor: Ananya Rastogi, in collaboration with the Nature Computational Science team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vu, M.H., Robert, P.A., Akbar, R. et al. Linguistics-based formalization of the antibody language as a basis for antibody language models. Nat Comput Sci 4, 412–422 (2024). https://doi.org/10.1038/s43588-024-00642-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s43588-024-00642-3

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing