We trained DEDAL, an algorithm based on deep-learning language models, to generate pairwise alignments of protein sequences taking into account the sequence-specific context of amino acid substitutions or gaps. DEDAL improved the alignment correctness on remote homologs by up to threefold and the discrimination of remote homologs from evolutionarily unrelated sequences.
This is a preview of subscription content, access via your institution
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Get just this article for as long as you need it
Prices may be subject to local taxes which are calculated during checkout
Jumper, J. B. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). This paper describes Alphafold2, the state-of-the-art method for protein structure prediction for multiple sequence alignments.
Smith, T. F. & Waterman, M. S. Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981). This paper introduces the classical Smith–Waterman algorithm for pairwise sequence alignment.
Altschul, S. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997). This paper presents the most widely used heuristic variant of the Smith–Waterman algorithm.
Devlin, J. et al. BERT: pre-training of deep bidirectional transformers for language understanding. Proc. NAACL-HLT 1, 4171–4186 (2019). This paper proposes BERT, a technique for unsupervised pretraining of language models.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This is a summary of: Llinares-López, F., Berthet, Q., Blondel, M., Teboul, O. & Vert, J.-P. Deep embedding and alignment of protein sequences. Nat. Methods https://doi.org/10.1038/s41592-022-01700-2 (2022).
Rights and permissions
About this article
Cite this article
Deep-learning language models help to improve protein sequence alignment. Nat Methods 20, 40–41 (2023). https://doi.org/10.1038/s41592-022-01707-9