Deep-learning language models help to improve protein sequence alignment

doi:10.1038/s41592-022-01707-9

Research Briefing
Published: 15 December 2022

Deep-learning language models help to improve protein sequence alignment

Nature Methods volume 20, pages 40–41 (2023)Cite this article

2427 Accesses
21 Altmetric
Metrics details

Subjects

We trained DEDAL, an algorithm based on deep-learning language models, to generate pairwise alignments of protein sequences taking into account the sequence-specific context of amino acid substitutions or gaps. DEDAL improved the alignment correctness on remote homologs by up to threefold and the discrimination of remote homologs from evolutionarily unrelated sequences.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Example of pairwise alignment of two protein domain sequences.**

References

Jumper, J. B. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). This paper describes Alphafold2, the state-of-the-art method for protein structure prediction for multiple sequence alignments.
Article CAS Google Scholar
Smith, T. F. & Waterman, M. S. Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981). This paper introduces the classical Smith–Waterman algorithm for pairwise sequence alignment.
Article CAS Google Scholar
Altschul, S. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997). This paper presents the most widely used heuristic variant of the Smith–Waterman algorithm.
Article CAS Google Scholar
Devlin, J. et al. BERT: pre-training of deep bidirectional transformers for language understanding. Proc. NAACL-HLT 1, 4171–4186 (2019). This paper proposes BERT, a technique for unsupervised pretraining of language models.
Google Scholar

Download references

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This is a summary of: Llinares-López, F., Berthet, Q., Blondel, M., Teboul, O. & Vert, J.-P. Deep embedding and alignment of protein sequences. Nat. Methods https://doi.org/10.1038/s41592-022-01700-2 (2022).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Deep-learning language models help to improve protein sequence alignment. Nat Methods 20, 40–41 (2023). https://doi.org/10.1038/s41592-022-01707-9

Download citation

Published: 15 December 2022
Issue Date: January 2023
DOI: https://doi.org/10.1038/s41592-022-01707-9

Deep-learning language models help to improve protein sequence alignment

Subjects

Access options

References

Additional information

Rights and permissions

About this article

Cite this article

Deep embedding and alignment of protein sequences

Search

Quick links

Subjects

Access options

References

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links