Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Research Briefing
  • Published:

Deep-learning language models help to improve protein sequence alignment

We trained DEDAL, an algorithm based on deep-learning language models, to generate pairwise alignments of protein sequences taking into account the sequence-specific context of amino acid substitutions or gaps. DEDAL improved the alignment correctness on remote homologs by up to threefold and the discrimination of remote homologs from evolutionarily unrelated sequences.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Example of pairwise alignment of two protein domain sequences.

References

  1. Jumper, J. B. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). This paper describes Alphafold2, the state-of-the-art method for protein structure prediction for multiple sequence alignments.

    Article  CAS  Google Scholar 

  2. Smith, T. F. & Waterman, M. S. Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981). This paper introduces the classical Smith–Waterman algorithm for pairwise sequence alignment.

    Article  CAS  Google Scholar 

  3. Altschul, S. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997). This paper presents the most widely used heuristic variant of the Smith–Waterman algorithm.

    Article  CAS  Google Scholar 

  4. Devlin, J. et al. BERT: pre-training of deep bidirectional transformers for language understanding. Proc. NAACL-HLT 1, 4171–4186 (2019). This paper proposes BERT, a technique for unsupervised pretraining of language models.

    Google Scholar 

Download references

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This is a summary of: Llinares-López, F., Berthet, Q., Blondel, M., Teboul, O. & Vert, J.-P. Deep embedding and alignment of protein sequences. Nat. Methods https://doi.org/10.1038/s41592-022-01700-2 (2022).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Deep-learning language models help to improve protein sequence alignment. Nat Methods 20, 40–41 (2023). https://doi.org/10.1038/s41592-022-01707-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-022-01707-9

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing