Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Uncovering new families and folds in the natural protein universe

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Abstract

We are now entering a new era in protein sequence and structure annotation, with hundreds of millions of predicted protein structures made available through the AlphaFold database1. These models cover nearly all proteins that are known, including those challenging to annotate for function or putative biological role using standard homology-based approaches. In this study, we examine the extent to which the AlphaFold database has structurally illuminated this “dark matter” of the natural protein universe at high predicted accuracy. We further describe the protein diversity that these models cover as an annotated interactive sequence similarity network, accessible at https://uniprot3d.org/atlas/AFDB90v4. By searching for novelties from sequence, structure, and semantic perspectives, we uncovered the β-flower fold, added multiple protein families to Pfam database2, and experimentally demonstrate that one of these belongs to a new superfamily of translation-targeting toxin-antitoxin systems, TumE-TumA. This work underscores the value of large-scale efforts in identifying, annotating, and prioritising novel protein families. By leveraging the recent deep learning revolution in protein bioinformatics, we can now shed light into uncharted areas of the protein universe at an unprecedented scale, paving the way to innovations in life sciences and biotechnology.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Torsten Schwede or Joana Pereira.

Supplementary information

Reporting Summary

Supplementary Table 1

Top 10 GO term predictions for the members of the clusters in the high-resolution sequence similarity network of component 159 (TumE) in figure 4a and their cognate antitoxin families, as predicted with DeepFRI. For each cluster, the number of models used for predictions (n) is displayed, as well as a boxplot depicting the distribution of DeepFri scores for each prediction. Boxplots demonstrate the quartiles of the dataset. Outliers that are outside the inter-quartile range are displayed as single points.

Supplementary Table 2

List of 290 connected components with a semantic diversity higher than 20%. 133 new Pfams will already be available in the next two releases of Pfam (36.0 and 37.0) and 17 were found to merge with previously defined Pfams. For all those, the corresponding Pfam and Pfam description are provided.

Supplementary Table 3

DNA fragments and DNA oligonucleotides used for plasmid construction during the experimental validation and characterisation of TumE and TumA.

Peer Review File

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Durairaj, J., Waterhouse, A.M., Mets, T. et al. Uncovering new families and folds in the natural protein universe. Nature (2023). https://doi.org/10.1038/s41586-023-06622-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41586-023-06622-3

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing