Predicted protein structures expand the CATH database

Singh, Arunima

doi:10.1038/s41592-023-01857-4

Research Highlight
Published: 12 April 2023

Computational biology

Predicted protein structures expand the CATH database

Arunima Singh¹

Nature Methods volume 20, page 483 (2023)Cite this article

1122 Accesses
8 Altmetric
Metrics details

Subjects

Access through your institution

Buy or subscribe

Christine Orengo from University College London, one of the original developers, maintains the CATH database. Orengo and colleagues now describe CATH-Assign, a set of automated methods for assigning protein structural domains, to handle the sudden expansion in protein structural data. “CATH-Assign includes profile HMM-based methods for identifying domains in UniProt proteins (CATH-Resolve-Hits); a deep learning method for assigning homology to known families in CATH (CATHe); and fast structure comparison methods (FoldSeek), developed by the Martin Steinegger group, for verifying these relationships through determination of structure similarity to known relatives. Methods for assessing structure quality are also applied,” says Orengo.

CATHe makes use of sequence embeddings generated by Prot-BERT-T5, a large language model developed by the Hannes Rost group, and is highly sensitive. It enabled the assignment of even remote homologues with less than 20% sequence identity to CATH superfamilies. The researchers used this approach to analyze the AlphaFold2-generated models for the proteomes of 21 model organisms. About half of these were of high enough quality for CATH classification, from which 92% could be assigned to existing CATH superfamilies. The researchers “manually analyzed a subset of unclassified structure clusters containing at least one human protein that could not be assigned to CATH superfamilies, and identified 24 novel superfamilies. Novel architectures were found, one of which, the ‘heart’ domain, adopts alternative conformations in solution,” says Orengo.

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

Author information

Authors and Affiliations

Nature Methods https://www.nature.com/nmeth/
Arunima Singh

Authors

Arunima Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arunima Singh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Singh, A. Predicted protein structures expand the CATH database. Nat Methods 20, 483 (2023). https://doi.org/10.1038/s41592-023-01857-4

Download citation

Published: 12 April 2023
Issue Date: April 2023
DOI: https://doi.org/10.1038/s41592-023-01857-4

Predicted protein structures expand the CATH database

Subjects

Access options

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Search

Quick links

Subjects

Access options

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links