Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

US-align: universal structure alignments of proteins, nucleic acids, and macromolecular complexes

Abstract

Structure comparison and alignment are of fundamental importance in structural biology studies. We developed the first universal platform, US-align, to uniformly align monomer and complex structures of different macromolecules—proteins, RNAs and DNAs. The pipeline is built on a uniform TM-score objective function coupled with a heuristic alignment searching algorithm. Large-scale benchmarks demonstrated consistent advantages of US-align over state-of-the-art methods in pairwise and multiple structure alignments of different molecules. Detailed analyses showed that the main advantage of US-align lies in the extensive optimization of the unified objective function powered by efficient heuristic search iterations, which substantially improve the accuracy and speed of the structural alignment process. Meanwhile, the universal protocol fusing different molecular and structural types helps facilitate the heterogeneous oligomer structure comparison and template-based protein–protein and protein–RNA/DNA docking.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Four different structure alignment modes of US-align.
Fig. 2: Performance of three oligomeric alignment programs.
Fig. 3: Structure alignments between two protein–RNA complexes from two different bacteria.
Fig. 4: US-align outperforms four control RNA structure alignment programs.
Fig. 5: MSTA RNA alignment by US-align, Matt and MUSTANG.
Fig. 6: Application of US-align to RNA-protein docking.

Similar content being viewed by others

Data availability

All data needed to reproduce this work are available at https://doi.org/10.6084/m9.figshare.16725745 under CC BY v.4.0. Source data are provided with this paper.

Code availability

An online webserver and the standalone program of US-align are available at https://zhanggroup.org/US-align. The latest source code of US-align is also available at https://github.com/pylelab/USalign, while the source code for US-align version 20220227 used by this manuscript is included in Supplementary Software. The code was tested on Linux, Windows and Mac OS, where no notable differences in speed across different operating systems were found (Supplementary Fig. 15).

References

  1. Pazos, F. & Sternberg, M. J. Automated prediction of protein function and detection of functional sites from structure. Proc. Natl Acad. Sci. USA 101, 14754–14759 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Zhang, C., Freddolino, P. L. & Zhang, Y. COFACTOR: improved protein function prediction by combining structure, sequence and protein-protein interaction information. Nucleic Acids Res. 45, W291–W299 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Zhang, C. X., Zheng, W., Freddolino, P. L. & Zhang, Y. MetaGO: predicting gene ontology of non-homologous proteins through low-resolution protein structure prediction and protein–protein network mapping. J. Mol. Biol. 430, 2256–2265 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Quan, L., Lv, Q. & Zhang, Y. STRUM: structure-based prediction of protein stability changes upon single-point mutation. Bioinformatics 32, 2936–2946 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Huang, P. S., Boyken, S. E. & Baker, D. The coming of age of de novo protein design. Nature 537, 320–327 (2016).

    Article  CAS  PubMed  Google Scholar 

  6. Mitra, P. et al. An evolution-based approach to de novo protein design and case study on mycobacterium tuberculosis. PLoS Comput. Biol. 9, e1003298 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Orengo, C. A. et al. CATH–a hierarchic classification of protein domain structures. Structure 5, 1093–1108 (1997).

    Article  CAS  PubMed  Google Scholar 

  8. Zhou, X. G., Hu, J., Zhang, C. X., Zhang, G. J. & Zhang, Y. Assembling multidomain protein structures through analogous global structural alignments. Proc. Natl Acad. Sci. USA 116, 15930–15938 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Zheng, J. F., Kundrotas, P.J., Vakser, I. A. & Liu, S. Y. Template-based modeling of protein-RNA interactions.PLoS Comput. Biol. 12, e1005120 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Holm, L. & Sander, C. Dali: a network tool for protein structure comparison. Trends Biochem. Sci. 20, 478–480 (1995).

    Article  CAS  PubMed  Google Scholar 

  11. Zhang, Y. & Skolnick, J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33, 2302–2309 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Gong, S., Zhang, C. & Zhang, Y. RNA-align: quick and accurate alignment of RNA 3D structures based on size-independent TM-scoreRNA. Bioinformatics 35, 4459–4461 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Zheng, J., Xie, J., Hong, X. & Liu, S. RMalign: an RNA structural alignment tool based on a novel scoring function RMscore. BMC Genomics 20, 276 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Ge, P. & Zhang, S. STAR3D: a stack-based RNA 3D structural alignment tool. Nucleic Acids Res. 43, e137 (2015).

    PubMed  PubMed Central  Google Scholar 

  15. Dror, O., Nussinov, R. & Wolfson, H. J. The ARTS web server for aligning RNA tertiary structures. Nucleic Acids Res. 34, W412–W415 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Mukherjee, S. & Zhang, Y. MM-align: a quick algorithm for aligning multiple-chain protein complex structures using iterative dynamic programming. Nucleic Acids Res. 37, e83 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Dong, R., Peng, Z., Zhang, Y. & Yang, J. mTM-align: an algorithm for fast and accurate multiple protein structure alignment. Bioinformatics 34, 1719–1725 (2018).

    Article  CAS  PubMed  Google Scholar 

  18. Menke, M., Berger, B. & Cowen, L. Matt: local flexibility aids protein multiple structure alignment. PLoS Comput. Biol. 4, e10 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Konagurthu, A. S., Whisstock, J. C., Stuckey, P. J. & Lesk, A. M. MUSTANG: a multiple structural alignment algorithm. Proteins 64, 559–574 (2006).

    Article  CAS  PubMed  Google Scholar 

  20. Kabsch, W. A solution for the best rotation to relate two sets of vectors. Acta Cryst. A 32, 922–923 (1976).

    Article  Google Scholar 

  21. Zemla, A. LGA: a method for finding 3D similarities in protein structures. Nucleic Acids Res. 31, 3370–3374 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Siew, N., Elofsson, A., Rychiewski, L. & Fischer, D. MaxSub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics 16, 776–785 (2000).

    Article  CAS  PubMed  Google Scholar 

  23. Zhang, Y. & Skolnick, J. Scoring function for automated assessment of protein structure template quality. Proteins 57, 702–710 (2004).

    Article  CAS  PubMed  Google Scholar 

  24. Xu, J. & Zhang, Y. How significant is a protein structure similarity with TM-score = 0.5? Bioinformatics 26, 889–895 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Adams, P. D. et al. Announcing mandatory submission of PDBx/mmCIF format files for crystallographic depositions to the Protein Data Bank (PDB). Acta Crystallogr D. Struct. Biol. 75, 451–454 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Minami, S., Sawada, K. & Chikenji, G. MICAN: a protein structure alignment algorithm that can handle Multiple-chains, Inverse alignments, C-alpha only models, alternative alignments, and non-sequential alignments.BMC Bioinform. 14, 24 (2013).

    Article  CAS  Google Scholar 

  27. Huang, Y., Niu, B., Gao, Y., Fu, L. & Li, W. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics 26, 680–682 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Nguyen, M. N., Sim, A. Y. L., Wan, Y., Madhusudhan, M. S. & Verma, C. Topology independent comparison of RNA 3D structures using the CLICK algorithm. Nucleic Acids Res. 45, e5 (2017).

    Article  PubMed  Google Scholar 

  29. Yang, Y., Zhan, J., Zhao, H. & Zhou, Y. A new size-independent score for pairwise protein structure alignment and its application to structure classification and nucleic-acid binding prediction. Proteins 80, 2080–2088 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. Krissinel, E. & Henrick, K. Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr. D Biol. Crystallogr. 60, 2256–2268 (2004).

    Article  CAS  PubMed  Google Scholar 

  31. Fox, N. K., Brenner, S. E. & Chandonia, J. M. SCOPe: structural classification of proteins-extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res. 42, D304–D309 (2014).

    Article  CAS  PubMed  Google Scholar 

  32. Cheng, H., Kim, B. H. & Grishin, N. V. MALIDUP: a database of manually constructed structure alignments for duplicated domain pairs. Proteins 70, 1162–1166 (2008).

    Article  CAS  PubMed  Google Scholar 

  33. Pei, J. M., Kim, B. H. & Grishin, N. V. PROMALS3D: a tool for multiple protein sequence and structure alignments. Nucleic Acids Res. 36, 2295–2300 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Lupyan, D., Leo-Macias, A. & Ortiz, A. R. A new progressive-iterative algorithm for multiple structure alignment. Bioinformatics 21, 3255–3263 (2005).

    Article  CAS  PubMed  Google Scholar 

  35. Huang, Y., Li, H. & Xiao, Y. 3dRPC: a web server for 3D RNA-protein structure prediction. Bioinformatics 34, 1238–1240 (2018).

    Article  CAS  PubMed  Google Scholar 

  36. Needleman, S. B. & Wunsch, C. D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970).

    Article  CAS  PubMed  Google Scholar 

  37. Dong, R., Pan, S., Peng, Z., Zhang, Y. & Yang, J. mTM-align: a server for fast protein structure database search and multiple protein structure alignment. Nucleic Acids Res. 46, W380–W386 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Hu, J., Liu, Z., Yu, D. J. & Zhang, Y. LS-align: an atom-level, flexible ligand structural alignment algorithm for high-throughput virtual screening. Bioinformatics 34, 2209–2218 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Sokal, R. R. A statistical method for evaluating systematic relationships. Univ. Kansas Sci. Bull. 38, 1409–1438 (1958).

    Google Scholar 

Download references

Acknowledgements

We thank Y. Cao for technical assistance in developing qTMclust and X. Wei for insightful discussions. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI1548562. C.Z. is a Howard Hughes Medical Institute postdoctoral fellow. A.M.P. is a Howard Hughes Medical Institute Investigator. This work is supported in part by the National Human Genome Research Institute (HG011868 to A.M.P.), National Institute of General Medical Sciences (GM136422, OD026825 to Y.Z.), the National Institute of Allergy and Infectious Diseases (AI134678 to Y.Z.) and the National Science Foundation (IIS1901191, DBI2030790, MTM2025426 to Y.Z.).

Author information

Authors and Affiliations

Authors

Contributions

Y.Z. conceived the study. C.Z., A.M.P. and Y.Z. designed the experiments. C.Z. developed the method. C.Z. and M.S. drafted the manuscript. All authors revised the manuscript and approved the final version.

Corresponding author

Correspondence to Yang Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Ruth Nussinov and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Arunima Singh in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Texts 1–6, Figs. 1–15 and Tables 1–8.

Reporting Summary

Supplementary Software

US-align version 20220227.

Supplementary Data

Structure coordinate files for Supplementary Fig. 1.

Supplementary Data

Structure coordinate files for Supplementary Fig. 2.

Supplementary Data

Statistical source data for Supplementary Fig. 3

Supplementary Data

Statistical source data for Supplementary Fig. 4

Supplementary Data

Statistical source data for Supplementary Fig. 6.

Supplementary Data

Statistical source data for Supplementary Fig. 7.

Supplementary Data

Statistical source data for Supplementary Fig. 9.

Supplementary Data

Statistical source data for Supplementary Fig. 15.

Source data

Source Data Fig. 2

Statistical source data and structure coordinate files.

Source Data Fig. 3

Structure coordinate files.

Source Data Fig. 4

Statistical source data and structure coordinate files.

Source Data Fig. 5

Statistical source data.

Source Data Fig. 6

Statistical source data and structure coordinate files.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, C., Shine, M., Pyle, A.M. et al. US-align: universal structure alignments of proteins, nucleic acids, and macromolecular complexes. Nat Methods 19, 1109–1115 (2022). https://doi.org/10.1038/s41592-022-01585-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-022-01585-1

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics