Abstract
Structure comparison and alignment are of fundamental importance in structural biology studies. We developed the first universal platform, US-align, to uniformly align monomer and complex structures of different macromolecules—proteins, RNAs and DNAs. The pipeline is built on a uniform TM-score objective function coupled with a heuristic alignment searching algorithm. Large-scale benchmarks demonstrated consistent advantages of US-align over state-of-the-art methods in pairwise and multiple structure alignments of different molecules. Detailed analyses showed that the main advantage of US-align lies in the extensive optimization of the unified objective function powered by efficient heuristic search iterations, which substantially improve the accuracy and speed of the structural alignment process. Meanwhile, the universal protocol fusing different molecular and structural types helps facilitate the heterogeneous oligomer structure comparison and template-based protein–protein and protein–RNA/DNA docking.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 per month
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout






Data availability
All data needed to reproduce this work are available at https://doi.org/10.6084/m9.figshare.16725745 under CC BY v.4.0. Source data are provided with this paper.
Code availability
An online webserver and the standalone program of US-align are available at https://zhanggroup.org/US-align. The latest source code of US-align is also available at https://github.com/pylelab/USalign, while the source code for US-align version 20220227 used by this manuscript is included in Supplementary Software. The code was tested on Linux, Windows and Mac OS, where no notable differences in speed across different operating systems were found (Supplementary Fig. 15).
References
Pazos, F. & Sternberg, M. J. Automated prediction of protein function and detection of functional sites from structure. Proc. Natl Acad. Sci. USA 101, 14754–14759 (2004).
Zhang, C., Freddolino, P. L. & Zhang, Y. COFACTOR: improved protein function prediction by combining structure, sequence and protein-protein interaction information. Nucleic Acids Res. 45, W291–W299 (2017).
Zhang, C. X., Zheng, W., Freddolino, P. L. & Zhang, Y. MetaGO: predicting gene ontology of non-homologous proteins through low-resolution protein structure prediction and protein–protein network mapping. J. Mol. Biol. 430, 2256–2265 (2018).
Quan, L., Lv, Q. & Zhang, Y. STRUM: structure-based prediction of protein stability changes upon single-point mutation. Bioinformatics 32, 2936–2946 (2016).
Huang, P. S., Boyken, S. E. & Baker, D. The coming of age of de novo protein design. Nature 537, 320–327 (2016).
Mitra, P. et al. An evolution-based approach to de novo protein design and case study on mycobacterium tuberculosis. PLoS Comput. Biol. 9, e1003298 (2013).
Orengo, C. A. et al. CATH–a hierarchic classification of protein domain structures. Structure 5, 1093–1108 (1997).
Zhou, X. G., Hu, J., Zhang, C. X., Zhang, G. J. & Zhang, Y. Assembling multidomain protein structures through analogous global structural alignments. Proc. Natl Acad. Sci. USA 116, 15930–15938 (2019).
Zheng, J. F., Kundrotas, P.J., Vakser, I. A. & Liu, S. Y. Template-based modeling of protein-RNA interactions.PLoS Comput. Biol. 12, e1005120 (2016).
Holm, L. & Sander, C. Dali: a network tool for protein structure comparison. Trends Biochem. Sci. 20, 478–480 (1995).
Zhang, Y. & Skolnick, J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33, 2302–2309 (2005).
Gong, S., Zhang, C. & Zhang, Y. RNA-align: quick and accurate alignment of RNA 3D structures based on size-independent TM-scoreRNA. Bioinformatics 35, 4459–4461 (2019).
Zheng, J., Xie, J., Hong, X. & Liu, S. RMalign: an RNA structural alignment tool based on a novel scoring function RMscore. BMC Genomics 20, 276 (2019).
Ge, P. & Zhang, S. STAR3D: a stack-based RNA 3D structural alignment tool. Nucleic Acids Res. 43, e137 (2015).
Dror, O., Nussinov, R. & Wolfson, H. J. The ARTS web server for aligning RNA tertiary structures. Nucleic Acids Res. 34, W412–W415 (2006).
Mukherjee, S. & Zhang, Y. MM-align: a quick algorithm for aligning multiple-chain protein complex structures using iterative dynamic programming. Nucleic Acids Res. 37, e83 (2009).
Dong, R., Peng, Z., Zhang, Y. & Yang, J. mTM-align: an algorithm for fast and accurate multiple protein structure alignment. Bioinformatics 34, 1719–1725 (2018).
Menke, M., Berger, B. & Cowen, L. Matt: local flexibility aids protein multiple structure alignment. PLoS Comput. Biol. 4, e10 (2008).
Konagurthu, A. S., Whisstock, J. C., Stuckey, P. J. & Lesk, A. M. MUSTANG: a multiple structural alignment algorithm. Proteins 64, 559–574 (2006).
Kabsch, W. A solution for the best rotation to relate two sets of vectors. Acta Cryst. A 32, 922–923 (1976).
Zemla, A. LGA: a method for finding 3D similarities in protein structures. Nucleic Acids Res. 31, 3370–3374 (2003).
Siew, N., Elofsson, A., Rychiewski, L. & Fischer, D. MaxSub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics 16, 776–785 (2000).
Zhang, Y. & Skolnick, J. Scoring function for automated assessment of protein structure template quality. Proteins 57, 702–710 (2004).
Xu, J. & Zhang, Y. How significant is a protein structure similarity with TM-score = 0.5? Bioinformatics 26, 889–895 (2010).
Adams, P. D. et al. Announcing mandatory submission of PDBx/mmCIF format files for crystallographic depositions to the Protein Data Bank (PDB). Acta Crystallogr D. Struct. Biol. 75, 451–454 (2019).
Minami, S., Sawada, K. & Chikenji, G. MICAN: a protein structure alignment algorithm that can handle Multiple-chains, Inverse alignments, C-alpha only models, alternative alignments, and non-sequential alignments.BMC Bioinform. 14, 24 (2013).
Huang, Y., Niu, B., Gao, Y., Fu, L. & Li, W. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics 26, 680–682 (2010).
Nguyen, M. N., Sim, A. Y. L., Wan, Y., Madhusudhan, M. S. & Verma, C. Topology independent comparison of RNA 3D structures using the CLICK algorithm. Nucleic Acids Res. 45, e5 (2017).
Yang, Y., Zhan, J., Zhao, H. & Zhou, Y. A new size-independent score for pairwise protein structure alignment and its application to structure classification and nucleic-acid binding prediction. Proteins 80, 2080–2088 (2012).
Krissinel, E. & Henrick, K. Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr. D Biol. Crystallogr. 60, 2256–2268 (2004).
Fox, N. K., Brenner, S. E. & Chandonia, J. M. SCOPe: structural classification of proteins-extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res. 42, D304–D309 (2014).
Cheng, H., Kim, B. H. & Grishin, N. V. MALIDUP: a database of manually constructed structure alignments for duplicated domain pairs. Proteins 70, 1162–1166 (2008).
Pei, J. M., Kim, B. H. & Grishin, N. V. PROMALS3D: a tool for multiple protein sequence and structure alignments. Nucleic Acids Res. 36, 2295–2300 (2008).
Lupyan, D., Leo-Macias, A. & Ortiz, A. R. A new progressive-iterative algorithm for multiple structure alignment. Bioinformatics 21, 3255–3263 (2005).
Huang, Y., Li, H. & Xiao, Y. 3dRPC: a web server for 3D RNA-protein structure prediction. Bioinformatics 34, 1238–1240 (2018).
Needleman, S. B. & Wunsch, C. D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970).
Dong, R., Pan, S., Peng, Z., Zhang, Y. & Yang, J. mTM-align: a server for fast protein structure database search and multiple protein structure alignment. Nucleic Acids Res. 46, W380–W386 (2018).
Hu, J., Liu, Z., Yu, D. J. & Zhang, Y. LS-align: an atom-level, flexible ligand structural alignment algorithm for high-throughput virtual screening. Bioinformatics 34, 2209–2218 (2018).
Sokal, R. R. A statistical method for evaluating systematic relationships. Univ. Kansas Sci. Bull. 38, 1409–1438 (1958).
Acknowledgements
We thank Y. Cao for technical assistance in developing qTMclust and X. Wei for insightful discussions. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI1548562. C.Z. is a Howard Hughes Medical Institute postdoctoral fellow. A.M.P. is a Howard Hughes Medical Institute Investigator. This work is supported in part by the National Human Genome Research Institute (HG011868 to A.M.P.), National Institute of General Medical Sciences (GM136422, OD026825 to Y.Z.), the National Institute of Allergy and Infectious Diseases (AI134678 to Y.Z.) and the National Science Foundation (IIS1901191, DBI2030790, MTM2025426 to Y.Z.).
Author information
Authors and Affiliations
Contributions
Y.Z. conceived the study. C.Z., A.M.P. and Y.Z. designed the experiments. C.Z. developed the method. C.Z. and M.S. drafted the manuscript. All authors revised the manuscript and approved the final version.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Methods thanks Ruth Nussinov and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Arunima Singh in collaboration with the Nature Methods team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Texts 1–6, Figs. 1–15 and Tables 1–8.
Supplementary Software
US-align version 20220227.
Supplementary Data
Structure coordinate files for Supplementary Fig. 1.
Supplementary Data
Structure coordinate files for Supplementary Fig. 2.
Supplementary Data
Statistical source data for Supplementary Fig. 3
Supplementary Data
Statistical source data for Supplementary Fig. 4
Supplementary Data
Statistical source data for Supplementary Fig. 6.
Supplementary Data
Statistical source data for Supplementary Fig. 7.
Supplementary Data
Statistical source data for Supplementary Fig. 9.
Supplementary Data
Statistical source data for Supplementary Fig. 15.
Source data
Source Data Fig. 2
Statistical source data and structure coordinate files.
Source Data Fig. 3
Structure coordinate files.
Source Data Fig. 4
Statistical source data and structure coordinate files.
Source Data Fig. 5
Statistical source data.
Source Data Fig. 6
Statistical source data and structure coordinate files.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, C., Shine, M., Pyle, A.M. et al. US-align: universal structure alignments of proteins, nucleic acids, and macromolecular complexes. Nat Methods 19, 1109–1115 (2022). https://doi.org/10.1038/s41592-022-01585-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-022-01585-1