Abstract
Metal–organic frameworks (MOFs) are a class of crystalline porous materials that exhibit a vast chemical space owing to their tunable molecular building blocks with diverse topologies. An unlimited number of MOFs can, in principle, be synthesized. Machine learning approaches can help to explore this vast chemical space by identifying optimal candidates with desired properties from structure–property relationships. Here we introduce MOFTransformer, a multi-modal Transformer encoder pre-trained with 1 million hypothetical MOFs. This multi-modal model utilizes integrated atom-based graph and energy-grid embeddings to capture both local and global features of MOFs, respectively. By fine-tuning the pre-trained model with small datasets ranging from 5,000 to 20,000 MOFs, our model achieves state-of-the-art results for predicting across various properties including gas adsorption, diffusion, electronic properties, and even text-mined data. Beyond its universal transfer learning capabilities, MOFTransformer generates chemical insights by analyzing feature importance through attention scores within the self-attention layers. As such, this model can serve as a platform for other MOF researchers that seek to develop new machine learning models for their work.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
Data used in this work are available via Figshare (https://doi.org/10.6084/m9.figshare.21155506)53. This provides the pre-trained model and the atom-based graph embeddings and the energy-grid embeddings used as inputs of the MOFTransformer for CoREMOF, QMOF database as well as fine-tuning data. In addition, The UFF-optimized CIF files of hMOFs used in this work are available via Figshare (https://doi.org/10.6084/m9.figshare.21810147)54.
Code availability
The MOFTransformer library is available at https://github.com/hspark1212/MOFTransformer ref. 55. Documents for the library are available at https://hspark1212.github.io/MOFTransformer, which provides up-to-date documentation for pre-training, fine-tuning and feature importance analysis with MOFTransformer. For the sake of reproducibility, all results in this paper are obtained from version 1.0.1 of the MOFTransformer library, which is available at https://pypi.org/project/moftransformer/1.0.1.
References
Deng, H. et al. Large-pore apertures in a series of metal-organic frameworks. Science 336, 1018–1023 (2012).
Ding, M., Cai, X. & Jiang, H.-L. Improving MOF stability: approaches and applications. Chem. Sci. 10, 10209–10230 (2019).
Wang, C., Liu, D. & Lin, W. Metal–organic frameworks as a tunable platform for designing functional molecular materials. J. Am. Chem. Soc. 135, 13222–13234 (2013).
Freund, R. et al. The current status of MOF and COF applications. Angew. Chem. Int. Ed. 60, 23975–24001 (2021).
Kumar, S. et al. Green synthesis of metal–organic frameworks: a state-of-the-art review of potential environmental and medical applications. Coord. Chem. Rev. 420, 213407 (2020).
Qian, Q. et al. MOF-based membranes for gas separations. Chem. Rev. 120, 8161–8266 (2020).
Lee, J. et al. Metal–organic framework materials as catalysts. Chem. Soc. Rev. 38, 1450–1459 (2009).
Colón, Y. J. & Snurr, R. Q. High-throughput computational screening of metal–organic frameworks. Chem. Soc. Rev. 43, 5735–5749 (2014).
Boyd, P. G. et al. Data-driven design of metal–organic frameworks for wet flue gas CO2 capture. Nature 576, 253–256 (2019).
Daglar, H. & Keskin, S. Recent advances, opportunities, and challenges in high-throughput computational screening of MOFs for gas separations. Coord. Chem. Rev. 422, 213470 (2020).
Lee, S. et al. Computational screening of trillions of metal–organic frameworks for high-performance methane storage. ACS Appl. Mater. Interfaces 13, 23647–23654 (2021).
Altintas, C., Altundal, O. F., Keskin, S. & Yildirim, R. Machine learning meets with metal organic frameworks for gas storage and separation. J. Chem. Inf. Model. 61, 2131–2146 (2021).
Chong, S., Lee, S., Kim, B. & Kim, J. Applications of machine learning in metal-organic frameworks. Coord. Chem. Rev. 423, 213487 (2020).
Ahmed, A. & Siegel, D. J. Predicting hydrogen storage in MOFs via machine learning. Patterns 2, 100291 (2021).
Simon, C. M. et al. The materials genome in action: identifying the performance limits for methane storage. Energy Environ. Sci. 8, 1190–1199 (2015).
Lim, Y. & Kim, J. Application of transfer learning to predict diffusion properties in metal–organic frameworks. Mol. Syst. Des. Eng. 7, 1056–1064 (2022).
Bucior, B. J. et al. Energy-based descriptors to rapidly predict hydrogen storage in metal–organic frameworks. Mol. Syst. Des. Eng. 4, 162–174 (2019).
Orhan, I. B., Daglar, H., Keskin, S., Le, T. C. & Babarao, R. Prediction of O2/N2 selectivity in metal–organic frameworks via high-throughput computational screening and machine learning. ACS Appl. Mater. Interfaces 14, 736–749 (2021).
Rosen, A. S. et al. Machine learning the quantum-chemical properties of metal–organic frameworks for accelerated materials discovery. Matter 4, 1578–1597 (2021).
Ma, R., Colon, Y. J. & Luo, T. Transfer learning study of gas adsorption in metal–organic frameworks. ACS Appl. Mater. Interfaces 12, 34041–34048 (2020).
Xie, T. & Grossman, J. C. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett. 120, 145301 (2018).
Moosavi, S. M. et al. Understanding the diversity of the metal-organic framework ecosystem. Nat. Commun. 11, 4068 (2020).
Nandy, A. et al. MOFSimplify, machine learning models with extracted stability data of three thousand metal–organic frameworks. Sci. Data 9, 1–11 (2022).
Yao, Z. et al. Inverse design of nanoporous crystalline reticular materials with deep generative models. Nat. Mach. Intell. 3, 76–86 (2021).
Lim, Y., Park, J., Lee, S. & Kim, J. Finely tuned inverse design of metal–organic frameworks with user-desired Xe/Kr selectivity. J. Mater. Chem. A 9, 21175–21183 (2021).
Willems, T. F., Rycroft, C. H., Kazi, M., Meza, J. C. & Haranczyk, M. Algorithms and tools for high-throughput geometry-based analysis of crystalline porous materials. Microporous Mesoporous Mater. 149, 134–141 (2012).
Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems 30 (2017).
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. Preprint at https://doi.org/10.48550/arXiv.1810.04805 (2018).
Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. https://doi.org/10.48550/arXiv.2010.11929 (2020).
Hu, R. & Singh, A. UniT: multimodal multitask learning with a unified transformer. Preprint at https://arxiv.org/abs/2102.10772 (2021).
Zhou, L. et al. Unified vision-language pre-training for image captioning and VQA. Preprint at https://arxiv.org/abs/1909.11059 (2019).
Li, L. H., Yatskar, M., Yin, D., Hsieh, C.-J. & Chang, K.-W. VisualBERT: a simple and performant baseline for vision and language. Preprint at https://doi.org/10.48550/arXiv.1908.03557 (2019).
Kim, W., Son, B. & Kim, I. ViLT: vision-and-language transformer without convolution or region supervision. Preprint at https://arxiv.org/abs/2102.03334 (2021).
Cao, Z., Magar, R., Wang, Y. & Farimani, A. B. MOFormer: self-supervised transformer model for metal-organic framework property prediction. https://doi.org/10.48550/arXiv.2210.14188 (2022).
Chen, P., Jiao, R., Liu, J., Liu, Y. & Lu, Y. Interpretable graph transformer network for predicting adsorption isotherms of metal–organic frameworks. J. Chem. Inf. Model. 62, 5446–5456 (2022).
Rappé, A. K., Casewit, C. J., Colwell, K., Goddard, W. A. III & Skiff, W. M. UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations. J. Am. Chem. Soc. 114, 10024–10035 (1992).
Martin, M. G. & Siepmann, J. I. Transferable potentials for phase equilibria. 1. United-atom description of n-alkanes. J. Phys. Chem. B 102, 2569–2577 (1998).
Rosen, A. QMOF Database. figshare https://doi.org/10.6084/m9.figshare.13147324.v13 (2020).
Chen, C., Ye, W., Zuo, Y., Zheng, C. & Ong, S. P. Graph networks as a universal machine learning framework for molecules and crystals. Chem. Mater. 31, 3564–3572 (2019).
Nandy, A., Duan, C. & Kulik, H. J. Using machine learning and data mining to leverage community knowledge for the engineering of stable metal–organic frameworks. J. Am. Chem. Soc. 143, 17535–17547 (2021).
Janet, J. P. & Kulik, H. J. Resolving transition metal chemical space: feature selection for machine learning and structure–property relationships. J. Phys. Chem. A 121, 8939–8954 (2017).
Koizumi, K., Nobusada, K. & Boero, M. Hydrogen storage mechanism and diffusion in metal–organic frameworks. Phys. Chem. Chem. Phys. 21, 7756–7764 (2019).
Colón, Y. J., Gomez-Gualdron, D. A. & Snurr, R. Q. Topologically guided, automated construction of metal–organic frameworks and their evaluation for energy-related applications. Cryst. Growth Des. 17, 5801–5810 (2017).
Chung, Y. G. et al. Advances, updates, and analytics for the computation-ready, experimental metal–organic framework database: CoRE MOF 2019. J. Chem. Eng. Data 64, 5985–5998 (2019).
O’Keeffe, M., Peskov, M. A., Ramsden, S. J. & Yaghi, O. M. The reticular chemistry structure resource (RCSR) database of, and symbols for, crystal nets. Acc. Chem. Res. 41, 1782–1789 (2008).
Plimpton, S. Fast parallel algorithms for short-range molecular dynamics. J. Comput. Phys. 117, 1–19 (1995).
Dubbeldam, D., Calero, S., Ellis, D. E. & Snurr, R. Q. RASPA: molecular simulation software for adsorption and diffusion in flexible nanoporous materials. Mol. Simul. 42, 81–101 (2016).
Feynman, R. P., Hibbs, A. R. & Styer, D. F. Quantum Mechanics and Path Integrals. (Courier, 2010).
Fischer, M., Hoffmann, F. & Fröba, M. Preferred hydrogen adsorption sites in various MOFs—a comparative computational study. ChemPhysChem 10, 2647–2657 (2009).
Daglar, H., Erucar, I. & Keskin, S. Exploring the performance limits of MOF/polymer MMMs for O2/N2 separation using computational screening. J. Membr. Sci. 618, 118555 (2021).
Ewald, P. P. Die Berechnung optischer und elektrostatischer Gitterpotentiale. Ann. Phys. 369, 253–287 (1921).
Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. Preprint at https://doi.org/10.48550/arXiv.1711.05101 (2017).
Kang, Y. et al. MOFTransformer. figshare https://doi.org/10.6084/m9.figshare.21155506.v2 (2022).
Kang, Y. et al. 1 million hypothetical MOFs. figshare https://doi.org/10.6084/m9.figshare.21810147.v2 (2022).
Kang, Y. et al. MOFTransformer. Zenodo https://doi.org/10.5281/zenodo.7593333 (2022).
Acknowledgements
H.P., Y.K. and J.K. acknowledge funding from the National Research Foundation of Korea under project numbers 2021M3A7C208974513 and 2021R1A2C2003583. This work was supported by the National Supercomputing Center with supercomputing resources including technical support (KSC-2021-CRE-0460). B.S. is supported by the PrISMa Project, which is funded through the ACT programme (Accelerating CCS Technologies, Horizon2020 project number 294766). Financial contributions were made by Business, Energy & Industrial Strategy (BEIS) together with extra funding from Natural Environment Research Council (NERC) and Engineering and Physical Sciences Research Council (EPSRC), UK; Research Council of Norway (RCN), Norway; Swiss Federal Office of Energy (SFOE), Switzerland and the United States Department of Energy (US-DOE), USA, are gratefully acknowledged. Additional financial support from TOTAL and Equinor, is also gratefully acknowledged.
Author information
Authors and Affiliations
Contributions
Y.K. and H.P. contributed equally to this work. Y.K. and H.P. developed MOFTransformer and wrote the paper with J.K. The paper was written through the contributions of all authors. All authors have given approval for the final version of the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks the anonymous reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Notes 1–5, Figs. 1–12 and Table 1.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kang, Y., Park, H., Smit, B. et al. A multi-modal pre-training transformer for universal transfer learning in metal–organic frameworks. Nat Mach Intell 5, 309–318 (2023). https://doi.org/10.1038/s42256-023-00628-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42256-023-00628-2
This article is cited by
-
Discernment of transformer oil stray gassing anomalies using machine learning classification techniques
Scientific Reports (2024)
-
Transfer learning enables identification of multiple types of RNA modifications using nanopore direct RNA sequencing
Nature Communications (2024)
-
ChatMOF: an artificial intelligence system for predicting and generating metal-organic frameworks using large language models
Nature Communications (2024)
-
Designing membranes with specific binding sites for selective ion separations
Nature Water (2024)
-
A comprehensive transformer-based approach for high-accuracy gas adsorption predictions in metal-organic frameworks
Nature Communications (2024)