Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A multi-modal pre-training transformer for universal transfer learning in metal–organic frameworks

Abstract

Metal–organic frameworks (MOFs) are a class of crystalline porous materials that exhibit a vast chemical space owing to their tunable molecular building blocks with diverse topologies. An unlimited number of MOFs can, in principle, be synthesized. Machine learning approaches can help to explore this vast chemical space by identifying optimal candidates with desired properties from structure–property relationships. Here we introduce MOFTransformer, a multi-modal Transformer encoder pre-trained with 1 million hypothetical MOFs. This multi-modal model utilizes integrated atom-based graph and energy-grid embeddings to capture both local and global features of MOFs, respectively. By fine-tuning the pre-trained model with small datasets ranging from 5,000 to 20,000 MOFs, our model achieves state-of-the-art results for predicting across various properties including gas adsorption, diffusion, electronic properties, and even text-mined data. Beyond its universal transfer learning capabilities, MOFTransformer generates chemical insights by analyzing feature importance through attention scores within the self-attention layers. As such, this model can serve as a platform for other MOF researchers that seek to develop new machine learning models for their work.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overall schematics and architecture of MOFTransformer.
Fig. 2: Relationship between global and local features and MOF properties.
Fig. 3: Results of pre-training and fine-tuning.
Fig. 4: Schematics of attention scores in IRMOF-1.
Fig. 5: Feature importance analysis with attention scores.

Similar content being viewed by others

Data availability

Data used in this work are available via Figshare (https://doi.org/10.6084/m9.figshare.21155506)53. This provides the pre-trained model and the atom-based graph embeddings and the energy-grid embeddings used as inputs of the MOFTransformer for CoREMOF, QMOF database as well as fine-tuning data. In addition, The UFF-optimized CIF files of hMOFs used in this work are available via Figshare (https://doi.org/10.6084/m9.figshare.21810147)54.

Code availability

The MOFTransformer library is available at https://github.com/hspark1212/MOFTransformer ref. 55. Documents for the library are available at https://hspark1212.github.io/MOFTransformer, which provides up-to-date documentation for pre-training, fine-tuning and feature importance analysis with MOFTransformer. For the sake of reproducibility, all results in this paper are obtained from version 1.0.1 of the MOFTransformer library, which is available at https://pypi.org/project/moftransformer/1.0.1.

References

  1. Deng, H. et al. Large-pore apertures in a series of metal-organic frameworks. Science 336, 1018–1023 (2012).

    Article  Google Scholar 

  2. Ding, M., Cai, X. & Jiang, H.-L. Improving MOF stability: approaches and applications. Chem. Sci. 10, 10209–10230 (2019).

    Article  Google Scholar 

  3. Wang, C., Liu, D. & Lin, W. Metal–organic frameworks as a tunable platform for designing functional molecular materials. J. Am. Chem. Soc. 135, 13222–13234 (2013).

    Article  Google Scholar 

  4. Freund, R. et al. The current status of MOF and COF applications. Angew. Chem. Int. Ed. 60, 23975–24001 (2021).

    Article  Google Scholar 

  5. Kumar, S. et al. Green synthesis of metal–organic frameworks: a state-of-the-art review of potential environmental and medical applications. Coord. Chem. Rev. 420, 213407 (2020).

    Article  Google Scholar 

  6. Qian, Q. et al. MOF-based membranes for gas separations. Chem. Rev. 120, 8161–8266 (2020).

    Article  Google Scholar 

  7. Lee, J. et al. Metal–organic framework materials as catalysts. Chem. Soc. Rev. 38, 1450–1459 (2009).

    Article  Google Scholar 

  8. Colón, Y. J. & Snurr, R. Q. High-throughput computational screening of metal–organic frameworks. Chem. Soc. Rev. 43, 5735–5749 (2014).

    Article  Google Scholar 

  9. Boyd, P. G. et al. Data-driven design of metal–organic frameworks for wet flue gas CO2 capture. Nature 576, 253–256 (2019).

    Article  Google Scholar 

  10. Daglar, H. & Keskin, S. Recent advances, opportunities, and challenges in high-throughput computational screening of MOFs for gas separations. Coord. Chem. Rev. 422, 213470 (2020).

    Article  Google Scholar 

  11. Lee, S. et al. Computational screening of trillions of metal–organic frameworks for high-performance methane storage. ACS Appl. Mater. Interfaces 13, 23647–23654 (2021).

    Article  Google Scholar 

  12. Altintas, C., Altundal, O. F., Keskin, S. & Yildirim, R. Machine learning meets with metal organic frameworks for gas storage and separation. J. Chem. Inf. Model. 61, 2131–2146 (2021).

    Article  Google Scholar 

  13. Chong, S., Lee, S., Kim, B. & Kim, J. Applications of machine learning in metal-organic frameworks. Coord. Chem. Rev. 423, 213487 (2020).

    Article  Google Scholar 

  14. Ahmed, A. & Siegel, D. J. Predicting hydrogen storage in MOFs via machine learning. Patterns 2, 100291 (2021).

    Article  Google Scholar 

  15. Simon, C. M. et al. The materials genome in action: identifying the performance limits for methane storage. Energy Environ. Sci. 8, 1190–1199 (2015).

    Article  Google Scholar 

  16. Lim, Y. & Kim, J. Application of transfer learning to predict diffusion properties in metal–organic frameworks. Mol. Syst. Des. Eng. 7, 1056–1064 (2022).

    Article  Google Scholar 

  17. Bucior, B. J. et al. Energy-based descriptors to rapidly predict hydrogen storage in metal–organic frameworks. Mol. Syst. Des. Eng. 4, 162–174 (2019).

    Article  Google Scholar 

  18. Orhan, I. B., Daglar, H., Keskin, S., Le, T. C. & Babarao, R. Prediction of O2/N2 selectivity in metal–organic frameworks via high-throughput computational screening and machine learning. ACS Appl. Mater. Interfaces 14, 736–749 (2021).

    Article  Google Scholar 

  19. Rosen, A. S. et al. Machine learning the quantum-chemical properties of metal–organic frameworks for accelerated materials discovery. Matter 4, 1578–1597 (2021).

    Article  Google Scholar 

  20. Ma, R., Colon, Y. J. & Luo, T. Transfer learning study of gas adsorption in metal–organic frameworks. ACS Appl. Mater. Interfaces 12, 34041–34048 (2020).

    Article  Google Scholar 

  21. Xie, T. & Grossman, J. C. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett. 120, 145301 (2018).

    Article  Google Scholar 

  22. Moosavi, S. M. et al. Understanding the diversity of the metal-organic framework ecosystem. Nat. Commun. 11, 4068 (2020).

    Article  Google Scholar 

  23. Nandy, A. et al. MOFSimplify, machine learning models with extracted stability data of three thousand metal–organic frameworks. Sci. Data 9, 1–11 (2022).

    Article  Google Scholar 

  24. Yao, Z. et al. Inverse design of nanoporous crystalline reticular materials with deep generative models. Nat. Mach. Intell. 3, 76–86 (2021).

    Article  Google Scholar 

  25. Lim, Y., Park, J., Lee, S. & Kim, J. Finely tuned inverse design of metal–organic frameworks with user-desired Xe/Kr selectivity. J. Mater. Chem. A 9, 21175–21183 (2021).

    Article  Google Scholar 

  26. Willems, T. F., Rycroft, C. H., Kazi, M., Meza, J. C. & Haranczyk, M. Algorithms and tools for high-throughput geometry-based analysis of crystalline porous materials. Microporous Mesoporous Mater. 149, 134–141 (2012).

    Article  Google Scholar 

  27. Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems 30 (2017).

  28. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. Preprint at https://doi.org/10.48550/arXiv.1810.04805 (2018).

  29. Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. https://doi.org/10.48550/arXiv.2010.11929 (2020).

  30. Hu, R. & Singh, A. UniT: multimodal multitask learning with a unified transformer. Preprint at https://arxiv.org/abs/2102.10772 (2021).

  31. Zhou, L. et al. Unified vision-language pre-training for image captioning and VQA. Preprint at https://arxiv.org/abs/1909.11059 (2019).

  32. Li, L. H., Yatskar, M., Yin, D., Hsieh, C.-J. & Chang, K.-W. VisualBERT: a simple and performant baseline for vision and language. Preprint at https://doi.org/10.48550/arXiv.1908.03557 (2019).

  33. Kim, W., Son, B. & Kim, I. ViLT: vision-and-language transformer without convolution or region supervision. Preprint at https://arxiv.org/abs/2102.03334 (2021).

  34. Cao, Z., Magar, R., Wang, Y. & Farimani, A. B. MOFormer: self-supervised transformer model for metal-organic framework property prediction. https://doi.org/10.48550/arXiv.2210.14188 (2022).

  35. Chen, P., Jiao, R., Liu, J., Liu, Y. & Lu, Y. Interpretable graph transformer network for predicting adsorption isotherms of metal–organic frameworks. J. Chem. Inf. Model. 62, 5446–5456 (2022).

    Article  Google Scholar 

  36. Rappé, A. K., Casewit, C. J., Colwell, K., Goddard, W. A. III & Skiff, W. M. UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations. J. Am. Chem. Soc. 114, 10024–10035 (1992).

    Article  Google Scholar 

  37. Martin, M. G. & Siepmann, J. I. Transferable potentials for phase equilibria. 1. United-atom description of n-alkanes. J. Phys. Chem. B 102, 2569–2577 (1998).

    Article  Google Scholar 

  38. Rosen, A. QMOF Database. figshare https://doi.org/10.6084/m9.figshare.13147324.v13 (2020).

  39. Chen, C., Ye, W., Zuo, Y., Zheng, C. & Ong, S. P. Graph networks as a universal machine learning framework for molecules and crystals. Chem. Mater. 31, 3564–3572 (2019).

    Article  Google Scholar 

  40. Nandy, A., Duan, C. & Kulik, H. J. Using machine learning and data mining to leverage community knowledge for the engineering of stable metal–organic frameworks. J. Am. Chem. Soc. 143, 17535–17547 (2021).

    Article  Google Scholar 

  41. Janet, J. P. & Kulik, H. J. Resolving transition metal chemical space: feature selection for machine learning and structure–property relationships. J. Phys. Chem. A 121, 8939–8954 (2017).

    Article  Google Scholar 

  42. Koizumi, K., Nobusada, K. & Boero, M. Hydrogen storage mechanism and diffusion in metal–organic frameworks. Phys. Chem. Chem. Phys. 21, 7756–7764 (2019).

    Article  Google Scholar 

  43. Colón, Y. J., Gomez-Gualdron, D. A. & Snurr, R. Q. Topologically guided, automated construction of metal–organic frameworks and their evaluation for energy-related applications. Cryst. Growth Des. 17, 5801–5810 (2017).

    Article  Google Scholar 

  44. Chung, Y. G. et al. Advances, updates, and analytics for the computation-ready, experimental metal–organic framework database: CoRE MOF 2019. J. Chem. Eng. Data 64, 5985–5998 (2019).

    Article  Google Scholar 

  45. O’Keeffe, M., Peskov, M. A., Ramsden, S. J. & Yaghi, O. M. The reticular chemistry structure resource (RCSR) database of, and symbols for, crystal nets. Acc. Chem. Res. 41, 1782–1789 (2008).

    Article  Google Scholar 

  46. Plimpton, S. Fast parallel algorithms for short-range molecular dynamics. J. Comput. Phys. 117, 1–19 (1995).

    Article  MATH  Google Scholar 

  47. Dubbeldam, D., Calero, S., Ellis, D. E. & Snurr, R. Q. RASPA: molecular simulation software for adsorption and diffusion in flexible nanoporous materials. Mol. Simul. 42, 81–101 (2016).

    Article  Google Scholar 

  48. Feynman, R. P., Hibbs, A. R. & Styer, D. F. Quantum Mechanics and Path Integrals. (Courier, 2010).

  49. Fischer, M., Hoffmann, F. & Fröba, M. Preferred hydrogen adsorption sites in various MOFs—a comparative computational study. ChemPhysChem 10, 2647–2657 (2009).

    Article  Google Scholar 

  50. Daglar, H., Erucar, I. & Keskin, S. Exploring the performance limits of MOF/polymer MMMs for O2/N2 separation using computational screening. J. Membr. Sci. 618, 118555 (2021).

    Article  Google Scholar 

  51. Ewald, P. P. Die Berechnung optischer und elektrostatischer Gitterpotentiale. Ann. Phys. 369, 253–287 (1921).

    Article  MATH  Google Scholar 

  52. Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. Preprint at https://doi.org/10.48550/arXiv.1711.05101 (2017).

  53. Kang, Y. et al. MOFTransformer. figshare https://doi.org/10.6084/m9.figshare.21155506.v2 (2022).

  54. Kang, Y. et al. 1 million hypothetical MOFs. figshare https://doi.org/10.6084/m9.figshare.21810147.v2 (2022).

  55. Kang, Y. et al. MOFTransformer. Zenodo https://doi.org/10.5281/zenodo.7593333 (2022).

Download references

Acknowledgements

H.P., Y.K. and J.K. acknowledge funding from the National Research Foundation of Korea under project numbers 2021M3A7C208974513 and 2021R1A2C2003583. This work was supported by the National Supercomputing Center with supercomputing resources including technical support (KSC-2021-CRE-0460). B.S. is supported by the PrISMa Project, which is funded through the ACT programme (Accelerating CCS Technologies, Horizon2020 project number 294766). Financial contributions were made by Business, Energy & Industrial Strategy (BEIS) together with extra funding from Natural Environment Research Council (NERC) and Engineering and Physical Sciences Research Council (EPSRC), UK; Research Council of Norway (RCN), Norway; Swiss Federal Office of Energy (SFOE), Switzerland and the United States Department of Energy (US-DOE), USA, are gratefully acknowledged. Additional financial support from TOTAL and Equinor, is also gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Contributions

Y.K. and H.P. contributed equally to this work. Y.K. and H.P. developed MOFTransformer and wrote the paper with J.K. The paper was written through the contributions of all authors. All authors have given approval for the final version of the paper.

Corresponding author

Correspondence to Jihan Kim.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Notes 1–5, Figs. 1–12 and Table 1.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kang, Y., Park, H., Smit, B. et al. A multi-modal pre-training transformer for universal transfer learning in metal–organic frameworks. Nat Mach Intell 5, 309–318 (2023). https://doi.org/10.1038/s42256-023-00628-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-023-00628-2

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing