Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Diffusion models in bioinformatics and computational biology

Abstract

Denoising diffusion models embody a type of generative artificial intelligence that can be applied in computer vision, natural language processing and bioinformatics. In this Review, we introduce the key concepts and theoretical foundations of three diffusion modelling frameworks (denoising diffusion probabilistic models, noise-conditioned scoring networks and score stochastic differential equations). We then explore their applications in bioinformatics and computational biology, including protein design and generation, drug and small-molecule design, protein–ligand interaction modelling, cryo-electron microscopy image data analysis and single-cell data analysis. Finally, we highlight open-source diffusion model tools and consider the future applications of diffusion models in bioinformatics.

Key points

  • Diffusion models are a generative artificial intelligence technology that can be applied in natural language processing, image synthesis and bioinformatics.

  • Diffusion models have contributed greatly to computational protein design and generation, drug and small-molecule design, protein–ligand interaction modelling, cryo-electron microscopy data enhancement and single-cell data analysis.

  • Many diffusion models are also available as open-source tools.

  • Although diffusion models may potentially outperform other generative approaches, such as generative adversarial networks and variational auto-encoders, their computational resource requirements remain high.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Timeline of advances in diffusion models and their applications in bioinformatics.
Fig. 2: Forward and reverse processes of diffusion models.

Similar content being viewed by others

References

  1. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015). This article provides a comprehensive overview of the advances, challenges and potential of deep learning methods.

    Article  CAS  PubMed  ADS  Google Scholar 

  2. Eickholt, J. & Cheng, J. Predicting protein residue–residue contacts using deep networks and boosting. Bioinformatics 28, 3066–3072 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Baek, M. & Baker, D. Deep learning and protein structure modeling. Nat. Methods 19, 13–14 (2022).

    Article  CAS  PubMed  Google Scholar 

  4. Kulmanov, M. & Hoehndorf, R. DeepGOPlus: improved protein function prediction from sequence. Bioinformatics 36, 422–429 (2020).

    Article  CAS  PubMed  Google Scholar 

  5. Aggarwal, D. & Hasija, Y. A review of deep learning techniques for protein function prediction. Preprint at arXiv https://doi.org/10.48550/arXiv.2211.09705 (2022).

  6. Gligorijević, V. et al. Structure-based protein function prediction using graph convolutional networks. Nat. Commun. 12, 3168 (2021).

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  7. Bileschi, M. L. et al. Using deep learning to annotate the protein universe. Nat. Biotechnol. 40, 932–937 (2022).

    Article  CAS  PubMed  Google Scholar 

  8. Cai, Y., Wang, J. & Deng, L. SDN2GO: an integrated deep learning model for protein function prediction. Front. Bioeng. Biotechnol. 8, 391 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Ko, C. W., Huh, J. & Park, J.-W. Deep learning program to predict protein functions based on sequence information. MethodsX 9, 101622 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Dhakal, A., McKay, C., Tanner, J. J. & Cheng, J. Artificial intelligence in the prediction of protein–ligand interactions: recent advances and future directions. Brief. Bioinform. 23, bbab476 (2022).

    Article  PubMed  Google Scholar 

  11. Verma, N. et al. Ssnet: a deep learning approach for protein–ligand interaction prediction. Int. J. Mol. Sci. 22, 1392 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Gomes, J., Ramsundar, B., Feinberg, E. N. & Pande, V. S. Atomic convolutional networks for predicting protein–ligand binding affinity. Preprint at arXiv https://doi.org/10.48550/arXiv.1703.10603 (2017).

  13. Jiménez, J., Skalic, M., Martinez-Rosell, G. & De Fabritiis, G. Kdeep: protein–ligand absolute binding affinity prediction via 3D-convolutional neural networks. J. Chem. Inf. Model 58, 287–296 (2018).

    Article  PubMed  Google Scholar 

  14. Öztürk, H., Özgür, A. & Ozkirimli, E. DeepDTA: deep drug–target binding affinity prediction. Bioinformatics 34, i821–i829 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Zhou, J. et al. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat. Genet. 50, 1171–1179 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Zrimec, J. et al. Deep learning suggests that gene expression is encoded in all parts of a co-evolving interacting gene regulatory structure. Nat. Commun. 11, 6141 (2020).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  18. Yuan, Y. & Bar-Joseph, Z. Deep learning for inferring gene relationships from single-cell expression data. Proc. Natl Acad. Sci. 116, 27151–27158 (2019).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  19. Khan, A. & Lee, B. Gene transformer: transformers for the gene expression-based classification of lung cancer subtypes. Preprint at arXiv https://doi.org/10.48550/arXiv.2108.11833 (2021).

  20. Singh, R., Lanchantin, J., Robins, G. & Qi, Y. DeepChrome: deep-learning for predicting gene expression from histone modifications. Bioinformatics 32, i639–i648 (2016).

    Article  CAS  PubMed  Google Scholar 

  21. Shu, H. et al. Modeling gene regulatory networks using neural network architectures. Nat. Comput. Sci. 1, 491–501 (2021).

    Article  PubMed  Google Scholar 

  22. Razaghi-Moghadam, Z. & Nikoloski, Z. Supervised learning of gene-regulatory networks based on graph distance profiles of transcriptomics data. npj Syst. Biol. Appl. 6, 21 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Chen, C. et al. DeepGRN: prediction of transcription factor binding site across cell-types using attention-based deep neural networks. BMC Bioinform. 22, 38 (2021).

    CAS  Google Scholar 

  24. Xu, R., Zhang, L. & Chen, Y. CdtGRN: Construction of qualitative time-delayed gene regulatory networks with a deep learning method. Preprint at arXiv https://doi.org/10.48550/arXiv.2111.00287 (2021).

  25. Kwon, M. S., Lee, B. T., Lee, S. Y. & Kim, H. U. Modeling regulatory networks using machine learning for systems metabolic engineering. Curr. Opin. Biotechnol. 65, 163–170 (2020).

    Article  CAS  PubMed  Google Scholar 

  26. LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).

    Article  Google Scholar 

  27. Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).

    Article  CAS  PubMed  Google Scholar 

  28. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).

  29. Goodfellow, I. et al. Generative adversarial networks. Commun. ACM 63, 139–144 (2020).

    Article  Google Scholar 

  30. Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M. & Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 20, 61–80 (2008).

    Article  PubMed  Google Scholar 

  31. Vaswani, A. et al. Attention is All you Need. In Advances in Neural Information Processing Systems Vol. 30 (eds Guyon, I. et al.) (Curran Associates, 2017).

  32. Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N. & Ganguli, S. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. In Proc. 32nd Int. Conference on Machine Learning Vol. 37 (eds Bach, F. & Blei, D.) 2256–2265 (PMLR, 2015).

  33. Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020). This article introduces the denoising diffusion probabilistic model, which was the first diffusion model capable of generating high-resolution data.

    Google Scholar 

  34. Song, Y. & Ermon, S. Generative Modeling by Estimating Gradients of the Data Distribution. In Advances in Neural Information Processing Systems Vol. 32 (eds Wallach, H. et al.) (Curran Associates, 2019). This article introduces the noise-conditioned score network, which is one of the three main diffusion model frameworks.

  35. Song, Y. et al. Score-based generative modeling through stochastic differential equations. Preprint at arXiv https://doi.org/10.48550/arXiv.2011.13456 (2020). This article introduces score stochastic differential equations for unconditional image generation.

  36. Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. High-resolution image synthesis with latent diffusion models. In Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition 10684–10695 (2022). This article reports stable diffusion for image inpainting, class-conditional image synthesis and other tasks, including text-to-image synthesis and unconditional image generation.

  37. Saharia, C. et al. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. In Advances in Neural Information Processing Systems Vol. 35 (eds Koyejo, S. et al.) 36479–36494 (Curran Associates, 2022).

  38. Wang, Z., Zheng, H., He, P., Chen, W. & Zhou, M. Diffusion-GAN: Training GANs with diffusion. Preprint at arXiv https://doi.org/10.48550/arXiv.2206.02262 (2022).

  39. Zheng, H., He, P., Chen, W. & Zhou, M. Truncated diffusion probabilistic models and diffusion-based adversarial auto-encoders. Preprint at arXiv https://doi.org/10.48550/arXiv.2202.09671 (2022).

  40. Xie, P. et al. Vector quantized diffusion model with CodeUnet for text-to-sign pose sequences generation. Preprint at arXiv https://doi.org/10.48550/arXiv.2208.09141 (2022).

  41. Kim, D., Kim, Y., Kang, W. & Moon, I.-C. Refining generative process with discriminator guidance in score-based diffusion models. Preprint at arXiv https://doi.org/10.48550/arXiv.2211.17091 (2022).

  42. Zheng, G. et al. Entropy-driven sampling and training scheme for conditional diffusion generation. In Eur. Conf. on Computer Vision 754–769 (Springer, 2022).

  43. Saharia, C. et al. Palette: image-to-image diffusion models. In ACM SIGGRAPH ‘22 Conf. Proc. https://doi.org/10.1145/3528233.3530757 (ACM, 2022).

  44. Wang, Y., Yu, J. & Zhang, J. Zero-shot image restoration using denoising diffusion null-space model. Preprint at arXiv https://doi.org/10.48550/arXiv.2212.00490 (2022).

  45. Lam, M. W., Wang, J., Su, D. & Yu, D. BDDM: bilateral denoising diffusion models for fast and high-quality speech synthesis. Preprint at arXiv https://doi.org/10.48550/arXiv.2203.13508 (2022).

  46. van den Oord, A. et al. Conditional Image Generation with PixelCNN Decoders. In Advances in Neural Information Processing Systems Vol. 29 (eds Lee, D. et al.) (Curran Associates, 2016).

  47. Papamakarios, G., Nalisnick, E. T., Rezende, D. J., Mohamed, S. & Lakshminarayanan, B. Normalizing flows for probabilistic modeling and inference. J. Mach. Learn. Res. 22, 1–64 (2021).

    MathSciNet  Google Scholar 

  48. LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M. & Huang, F. A tutorial on energy-based learning. In Predicting Structured Data (eds Bakir, G., Hofman, T., Schölkopf, B., Smola, A. & Taskar, B.) Vol. 1 (MIT Press, 2006).

  49. Kingma, D. P. & Welling, M. Auto-encoding variational bayes. Preprint at arXiv https://doi.org/10.48550/arXiv.1312.6114 (2013).

  50. Dhariwal, P. & Nichol, A. Diffusion models beat GANs on image synthesis. Adv. Neural Inf. Process. Syst. 34, 8780–8794 (2021).

    Google Scholar 

  51. Li, H. et al. SRDiff: single image super-resolution with diffusion probabilistic models. Neurocomputing 479, 47–59 (2022).

    Article  Google Scholar 

  52. Giannone, G., Nielsen, D. & Winther, O. Few-shot diffusion models. Preprint at arXiv https://doi.org/10.48550/arXiv.2205.15463 (2022).

  53. Lyu, Z., Kong, Z., Xu, X., Pan, L. & Lin, D. A conditional point diffusion-refinement paradigm for 3d point cloud completion. Preprint at arXiv https://doi.org/10.48550/arXiv.2112.03530 (2021).

  54. Hoogeboom, E., Satorras, V. c. G., Vignac, C. & Welling, M. Equivariant Diffusion for Molecule Generation in 3D. In Proc. 39th Int. Conference on Machine Learning Vol. 162 (eds Chaudhuri, K. et al.) 8867–8887 (PMLR, 2022). This article reports a foundational diffusion model that directly generates molecules in 3D space based on an equivariant graph neural network architecture.

  55. Li, X., Thickstun, J., Gulrajani, I., Liang, P. S. & Hashimoto, T. B. Diffusion-LM Improves Controllable Text Generation. In Advances in Neural Information Processing Systems Vol. 35 (eds Koyejo, S. et al.) 4328–4343 (Curran Associates, 2022).

  56. Amit, T., Nachmani, E., Shaharbany, T. & Wolf, L. SegDiff: image segmentation with diffusion probabilistic models. Preprint at arXiv https://doi.org/10.48550/arXiv.2112.00390 (2021).

  57. Baranchuk, D., Rubachev, I., Voynov, A., Khrulkov, V. & Babenko, A. Label-efficient semantic segmentation with diffusion models. Preprint at arXiv https://doi.org/10.48550/arXiv.2112.03126 (2021).

  58. Brempong, E. A. et al. Denoising pretraining for semantic segmentation. In Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition 4175–4186 (IEEE, 2022).

  59. Cai, R. et al. Learning gradient fields for shape generation. In Eur. Conf. on Computer Vision 364–381 (Springer, 2020).

  60. Ho, J. et al. Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23, 1–33 (2022).

    MathSciNet  CAS  Google Scholar 

  61. Ho, J. et al. Video diffusion models. Preprint at arXiv https://doi.org/10.48550/arXiv.2204.03458 (2022).

  62. Kawar, B., Vaksman, G. & Elad, M. Stochastic image denoising by sampling from the posterior distribution. In Proc. IEEE/CVF Int. Conf. on Computer Vision 1866–1875 (2021).

  63. Kim, B., Han, I. & Ye, J. C. DiffuseMorph: unsupervised deformable image registration along continuous trajectory using diffusion models. Preprint at arXiv https://doi.org/10.48550/arXiv.2112.05149 (2021).

  64. Luo, S. & Hu, W. Score-based point cloud denoising. In Proc. IEEE/CVF Int. Conf. on Computer Vision 4583–4592 (IEEE, 2021).

  65. Meng, C. et al. Sdedit: Guided image synthesis and editing with stochastic differential equations. Preprint at arXiv https://doi.org/10.48550/arXiv.2108.01073 (2021).

  66. Özbey, M. et al. Unsupervised medical image translation with adversarial diffusion models. Preprint at arXiv https://doi.org/10.48550/arXiv.2207.08208 (2023).

  67. Saharia, C. et al. Image super-resolution via iterative refinement. In IEEE Trans. on Pattern Analysis and Machine Intelligence 4713–4726 (IEEE, 2022).

  68. Whang, J. et al. Deblurring via stochastic refinement. In Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition 16293–16303 (IEEE, 2022).

  69. Yang, R. & Mandt, S. Lossy image compression with conditional diffusion models. Preprint at arXiv https://doi.org/10.48550/arXiv.2209.06950 (2022).

  70. Zhao, M., Bao, F., Chongxuan, L. I. & Zhu, J. EGSDE: Unpaired Image-to-Image Translation via Energy-Guided Stochastic Differential Equations. In Advances in Neural Information Processing Systems Vol. 35 (eds Koyejo, S. et al.) 3609–3623 (Curran Associates, 2022).

  71. Zimmermann, R. S., Schott, L., Song, Y., Dunn, B. A. & Klindt, D. A. Score-based generative classifiers. Preprint at arXiv https://doi.org/10.48550/arXiv.2110.00473 (2021).

  72. Austin, J., Johnson, D. D., Ho, J., Tarlow, D. & van den Berg, R. Structured denoising diffusion models in discrete state-spaces. Adv. Neural Inf. Process. Syst. 34, 17981–17993 (2021).

    Google Scholar 

  73. Hoogeboom, E., Nielsen, D., Jaini, P., Forré, P. & Welling, M. Argmax flows and multinomial diffusion: learning categorical distributions. Adv. Neural Inf. Process. Syst. 34, 12454–12465 (2021).

    Google Scholar 

  74. Savinov, N., Chung, J., Binkowski, M., Elsen, E. & Oord, A. V. D. Step-unrolled denoising autoencoders for text generation. Preprint at arXiv https://doi.org/10.48550/arXiv.2112.06749 (2021).

  75. Yu, P. et al. Latent diffusion energy-based model for interpretable text modeling. Preprint at arXiv https://doi.org/10.48550/arXiv.2206.05895 (2022).

  76. Alcaraz, J. M. L. & Strodthoff, N. Diffusion-based time series imputation and forecasting with structured state space models. Preprint at arXiv https://doi.org/10.48550/arXiv.2208.09399 (2022).

  77. Chen, N. et al. WaveGrad: estimating gradients for waveform generation. Preprint at arXiv https://doi.org/10.48550/arXiv.2009.00713 (2020).

  78. Kong, Z., Ping, W., Huang, J., Zhao, K. & Catanzaro, B. DiffWave: a versatile diffusion model for audio synthesis. Preprint at arXiv https://doi.org/10.48550/arXiv.2009.09761 (2020).

  79. Rasul, K., Sheikh, A.-S., Schuster, I., Bergmann, U. & Vollgraf, R. Multivariate probabilistic time series forecasting via conditioned normalizing flows. Preprint at arXiv https://doi.org/10.48550/arXiv.2002.06103 (2020).

  80. Tashiro, Y., Song, J., Song, Y. & Ermon, S. CSDI: conditional score-based diffusion models for probabilistic time series imputation. Adv. Neural Inf. Process. Syst. 34, 24804–24816 (2021).

    Google Scholar 

  81. Yan, T., Zhang, H., Zhou, T., Zhan, Y. & Xia, Y. ScoreGrad: multivariate probabilistic time series forecasting with continuous energy-based generative models. Preprint at arXiv https://doi.org/10.48550/arXiv.2106.10121 (2021).

  82. Avrahami, O., Lischinski, D. & Fried, O. Blended diffusion for text-driven editing of natural images. In Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition 18208–18218 (IEEE, 2022).

  83. Ramesh, A., Dhariwal, P., Nichol, A., Chu, C. & Chen, M. Hierarchical text-conditional image generation with clip latents. Preprint at arXiv https://doi.org/10.48550/arXiv.2204.06125 (2022).

  84. Anand, N. & Achim, T. Protein structure and sequence generation with equivariant denoising diffusion probabilistic models. Preprint at arXiv https://doi.org/10.48550/arXiv.2205.15019 (2022).

  85. Cao, C., Cui, Z.-X., Liu, S., Liang, D. & Zhu, Y. High-frequency space diffusion models for accelerated MRI. Preprint at arXiv https://doi.org/10.48550/arXiv.2208.05481 (2022).

  86. Chung, H., Lee, E. S. & Ye, J. C. MR image denoising and super-resolution using regularized reverse diffusion. IEEE Trans. Med. Imaging 42, 922–934 (2022).

    Article  Google Scholar 

  87. Chung, H., Sim, B. & Ye, J. C. Come-closer-diffuse-faster: accelerating conditional diffusion models for inverse problems through stochastic contraction. In Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition 12413–12422 (IEEE, 2022).

  88. Chung, H. & Ye, J. C. Score-based diffusion models for accelerated MRI. Med. Image Anal. 80, 102479 (2022).

    Article  PubMed  Google Scholar 

  89. Güngör, A. et al. Adaptive diffusion priors for accelerated MRI reconstruction. Med. Image Anal. 88, 102872 (2023).

    Article  PubMed  Google Scholar 

  90. Jing, B., Corso, G., Chang, J., Barzilay, R. & Jaakkola, T. Torsional Diffusion for Molecular Conformer Generation. In Advances in Neural Information Processing Systems Vol. 35 (eds Koyejo, S. et al.) 24240–24253 (Curran Associates, 2022).

  91. Lee, J. S. & Kim, P. M. ProteinSGM: score-based generative modeling for de novo protein design. Preprint at bioRxiv https://doi.org/10.1101/2022.07.13.499967 (2022).

  92. Luo, S. et al. Antigen-Specific Antibody Design and Optimization with Diffusion-Based Generative Models for Protein Structures. In Advances in Neural Information Processing Systems Vol. 35 (eds Koyejo, S. et al.) 9754–9767 (Curran Associates, 2022).

  93. Mei, S., Fan, F. & Maier, A. Metal inpainting in CBCT projections using score-based generative model. Preprint at arXiv https://doi.org/10.48550/arXiv.2209.09733 (2022).

  94. Du, Y. & Mordatch, I. Implicit Generation and Modeling with Energy Based Models. In Advances in Neural Information Processing Systems Vol. 32 (eds Wallach, H. et al.) (Curran Associates, 2019).

  95. Brock, A., Donahue, J. & Simonyan, K. Large scale GAN training for high fidelity natural image synthesis. Preprint at arXiv https://doi.org/10.48550/arXiv.1809.11096 (2018).

  96. Karras, T. et al. Training generative adversarial networks with limited data. Adv. Neural Inf. Process. Syst. 33, 12104–12114 (2020).

    Google Scholar 

  97. Song, J., Meng, C. & Ermon, S. Denoising diffusion implicit models. Preprint at arXiv https://doi.org/10.48550/arXiv.2010.02502 (2020).

  98. Kreis, K., Dockhorn, T., Li, Z. & Zhong, E. Latent space diffusion models of cryo-EM structures. Preprint at arXiv https://doi.org/10.48550/arXiv.2211.14169 (2022).

  99. Waibel, D. J., Röell, E., Rieck, B., Giryes, R. & Marr, C. A diffusion model predicts 3D shapes from 2D microscopy images. Preprint at arXiv https://doi.org/10.48550/arXiv.2208.14125 (2022).

  100. Tjärnberg, A. et al. Optimal tuning of weighted kNN- and diffusion-based methods for denoising single cell genomics data. PLoS Comput. Biol. 17, e1008569 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  101. Wu, K. E. et al. Protein structure generation via folding diffusion. Preprint at arXiv https://doi.org/10.48550/arXiv.2209.15611 (2022).

  102. Gao, Z., Tan, C. & Li, S. Z. DiffSDS: a language diffusion model for protein backbone inpainting under geometric conditions and constraints. Preprint at arXiv https://doi.org/10.48550/arXiv.2301.09642 (2023).

  103. Lin, Y. & AlQuraishi, M. Generating novel, designable, and diverse protein structures by equivariantly diffusing oriented residue clouds. Preprint at arXiv https://doi.org/10.48550/arXiv.2301.12485 (2023).

  104. Trippe, B. L. et al. Diffusion probabilistic modeling of protein backbones in 3D for the motif-scaffolding problem. Preprint at arXiv https://doi.org/10.48550/arXiv.2206.04119 (2022).

  105. Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100 (2023). This article presents RFdiffusion, which can be applied to complex protein-generation tasks.

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  106. Yim, J. et al. SE(3) diffusion model with application to protein backbone generation. Preprint at arXiv https://doi.org/10.48550/arXiv.2302.02277 (2023).

  107. Ingraham, J. et al. Illuminating protein space with a programmable generative model. Preprint at bioRxiv https://doi.org/10.1101/2022.12.01.518682 (2022). This article reports the graph-neural-network-based conditional diffusion model Chroma, which can generate large single-chain proteins and protein complexes with programmable properties and functions.

  108. Huang, H., Sun, L., Du, B. & Lv, W. Conditional diffusion based on discrete graph structures for molecular graph generation. Preprint at arXiv https://doi.org/10.48550/arXiv.2301.00427 (2023).

  109. Wu, L., Gong, C., Liu, X., Ye, M. & Liu, Q. Diffusion-based Molecule Generation with Informative Prior Bridges. In Advances in Neural Information Processing Systems Vol. 35 (eds Koyejo, S. et al.) 36533–36545 (Curran Associates, 2022).

  110. Luo, S., Shi, C., Xu, M. & Tang, J. Predicting molecular conformation via dynamic graph score matching. Adv. Neural Inf. Process. Syst. 34, 19784–19795 (2021).

    Google Scholar 

  111. Zhang, H. et al. SDEGen: learning to evolve molecular conformations from thermodynamic noise for conformation generation. Chem. Sci. 14, 1557–1568 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  112. Wu, F. & Li, S. Z. DIFFMD: a geometric diffusion model for molecular dynamics simulations. In Proc. AAAI Conference Artificial Intelligence 37, 5321–5329 (2003).

    Article  Google Scholar 

  113. Igashov, I. et al. Equivariant 3D-conditional diffusion models for molecular linker design. Preprint at arXiv https://doi.org/10.48550/arXiv.2210.05274 (2022).

  114. Lin, H. et al. DiffBP: generative diffusion of 3D molecules for target protein binding. Preprint at arXiv https://doi.org/10.48550/arXiv.2211.11214 (2022).

  115. Schneuing, A. et al. Structure-based drug design with equivariant diffusion models. Preprint at arXiv https://doi.org/10.48550/arXiv.2210.13695 (2022).

  116. Corso, G., Stärk, H., Jing, B., Barzilay, R. & Jaakkola, T. DiffDock: diffusion steps, twists, and turns for molecular docking. Preprint at arXiv https://doi.org/10.48550/arXiv.2210.01776 (2022). This article presents the diffusion model DiffDock for protein pocket docking.

  117. Qiao, Z., Nie, W., Vahdat, A., Miller III, T. F. & Anandkumar, A. Dynamic-backbone protein–ligand structure prediction with multiscale generative diffusion models. Preprint at arXiv https://doi.org/10.48550/arXiv.2209.15171 (2022).

  118. Jin, W., Sarkizova, S., Chen, X., Hacohen, N. & Uhler, C. Unsupervised protein–ligand binding energy prediction via neural Euler’s rotation equation. Preprint at arXiv https://doi.org/10.48550/arXiv.2301.10814 (2023).

  119. Song, Y. & Ermon, S. Improved techniques for training score-based generative models. Adv. Neural Inf. Process. Syst. 33, 12438–12448 (2020).

    Google Scholar 

  120. Song, Y., Durkan, C., Murray, I. & Ermon, S. Maximum likelihood training of score-based diffusion models. Adv. Neural Inf. Process. Syst. 34, 1415–1428 (2021).

    Google Scholar 

  121. Hyvärinen, A. & Dayan, P. Estimation of non-normalized statistical models by score matching. J. Mach. Learn. Res. 6, 95–709 (2005).

  122. Raphan, M. & Simoncelli, E. P. Least squares estimation without priors or supervision. Neural Comput. 23, 374–420 (2011).

    Article  MathSciNet  PubMed  Google Scholar 

  123. Raphan, M. & Simoncelli, E. Learning to be Bayesian without Supervision. In Advances in Neural Information Processing Systems Vol. 19 (eds Scholkopf, B., Platt, J. & Hoffman, T) (MIT Press, 2006).

  124. Vincent, P. A connection between score matching and denoising autoencoders. Neural Comput. 23, 1661–1674 (2011).

    Article  MathSciNet  PubMed  ADS  Google Scholar 

  125. Song, Y., Garg, S., Shi, J. & Ermon, S. Sliced Score Matching: A Scalable Approach to Density and Score Estimation. In Proc. 35th Uncertainty in Artificial Intelligence Conference Vol. 115 (eds Adams R., & Gogate, V.) 574–584 (PMLR, 2020).

  126. Kingma, D., Salimans, T., Poole, B. & Ho, J. Variational diffusion models. Adv. Neural Inf. Process. Syst. 34, 21696–21707 (2021).

    Google Scholar 

  127. Luo, C. Understanding diffusion models: a unified perspective. Preprint at arXiv https://doi.org/10.48550/arXiv.2208.11970 (2022).

  128. Arnold, L. Stochastic Differential Equations (Wiley, 1974).

  129. Anderson, B. D. Reverse-time diffusion equation models. Stoch. Process. Appl. 12, 313–326 (1982).

    Article  MathSciNet  Google Scholar 

  130. Nichol, A. Q. & Dhariwal, P. Improved Denoising Diffusion Probabilistic Models. In Proc. 38th Int. Conference on Machine Learning Vol. 139 (eds Meila, M. & Zhang, T.) 8162–8171 (PMLR, 2021).

  131. Bansal, A. et al. Cold diffusion: inverting arbitrary image transforms without noise. Preprint at arXiv https://doi.org/10.48550/arXiv.2208.09392 (2022).

  132. Kong, Z. & Ping, W. On fast sampling of diffusion probabilistic models. Preprint at arXiv https://doi.org/10.48550/arXiv.2106.00132 (2021).

  133. Salimans, T. & Ho, J. Progressive distillation for fast sampling of diffusion models. Preprint at arXiv https://doi.org/10.48550/arXiv.2202.00512 (2022).

  134. Jolicoeur-Martineau, A., Li, K., Piché-Taillefer, R., Kachman, T. & Mitliagkas, I. Gotta go fast when generating data with score-based models. Preprint at arXiv https://doi.org/10.48550/arXiv.2105.14080 (2021).

  135. Karras, T., Aittala, M., Aila, T. & Laine, S. Elucidating the Design Space of Diffusion-Based Generative Models. In Advances in Neural Information Processing Systems Vol. 35 (eds Koyejo, S. et al.) 26565–26577 (Curran Associates, 2022).

  136. Lu, C. et al. DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps. In Advances in Neural Information Processing Systems Vol. 35 (eds Koyejo, S. et al.) 5775–5787 (Curran Associates, 2022).

  137. Liu, L., Ren, Y., Lin, Z. & Zhao, Z. Pseudo numerical methods for diffusion models on manifolds. Preprint at arXiv https://doi.org/10.48550/arXiv.2202.09778 (2022).

  138. Bao, F., Li, C., Zhu, J. & Zhang, B. Analytic-DPM: an analytic estimate of the optimal reverse variance in diffusion probabilistic models. Preprint at arXiv https://doi.org/10.48550/arXiv.2201.06503 (2022).

  139. Lu, C. et al. DPM-solver++: fast solver for guided sampling of diffusion probabilistic models. Preprint at arXiv https://doi.org/10.48550/arXiv.2211.01095 (2022).

  140. Vahdat, A., Kreis, K. & Kautz, J. Score-based generative modeling in latent space. Adv. Neural Inf. Process. Syst. 34, 11287–11302 (2021).

    Google Scholar 

  141. Zhang, Q. & Chen, Y. Diffusion normalizing flow. Adv. Neural Inf. Process. Syst. 34, 16280–16291 (2021).

    Google Scholar 

  142. Pandey, K., Mukherjee, A., Rai, P. & Kumar, A. DiffuseVAE: efficient, controllable and high-fidelity generation from low-dimensional latents. Preprint at arXiv https://doi.org/10.48550/arXiv.2201.00308 (2022).

  143. Luo, S. & Hu, W. Diffusion probabilistic models for 3D point cloud generation. In Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition 2837–2845 (IEEE, 2021).

  144. Shi, C., Luo, S., Xu, M. & Tang, J. Learning Gradient Fields for Molecular Conformation Generation. In Proc. 38th Int. Conference on Machine Learning Vol. 139 (ed Meila, M. & Zhang, T.) 9558–9568 (PMLR, 2021).

  145. Zhou, L., Du, Y. & Wu, J. 3D shape generation and completion through point-voxel diffusion. In Proc. IEEE/CVF Int. Conf. on Computer Vision 5826–5835 (IEEE, 2021).

  146. Hoogeboom, E. et al. Autoregressive diffusion models. Preprint at arXiv https://doi.org/10.48550/arXiv.2110.02037 (2021).

  147. Xu, M. et al. GeoDiff: a geometric diffusion model for molecular conformation generation. Preprint at arXiv https://doi.org/10.48550/arXiv.2203.02923 (2022).

  148. Jo, J., Lee, S. & Hwang, S. J. Score-based Generative Modeling of Graphs via the System of Stochastic Differential Equations. In Proc. 39th Int. Conference on Machine Learning Vol. 162 (eds Chaudhuri, K. et al.) 10362–10383 (PMLR, 2022).

  149. De Bortoli, V. et al. Riemannian score-based generative modelling. Adv. Neural Inf. Process. 35, 2406–2422 (2022).

    Google Scholar 

  150. Chen, T., Zhang, R. & Hinton, G. Analog bits: generating discrete data using diffusion models with self-conditioning. Preprint at arXiv https://doi.org/10.48550/arXiv.2208.04202 (2022).

  151. Niu, C. et al. Permutation Invariant Graph Generation via Score-Based Generative Modeling. In Proc. 23rd Int. Conference on Artificial Intelligence and Statistics Vol. 108 (eds Chiappa, S. & Calandra, R.) 4474–4484 (PMLR, 2020).

  152. Yang, L. et al. Diffusion models: a comprehensive survey of methods and applications. Preprint at arXiv https://doi.org/10.48550/arXiv.2209.00796 (2022).

  153. Anand, N. & Huang, P. Generative modeling for protein structures. In Advances in Neural Information Processing Systems Vol. 31 (eds Bengio, S. et al.) (Curran Associates, 2018).

  154. Lin, Z., Sercu, T., LeCun, Y. & Rives, A. Deep generative models create new and diverse protein structures. In Machine Learning for Structural Biology Workshop, NeurIPS (2021).

  155. Eguchi, R. R., Choe, C. A. & Huang, P.-S. Ig-VAE: generative modeling of protein structure by direct 3D coordinate generation. PLoS Comput. Biol. 18, e1010271 (2022).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  156. Anishchenko, I. et al. De novo protein design by deep network hallucination. Nature 600, 547–552 (2021).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  157. Greener, J. G., Moffat, L. & Jones, D. T. Design of metalloproteins and novel protein folds using variational autoencoders. Sci. Rep. 8, 16189 (2018).

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  158. Anand, N., Eguchi, R. & Huang, P.-S. Fully differentiable full-atom protein backbone generation. In Proc. Deep Generative Models for Highly Structured Data, ICLR 2019 Workshop (OpenReview.net, 2019).

  159. Karimi, M., Zhu, S., Cao, Y. & Shen, Y. De novo protein design for novel folds using guided conditional Wasserstein generative adversarial networks. J. Chem. Inf. Model 60, 5667–5681 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  160. Simons, K. T., Bonneau, R., Ruczinski, I. & Baker, D. Ab initio protein structure prediction of CASP III targets using ROSETTA. Proteins Struct. Funct. Bioinform. 37, 171–176 (1999).

    Article  Google Scholar 

  161. Satorras, V. c. G., Hoogeboom, E. & Welling, M. E(n) Equivariant Graph Neural Networks. In Proc. 38th Int. Conference on Machine Learning Vol. 139 (eds Meila, M. & Zhang, T.) 9323–9332 (PMLR, 2021).

  162. Thomas, N. et al. Tensor field networks: rotation-and translation-equivariant neural networks for 3D point clouds. Preprint at arXiv https://doi.org/10.48550/arXiv.1802.08219 (2018).

  163. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  164. Anand, N. et al. Protein sequence design with a learned potential. Nat. Commun. 13, 746 (2022).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  165. Leaver-Fay, A. et al. Scientific benchmarks for guiding macromolecular energy function improvement. Methods Enzymol. 523, 109–143 (2013).

  166. Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  167. Chène, P. Inhibiting the p53–MDM2 interaction: an important target for cancer therapy. Nat. Rev. Cancer 3, 102–109 (2003).

    Article  PubMed  Google Scholar 

  168. Salgado, E. N., Lewis, R. A., Mossin, S., Rheingold, A. L. & Tezcan, F. A. Control of protein oligomerization symmetry by metal coordination: C2 and C3 symmetrical assemblies through CuII and NiII coordination. Inorg. Chem. 48, 2726–2728 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  169. Salgado, E. N. et al. Metal templated design of protein interfaces. Proc. Natl Acad. Sci. 107, 1827–1832 (2010).

    Article  CAS  PubMed  ADS  Google Scholar 

  170. Cao, L. et al. Design of protein-binding proteins from the target structure alone. Nature 605, 551–560 (2022).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  171. Chevalier, A. et al. Massively parallel de novo protein design for targeted therapeutics. Nature 550, 74–79 (2017).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  172. Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  173. De Cao, N. & Kipf, T. MolGAN: an implicit generative model for small molecular graphs. Preprint at arXiv https://doi.org/10.48550/arXiv.1805.11973 (2018).

  174. Wu, Z. et al. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9, 513–530 (2018).

    Article  CAS  PubMed  Google Scholar 

  175. You, J., Liu, B., Ying, Z., Pande, V. & Leskovec, J. Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation. In Advances in Neural Information Processing Systems Vol. 31 (eds Bengio, S. et al.) (Curran Associates, 2018).

  176. Kloeden, P. E., Platen, E., Kloeden, P. E. & Platen, E. Stochastic Differential Equations (Springer, 1992).

  177. Shi, C. et al. GraphAF: a flow-based autoregressive model for molecular graph generation. Preprint at arXiv https://doi.org/10.48550/arXiv.2001.09382 (2020).

  178. Luo, Y., Yan, K. & Ji, S. GraphDF: A Discrete Flow Model for Molecular Graph Generation. In Proc. 38th Int. Conference on Machine Learning Vol. 139 (eds Meila, M. & Zhang, T.) 7192–7203 (PMLR, 2021).

  179. Zang, C. & Wang, F. MoFlow: an invertible flow model for generating molecular graphs. In Proc. 26th ACM SIGKDD Int. Conf. on Knowledge Discovery & Data Mining 617–626 (2020).

  180. Lippe, P. & Gavves, E. Categorical normalizing flows via continuous transformations. Preprint at arXiv https://doi.org/10.48550/arXiv.2006.09790 (2020).

  181. Liu, M., Yan, K., Oztekin, B. & Ji, S. G. Molecular graph generation with energy-based models. Preprint at arXiv https://doi.org/10.48550/arXiv.2102.00546 (2021).

  182. Erdős, P. & Rényi, A. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci. 5, 17–60 (1960).

    MathSciNet  Google Scholar 

  183. Kipf, T. N. & Welling, M. Variational graph auto-encoders. Preprint at arXiv https://doi.org/10.48550/arXiv.1611.07308 (2016).

  184. You, J., Ying, R., Ren, X., Hamilton, W. & Leskovec, J. GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models. In Proc. 35th Int. Conference on Machine Learning Vol. 80 (eds Dy, J & Krause, A.) 5708–5717 (PMLR, 2018).

  185. Liao, R. et al. Efficient Graph Generation with Graph Recurrent Attention Networks. In Advances in Neural Information Processing Systems Vol. 32 (eds Wallach, H. et al.) (Curran Associates, 2019).

  186. Garcia Satorras, V., Hoogeboom, E., Fuchs, F., Posner, I. & Welling, M. E(n) Equivariant Normalizing Flows. In Advances in Neural Information Processing Systems Vol. 34 (eds Ranzato, M. et al.) 4181–4192 (Curran Associates, 2021).

  187. Gebauer, N., Gastegger, M. & Schütt, K. Symmetry-adapted generation of 3D point sets for the targeted discovery of molecules. In Advances in Neural Information Processing Systems Vol. 32 (eds Wallach, H. et al.) (Curran Associates, 2019).

  188. Simonovsky, M. & Komodakis, N. Graphvae: towards generation of small graphs using variational autoencoders. In Artificial Neural Networks and Machine Learning (ICANN 2018) 27th Int. Conf. on Artificial Neural Networks Proc. Part I 412–422 (Springer, 2018).

  189. Mitton, J., Senn, H. M., Wynne, K. & Murray-Smith, R. A graph VAE graph transformer approach to generating molecular graphs. Preprint at arXiv https://doi.org/10.48550/arXiv.2104.04345 (2021).

  190. Vignac, C. & Frossard, P. Top-N: equivariant set and graph generation without exchangeability. Preprint at arXiv https://doi.org/10.48550/arXiv.2110.02096 (2021).

  191. Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. Neural Message Passing for Quantum Chemistry. In Proc. 34th Int. Conference on Machine Learning Vol. 70 (eds Precup, D. & Teh, Y.) 1263–1272 (PMLR, 2017).

  192. Riniker, S. & Landrum, G. A. Better informed distance geometry: using what we know to improve conformation generation. J. Chem. Inf. Model 55, 2562–2574 (2015).

    Article  CAS  PubMed  Google Scholar 

  193. Xu, M., Luo, S., Bengio, Y., Peng, J. & Tang, J. Learning neural generative dynamics for molecular conformation generation. Preprint at arXiv https://doi.org/10.48550/arXiv.2102.10240 (2021).

  194. Simm, G. N. & Hernández-Lobato, J. M. A generative model for molecular distance geometry. Preprint at arXiv https://doi.org/10.48550/arXiv.1909.11459 (2019).

  195. Mansimov, E., Mahmood, O., Kang, S. & Cho, K. Molecular geometry prediction using a deep generative graph neural network. Sci. Rep. 9, 20381 (2019).

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  196. Zhu, J. et al. Direct molecular conformation generation. Preprint at arXiv https://doi.org/10.48550/arXiv.2202.01356 (2022).

  197. Köhler, J., Klein, L. & Noé, F. Equivariant flows: sampling configurations for multi-body systems with symmetric energies. Preprint at arXiv https://doi.org/10.48550/arXiv.1910.00753 (2019).

  198. Fuchs, F., Worrall, D., Fischer, V. & Welling, M. Se(3)-transformers: 3D roto-translation equivariant attention networks. Adv. Neural Inf. Process. Syst. 33, 1970–1981 (2020).

    Google Scholar 

  199. Huang, W. et al. Equivariant graph mechanics networks with constraints. Preprint at arXiv https://doi.org/10.48550/arXiv.2203.06442 (2022).

  200. Gao, A. & Remsing, R. C. Self-consistent determination of long-range electrostatics in neural network potentials. Nat. Commun. 13, 1572 (2022).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  201. Imrie, F., Bradley, A. R., van der Schaar, M. & Deane, C. M. Deep generative models for 3D linker design. J. Chem. Inf. Model 60, 1983–1995 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  202. Huang, Y., Peng, X., Ma, J. & Zhang, M. 3DLinker: an E(3) equivariant variational autoencoder for molecular linker design. Preprint at arXiv https://doi.org/10.48550/arXiv.2205.07309 (2022).

  203. Liu, M., Luo, Y., Uchino, K., Maruhashi, K. & Ji, S. Generating 3D molecules for target protein binding. Preprint at arXiv https://doi.org/10.48550/arXiv.2204.09410 (2022).

  204. Masuda, T., Ragoza, M. & Koes, D. R. Generating 3D molecular structures conditional on a receptor binding site with deep generative models. Preprint at arXiv https://doi.org/10.48550/arXiv.2010.14442 (2020).

  205. Luo, S., Guan, J., Ma, J. & Peng, J. A 3D generative model for structure-based drug design. Adv. Neural Inf. Process. Syst. 34, 6229–6239 (2021).

    Google Scholar 

  206. Peng, X. et al. Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets. In Proc. 39th Int. Conference on Machine Learning Vol. 162 (eds Chaudhuri, K. et al.) 17644–17655 (PMLR, 2022).

  207. Jing, B., Eismann, S., Soni, P. N. & Dror, R. O. Equivariant graph neural networks for 3D macromolecular structure. Preprint at arXiv https://doi.org/10.48550/arXiv.2106.03843 (2021).

  208. Francoeur, P. G. et al. Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. J. Chem. Inf. Model. 60, 4200–4215 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  209. Koes, D. R., Baumgartner, M. P. & Camacho, C. J. Lessons learned in empirical scoring with smina from the CSAR 2011 benchmarking exercise. J. Chem. Inf. Model 53, 1893–1904 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  210. Hassan, N. M., Alhossary, A. A., Mu, Y. & Kwoh, C.-K. Protein–ligand blind docking using QuickVina-W with inter-process spatio-temporal integration. Sci. Rep. 7, 15451 (2017).

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  211. Friesner, R. A. et al. Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J. Med. Chem. 47, 1739–1749 (2004).

    Article  CAS  PubMed  Google Scholar 

  212. McNutt, A. T. et al. GNINA 1.0: molecular docking with deep learning. J. Cheminform. 13, 43 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  213. Strk, H., Ganea, O., Pattanaik, L., Barzilay, D. R. & Jaakkola, T. EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction. In Proc. 39th Int. Conference on Machine Learning Vol. 162 (eds Chaudhuri, K. et al.) 20503–20521 (PMLR, 2022).

  214. Lu, W. et al. TANKBind: Trigonometry-Aware Neural NetworKs for Drug-Protein Binding Structure Prediction. In Advances in Neural Information Processing Systems Vol. 35 (eds Koyejo, S. et al.) 7236–7249 (Curran Associates, 2022).

  215. Liu, Y. et al. CB-Dock: a web server for cavity detection-guided protein–ligand blind docking. Acta Pharmacol. Sin. 41, 138–144 (2020).

    Article  PubMed  Google Scholar 

  216. Wang, R., Fang, X., Lu, Y. & Wang, S. The PDBbind database: collection of binding affinities for protein−ligand complexes with known three-dimensional structures. J. Med. Chem. 47, 2977–2980 (2004).

    Article  CAS  PubMed  Google Scholar 

  217. Dunbar, J. et al. SAbDab: the structural antibody database. Nucleic Acids Res. 42, D1140–D1146 (2014).

    Article  CAS  PubMed  Google Scholar 

  218. Miller, B. R. III et al. MMPBSA.py: an efficient program for end-state free energy calculations. J. Chem. Theory Comput. 8, 3314–3321 (2012).

    Article  CAS  PubMed  Google Scholar 

  219. Mooij, W. T. & Verdonk, M. L. General and targeted statistical potentials for protein–ligand interactions. Proteins 61, 272–287 (2005).

    Article  CAS  PubMed  Google Scholar 

  220. Dittrich, J., Schmidt, D., Pfleger, C. & Gohlke, H. Converging a knowledge-based scoring function: DrugScore2018. J. Chem. Inf. Model 59, 509–521 (2018).

    Article  PubMed  Google Scholar 

  221. Pierce, B. & Weng, Z. ZRANK: reranking protein docking predictions with an optimized energy function. Proteins 67, 1078–1086 (2007).

    Article  CAS  PubMed  Google Scholar 

  222. Pierce, B. & Weng, Z. A combination of rescoring and refinement significantly improves protein docking performance. Proteins 72, 270–279 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  223. Alford, R. F. et al. The Rosetta all-atom energy function for macromolecular modeling and design. J. Chem. Theory Comput. 13, 3031–3048 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  224. Grosdidier, S., Pons, C., Solernou, A. & Fernández-Recio, J. Prediction and scoring of docking poses with pyDock. Proteins 69, 852–858 (2007).

    Article  CAS  PubMed  Google Scholar 

  225. Pons, C., Talavera, D., De La Cruz, X., Orozco, M. & Fernandez-Recio, J. Scoring by intermolecular pairwise propensities of exposed residues (SIPPER): a new efficient potential for protein−protein docking. J. Chem. Inf. Model 51, 370–377 (2011).

    Article  CAS  PubMed  Google Scholar 

  226. Viswanath, S., Ravikant, D. & Elber, R. Improving ranking of models for protein complexes with side chain modeling and atomic potentials. Proteins 81, 592–606 (2013).

    Article  CAS  PubMed  Google Scholar 

  227. Ravikant, D. & Elber, R. PIE—efficient filters and coarse grained potentials for unbound protein–protein docking. Proteins 78, 400–419 (2010).

    Article  CAS  PubMed  Google Scholar 

  228. Andrusier, N., Nussinov, R. & Wolfson, H. J. FireDock: fast interaction refinement in molecular docking. Proteins 69, 139–159 (2007).

    Article  CAS  PubMed  Google Scholar 

  229. Dubochet, J. et al. Cryo-electron microscopy of vitrified specimens. Q. Rev. Biophys. 21, 129–228 (1988).

    Article  CAS  PubMed  Google Scholar 

  230. Frank, J. et al. SPIDER and WEB: processing and visualization of images in 3D electron microscopy and related fields. J. Struct. Biol. 116, 190–199 (1996).

    Article  CAS  PubMed  Google Scholar 

  231. Ludtke, S. J., Baldwin, P. R. & Chiu, W. EMAN: semiautomated software for high-resolution single-particle reconstructions. J. Struct. Biol. 128, 82–97 (1999).

    Article  CAS  PubMed  Google Scholar 

  232. Scheres, S. H. RELION: implementation of a Bayesian approach to cryo-EM structure determination. J. Struct. Biol. 180, 519–530 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  233. Nogales, E. & Scheres, S. H. Cryo-EM: a unique tool for the visualization of macromolecular complexity. Mol. Cell 58, 677–689 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  234. Fernandez-Leiro, R. & Scheres, S. H. Unravelling biological macromolecules with cryo-electron microscopy. Nature 537, 339–346 (2016).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  235. Merk, A. et al. Breaking cryo-EM resolution barriers to facilitate drug discovery. Cell 165, 1698–1707 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  236. Zhong, E. D., Bepler, T., Davis, J. H. & Berger, B. Reconstructing continuous distributions of 3D protein structure from cryo-EM images. Preprint at arXiv https://doi.org/10.48550/arXiv.1909.05215 (2019).

  237. Ronneberger, O., Fischer, P. & Brox, T. U-net: convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI 2015) 18th Int. Conf. Proc. Part III 234–241 (Springer, 2015).

  238. Waibel, D. J. E. et al. SHAPR—an AI approach to predict 3D cell shapes from 2D microscopic images. iScience https://doi.org/10.1016/j.isci.2022.105298 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  239. Waibel, D. J., Atwell, S., Meier, M., Marr, C. & Rieck, B. Capturing shape information with multi-scale topological loss terms for 3D reconstruction. In Medical Image Computing and Computer Assisted Intervention (MICCAI 2022) 25th Int. Conf. Proc. Part IV 150–159 (Springer, 2022).

  240. Van Dijk, D. et al. Recovering gene interactions from single-cell data using data diffusion. Cell 174, 716–729.e727 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  241. Wagner, F., Yan, Y. & Yanai, I. K-nearest neighbor smoothing for high-throughput single-cell RNA-seq data. Preprint at bioRxiv https://doi.org/10.1101/217737 (2017).

  242. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  243. Trieu, T. & Cheng, J. 3D genome structure modeling by Lorentzian objective function. Nucleic Acids Res. 45, 1049–1058 (2017).

    Article  CAS  PubMed  Google Scholar 

  244. Trieu, T. & Cheng, J. MOGEN: a tool for reconstructing 3D models of genomes from chromosomal conformation capturing data. Bioinformatics 32, 1286–1292 (2016).

    Article  CAS  PubMed  Google Scholar 

  245. Highsmith, M. & Cheng, J. VEHiCLE: a variationally encoded Hi-C loss enhancement algorithm for improving and generating Hi-C data. Sci. Rep. 11, 1–13 (2021).

    Article  Google Scholar 

  246. Wang, Y., Guo, Z. & Cheng, J. Single-cell Hi-C data enhancement with deep residual and generative adversarial networks. Bioinformatics 39, btad458 (2023).

  247. Taskiran, I. I., Spanier, K. I., Christiaens, V., Mauduit, D. & Aerts, S. Cell type directed design of synthetic enhancers. Preprint at bioRxiv https://doi.org/10.1101/2022.07.26.501466 (2022).

  248. Tang, G. et al. EMAN2: an extensible image processing suite for electron microscopy. J. Struct. Biol. 157, 38–46 (2007).

    Article  CAS  PubMed  Google Scholar 

  249. Al-Azzawi, A. et al. DeepCryoPicker: fully automated deep neural network for single protein particle picking in cryo-EM. BMC Bioinform. 21, 1–38 (2020).

    Article  Google Scholar 

  250. Kawar, B., Elad, M., Ermon, S. & Song, J. Denoising diffusion restoration models. Adv. Neural Inf. Process. Syst. 35, 23593–23606 (2022).

    Google Scholar 

  251. Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. Preprint at bioRxiv https://doi.org/10.1101/2021.10.04.463034 (2021).

  252. Huang, C.-W., Lim, J. H. & Courville, A. C. A variational perspective on diffusion-based generative models and score matching. Adv. Neural Inf. Process. Syst. 34, 22863–22876 (2021).

    Google Scholar 

  253. Kim, D., Shin, S., Song, K., Kang, W. & Moon, I.-C. Soft truncation: a universal training technique of score-based diffusion model for High Precision Score Estimation. Preprint at arXiv https://doi.org/10.48550/arXiv.2106.05527 (2021).

  254. Gu, S. et al. Vector quantized diffusion model for text-to-image synthesis. In Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition 10696–10706 (IEEE, 2022).

  255. Tang, Z., Gu, S., Bao, J., Chen, D. & Wen, F. Improved vector quantized diffusion models. Preprint at arXiv https://doi.org/10.48550/arXiv.2205.16007 (2022).

  256. Poole, B., Jain, A., Barron, J. T. & Mildenhall, B. DreamFusion: text-to-3D using 2D diffusion. Preprint at arXiv https://doi.org/10.48550/arXiv.2209.14988 (2022).

  257. Hong, S., Lee, G., Jang, W. & Kim, S. Improving sample quality of diffusion models using self-attention guidance. Preprint at arXiv https://doi.org/10.48550/arXiv.2210.00939 (2022).

  258. Li, W. Automatic segmentation of liver tumor in CT images with deep convolutional neural networks. J. Comput. Commun. 3, 146 (2015).

    Article  Google Scholar 

  259. Menze, B. H. et al. The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging 34, 1993–2024 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  260. Cheng, J. et al. Superpixel classification based optic disc and optic cup segmentation for glaucoma screening. IEEE Trans. Med. Imaging 32, 1019–1032 (2013).

    Article  PubMed  Google Scholar 

  261. Wang, S. et al. Central focused convolutional neural networks: developing a data-driven model for lung nodule segmentation. Med. Image Anal. 40, 172–183 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  262. Srinivasu, P. N. et al. Classification of skin disease using deep learning neural networks with MobileNet V2 and LSTM. Sensors 21, 2852 (2021).

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  263. Swapna, G., Vinayakumar, R. & Soman, K. Diabetes detection using deep learning algorithms. ICT Express 4, 243–246 (2018).

    Article  Google Scholar 

  264. Das, A., Acharya, U. R., Panda, S. S. & Sabut, S. Deep learning based liver cancer detection using watershed transform and Gaussian mixture model techniques. Cogn. Syst. Res. 54, 165–175 (2019).

    Article  Google Scholar 

  265. Jo, T., Nho, K. & Saykin, A. J. Deep learning in Alzheimer’s disease: diagnostic classification and prognostic prediction using neuroimaging data. Front. Aging Neurosci. 11, 220 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  266. Arévalo, A., Niño, J., Hernández, G. & Sandoval, J. High-frequency trading strategy based on deep neural networks. In Int. Conf. on Intelligent Computing 424–436 (Springer, 2016).

  267. Bao, W., Yue, J. & Rao, Y. A deep learning framework for financial time series using stacked autoencoders and long-short term memory. PLoS One 12, e0180944 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  268. Xiao, Q., Li, K., Zhang, D. & Xu, W. Security risks in deep learning implementations. In 2018 IEEE Security and Privacy Workshops (SPW) 123–128 (IEEE, 2018).

  269. Halstead, M., Ahmadi, A., Smitt, C., Schmittmann, O. & McCool, C. Crop agnostic monitoring driven by deep learning. Front. Plant. Sci. 12, 786702 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  270. Feng, A., Zhou, J., Vories, E. & Sudduth, K. A. Evaluation of cotton emergence using UAV-based imagery and deep learning. Comput. Electron. Agric. 177, 105711 (2020).

    Article  Google Scholar 

  271. Liu, J. & Wang, X. Plant diseases and pests detection based on deep learning: a review. Plant. Methods 17, 22 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  272. Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020).

    Google Scholar 

  273. Nichol, A. et al. Glide: towards photorealistic image generation and editing with text-guided diffusion models. Preprint at arXiv https://doi.org/10.48550/arXiv.2112.10741 (2021).

Download references

Acknowledgements

The work was partly supported by the US National Institutes of Health (grants R01GM146340 (to J.C.), R01GM093123 (to J.C.) and R35GM126985 (to D.X.)) and the US National Science Foundation (grant DBI2308699 (to J.C.)).

Author information

Authors and Affiliations

Authors

Contributions

Z.G., J.L., Y.W. and M.C. collected data. J.C., D.X. and D.W. provided guidance on organizing the content. J.C. envisioned the developments for proteins, 3D genomics, single-cell Hi-C data analytics, cryo-EM, DNA design, peptide, proteomics and metabolomics. D.X. envisioned the developments in single-cell reconstruction and inference. Z.G., J.C., J.L., Y.W., D.X., D.W. and M.C. wrote and edited the manuscript.

Corresponding author

Correspondence to Jianlin Cheng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Reviews Bioengineering thanks Bonnie Berger, Daniel Lazarev and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, Z., Liu, J., Wang, Y. et al. Diffusion models in bioinformatics and computational biology. Nat Rev Bioeng 2, 136–154 (2024). https://doi.org/10.1038/s44222-023-00114-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s44222-023-00114-9

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing