Diffusion models in bioinformatics and computational biology

Guo, Zhiye; Liu, Jian; Wang, Yanli; Chen, Mengrui; Wang, Duolin; Xu, Dong; Cheng, Jianlin

doi:10.1038/s44222-023-00114-9

Review Article
Published: 27 October 2023

Diffusion models in bioinformatics and computational biology

Zhiye Guo ORCID: orcid.org/0000-0001-5598-4834^1,2^na1,
Jian Liu^1,2^na1,
Yanli Wang^1,2^na1,
Mengrui Chen^1,2^na1,
Duolin Wang^1,2,
Dong Xu^1,2 &
…
Jianlin Cheng ORCID: orcid.org/0000-0003-0305-2853^1,2

Nature Reviews Bioengineering volume 2, pages 136–154 (2024)Cite this article

2295 Accesses
4 Altmetric
Metrics details

Subjects

Abstract

Denoising diffusion models embody a type of generative artificial intelligence that can be applied in computer vision, natural language processing and bioinformatics. In this Review, we introduce the key concepts and theoretical foundations of three diffusion modelling frameworks (denoising diffusion probabilistic models, noise-conditioned scoring networks and score stochastic differential equations). We then explore their applications in bioinformatics and computational biology, including protein design and generation, drug and small-molecule design, protein–ligand interaction modelling, cryo-electron microscopy image data analysis and single-cell data analysis. Finally, we highlight open-source diffusion model tools and consider the future applications of diffusion models in bioinformatics.

Key points

Diffusion models are a generative artificial intelligence technology that can be applied in natural language processing, image synthesis and bioinformatics.
Diffusion models have contributed greatly to computational protein design and generation, drug and small-molecule design, protein–ligand interaction modelling, cryo-electron microscopy data enhancement and single-cell data analysis.
Many diffusion models are also available as open-source tools.
Although diffusion models may potentially outperform other generative approaches, such as generative adversarial networks and variational auto-encoders, their computational resource requirements remain high.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Timeline of advances in diffusion models and their applications in bioinformatics.**

**Fig. 2: Forward and reverse processes of diffusion models.**

A blind benchmark of analysis tools to infer kinetic rate constants from single-molecule FRET trajectories

Article Open access 14 September 2022

Martini 3: a general purpose force field for coarse-grained molecular dynamics

Article 29 March 2021

Protein structure generation via folding diffusion

Article Open access 05 February 2024

References

LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015). This article provides a comprehensive overview of the advances, challenges and potential of deep learning methods.
Article CAS PubMed ADS Google Scholar
Eickholt, J. & Cheng, J. Predicting protein residue–residue contacts using deep networks and boosting. Bioinformatics 28, 3066–3072 (2012).
Article CAS PubMed PubMed Central Google Scholar
Baek, M. & Baker, D. Deep learning and protein structure modeling. Nat. Methods 19, 13–14 (2022).
Article CAS PubMed Google Scholar
Kulmanov, M. & Hoehndorf, R. DeepGOPlus: improved protein function prediction from sequence. Bioinformatics 36, 422–429 (2020).
Article CAS PubMed Google Scholar
Aggarwal, D. & Hasija, Y. A review of deep learning techniques for protein function prediction. Preprint at arXiv https://doi.org/10.48550/arXiv.2211.09705 (2022).
Gligorijević, V. et al. Structure-based protein function prediction using graph convolutional networks. Nat. Commun. 12, 3168 (2021).
Article PubMed PubMed Central ADS Google Scholar
Bileschi, M. L. et al. Using deep learning to annotate the protein universe. Nat. Biotechnol. 40, 932–937 (2022).
Article CAS PubMed Google Scholar
Cai, Y., Wang, J. & Deng, L. SDN2GO: an integrated deep learning model for protein function prediction. Front. Bioeng. Biotechnol. 8, 391 (2020).
Article PubMed PubMed Central Google Scholar
Ko, C. W., Huh, J. & Park, J.-W. Deep learning program to predict protein functions based on sequence information. MethodsX 9, 101622 (2022).
Article CAS PubMed PubMed Central Google Scholar
Dhakal, A., McKay, C., Tanner, J. J. & Cheng, J. Artificial intelligence in the prediction of protein–ligand interactions: recent advances and future directions. Brief. Bioinform. 23, bbab476 (2022).
Article PubMed Google Scholar
Verma, N. et al. Ssnet: a deep learning approach for protein–ligand interaction prediction. Int. J. Mol. Sci. 22, 1392 (2021).
Article CAS PubMed PubMed Central Google Scholar
Gomes, J., Ramsundar, B., Feinberg, E. N. & Pande, V. S. Atomic convolutional networks for predicting protein–ligand binding affinity. Preprint at arXiv https://doi.org/10.48550/arXiv.1703.10603 (2017).
Jiménez, J., Skalic, M., Martinez-Rosell, G. & De Fabritiis, G. Kdeep: protein–ligand absolute binding affinity prediction via 3D-convolutional neural networks. J. Chem. Inf. Model 58, 287–296 (2018).
Article PubMed Google Scholar
Öztürk, H., Özgür, A. & Ozkirimli, E. DeepDTA: deep drug–target binding affinity prediction. Bioinformatics 34, i821–i829 (2018).
Article PubMed PubMed Central Google Scholar
Zhou, J. et al. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat. Genet. 50, 1171–1179 (2018).
Article CAS PubMed PubMed Central Google Scholar
Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021).
Article CAS PubMed PubMed Central Google Scholar
Zrimec, J. et al. Deep learning suggests that gene expression is encoded in all parts of a co-evolving interacting gene regulatory structure. Nat. Commun. 11, 6141 (2020).
Article CAS PubMed PubMed Central ADS Google Scholar
Yuan, Y. & Bar-Joseph, Z. Deep learning for inferring gene relationships from single-cell expression data. Proc. Natl Acad. Sci. 116, 27151–27158 (2019).
Article CAS PubMed PubMed Central ADS Google Scholar
Khan, A. & Lee, B. Gene transformer: transformers for the gene expression-based classification of lung cancer subtypes. Preprint at arXiv https://doi.org/10.48550/arXiv.2108.11833 (2021).
Singh, R., Lanchantin, J., Robins, G. & Qi, Y. DeepChrome: deep-learning for predicting gene expression from histone modifications. Bioinformatics 32, i639–i648 (2016).
Article CAS PubMed Google Scholar
Shu, H. et al. Modeling gene regulatory networks using neural network architectures. Nat. Comput. Sci. 1, 491–501 (2021).
Article PubMed Google Scholar
Razaghi-Moghadam, Z. & Nikoloski, Z. Supervised learning of gene-regulatory networks based on graph distance profiles of transcriptomics data. npj Syst. Biol. Appl. 6, 21 (2020).
Article PubMed PubMed Central Google Scholar
Chen, C. et al. DeepGRN: prediction of transcription factor binding site across cell-types using attention-based deep neural networks. BMC Bioinform. 22, 38 (2021).
CAS Google Scholar
Xu, R., Zhang, L. & Chen, Y. CdtGRN: Construction of qualitative time-delayed gene regulatory networks with a deep learning method. Preprint at arXiv https://doi.org/10.48550/arXiv.2111.00287 (2021).
Kwon, M. S., Lee, B. T., Lee, S. Y. & Kim, H. U. Modeling regulatory networks using machine learning for systems metabolic engineering. Curr. Opin. Biotechnol. 65, 163–170 (2020).
Article CAS PubMed Google Scholar
LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).
Article Google Scholar
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).
Article CAS PubMed Google Scholar
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).
Goodfellow, I. et al. Generative adversarial networks. Commun. ACM 63, 139–144 (2020).
Article Google Scholar
Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M. & Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 20, 61–80 (2008).
Article PubMed Google Scholar
Vaswani, A. et al. Attention is All you Need. In Advances in Neural Information Processing Systems Vol. 30 (eds Guyon, I. et al.) (Curran Associates, 2017).
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N. & Ganguli, S. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. In Proc. 32nd Int. Conference on Machine Learning Vol. 37 (eds Bach, F. & Blei, D.) 2256–2265 (PMLR, 2015).
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020). This article introduces the denoising diffusion probabilistic model, which was the first diffusion model capable of generating high-resolution data.
Google Scholar
Song, Y. & Ermon, S. Generative Modeling by Estimating Gradients of the Data Distribution. In Advances in Neural Information Processing Systems Vol. 32 (eds Wallach, H. et al.) (Curran Associates, 2019). This article introduces the noise-conditioned score network, which is one of the three main diffusion model frameworks.
Song, Y. et al. Score-based generative modeling through stochastic differential equations. Preprint at arXiv https://doi.org/10.48550/arXiv.2011.13456 (2020). This article introduces score stochastic differential equations for unconditional image generation.
Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. High-resolution image synthesis with latent diffusion models. In Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition 10684–10695 (2022). This article reports stable diffusion for image inpainting, class-conditional image synthesis and other tasks, including text-to-image synthesis and unconditional image generation.
Saharia, C. et al. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. In Advances in Neural Information Processing Systems Vol. 35 (eds Koyejo, S. et al.) 36479–36494 (Curran Associates, 2022).
Wang, Z., Zheng, H., He, P., Chen, W. & Zhou, M. Diffusion-GAN: Training GANs with diffusion. Preprint at arXiv https://doi.org/10.48550/arXiv.2206.02262 (2022).
Zheng, H., He, P., Chen, W. & Zhou, M. Truncated diffusion probabilistic models and diffusion-based adversarial auto-encoders. Preprint at arXiv https://doi.org/10.48550/arXiv.2202.09671 (2022).
Xie, P. et al. Vector quantized diffusion model with CodeUnet for text-to-sign pose sequences generation. Preprint at arXiv https://doi.org/10.48550/arXiv.2208.09141 (2022).
Kim, D., Kim, Y., Kang, W. & Moon, I.-C. Refining generative process with discriminator guidance in score-based diffusion models. Preprint at arXiv https://doi.org/10.48550/arXiv.2211.17091 (2022).
Zheng, G. et al. Entropy-driven sampling and training scheme for conditional diffusion generation. In Eur. Conf. on Computer Vision 754–769 (Springer, 2022).
Saharia, C. et al. Palette: image-to-image diffusion models. In ACM SIGGRAPH ‘22 Conf. Proc. https://doi.org/10.1145/3528233.3530757 (ACM, 2022).
Wang, Y., Yu, J. & Zhang, J. Zero-shot image restoration using denoising diffusion null-space model. Preprint at arXiv https://doi.org/10.48550/arXiv.2212.00490 (2022).
Lam, M. W., Wang, J., Su, D. & Yu, D. BDDM: bilateral denoising diffusion models for fast and high-quality speech synthesis. Preprint at arXiv https://doi.org/10.48550/arXiv.2203.13508 (2022).
van den Oord, A. et al. Conditional Image Generation with PixelCNN Decoders. In Advances in Neural Information Processing Systems Vol. 29 (eds Lee, D. et al.) (Curran Associates, 2016).
Papamakarios, G., Nalisnick, E. T., Rezende, D. J., Mohamed, S. & Lakshminarayanan, B. Normalizing flows for probabilistic modeling and inference. J. Mach. Learn. Res. 22, 1–64 (2021).
MathSciNet Google Scholar
LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M. & Huang, F. A tutorial on energy-based learning. In Predicting Structured Data (eds Bakir, G., Hofman, T., Schölkopf, B., Smola, A. & Taskar, B.) Vol. 1 (MIT Press, 2006).
Kingma, D. P. & Welling, M. Auto-encoding variational bayes. Preprint at arXiv https://doi.org/10.48550/arXiv.1312.6114 (2013).
Dhariwal, P. & Nichol, A. Diffusion models beat GANs on image synthesis. Adv. Neural Inf. Process. Syst. 34, 8780–8794 (2021).
Google Scholar
Li, H. et al. SRDiff: single image super-resolution with diffusion probabilistic models. Neurocomputing 479, 47–59 (2022).
Article Google Scholar
Giannone, G., Nielsen, D. & Winther, O. Few-shot diffusion models. Preprint at arXiv https://doi.org/10.48550/arXiv.2205.15463 (2022).
Lyu, Z., Kong, Z., Xu, X., Pan, L. & Lin, D. A conditional point diffusion-refinement paradigm for 3d point cloud completion. Preprint at arXiv https://doi.org/10.48550/arXiv.2112.03530 (2021).
Hoogeboom, E., Satorras, V. c. G., Vignac, C. & Welling, M. Equivariant Diffusion for Molecule Generation in 3D. In Proc. 39th Int. Conference on Machine Learning Vol. 162 (eds Chaudhuri, K. et al.) 8867–8887 (PMLR, 2022). This article reports a foundational diffusion model that directly generates molecules in 3D space based on an equivariant graph neural network architecture.
Li, X., Thickstun, J., Gulrajani, I., Liang, P. S. & Hashimoto, T. B. Diffusion-LM Improves Controllable Text Generation. In Advances in Neural Information Processing Systems Vol. 35 (eds Koyejo, S. et al.) 4328–4343 (Curran Associates, 2022).
Amit, T., Nachmani, E., Shaharbany, T. & Wolf, L. SegDiff: image segmentation with diffusion probabilistic models. Preprint at arXiv https://doi.org/10.48550/arXiv.2112.00390 (2021).
Baranchuk, D., Rubachev, I., Voynov, A., Khrulkov, V. & Babenko, A. Label-efficient semantic segmentation with diffusion models. Preprint at arXiv https://doi.org/10.48550/arXiv.2112.03126 (2021).
Brempong, E. A. et al. Denoising pretraining for semantic segmentation. In Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition 4175–4186 (IEEE, 2022).
Cai, R. et al. Learning gradient fields for shape generation. In Eur. Conf. on Computer Vision 364–381 (Springer, 2020).
Ho, J. et al. Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23, 1–33 (2022).
MathSciNet CAS Google Scholar
Ho, J. et al. Video diffusion models. Preprint at arXiv https://doi.org/10.48550/arXiv.2204.03458 (2022).
Kawar, B., Vaksman, G. & Elad, M. Stochastic image denoising by sampling from the posterior distribution. In Proc. IEEE/CVF Int. Conf. on Computer Vision 1866–1875 (2021).
Kim, B., Han, I. & Ye, J. C. DiffuseMorph: unsupervised deformable image registration along continuous trajectory using diffusion models. Preprint at arXiv https://doi.org/10.48550/arXiv.2112.05149 (2021).
Luo, S. & Hu, W. Score-based point cloud denoising. In Proc. IEEE/CVF Int. Conf. on Computer Vision 4583–4592 (IEEE, 2021).
Meng, C. et al. Sdedit: Guided image synthesis and editing with stochastic differential equations. Preprint at arXiv https://doi.org/10.48550/arXiv.2108.01073 (2021).
Özbey, M. et al. Unsupervised medical image translation with adversarial diffusion models. Preprint at arXiv https://doi.org/10.48550/arXiv.2207.08208 (2023).
Saharia, C. et al. Image super-resolution via iterative refinement. In IEEE Trans. on Pattern Analysis and Machine Intelligence 4713–4726 (IEEE, 2022).
Whang, J. et al. Deblurring via stochastic refinement. In Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition 16293–16303 (IEEE, 2022).
Yang, R. & Mandt, S. Lossy image compression with conditional diffusion models. Preprint at arXiv https://doi.org/10.48550/arXiv.2209.06950 (2022).
Zhao, M., Bao, F., Chongxuan, L. I. & Zhu, J. EGSDE: Unpaired Image-to-Image Translation via Energy-Guided Stochastic Differential Equations. In Advances in Neural Information Processing Systems Vol. 35 (eds Koyejo, S. et al.) 3609–3623 (Curran Associates, 2022).
Zimmermann, R. S., Schott, L., Song, Y., Dunn, B. A. & Klindt, D. A. Score-based generative classifiers. Preprint at arXiv https://doi.org/10.48550/arXiv.2110.00473 (2021).
Austin, J., Johnson, D. D., Ho, J., Tarlow, D. & van den Berg, R. Structured denoising diffusion models in discrete state-spaces. Adv. Neural Inf. Process. Syst. 34, 17981–17993 (2021).
Google Scholar
Hoogeboom, E., Nielsen, D., Jaini, P., Forré, P. & Welling, M. Argmax flows and multinomial diffusion: learning categorical distributions. Adv. Neural Inf. Process. Syst. 34, 12454–12465 (2021).
Google Scholar
Savinov, N., Chung, J., Binkowski, M., Elsen, E. & Oord, A. V. D. Step-unrolled denoising autoencoders for text generation. Preprint at arXiv https://doi.org/10.48550/arXiv.2112.06749 (2021).
Yu, P. et al. Latent diffusion energy-based model for interpretable text modeling. Preprint at arXiv https://doi.org/10.48550/arXiv.2206.05895 (2022).
Alcaraz, J. M. L. & Strodthoff, N. Diffusion-based time series imputation and forecasting with structured state space models. Preprint at arXiv https://doi.org/10.48550/arXiv.2208.09399 (2022).
Chen, N. et al. WaveGrad: estimating gradients for waveform generation. Preprint at arXiv https://doi.org/10.48550/arXiv.2009.00713 (2020).
Kong, Z., Ping, W., Huang, J., Zhao, K. & Catanzaro, B. DiffWave: a versatile diffusion model for audio synthesis. Preprint at arXiv https://doi.org/10.48550/arXiv.2009.09761 (2020).
Rasul, K., Sheikh, A.-S., Schuster, I., Bergmann, U. & Vollgraf, R. Multivariate probabilistic time series forecasting via conditioned normalizing flows. Preprint at arXiv https://doi.org/10.48550/arXiv.2002.06103 (2020).
Tashiro, Y., Song, J., Song, Y. & Ermon, S. CSDI: conditional score-based diffusion models for probabilistic time series imputation. Adv. Neural Inf. Process. Syst. 34, 24804–24816 (2021).
Google Scholar
Yan, T., Zhang, H., Zhou, T., Zhan, Y. & Xia, Y. ScoreGrad: multivariate probabilistic time series forecasting with continuous energy-based generative models. Preprint at arXiv https://doi.org/10.48550/arXiv.2106.10121 (2021).
Avrahami, O., Lischinski, D. & Fried, O. Blended diffusion for text-driven editing of natural images. In Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition 18208–18218 (IEEE, 2022).
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C. & Chen, M. Hierarchical text-conditional image generation with clip latents. Preprint at arXiv https://doi.org/10.48550/arXiv.2204.06125 (2022).
Anand, N. & Achim, T. Protein structure and sequence generation with equivariant denoising diffusion probabilistic models. Preprint at arXiv https://doi.org/10.48550/arXiv.2205.15019 (2022).
Cao, C., Cui, Z.-X., Liu, S., Liang, D. & Zhu, Y. High-frequency space diffusion models for accelerated MRI. Preprint at arXiv https://doi.org/10.48550/arXiv.2208.05481 (2022).
Chung, H., Lee, E. S. & Ye, J. C. MR image denoising and super-resolution using regularized reverse diffusion. IEEE Trans. Med. Imaging 42, 922–934 (2022).
Article Google Scholar
Chung, H., Sim, B. & Ye, J. C. Come-closer-diffuse-faster: accelerating conditional diffusion models for inverse problems through stochastic contraction. In Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition 12413–12422 (IEEE, 2022).
Chung, H. & Ye, J. C. Score-based diffusion models for accelerated MRI. Med. Image Anal. 80, 102479 (2022).
Article PubMed Google Scholar
Güngör, A. et al. Adaptive diffusion priors for accelerated MRI reconstruction. Med. Image Anal. 88, 102872 (2023).
Article PubMed Google Scholar
Jing, B., Corso, G., Chang, J., Barzilay, R. & Jaakkola, T. Torsional Diffusion for Molecular Conformer Generation. In Advances in Neural Information Processing Systems Vol. 35 (eds Koyejo, S. et al.) 24240–24253 (Curran Associates, 2022).
Lee, J. S. & Kim, P. M. ProteinSGM: score-based generative modeling for de novo protein design. Preprint at bioRxiv https://doi.org/10.1101/2022.07.13.499967 (2022).
Luo, S. et al. Antigen-Specific Antibody Design and Optimization with Diffusion-Based Generative Models for Protein Structures. In Advances in Neural Information Processing Systems Vol. 35 (eds Koyejo, S. et al.) 9754–9767 (Curran Associates, 2022).
Mei, S., Fan, F. & Maier, A. Metal inpainting in CBCT projections using score-based generative model. Preprint at arXiv https://doi.org/10.48550/arXiv.2209.09733 (2022).
Du, Y. & Mordatch, I. Implicit Generation and Modeling with Energy Based Models. In Advances in Neural Information Processing Systems Vol. 32 (eds Wallach, H. et al.) (Curran Associates, 2019).
Brock, A., Donahue, J. & Simonyan, K. Large scale GAN training for high fidelity natural image synthesis. Preprint at arXiv https://doi.org/10.48550/arXiv.1809.11096 (2018).
Karras, T. et al. Training generative adversarial networks with limited data. Adv. Neural Inf. Process. Syst. 33, 12104–12114 (2020).
Google Scholar
Song, J., Meng, C. & Ermon, S. Denoising diffusion implicit models. Preprint at arXiv https://doi.org/10.48550/arXiv.2010.02502 (2020).
Kreis, K., Dockhorn, T., Li, Z. & Zhong, E. Latent space diffusion models of cryo-EM structures. Preprint at arXiv https://doi.org/10.48550/arXiv.2211.14169 (2022).
Waibel, D. J., Röell, E., Rieck, B., Giryes, R. & Marr, C. A diffusion model predicts 3D shapes from 2D microscopy images. Preprint at arXiv https://doi.org/10.48550/arXiv.2208.14125 (2022).
Tjärnberg, A. et al. Optimal tuning of weighted kNN- and diffusion-based methods for denoising single cell genomics data. PLoS Comput. Biol. 17, e1008569 (2021).
Article PubMed PubMed Central Google Scholar
Wu, K. E. et al. Protein structure generation via folding diffusion. Preprint at arXiv https://doi.org/10.48550/arXiv.2209.15611 (2022).
Gao, Z., Tan, C. & Li, S. Z. DiffSDS: a language diffusion model for protein backbone inpainting under geometric conditions and constraints. Preprint at arXiv https://doi.org/10.48550/arXiv.2301.09642 (2023).
Lin, Y. & AlQuraishi, M. Generating novel, designable, and diverse protein structures by equivariantly diffusing oriented residue clouds. Preprint at arXiv https://doi.org/10.48550/arXiv.2301.12485 (2023).
Trippe, B. L. et al. Diffusion probabilistic modeling of protein backbones in 3D for the motif-scaffolding problem. Preprint at arXiv https://doi.org/10.48550/arXiv.2206.04119 (2022).
Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100 (2023). This article presents RFdiffusion, which can be applied to complex protein-generation tasks.
Article CAS PubMed PubMed Central ADS Google Scholar
Yim, J. et al. SE(3) diffusion model with application to protein backbone generation. Preprint at arXiv https://doi.org/10.48550/arXiv.2302.02277 (2023).
Ingraham, J. et al. Illuminating protein space with a programmable generative model. Preprint at bioRxiv https://doi.org/10.1101/2022.12.01.518682 (2022). This article reports the graph-neural-network-based conditional diffusion model Chroma, which can generate large single-chain proteins and protein complexes with programmable properties and functions.
Huang, H., Sun, L., Du, B. & Lv, W. Conditional diffusion based on discrete graph structures for molecular graph generation. Preprint at arXiv https://doi.org/10.48550/arXiv.2301.00427 (2023).
Wu, L., Gong, C., Liu, X., Ye, M. & Liu, Q. Diffusion-based Molecule Generation with Informative Prior Bridges. In Advances in Neural Information Processing Systems Vol. 35 (eds Koyejo, S. et al.) 36533–36545 (Curran Associates, 2022).
Luo, S., Shi, C., Xu, M. & Tang, J. Predicting molecular conformation via dynamic graph score matching. Adv. Neural Inf. Process. Syst. 34, 19784–19795 (2021).
Google Scholar
Zhang, H. et al. SDEGen: learning to evolve molecular conformations from thermodynamic noise for conformation generation. Chem. Sci. 14, 1557–1568 (2023).
Article CAS PubMed PubMed Central Google Scholar
Wu, F. & Li, S. Z. DIFFMD: a geometric diffusion model for molecular dynamics simulations. In Proc. AAAI Conference Artificial Intelligence 37, 5321–5329 (2003).
Article Google Scholar
Igashov, I. et al. Equivariant 3D-conditional diffusion models for molecular linker design. Preprint at arXiv https://doi.org/10.48550/arXiv.2210.05274 (2022).
Lin, H. et al. DiffBP: generative diffusion of 3D molecules for target protein binding. Preprint at arXiv https://doi.org/10.48550/arXiv.2211.11214 (2022).
Schneuing, A. et al. Structure-based drug design with equivariant diffusion models. Preprint at arXiv https://doi.org/10.48550/arXiv.2210.13695 (2022).
Corso, G., Stärk, H., Jing, B., Barzilay, R. & Jaakkola, T. DiffDock: diffusion steps, twists, and turns for molecular docking. Preprint at arXiv https://doi.org/10.48550/arXiv.2210.01776 (2022). This article presents the diffusion model DiffDock for protein pocket docking.
Qiao, Z., Nie, W., Vahdat, A., Miller III, T. F. & Anandkumar, A. Dynamic-backbone protein–ligand structure prediction with multiscale generative diffusion models. Preprint at arXiv https://doi.org/10.48550/arXiv.2209.15171 (2022).
Jin, W., Sarkizova, S., Chen, X., Hacohen, N. & Uhler, C. Unsupervised protein–ligand binding energy prediction via neural Euler’s rotation equation. Preprint at arXiv https://doi.org/10.48550/arXiv.2301.10814 (2023).
Song, Y. & Ermon, S. Improved techniques for training score-based generative models. Adv. Neural Inf. Process. Syst. 33, 12438–12448 (2020).
Google Scholar
Song, Y., Durkan, C., Murray, I. & Ermon, S. Maximum likelihood training of score-based diffusion models. Adv. Neural Inf. Process. Syst. 34, 1415–1428 (2021).
Google Scholar
Hyvärinen, A. & Dayan, P. Estimation of non-normalized statistical models by score matching. J. Mach. Learn. Res. 6, 95–709 (2005).
Raphan, M. & Simoncelli, E. P. Least squares estimation without priors or supervision. Neural Comput. 23, 374–420 (2011).
Article MathSciNet PubMed Google Scholar
Raphan, M. & Simoncelli, E. Learning to be Bayesian without Supervision. In Advances in Neural Information Processing Systems Vol. 19 (eds Scholkopf, B., Platt, J. & Hoffman, T) (MIT Press, 2006).
Vincent, P. A connection between score matching and denoising autoencoders. Neural Comput. 23, 1661–1674 (2011).
Article MathSciNet PubMed ADS Google Scholar
Song, Y., Garg, S., Shi, J. & Ermon, S. Sliced Score Matching: A Scalable Approach to Density and Score Estimation. In Proc. 35th Uncertainty in Artificial Intelligence Conference Vol. 115 (eds Adams R., & Gogate, V.) 574–584 (PMLR, 2020).
Kingma, D., Salimans, T., Poole, B. & Ho, J. Variational diffusion models. Adv. Neural Inf. Process. Syst. 34, 21696–21707 (2021).
Google Scholar
Luo, C. Understanding diffusion models: a unified perspective. Preprint at arXiv https://doi.org/10.48550/arXiv.2208.11970 (2022).
Arnold, L. Stochastic Differential Equations (Wiley, 1974).
Anderson, B. D. Reverse-time diffusion equation models. Stoch. Process. Appl. 12, 313–326 (1982).
Article MathSciNet Google Scholar
Nichol, A. Q. & Dhariwal, P. Improved Denoising Diffusion Probabilistic Models. In Proc. 38th Int. Conference on Machine Learning Vol. 139 (eds Meila, M. & Zhang, T.) 8162–8171 (PMLR, 2021).
Bansal, A. et al. Cold diffusion: inverting arbitrary image transforms without noise. Preprint at arXiv https://doi.org/10.48550/arXiv.2208.09392 (2022).
Kong, Z. & Ping, W. On fast sampling of diffusion probabilistic models. Preprint at arXiv https://doi.org/10.48550/arXiv.2106.00132 (2021).
Salimans, T. & Ho, J. Progressive distillation for fast sampling of diffusion models. Preprint at arXiv https://doi.org/10.48550/arXiv.2202.00512 (2022).
Jolicoeur-Martineau, A., Li, K., Piché-Taillefer, R., Kachman, T. & Mitliagkas, I. Gotta go fast when generating data with score-based models. Preprint at arXiv https://doi.org/10.48550/arXiv.2105.14080 (2021).
Karras, T., Aittala, M., Aila, T. & Laine, S. Elucidating the Design Space of Diffusion-Based Generative Models. In Advances in Neural Information Processing Systems Vol. 35 (eds Koyejo, S. et al.) 26565–26577 (Curran Associates, 2022).
Lu, C. et al. DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps. In Advances in Neural Information Processing Systems Vol. 35 (eds Koyejo, S. et al.) 5775–5787 (Curran Associates, 2022).
Liu, L., Ren, Y., Lin, Z. & Zhao, Z. Pseudo numerical methods for diffusion models on manifolds. Preprint at arXiv https://doi.org/10.48550/arXiv.2202.09778 (2022).
Bao, F., Li, C., Zhu, J. & Zhang, B. Analytic-DPM: an analytic estimate of the optimal reverse variance in diffusion probabilistic models. Preprint at arXiv https://doi.org/10.48550/arXiv.2201.06503 (2022).
Lu, C. et al. DPM-solver++: fast solver for guided sampling of diffusion probabilistic models. Preprint at arXiv https://doi.org/10.48550/arXiv.2211.01095 (2022).
Vahdat, A., Kreis, K. & Kautz, J. Score-based generative modeling in latent space. Adv. Neural Inf. Process. Syst. 34, 11287–11302 (2021).
Google Scholar
Zhang, Q. & Chen, Y. Diffusion normalizing flow. Adv. Neural Inf. Process. Syst. 34, 16280–16291 (2021).
Google Scholar
Pandey, K., Mukherjee, A., Rai, P. & Kumar, A. DiffuseVAE: efficient, controllable and high-fidelity generation from low-dimensional latents. Preprint at arXiv https://doi.org/10.48550/arXiv.2201.00308 (2022).
Luo, S. & Hu, W. Diffusion probabilistic models for 3D point cloud generation. In Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition 2837–2845 (IEEE, 2021).
Shi, C., Luo, S., Xu, M. & Tang, J. Learning Gradient Fields for Molecular Conformation Generation. In Proc. 38th Int. Conference on Machine Learning Vol. 139 (ed Meila, M. & Zhang, T.) 9558–9568 (PMLR, 2021).
Zhou, L., Du, Y. & Wu, J. 3D shape generation and completion through point-voxel diffusion. In Proc. IEEE/CVF Int. Conf. on Computer Vision 5826–5835 (IEEE, 2021).
Hoogeboom, E. et al. Autoregressive diffusion models. Preprint at arXiv https://doi.org/10.48550/arXiv.2110.02037 (2021).
Xu, M. et al. GeoDiff: a geometric diffusion model for molecular conformation generation. Preprint at arXiv https://doi.org/10.48550/arXiv.2203.02923 (2022).
Jo, J., Lee, S. & Hwang, S. J. Score-based Generative Modeling of Graphs via the System of Stochastic Differential Equations. In Proc. 39th Int. Conference on Machine Learning Vol. 162 (eds Chaudhuri, K. et al.) 10362–10383 (PMLR, 2022).
De Bortoli, V. et al. Riemannian score-based generative modelling. Adv. Neural Inf. Process. 35, 2406–2422 (2022).
Google Scholar
Chen, T., Zhang, R. & Hinton, G. Analog bits: generating discrete data using diffusion models with self-conditioning. Preprint at arXiv https://doi.org/10.48550/arXiv.2208.04202 (2022).
Niu, C. et al. Permutation Invariant Graph Generation via Score-Based Generative Modeling. In Proc. 23rd Int. Conference on Artificial Intelligence and Statistics Vol. 108 (eds Chiappa, S. & Calandra, R.) 4474–4484 (PMLR, 2020).
Yang, L. et al. Diffusion models: a comprehensive survey of methods and applications. Preprint at arXiv https://doi.org/10.48550/arXiv.2209.00796 (2022).
Anand, N. & Huang, P. Generative modeling for protein structures. In Advances in Neural Information Processing Systems Vol. 31 (eds Bengio, S. et al.) (Curran Associates, 2018).
Lin, Z., Sercu, T., LeCun, Y. & Rives, A. Deep generative models create new and diverse protein structures. In Machine Learning for Structural Biology Workshop, NeurIPS (2021).
Eguchi, R. R., Choe, C. A. & Huang, P.-S. Ig-VAE: generative modeling of protein structure by direct 3D coordinate generation. PLoS Comput. Biol. 18, e1010271 (2022).
Article CAS PubMed PubMed Central ADS Google Scholar
Anishchenko, I. et al. De novo protein design by deep network hallucination. Nature 600, 547–552 (2021).
Article CAS PubMed PubMed Central ADS Google Scholar
Greener, J. G., Moffat, L. & Jones, D. T. Design of metalloproteins and novel protein folds using variational autoencoders. Sci. Rep. 8, 16189 (2018).
Article PubMed PubMed Central ADS Google Scholar
Anand, N., Eguchi, R. & Huang, P.-S. Fully differentiable full-atom protein backbone generation. In Proc. Deep Generative Models for Highly Structured Data, ICLR 2019 Workshop (OpenReview.net, 2019).
Karimi, M., Zhu, S., Cao, Y. & Shen, Y. De novo protein design for novel folds using guided conditional Wasserstein generative adversarial networks. J. Chem. Inf. Model 60, 5667–5681 (2020).
Article CAS PubMed PubMed Central Google Scholar
Simons, K. T., Bonneau, R., Ruczinski, I. & Baker, D. Ab initio protein structure prediction of CASP III targets using ROSETTA. Proteins Struct. Funct. Bioinform. 37, 171–176 (1999).
Article Google Scholar
Satorras, V. c. G., Hoogeboom, E. & Welling, M. E(n) Equivariant Graph Neural Networks. In Proc. 38th Int. Conference on Machine Learning Vol. 139 (eds Meila, M. & Zhang, T.) 9323–9332 (PMLR, 2021).
Thomas, N. et al. Tensor field networks: rotation-and translation-equivariant neural networks for 3D point clouds. Preprint at arXiv https://doi.org/10.48550/arXiv.1802.08219 (2018).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article CAS PubMed PubMed Central ADS Google Scholar
Anand, N. et al. Protein sequence design with a learned potential. Nat. Commun. 13, 746 (2022).
Article CAS PubMed PubMed Central ADS Google Scholar
Leaver-Fay, A. et al. Scientific benchmarks for guiding macromolecular energy function improvement. Methods Enzymol. 523, 109–143 (2013).
Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).
Article CAS PubMed PubMed Central ADS Google Scholar
Chène, P. Inhibiting the p53–MDM2 interaction: an important target for cancer therapy. Nat. Rev. Cancer 3, 102–109 (2003).
Article PubMed Google Scholar
Salgado, E. N., Lewis, R. A., Mossin, S., Rheingold, A. L. & Tezcan, F. A. Control of protein oligomerization symmetry by metal coordination: C2 and C3 symmetrical assemblies through CuII and NiII coordination. Inorg. Chem. 48, 2726–2728 (2009).
Article CAS PubMed PubMed Central Google Scholar
Salgado, E. N. et al. Metal templated design of protein interfaces. Proc. Natl Acad. Sci. 107, 1827–1832 (2010).
Article CAS PubMed ADS Google Scholar
Cao, L. et al. Design of protein-binding proteins from the target structure alone. Nature 605, 551–560 (2022).
Article CAS PubMed PubMed Central ADS Google Scholar
Chevalier, A. et al. Massively parallel de novo protein design for targeted therapeutics. Nature 550, 74–79 (2017).
Article CAS PubMed PubMed Central ADS Google Scholar
Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).
Article PubMed PubMed Central Google Scholar
De Cao, N. & Kipf, T. MolGAN: an implicit generative model for small molecular graphs. Preprint at arXiv https://doi.org/10.48550/arXiv.1805.11973 (2018).
Wu, Z. et al. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9, 513–530 (2018).
Article CAS PubMed Google Scholar
You, J., Liu, B., Ying, Z., Pande, V. & Leskovec, J. Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation. In Advances in Neural Information Processing Systems Vol. 31 (eds Bengio, S. et al.) (Curran Associates, 2018).
Kloeden, P. E., Platen, E., Kloeden, P. E. & Platen, E. Stochastic Differential Equations (Springer, 1992).
Shi, C. et al. GraphAF: a flow-based autoregressive model for molecular graph generation. Preprint at arXiv https://doi.org/10.48550/arXiv.2001.09382 (2020).
Luo, Y., Yan, K. & Ji, S. GraphDF: A Discrete Flow Model for Molecular Graph Generation. In Proc. 38th Int. Conference on Machine Learning Vol. 139 (eds Meila, M. & Zhang, T.) 7192–7203 (PMLR, 2021).
Zang, C. & Wang, F. MoFlow: an invertible flow model for generating molecular graphs. In Proc. 26th ACM SIGKDD Int. Conf. on Knowledge Discovery & Data Mining 617–626 (2020).
Lippe, P. & Gavves, E. Categorical normalizing flows via continuous transformations. Preprint at arXiv https://doi.org/10.48550/arXiv.2006.09790 (2020).
Liu, M., Yan, K., Oztekin, B. & Ji, S. G. Molecular graph generation with energy-based models. Preprint at arXiv https://doi.org/10.48550/arXiv.2102.00546 (2021).
Erdős, P. & Rényi, A. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci. 5, 17–60 (1960).
MathSciNet Google Scholar
Kipf, T. N. & Welling, M. Variational graph auto-encoders. Preprint at arXiv https://doi.org/10.48550/arXiv.1611.07308 (2016).
You, J., Ying, R., Ren, X., Hamilton, W. & Leskovec, J. GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models. In Proc. 35th Int. Conference on Machine Learning Vol. 80 (eds Dy, J & Krause, A.) 5708–5717 (PMLR, 2018).
Liao, R. et al. Efficient Graph Generation with Graph Recurrent Attention Networks. In Advances in Neural Information Processing Systems Vol. 32 (eds Wallach, H. et al.) (Curran Associates, 2019).
Garcia Satorras, V., Hoogeboom, E., Fuchs, F., Posner, I. & Welling, M. E(n) Equivariant Normalizing Flows. In Advances in Neural Information Processing Systems Vol. 34 (eds Ranzato, M. et al.) 4181–4192 (Curran Associates, 2021).
Gebauer, N., Gastegger, M. & Schütt, K. Symmetry-adapted generation of 3D point sets for the targeted discovery of molecules. In Advances in Neural Information Processing Systems Vol. 32 (eds Wallach, H. et al.) (Curran Associates, 2019).
Simonovsky, M. & Komodakis, N. Graphvae: towards generation of small graphs using variational autoencoders. In Artificial Neural Networks and Machine Learning (ICANN 2018) 27th Int. Conf. on Artificial Neural Networks Proc. Part I 412–422 (Springer, 2018).
Mitton, J., Senn, H. M., Wynne, K. & Murray-Smith, R. A graph VAE graph transformer approach to generating molecular graphs. Preprint at arXiv https://doi.org/10.48550/arXiv.2104.04345 (2021).
Vignac, C. & Frossard, P. Top-N: equivariant set and graph generation without exchangeability. Preprint at arXiv https://doi.org/10.48550/arXiv.2110.02096 (2021).
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. Neural Message Passing for Quantum Chemistry. In Proc. 34th Int. Conference on Machine Learning Vol. 70 (eds Precup, D. & Teh, Y.) 1263–1272 (PMLR, 2017).
Riniker, S. & Landrum, G. A. Better informed distance geometry: using what we know to improve conformation generation. J. Chem. Inf. Model 55, 2562–2574 (2015).
Article CAS PubMed Google Scholar
Xu, M., Luo, S., Bengio, Y., Peng, J. & Tang, J. Learning neural generative dynamics for molecular conformation generation. Preprint at arXiv https://doi.org/10.48550/arXiv.2102.10240 (2021).
Simm, G. N. & Hernández-Lobato, J. M. A generative model for molecular distance geometry. Preprint at arXiv https://doi.org/10.48550/arXiv.1909.11459 (2019).
Mansimov, E., Mahmood, O., Kang, S. & Cho, K. Molecular geometry prediction using a deep generative graph neural network. Sci. Rep. 9, 20381 (2019).
Article PubMed PubMed Central ADS Google Scholar
Zhu, J. et al. Direct molecular conformation generation. Preprint at arXiv https://doi.org/10.48550/arXiv.2202.01356 (2022).
Köhler, J., Klein, L. & Noé, F. Equivariant flows: sampling configurations for multi-body systems with symmetric energies. Preprint at arXiv https://doi.org/10.48550/arXiv.1910.00753 (2019).
Fuchs, F., Worrall, D., Fischer, V. & Welling, M. Se(3)-transformers: 3D roto-translation equivariant attention networks. Adv. Neural Inf. Process. Syst. 33, 1970–1981 (2020).
Google Scholar
Huang, W. et al. Equivariant graph mechanics networks with constraints. Preprint at arXiv https://doi.org/10.48550/arXiv.2203.06442 (2022).
Gao, A. & Remsing, R. C. Self-consistent determination of long-range electrostatics in neural network potentials. Nat. Commun. 13, 1572 (2022).
Article CAS PubMed PubMed Central ADS Google Scholar
Imrie, F., Bradley, A. R., van der Schaar, M. & Deane, C. M. Deep generative models for 3D linker design. J. Chem. Inf. Model 60, 1983–1995 (2020).
Article CAS PubMed PubMed Central Google Scholar
Huang, Y., Peng, X., Ma, J. & Zhang, M. 3DLinker: an E(3) equivariant variational autoencoder for molecular linker design. Preprint at arXiv https://doi.org/10.48550/arXiv.2205.07309 (2022).
Liu, M., Luo, Y., Uchino, K., Maruhashi, K. & Ji, S. Generating 3D molecules for target protein binding. Preprint at arXiv https://doi.org/10.48550/arXiv.2204.09410 (2022).
Masuda, T., Ragoza, M. & Koes, D. R. Generating 3D molecular structures conditional on a receptor binding site with deep generative models. Preprint at arXiv https://doi.org/10.48550/arXiv.2010.14442 (2020).
Luo, S., Guan, J., Ma, J. & Peng, J. A 3D generative model for structure-based drug design. Adv. Neural Inf. Process. Syst. 34, 6229–6239 (2021).
Google Scholar
Peng, X. et al. Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets. In Proc. 39th Int. Conference on Machine Learning Vol. 162 (eds Chaudhuri, K. et al.) 17644–17655 (PMLR, 2022).
Jing, B., Eismann, S., Soni, P. N. & Dror, R. O. Equivariant graph neural networks for 3D macromolecular structure. Preprint at arXiv https://doi.org/10.48550/arXiv.2106.03843 (2021).
Francoeur, P. G. et al. Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. J. Chem. Inf. Model. 60, 4200–4215 (2020).
Article CAS PubMed PubMed Central Google Scholar
Koes, D. R., Baumgartner, M. P. & Camacho, C. J. Lessons learned in empirical scoring with smina from the CSAR 2011 benchmarking exercise. J. Chem. Inf. Model 53, 1893–1904 (2013).
Article CAS PubMed PubMed Central Google Scholar
Hassan, N. M., Alhossary, A. A., Mu, Y. & Kwoh, C.-K. Protein–ligand blind docking using QuickVina-W with inter-process spatio-temporal integration. Sci. Rep. 7, 15451 (2017).
Article PubMed PubMed Central ADS Google Scholar
Friesner, R. A. et al. Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J. Med. Chem. 47, 1739–1749 (2004).
Article CAS PubMed Google Scholar
McNutt, A. T. et al. GNINA 1.0: molecular docking with deep learning. J. Cheminform. 13, 43 (2021).
Article PubMed PubMed Central Google Scholar
Strk, H., Ganea, O., Pattanaik, L., Barzilay, D. R. & Jaakkola, T. EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction. In Proc. 39th Int. Conference on Machine Learning Vol. 162 (eds Chaudhuri, K. et al.) 20503–20521 (PMLR, 2022).
Lu, W. et al. TANKBind: Trigonometry-Aware Neural NetworKs for Drug-Protein Binding Structure Prediction. In Advances in Neural Information Processing Systems Vol. 35 (eds Koyejo, S. et al.) 7236–7249 (Curran Associates, 2022).
Liu, Y. et al. CB-Dock: a web server for cavity detection-guided protein–ligand blind docking. Acta Pharmacol. Sin. 41, 138–144 (2020).
Article PubMed Google Scholar
Wang, R., Fang, X., Lu, Y. & Wang, S. The PDBbind database: collection of binding affinities for protein−ligand complexes with known three-dimensional structures. J. Med. Chem. 47, 2977–2980 (2004).
Article CAS PubMed Google Scholar
Dunbar, J. et al. SAbDab: the structural antibody database. Nucleic Acids Res. 42, D1140–D1146 (2014).
Article CAS PubMed Google Scholar
Miller, B. R. III et al. MMPBSA.py: an efficient program for end-state free energy calculations. J. Chem. Theory Comput. 8, 3314–3321 (2012).
Article CAS PubMed Google Scholar
Mooij, W. T. & Verdonk, M. L. General and targeted statistical potentials for protein–ligand interactions. Proteins 61, 272–287 (2005).
Article CAS PubMed Google Scholar
Dittrich, J., Schmidt, D., Pfleger, C. & Gohlke, H. Converging a knowledge-based scoring function: DrugScore2018. J. Chem. Inf. Model 59, 509–521 (2018).
Article PubMed Google Scholar
Pierce, B. & Weng, Z. ZRANK: reranking protein docking predictions with an optimized energy function. Proteins 67, 1078–1086 (2007).
Article CAS PubMed Google Scholar
Pierce, B. & Weng, Z. A combination of rescoring and refinement significantly improves protein docking performance. Proteins 72, 270–279 (2008).
Article CAS PubMed PubMed Central Google Scholar
Alford, R. F. et al. The Rosetta all-atom energy function for macromolecular modeling and design. J. Chem. Theory Comput. 13, 3031–3048 (2017).
Article CAS PubMed PubMed Central Google Scholar
Grosdidier, S., Pons, C., Solernou, A. & Fernández-Recio, J. Prediction and scoring of docking poses with pyDock. Proteins 69, 852–858 (2007).
Article CAS PubMed Google Scholar
Pons, C., Talavera, D., De La Cruz, X., Orozco, M. & Fernandez-Recio, J. Scoring by intermolecular pairwise propensities of exposed residues (SIPPER): a new efficient potential for protein−protein docking. J. Chem. Inf. Model 51, 370–377 (2011).
Article CAS PubMed Google Scholar
Viswanath, S., Ravikant, D. & Elber, R. Improving ranking of models for protein complexes with side chain modeling and atomic potentials. Proteins 81, 592–606 (2013).
Article CAS PubMed Google Scholar
Ravikant, D. & Elber, R. PIE—efficient filters and coarse grained potentials for unbound protein–protein docking. Proteins 78, 400–419 (2010).
Article CAS PubMed Google Scholar
Andrusier, N., Nussinov, R. & Wolfson, H. J. FireDock: fast interaction refinement in molecular docking. Proteins 69, 139–159 (2007).
Article CAS PubMed Google Scholar
Dubochet, J. et al. Cryo-electron microscopy of vitrified specimens. Q. Rev. Biophys. 21, 129–228 (1988).
Article CAS PubMed Google Scholar
Frank, J. et al. SPIDER and WEB: processing and visualization of images in 3D electron microscopy and related fields. J. Struct. Biol. 116, 190–199 (1996).
Article CAS PubMed Google Scholar
Ludtke, S. J., Baldwin, P. R. & Chiu, W. EMAN: semiautomated software for high-resolution single-particle reconstructions. J. Struct. Biol. 128, 82–97 (1999).
Article CAS PubMed Google Scholar
Scheres, S. H. RELION: implementation of a Bayesian approach to cryo-EM structure determination. J. Struct. Biol. 180, 519–530 (2012).
Article CAS PubMed PubMed Central Google Scholar
Nogales, E. & Scheres, S. H. Cryo-EM: a unique tool for the visualization of macromolecular complexity. Mol. Cell 58, 677–689 (2015).
Article CAS PubMed PubMed Central Google Scholar
Fernandez-Leiro, R. & Scheres, S. H. Unravelling biological macromolecules with cryo-electron microscopy. Nature 537, 339–346 (2016).
Article CAS PubMed PubMed Central ADS Google Scholar
Merk, A. et al. Breaking cryo-EM resolution barriers to facilitate drug discovery. Cell 165, 1698–1707 (2016).
Article CAS PubMed PubMed Central Google Scholar
Zhong, E. D., Bepler, T., Davis, J. H. & Berger, B. Reconstructing continuous distributions of 3D protein structure from cryo-EM images. Preprint at arXiv https://doi.org/10.48550/arXiv.1909.05215 (2019).
Ronneberger, O., Fischer, P. & Brox, T. U-net: convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI 2015) 18th Int. Conf. Proc. Part III 234–241 (Springer, 2015).
Waibel, D. J. E. et al. SHAPR—an AI approach to predict 3D cell shapes from 2D microscopic images. iScience https://doi.org/10.1016/j.isci.2022.105298 (2022).
Article PubMed PubMed Central Google Scholar
Waibel, D. J., Atwell, S., Meier, M., Marr, C. & Rieck, B. Capturing shape information with multi-scale topological loss terms for 3D reconstruction. In Medical Image Computing and Computer Assisted Intervention (MICCAI 2022) 25th Int. Conf. Proc. Part IV 150–159 (Springer, 2022).
Van Dijk, D. et al. Recovering gene interactions from single-cell data using data diffusion. Cell 174, 716–729.e727 (2018).
Article PubMed PubMed Central Google Scholar
Wagner, F., Yan, Y. & Yanai, I. K-nearest neighbor smoothing for high-throughput single-cell RNA-seq data. Preprint at bioRxiv https://doi.org/10.1101/217737 (2017).
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Article CAS PubMed PubMed Central ADS Google Scholar
Trieu, T. & Cheng, J. 3D genome structure modeling by Lorentzian objective function. Nucleic Acids Res. 45, 1049–1058 (2017).
Article CAS PubMed Google Scholar
Trieu, T. & Cheng, J. MOGEN: a tool for reconstructing 3D models of genomes from chromosomal conformation capturing data. Bioinformatics 32, 1286–1292 (2016).
Article CAS PubMed Google Scholar
Highsmith, M. & Cheng, J. VEHiCLE: a variationally encoded Hi-C loss enhancement algorithm for improving and generating Hi-C data. Sci. Rep. 11, 1–13 (2021).
Article Google Scholar
Wang, Y., Guo, Z. & Cheng, J. Single-cell Hi-C data enhancement with deep residual and generative adversarial networks. Bioinformatics 39, btad458 (2023).
Taskiran, I. I., Spanier, K. I., Christiaens, V., Mauduit, D. & Aerts, S. Cell type directed design of synthetic enhancers. Preprint at bioRxiv https://doi.org/10.1101/2022.07.26.501466 (2022).
Tang, G. et al. EMAN2: an extensible image processing suite for electron microscopy. J. Struct. Biol. 157, 38–46 (2007).
Article CAS PubMed Google Scholar
Al-Azzawi, A. et al. DeepCryoPicker: fully automated deep neural network for single protein particle picking in cryo-EM. BMC Bioinform. 21, 1–38 (2020).
Article Google Scholar
Kawar, B., Elad, M., Ermon, S. & Song, J. Denoising diffusion restoration models. Adv. Neural Inf. Process. Syst. 35, 23593–23606 (2022).
Google Scholar
Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. Preprint at bioRxiv https://doi.org/10.1101/2021.10.04.463034 (2021).
Huang, C.-W., Lim, J. H. & Courville, A. C. A variational perspective on diffusion-based generative models and score matching. Adv. Neural Inf. Process. Syst. 34, 22863–22876 (2021).
Google Scholar
Kim, D., Shin, S., Song, K., Kang, W. & Moon, I.-C. Soft truncation: a universal training technique of score-based diffusion model for High Precision Score Estimation. Preprint at arXiv https://doi.org/10.48550/arXiv.2106.05527 (2021).
Gu, S. et al. Vector quantized diffusion model for text-to-image synthesis. In Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition 10696–10706 (IEEE, 2022).
Tang, Z., Gu, S., Bao, J., Chen, D. & Wen, F. Improved vector quantized diffusion models. Preprint at arXiv https://doi.org/10.48550/arXiv.2205.16007 (2022).
Poole, B., Jain, A., Barron, J. T. & Mildenhall, B. DreamFusion: text-to-3D using 2D diffusion. Preprint at arXiv https://doi.org/10.48550/arXiv.2209.14988 (2022).
Hong, S., Lee, G., Jang, W. & Kim, S. Improving sample quality of diffusion models using self-attention guidance. Preprint at arXiv https://doi.org/10.48550/arXiv.2210.00939 (2022).
Li, W. Automatic segmentation of liver tumor in CT images with deep convolutional neural networks. J. Comput. Commun. 3, 146 (2015).
Article Google Scholar
Menze, B. H. et al. The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging 34, 1993–2024 (2014).
Article PubMed PubMed Central Google Scholar
Cheng, J. et al. Superpixel classification based optic disc and optic cup segmentation for glaucoma screening. IEEE Trans. Med. Imaging 32, 1019–1032 (2013).
Article PubMed Google Scholar
Wang, S. et al. Central focused convolutional neural networks: developing a data-driven model for lung nodule segmentation. Med. Image Anal. 40, 172–183 (2017).
Article PubMed PubMed Central Google Scholar
Srinivasu, P. N. et al. Classification of skin disease using deep learning neural networks with MobileNet V2 and LSTM. Sensors 21, 2852 (2021).
Article PubMed PubMed Central ADS Google Scholar
Swapna, G., Vinayakumar, R. & Soman, K. Diabetes detection using deep learning algorithms. ICT Express 4, 243–246 (2018).
Article Google Scholar
Das, A., Acharya, U. R., Panda, S. S. & Sabut, S. Deep learning based liver cancer detection using watershed transform and Gaussian mixture model techniques. Cogn. Syst. Res. 54, 165–175 (2019).
Article Google Scholar
Jo, T., Nho, K. & Saykin, A. J. Deep learning in Alzheimer’s disease: diagnostic classification and prognostic prediction using neuroimaging data. Front. Aging Neurosci. 11, 220 (2019).
Article PubMed PubMed Central Google Scholar
Arévalo, A., Niño, J., Hernández, G. & Sandoval, J. High-frequency trading strategy based on deep neural networks. In Int. Conf. on Intelligent Computing 424–436 (Springer, 2016).
Bao, W., Yue, J. & Rao, Y. A deep learning framework for financial time series using stacked autoencoders and long-short term memory. PLoS One 12, e0180944 (2017).
Article PubMed PubMed Central Google Scholar
Xiao, Q., Li, K., Zhang, D. & Xu, W. Security risks in deep learning implementations. In 2018 IEEE Security and Privacy Workshops (SPW) 123–128 (IEEE, 2018).
Halstead, M., Ahmadi, A., Smitt, C., Schmittmann, O. & McCool, C. Crop agnostic monitoring driven by deep learning. Front. Plant. Sci. 12, 786702 (2021).
Article PubMed PubMed Central Google Scholar
Feng, A., Zhou, J., Vories, E. & Sudduth, K. A. Evaluation of cotton emergence using UAV-based imagery and deep learning. Comput. Electron. Agric. 177, 105711 (2020).
Article Google Scholar
Liu, J. & Wang, X. Plant diseases and pests detection based on deep learning: a review. Plant. Methods 17, 22 (2021).
Article PubMed PubMed Central Google Scholar
Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020).
Google Scholar
Nichol, A. et al. Glide: towards photorealistic image generation and editing with text-guided diffusion models. Preprint at arXiv https://doi.org/10.48550/arXiv.2112.10741 (2021).

Download references

Acknowledgements

The work was partly supported by the US National Institutes of Health (grants R01GM146340 (to J.C.), R01GM093123 (to J.C.) and R35GM126985 (to D.X.)) and the US National Science Foundation (grant DBI2308699 (to J.C.)).

Author information

These authors equally contributed to this work: Zhiye Guo, Jian Liu, Yanli Wang, Mengrui Chen.

Authors and Affiliations

Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
Zhiye Guo, Jian Liu, Yanli Wang, Mengrui Chen, Duolin Wang, Dong Xu & Jianlin Cheng
NextGen Precision Health, University of Missouri, Columbia, MO, USA
Zhiye Guo, Jian Liu, Yanli Wang, Mengrui Chen, Duolin Wang, Dong Xu & Jianlin Cheng

Authors

Zhiye Guo
View author publications
You can also search for this author in PubMed Google Scholar
Jian Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yanli Wang
View author publications
You can also search for this author in PubMed Google Scholar
Mengrui Chen
View author publications
You can also search for this author in PubMed Google Scholar
Duolin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dong Xu
View author publications
You can also search for this author in PubMed Google Scholar
Jianlin Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Z.G., J.L., Y.W. and M.C. collected data. J.C., D.X. and D.W. provided guidance on organizing the content. J.C. envisioned the developments for proteins, 3D genomics, single-cell Hi-C data analytics, cryo-EM, DNA design, peptide, proteomics and metabolomics. D.X. envisioned the developments in single-cell reconstruction and inference. Z.G., J.C., J.L., Y.W., D.X., D.W. and M.C. wrote and edited the manuscript.

Corresponding author

Correspondence to Jianlin Cheng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Reviews Bioengineering thanks Bonnie Berger, Daniel Lazarev and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Guo, Z., Liu, J., Wang, Y. et al. Diffusion models in bioinformatics and computational biology. Nat Rev Bioeng 2, 136–154 (2024). https://doi.org/10.1038/s44222-023-00114-9

Download citation

Accepted: 25 August 2023
Published: 27 October 2023
Issue Date: February 2024
DOI: https://doi.org/10.1038/s44222-023-00114-9