Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Feedback GAN for DNA optimizes protein functions


Generative adversarial networks (GANs) represent an attractive and novel approach to generate realistic data, such as genes, proteins or drugs, in synthetic biology. Here, we apply GANs to generate synthetic DNA sequences encoding for proteins of variable length. We propose a novel feedback-loop architecture, feedback GAN (FBGAN), to optimize the synthetic gene sequences for desired properties using an external function analyser. The proposed architecture also has the advantage that the analyser does not need to be differentiable. We apply the feedback-loop mechanism to two examples: generating synthetic genes coding for antimicrobial peptides, and optimizing synthetic genes for the secondary structure of their resulting peptides. A suite of metrics, calculated in silico, demonstrates that the GAN-generated proteins have desirable biophysical properties. The FBGAN architecture can also be used to optimize GAN-generated data points for useful properties in domains beyond genomics.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: FBGAN architecture and training.
Fig. 2: t-SNE visualization of synthetic genes.
Fig. 3: AMP analyser predictions over training epochs.
Fig. 4: Sequence similarity for synthetic AMPs and known AMPs.
Fig. 5: α-helix lengths of known versus synthetic proteins.
Fig. 6: Sample α-helices from FBGAN.

Data availability

Demo, instructions and code for FBGAN are available at All of the data used in this paper are publicly available and can be accessed at the references cited22.


  1. Benner, S. & Sismour, M. Synthetic biology. Nat. Rev. Genet. 6, 533–543 (2005).

    Article  Google Scholar 

  2. Izadpanah, A. & Gallo, R. Antimicrobial peptides. J. Am. Acad. Dermatol. 52, 381–390 (2005).

    Article  Google Scholar 

  3. Papagianni, M. Ribosomally synthesized peptides with antimicrobial properties: biosynthesis, structure, function, and applications. Biotechnol. Adv. 21, 465–499 (2003).

    Article  Google Scholar 

  4. Rocklin, G. J. et al. Global analysis of protein folding using massively parallel design, synthesis, and testing. Science 357, 168–175 (2017).

    MathSciNet  Article  Google Scholar 

  5. Abrusán, G. & Marsh, J. Alpha helices are more robust to mutations than beta strands. PLoS Comput. Biol. 12, e1005242 (2016).

    Article  Google Scholar 

  6. Segler, M., Kogej, T., Tyrchan, C. & Waller, M. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 4, 120–131 (2018).

    Article  Google Scholar 

  7. Gupta, A. et al. Generative recurrent networks for de novo drug design. Mol. Inform. 37, 1700111 (2018).

    Article  Google Scholar 

  8. Muller, A. T., Hiss, J. A. & Schneider, G. Recurrent neural network model for constructive peptide design. J. Chem. Inf. Model. 58, 472–479 (2018).

    Article  Google Scholar 

  9. Olivecrona, M., Blaschke, T., Engkvist, O. & Chen, H. Molecular de novo design through deep reinforcement learning. J. Chemoinformatics 9, 48 (2017).

    Article  Google Scholar 

  10. Salimans, T. et al. Improved techniques for training GANs. Preprint at abs/1606.03498 (2016).

  11. Goldsborough, P., Pawlowski, N., Caicedo, J., Singh, S. & Carpenter, A. Cytogan: generative modeling of cell images. Preprint at (2017).

  12. Esteban, C., Hyland, S. & Rätsch, G. Real-valued (medical) time series generation with recurrent conditional GANs. Preprint at (2017).

  13. Ghahramani, A., Watt, F. & Luscombe, N. Generative adversarial networks uncover epidermal regulators and predict single cell perturbations. Preprint at (2018).

  14. Osokin, A., Chessel, A., Carazo-Salas, R. & Vaggi, F. GANs for biological image synthesis. Preprint at (2017).

  15. Zhu, J. & Bento, J. Generative adversarial active learning. Preprint at (2017).

  16. Killoran, N., Lee, L., Delong, A., Duvenaud, D. & Frey, B. Generating and designing DNA with deep generative models. Preprint at (2017).

  17. Arjovsky, M., Chintala, S. & Bottou, L. Wasserstein generative adversarial networks. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 214–223 (PMLR, 2017).

  18. Goodfellow, I. et al. Generative adversarial nets. In Advances in Neural Information Processing Systems 27 (eds Ghahramani, Z. et al.) 2672–2680 (Curran Associates, 2014).

  19. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V. & Courville, A. Improved training of Wasserstein GANs. Preprint at (2017).

  20. Apweiler, R. et al. Uniprot: the universal protein knowledgebase. Nucleic Acids Res. 46, 2699 (2018).

    Article  Google Scholar 

  21. Wang, G., Li, X. & Wang, Z. Apd3: the antimicrobial peptide database as a tool for research and education. Nucleic Acids Res. 44, D1087–D1093 (2016).

    Article  Google Scholar 

  22. Muller, A., Gabernet, G., Hiss, J. & Schneider, G. modlamp: python for antimicrobial peptides. Bioinformatics 33, 2753–2755 (2017).

    Article  Google Scholar 

  23. Buchan, D., Minneci, F., Nugent, T., Bryson, K. & Jones, D. Scalable web services for the PSIPRED protein analysis workbench. Nucleic Acids Res. 41, W349–W357 (2013).

    Article  Google Scholar 

  24. Waghu, F., Barai, R., Gurung, P. & Idicula-Thomas, S. Campr3: a database on sequences, structures and signatures of antimicrobial peptides. Nucleic Acids Res. 44, D1094–D1097 (2016).

    Article  Google Scholar 

  25. Xu, D. & Zhang, Y. Ab initio protein structure assembly using continuous structure fragments and optimized knowledge based force field. Proteins 80, 1715–1735 (2012).

    Article  Google Scholar 

  26. Gupta, A. & Rush, A. Dilated convolutions for modeling long-distance genomic dependencies. Preprint at (2017).

Download references


The authors thank A. Kundaje for guidance when initiating the research on GANs and DNA. J.Z. is supported by a Chan-Zuckerberg Biohub Investigator grant and National Science Foundation (NSF) grant CRII 1657155.

Author information

Authors and Affiliations



J.Z. conceived the objective of using GANs to generate genes and optimize protein functions; A.G. conceived of and implemented the feedback-loop architecture and conducted the experiments and analysis. Both authors wrote the manuscript.

Corresponding author

Correspondence to James Zou.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figures

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gupta, A., Zou, J. Feedback GAN for DNA optimizes protein functions. Nat Mach Intell 1, 105–111 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing