Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

De novo protein design by deep network hallucination

Abstract

There has been considerable recent progress in protein structure prediction using deep neural networks to predict inter-residue distances from amino acid sequences1,2,3. Here we investigate whether the information captured by such networks is sufficiently rich to generate new folded proteins with sequences unrelated to those of the naturally occurring proteins used in training the models. We generate random amino acid sequences, and input them into the trRosetta structure prediction network to predict starting residue–residue distance maps, which, as expected, are quite featureless. We then carry out Monte Carlo sampling in amino acid sequence space, optimizing the contrast (Kullback–Leibler divergence) between the inter-residue distance distributions predicted by the network and background distributions averaged over all proteins. Optimization from different random starting points resulted in novel proteins spanning a wide range of sequences and predicted structures. We obtained synthetic genes encoding 129 of the network-‘hallucinated’ sequences, and expressed and purified the proteins in Escherichia coli; 27 of the proteins yielded monodisperse species with circular dichroism spectra consistent with the hallucinated structures. We determined the three-dimensional structures of three of the hallucinated proteins, two by X-ray crystallography and one by NMR, and these closely matched the hallucinated models. Thus, deep networks trained to predict native protein structures from their sequences can be inverted to design new proteins, and such networks and methods should contribute alongside traditional physics-based models to the de novo design of proteins with new functions.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of protein hallucination approach.
Fig. 2: Overview of computational results.
Fig. 3: Experimental characterization of α-helical network-hallucinated proteins.
Fig. 4: Experimental characterization of network-hallucinated proteins with mixed α–β structures.
Fig. 5: Structural analysis of network-hallucinated proteins.

Similar content being viewed by others

Data availability

The atomic coordinates of the crystal structures for designs 0217 and 0738_mod, as well as the NMR structure for design 0515 have been deposited in the RCSB Protein Data Bank with the accession numbers 7K3H, 7M0Q and 7M5T, respectively. NMR chemical shifts, NOESY peak lists, and spectral data have been deposited in the BioMagResDB, BMRB ID 30890. Amino acid sequences and structure models for all 2K designs described in the manuscript are freely available for download at https://files.ipd.uw.edu/pub/trRosetta/hallucinations2K.tar.gz. Amino acid sequences and 3D structures of the generated designs were compared to known protein sequences and structures in UniProt (https://ftp.uniprot.org/pub/databases/uniprot/previous_releases/release-2017_12/uniref/) and the Protein Data Bank (11 March 2020), respectively.

Code availability

The computer code used to generate the hallucinated proteins described in the manuscript was made publicly available as a part of trDesign Github package (https://github.com/gjoni/trDesign); corresponding structural models were generated by the trRosetta structure modelling script available for free download at https://yanglab.nankai.edu.cn/trRosetta/download/. The Rosetta software suite was used to perform ab initio prediction calculations. Rosetta is freely available for academic users on Github, and can be licensed for commercial use by the University of Washington CoMotion Express License Program.

References

  1. Xu, J. Distance-based protein folding powered by deep learning. Proc. Natl Acad. Sci. USA 116, 16856–16865 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Senior, A. W. et al. Improved protein structure prediction using potentials from deep learning. Nature 577, 706–710 (2020).

    Article  ADS  CAS  PubMed  Google Scholar 

  3. Yang, J. et al. Improved protein structure prediction using predicted interresidue orientations. Proc. Natl Acad. Sci. USA 117, 1496–1503 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Biswas, S., Khimulya, G., Alley, E. C., Esvelt, K. M. & Church, G. M. Low-N protein engineering with data-efficient deep learning. Nat. Methods 18, 389–396 (2021).

    Article  CAS  PubMed  Google Scholar 

  5. Madani, A. et al. ProGen: language modeling for protein generation. Preprint at https://arxiv.org/abs/2004.03497 (2020).

  6. Anand, N., Eguchi, R. & Huang, P. S. Fully differentiable full-atom protein backbone generation. In ICLR 2019 Workshop https://openreview.net/forum?id=SJxnVL8YOV (2019).

  7. Wang, J., Cao, H., Zhang, J. Z. H. & Qi, Y. Computational protein design with deep learning neural networks. Sci Rep. 8, 6349 (2018).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  8. Ingraham, J., Garg, V. K., Barzilay, R. & Jaakkola, T. Generative models for graph-based protein design. in ICLR 2019 Workshop https://openreview.net/forum?id=SJgxrLLKOE (2019).

  9. Anand, N., Eguchi, R. R., Derry, A., Altman, R. B. & Huang, P.-S. Protein sequence design with a learned potential. Preprint at https://doi.org/10.1101/2020.01.06.895466 (2020).

  10. Strokach, A., Becerra, D., Corbi-Verge, C., Perez-Riba, A. & Kim, P. M. Fast and flexible protein design using deep graph neural networks. Cell Syst. 11, 402–411.e4 (2020).

    Article  CAS  PubMed  Google Scholar 

  11. Karimi, M., Zhu, S., Cao, Y. & Shen, Y. De novo protein design for novel folds using guided conditional Wasserstein generative adversarial networks. J. Chem. Inf. Model. 60, 5667–5681 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Davidsen, K. et al. Deep generative models for T cell receptor protein sequences. eLife 8, e46935 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Costello, Z. & Martin, H. G. How to hallucinate functional proteins. Preprint at https://arxiv.org/abs/1903.00458 (2019).

  14. Eguchi, R. R., Anand, N., Choe, C. A. & Huang, P.-S. IG-VAE: generative modeling of immunoglobulin proteins by direct 3D coordinate generation. Preprint at https://doi.org/10.1101/2020.08.07.242347 (2020).

  15. Repecka, D. et al. Expanding functional protein sequence spaces using generative adversarial networks. Nat. Mach. Intell. 3, 324–333 (2021).

    Article  Google Scholar 

  16. Hawkins-Hooker, A. et al. Generating functional protein variants with variational autoencoders. PLoS Comput. Biol. 17, e1008736 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Senior, A. W. et al. Protein structure prediction using multiple deep neural networks in the 13th Critical Assessment of Protein Structure Prediction (CASP13). Proteins 87, 1141–1148 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Mordvintsev, A., Olah, C. & Tyka, M. Inceptionism: going deeper into neural networks. Google AI Blog https://ai.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html (2015).

  19. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

    Article  CAS  PubMed  Google Scholar 

  20. Rohl, C. A., Strauss, C. E. M., Misura, K. M. S. & Baker, D. Protein structure prediction using Rosetta. Methods Enzymol. 383, 66–93 (2004).

    Article  CAS  PubMed  Google Scholar 

  21. Park, H. et al. Simultaneous optimization of biomolecular energy functions on features from small molecules and macromolecules. J. Chem. Theory Comput. 12, 6201–6212 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Rossi, P. et al. A microscale protein NMR sample screening pipeline. J. Biomol. NMR 46, 11–22 (2010).

    Article  CAS  PubMed  Google Scholar 

  23. Koga, N. et al. Principles for designing ideal protein structures. Nature 491, 222–227 (2012).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  24. Dou, J. et al. De novo design of a fluorescence-activating β-barrel. Nature 561, 485–491 (2018).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  25. Norn, C. et al. Protein sequence design by conformational landscape optimization. Proc. Natl. Acad Sci. USA 118, e2017228118 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).

    Article  Google Scholar 

  27. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article  Google Scholar 

  28. Wang, J. et al. Deep learning methods for designing proteins scaffolding functional sites. Preprint at https://doi.org/10.1101/2021.11.10.468128 (2021).

  29. Jendrusch, M., Korbel, J. O. & Sadiq, S. K. AlphaDesign: A de novo protein design framework based on AlphaFold. Preprint at https://doi.org/10.1101/2021.10.11.463937 (2021).

  30. Tischer, D. et al. Design of proteins presenting discontinuous functional sites using deep learning. Preprint at https://doi.org/10.1101/2020.11.29.402743 (2020).

  31. Zhang, Y. & Skolnick, J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33, 2302–2309 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Studier, F. W. Protein production by auto-induction in high density shaking cultures. Protein Expr. Purif. 41, 207–234 (2005).

    Article  CAS  PubMed  Google Scholar 

  33. Pace, C. N., Vajdos, F., Fee, L., Grimsley, G. & Gray, T. How to measure and predict the molar absorption coefficient of a protein. Protein Sci. 4, 2411–2423 (1995).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Acton, T. B. et al. Preparation of protein samples for NMR structure, function, and small-molecule screening studies. Methods Enzymol. 493, 21–60 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Xiao, R. et al. The high-throughput protein sample production platform of the Northeast Structural Genomics Consortium. J. Struct. Biol. 172, 21–33 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Jansson, M. et al. High-level production of uniformly 15N-and 13C-enriched fusion proteins in Escherichia coli. J. Biomol. NMR 7, 131–141 (1996).

    Article  CAS  PubMed  Google Scholar 

  37. Ottiger, M., Delaglio, F. & Bax, A. Measurement of J and dipolar couplings from simplified two-dimensional NMR spectra. J. Magn. Reson. 131, 373–378 (1998).

    Article  ADS  CAS  PubMed  Google Scholar 

  38. Delaglio, F. et al. NMRPipe: a multidimensional spectral processing system based on UNIX pipes. J. Biomol. NMR 6, 277–293 (1995).

    Article  CAS  PubMed  Google Scholar 

  39. Lee, W., Tonelli, M. & Markley, J. L. NMRFAM-SPARKY: enhanced software for biomolecular NMR spectroscopy. Bioinformatics 31, 1325–1327 (2015).

    Article  PubMed  Google Scholar 

  40. Favier, A. & Brutscher, B. NMRlib: user-friendly pulse sequence tools for Bruker NMR spectrometers. J. Biomol. NMR 73, 199–211 (2019).

    Article  CAS  PubMed  Google Scholar 

  41. Hyberts, S. G., Milbradt, A. G., Wagner, A. B., Arthanari, H. & Wagner, G. Application of iterative soft thresholding for fast reconstruction of NMR data non-uniformly sampled with multidimensional Poisson gap scheduling. J. Biomol. NMR 52, 315–327 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Ying, J., Delaglio, F., Torchia, D. A. & Bax, A. Sparse multidimensional iterative lineshape-enhanced (SMILE) reconstruction of both non-uniformly sampled and conventional NMR data. J. Biomol. NMR 68, 101–118 (2017).

    Article  CAS  PubMed  Google Scholar 

  43. Lee, W. et al. I-PINE web server: an integrative probabilistic NMR assignment system for proteins. J. Biomol. NMR 73, 213–222 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Moseley, H. N. B., Sahota, G. & Montelione, G. T. Assignment validation software suite for the evaluation and presentation of protein resonance assignment data. J. Biomol. NMR 28, 341–355 (2004).

    Article  CAS  PubMed  Google Scholar 

  45. Shen, Y. & Bax, A. Protein backbone and sidechain torsion angles predicted from NMR chemical shifts using artificial neural networks. J. Biomol. NMR 56, 227–241 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Güntert, P., Mumenthaler, C. & Wüthrich, K. Torsion angle dynamics for NMR structure calculation with the new program DYANA. J. Mol. Biol. 273, 283–298 (1997).

    Article  PubMed  Google Scholar 

  47. Herrmann, T., Güntert, P. & Wüthrich, K. Protein NMR structure determination with automated NOE-identification in the NOESY spectra using the new software ATNOS. J. Biomol. NMR 24, 171–189 (2002).

    Article  CAS  PubMed  Google Scholar 

  48. Huang, Y. J., Powers, R. & Montelione, G. T. Protein NMR recall, precision, and F-measure scores (RPF scores): structure quality assessment measures based on information retrieval statistics. J. Am. Chem. Soc. 127, 1665–1674 (2005).

    Article  CAS  PubMed  Google Scholar 

  49. Huang, Y. J., Tejero, R., Powers, R. & Montelione, G. T. A topology-constrained distance network algorithm for protein structure determination from NOESY data. Proteins 62, 587–603 (2006).

    Article  CAS  PubMed  Google Scholar 

  50. Brünger, A. T. et al. Crystallography & NMR system: A new software suite for macromolecular structure determination. Acta Crystallogr. D 54, 905–921 (1998).

    Article  PubMed  Google Scholar 

  51. Bhattacharya, A., Tejero, R. & Montelione, G. T. Evaluating protein structures determined by structural genomics consortia. Proteins 66, 778–795 (2007).

    Article  CAS  PubMed  Google Scholar 

  52. Otwinowski, Z. & Minor, W. Processing of X-ray diffraction data collected in oscillation mode. Methods Enzymol. 276, 307–326 (1997).

    Article  CAS  PubMed  Google Scholar 

  53. McCoy, A. J. et al. Phaser crystallographic software. J. Appl. Crystallogr. 40, 658–674 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. DiMaio, F. et al. Improved low-resolution crystallographic refinement with Phenix and Rosetta. Nat. Methods 10, 1102–1104 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of Coot. Acta Crystallogr. D 66, 486–501 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Liebschner, D. et al. Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix. Acta Crystallogr. D 75, 861–877 (2019).

    Article  CAS  Google Scholar 

  57. Theobald, D. L. & Wuttke, D. S. Accurate structural correlations from maximum likelihood superpositions. PLoS Comput. Biol. 4, e43 (2008).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  58. The PyMOL Molecular Graphics System version 2.4 (Schrödinger, 2021).

  59. Zweckstetter, M. NMR: prediction of molecular alignment from structure using the PALES software. Nat. Protoc. 3, 679–690 (2008).

    Article  CAS  PubMed  Google Scholar 

  60. Montelione, G. T. & Wagner, G. 2D Chemical exchange NMR spectroscopy by proton-detected heteronuclear correlation. J. Am. Chem. Soc. 111, 3096–3098 (1989).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank R. Xiao, G. Liu and A. Wu (Nexomics Biosciences), for assistance in initial NMR protein production; J. Aramini for assistance with NMR data collection for initial HSQC screening; R. Ballard and X. Li for mass spectrometry assistance; and R. Divine and R. Kibler for AKTA scripting. This work was funded by grants from the NSF (DBI 1937533 to D.B. and I.A., and MCB 2032259 to S.O.), the NIH (DP5OD026389 to S.O.), Open Philanthropy (C.C. and A.B.), Eric and Wendy Schmidt by recommendation of the Schmidt Futures program (F.D. and L.C.), and the Audacious project (A.K.), the Washington Research Foundation (S.J.P.), Novo Nordisk Foundation Grant NNF17OC0030446 (C.N.). This work was also supported in part by NIH grants R01 GM120574 (G.T.M.) and R35GM141818 (G.T.M.), and the Howard Hughes Medical Institute (D.B. and T.M.C.). We also acknowledge computing resources provided by the Hyak supercomputer system funded by the STF at the University of Washington, and Rosetta@Home volunteers in ab initio structure prediction calculations, and thank staff at Northeastern Collaborative Access Team at Advanced Photon Source for the beamline, supported by NIH grants P30GM124165 and S10OD021527, and DOE contract DE-AC02-06CH11357. We acknowledge the NMR Core Facility resources at Renssealaer Polytechnic Institute and thank S. McCallum for providing valuable support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Baker.

Ethics declarations

Competing interests

G.T.M. is a founder of Nexomics Biosciences. The other authors declare no competing interests.

Additional information

Peer review information Nature thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Comparison of the hallucinated designs to proteins with known structure and of similar length (100 +/− 10 aa) from the trRosetta training set.

a,b) Multidimensional scaling plots of the sequence (a) and structure (b) spaces covered by the 2,000 hallucinated proteins (blue dots) along with 1,110 proteins of similar length from the trRosetta training set (red dots). These scatter plots show that subspaces spanned by hallucinated proteins and natural proteins of similar size (100 +/− 10 aa) are quite distinct; the network is not simply recapitulating native proteins of the same length. Soluble and structurally characterized hallucinations are marked by black and magenta dots respectively. c,d) Distributions of pairwise structure (c) and sequence (d) similarities for hallucinated and natural proteins. The hallucinated proteins are more similar to each other (blue lines) than they are to natural proteins (grey lines). e) Sequence comparisons (gappless threading) of fragments of various size (15,20,...,60 aa) from the hallucinated designs (blue) and natural 100 (+/− 10) aa-long proteins (red) to other proteins from the trRosetta training set. There is no apparent tendency for the trRosetta-based design procedure to “copy over” sequence fragments from the proteins in the training set into the hallucinated designs. f,g) Secondary structure content of the hallucinated designs and natural 100 aa-long proteins from the training set. Hallucinations are more ideal than natural proteins in having less loops but longer secondary structure elements.

Extended Data Fig. 2 Additional data on the experimentally characterized all-α and mixed α–β network-hallucinated proteins.

a,e) Dendrograms showing representative hallucinated protein designs clustered by TM-score; thermostable designs with CD spectra consistent with the target structure are labelled by their IDs. b,f) Three-dimensional models of the hallucinated designs. c,g) Predicted distance maps at the end of the hallucination trajectory. d,h) Temperature dependence of CD signal at 220 nm in the 25-95 °C temperature range.

Extended Data Fig. 3 Additional examples of thermostable hallucinations with CD spectra consistent with the target structure.

a,g) 3D structure models of the hallucinated designs. b,h) Predicted distance maps at the end of the hallucination trajectory. c,i) ab initio folding funnels from Rosetta. d,j) Size-exclusion chromatography traces. e,k) Circular dichroism spectra at 25 °C (blue) and 95 °C (red). f,l) Temperature dependence of Circular Dichroism signal at 220 nm in the 25 to 95 °C temperature range.

Extended Data Fig. 4 Comparison of 0515 NMR structure to hallucinated model.

a) Superposition of hallucinated model (blue) and NMR medoid structure (gray) of 0515 reveal 1.82 Å backbone r.m.s.d. over 100 residues b) Hallucinated model of 0515 colored by distance between Cɑ-Cɑ pairs between model and NMR medoid structure after structural superposition and b) corresponding plot of per-residue Cɑ-Cɑ distance difference between model and NMR medoid structure.

Extended Data Fig.5 Structural analysis of 0217 and comparison to hallucinated model.

a) Representative electron density (2Fo-Fc, 1𝞂) over entire asymmetric unit (left) and core packing regions (right) of hallucination 0217. b) Both chains of the crystal structure colored by B-factor. c) Structural superposition of chains observed in the asymmetric unit reveal a 2.8 Å backbone r.m.s.d.  over 91 residues. d) Crystal lattice contacts for chain A (green) and chain B (yellow) may explain structural differences observed between chains. Circled regions highlight where chain A is an ordered helix-loop-helix and chain B is disordered. e) Hallucinated model of 0217 colored by distance between Cɑ-Cɑ pairs between model and crystal structure after structural superposition and corresponding plot of per-residue Cɑ-Cɑ distance difference between model and crystal structure. f) Structural superposition of the hallucinated model and chain B of the 0217 crystal structure (left), 0217 model colored by Cɑ-Cɑ distance between hallucination and crystal structure (middle), and per residue Cɑ-Cɑ distance between hallucination and crystal structure per residue (right).

Extended Data Fig. 6 Structural analysis, NMR characterization, and SEC analysis of hallucinated sequence 0417.

a) Hallucinated model with surface hydrophobics shown as sticks and b) [1H-15N]-SOFAST-HMQC spectra of hallucinated sequence 0417 before (red) and after (blue) buffer optimization. Spectrum before optimization (red) was obtained using a protein concentration of ~0.3 mM at 298K in 20 mM Tris-HCl, pH 7.2, 100 mM NaCl and spectrum acquired after optimization (blue) was obtained using a protein concentration of ~0.3 mM, at temperature of 323 K in a buffer of 20 mM sodium phosphate at pH 6.5, 50 mM NaCl, and 20% glycerol. The NMR data are consistent with a folded structure containing a mix of alpha and beta secondary structure. Even under optimized conditions, there is still evidence of exchange broadening (e.g. Trp side chain NεHs are weak), resonances that appear only at high temperature and high glycerol concentrations, and some resonances that are doubled; all indications of transient self-association. c) Size-exclusion chromatography trace of 0417 displays a small additional peak corresponding to a larger oligomeric species which corroborates the NMR analysis.

Extended Data Fig. 7 Structural analysis of 0738_mod and comparison to hallucinated model 0738.

a) Representative electron density (2Fo-Fc, 1𝞂) over entire asymmetric unit (left) and core packing regions (right) of hallucination 0738_mod. b) Both chains of the crystal structure colored by B-factor. c) Structural superposition of the hallucinated model and chain A of the 0738_mod crystal structure (left), 0738_mod model colored by Cɑ-Cɑ distance between hallucination and crystal structure (middle), and per residue Cɑ-Cɑ distance between hallucination and crystal structure per residue (right). d) Hallucinated model of 0738_mod colored by distance between Cɑ-Cɑ pairs between model and crystal structure after structural superposition and corresponding plot of per-residue Cɑ-Cɑ distance difference between model and crystal structure.

Extended Data Fig. 8 NMR and biochemical analysis of hallucinated sequences 0515, 0738_mod, and 0217.

a) 1H-15N heteronuclear NOE (hetNOE) histograms for 0515 (82 non-overlapped peaks), 0738_mod (144 peaks), and 0217 (47 peaks), together with their average values. 1H-15N steady state heteronuclear NOEs were obtained from the ratio of cross peak intensities (Isaturated/Iequilibrium) with (Isaturated) and without (Iequilibrium) 3 s of proton saturation during the presat delay and recorded in an interleaved manner, split in TopSpin, processed identically using NMRPipe, and peak picked in SPARKY to obtain peak intensities. b) 1H-15N HSQC spectra of corresponding proteins collected at 800 MHz at 298 K in 25 mM HEPES, pH 7.4, 50 mM NaCl buffer and prepared in a 5-mm Shigemi NMR tubes for data collection with addition of 5% D2O (v/v). These 15N-enriched protein samples were prepared at concentrations of 0.4 mM, 0.15 mM, and 0.2 mM, respectively. c) SEC data demonstrating monodispersity of these proteins in solution, with predominantly monomer for 0515 and 0738_mod and predominantly dimer for 0217. SDS-PAGE data (not shown) show that each is > 95% homogeneous, which together with MALDI-TOF mass spectrometry indicate that the spectral heterogeneity observed is not due to chemical heterogeneity. d) Ribbon diagrams of the corresponding monomeric or dimeric protein structures. These results show that the three designs have characteristic dynamics in solution. The average hetNOE for the homodimer 0217 is lower than for 0515 and 0738_mod, and it has fewer peaks than expected due to exchange broadening. Although 0738_mod has a similar hetNOE distribution as monomeric 0515, it has more than double the expected number of peaks, indicating at least two folded conformations (for all or parts of the protein) in solution that are in slow conformational exchange on the NMR time-scale. This was further validated by the appearance of new peaks in spectra at lower temperature (288 K), and different peaks at higher temperatures (308 and 318 K), and confirmed by detection of 15N ZZ-exchange cross peaks at 318 K with 600 and 750 ms mixing times (Bruker pulse sequence hsqcetexf3gp, data not shown)60.

Extended Data Table 1 NMR refinement statistics and quality scores for 0515
Extended Data Table 2 Crystallographic data collection and refinement statistics

Supplementary information

Supplementary Information

This file contains a Supplementary Discussion, Supplementary Table 1 and Supplementary Figs 1–7.

Reporting Summary

Peer Review File

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Anishchenko, I., Pellock, S.J., Chidyausiku, T.M. et al. De novo protein design by deep network hallucination. Nature 600, 547–552 (2021). https://doi.org/10.1038/s41586-021-04184-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41586-021-04184-w

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing