The application of machine learning algorithms in computational protein design has led to many recent achievements, such as the prediction of 3D structure from amino-acid sequences and the inverse design of protein sequences that can fold into a desired 3D structure. However, de novo protein structure generation — meaning, the generation of protein structures with desired properties — remains a difficult task. The enormous protein molecule space, as well as the fact that functional proteins consist only of a small portion of all possible protein molecules, make it difficult to effectively identify the relationships between sequence, structure, and functions (or properties). While deep generative models have facilitated protein structure generation, several challenges still exist, such as the generation of full protein complexities, the conditional sampling with diverse design constraints without the need to retrain the model, and the lack of a reasonable scaling behavior with increased complexities. In a recent work, Gevorg Grigoryan and colleagues developed a deep generative model — based on a modified diffusion model that is commonly used for images — to overcome the aforementioned challenges, making it possible to generate physically reasonable and designable protein structures with various user-defined constraints.
The developed framework — Chroma — contains three key model designs for achieving high-quality generation of proteins. The first is the use of a diffusion model that can learn to reverse a correlated noise process to match the distance statistics of natural proteins. In addition, inspired by a force calculation method from many-body physics, the authors designed a neural-network (NN) architecture that uses random long-range graph connections with connectivity statistics for updating molecular coordinates. It is worth mentioning that such NN design allows the computation to scale semi-quadratically in the number of residues. Finally, Chroma incorporates a low-temperature sampling method with an improved quality of sampled backbones, which provides increased flexibility in choosing constraints for protein design. The authors showed that Chroma can be applied to generate large protein molecules by adapting to many external constraints, including symmetries, shape, semantics, and geometries in the Latin alphabet or Arabic numerals. More importantly, experimental validations demonstrated that the designed proteins have both structural accuracy and favorable properties. Overall, Chroma sheds light on the capabilities of generative protein modeling for effectively programming properties and functions in protein design.
This is a preview of subscription content, access via your institution