FrameDiff: A generative AI to craft new protein structures

Generative AI imagines new protein structures.

Share

The design of novel protein structures remains a challenge in protein engineering for applications across biomedicine and chemistry. It requires substantial domain knowledge and laborious experimental testing.

In a new study, MIT CSAIL researchers focus on generating protein backbones. MIT CSAIL researchers developed “FrameDiff,” a computational tool for developing novel protein structures outside of what nature has generated, to improve our capacity for protein engineering. To manufacture innovative proteins independently of preexisting designs, the machine learning approach generates “frames” that align with the fundamental features of protein structures, enabling previously unimaginable protein configurations.

This new technique offers an answer to addressing human-made problems that evolve much faster than nature’s pace.

MIT CSAIL Ph.D. student Jason Yim, a lead author on a new paper about the work, said, “The aim, concerning this new capacity of generating synthetic protein structures, opens up a myriad of enhanced capabilities, such as better binders. This means engineering proteins that can attach to other molecules more efficiently and selectively, with widespread implications related to targeted drug delivery and biotechnology, where it could develop better biosensors. It could also have implications for biomedicine and beyond, offering possibilities such as developing more efficient photosynthesis proteins, creating more effective antibodies, and engineering nanoparticles for gene therapy.”

The complex structure of proteins comprises many atoms connected by chemical bonds. The “backbone,” which resembles the protein’s spine, refers to the most crucial atoms that govern the protein’s three-dimensional form. Every triplet of atoms along the backbone shares an identical set of bonds and atom kinds. This pattern was discovered by researchers, who can use it to develop machine learning algorithms employing concepts from differential geometry and probability. Here’s where the frames are useful: These triplets can mathematically represent rigid bodies known as “frames” (common in physics) with 3D rotation and position.

These frames give each triplet the information necessary to understand its physical environment. Next, a machine learning system must determine how to move each frame to build a protein backbone. The algorithm will hopefully generalize and be able to generate new proteins that have never been seen in nature by learning how to build existing proteins.

By adding noise, which randomly shifts all the frames and blurs the original protein’s appearance, we can train a model to build proteins via “diffusion.” The algorithm must move and rotate each frame until it resembles the original protein. Although straightforward, stochastic calculus on Riemannian manifolds approaches are necessary for the development of diffusion on frames. For learning probability distributions that nontrivially relate the translations and rotations components of each frame, the researchers created “SE(3) diffusion” on the theoretical side.

This new tool brought protein designers closer to solving crucial problems in biotechnology, including developing particular protein binders for accelerated vaccine design, the engineering of symmetric proteins for gene delivery, and robust motif scaffolding for precise enzyme design.

Harvard University computational biologist Sergey Ovchinnikov said, “Discarding a pretrained structure prediction model [in FrameDiff] opens up possibilities for rapidly generating structures extending to large lengths. Even though it’s still preliminary work, it’s an encouraging stride in the right direction. As such, the vision of protein design, playing a pivotal role in addressing humanity’s most pressing challenges, seems increasingly within reach, thanks to the pioneering work of this MIT research team.”

Journal Reference:

  1. Jason Yim, Brian Trippe, Valentin De Bortoli et al. SE(3) diffusion model with application to protein backbone generation. Paper

Newsletter

See stories of the future in your inbox each morning.

Trending