Determining protein structures using Machine Learning

The new technique reveals many possible conformations that a protein may take.


Protein structure can be determined using cryo-EM. cryo-EM produces several images of protein samples frozen in a thin layer of ice. Using computer algorithms, these images are then arranged together to form a 3D representation of the protein in a reconstructed process.

Now, MIT scientists have developed a machine-learning algorithm that identifies multiple possible structures that a protein can take. Their AI-based software reconstructs multiple structures and motions of the imaged protein.

The traditional representation of protein structure was impractical for modeling multiple systems. Instead of using conventional protein representation structure as electron-scattering intensities on a 3D lattice, scientists introduced a new neural network architecture that can efficiently generate the full ensemble structures in a single model.

Ellen Zhong, an MIT graduate student and the lead author of the paper, said, “With the broad representation power of neural networks, we can extract structural information from noisy images and visualize detailed movements of macromolecular machines.”

Using their method, scientists were able to identify protein motions from imaging datasets where only a single static 3D structure was originally identified. They likewise envisioned large-scale flexible motions of the spliceosome — a protein complex that coordinates the splicing of the protein-coding sequences of transcribed RNA.

The efficiency of their new approach is determined by analyzing structures that form during assembling ribosomes. To study this process in detail, a method was installed at different points. Scientists then captured electron microscope images of the resulting structures.

At specific points, blocking assembly brought about the aggregation of just a single structure. It means that there is only a single path for that step to occur. Nonetheless, different hindering points resulted in many different structures, proposing that the gathering could occur in various ways.

Because some of these experiments generated so many different protein structures, traditional cryo-EM reconstruction tools did not work well to determine what those structures were.

Joseph Davis, the Whitehead Career Development Assistant Professor in MIT’s Department of Biology, said, “In general, it’s an extremely challenging problem to try to figure out how many states you have when you have a mixture of particles.”

In this new study, scientists demonstrated the power of the technique by identifying a new ribosomal state that hadn’t been seen before. Previous studies had suggested that as a ribosome is assembled, large structural elements, which are akin to the foundation for a building, form first. Only after this foundation is formed are the “active sites” of the ribosome, which read messenger RNA and synthesize proteins, added to the structure.

In the new investigation, the analysts found that in a little subset of ribosomes, around 1 percent, a normally added construction toward the end shows up before assembly of the foundation. To represent that, Davis theorizes that it very well may be costly for cells to guarantee that each and every ribosome is assembled in the correct order.

Davis said, “The cells are likely evolved to find a balance between what they can tolerate, which is maybe a small percentage of these types of potentially deleterious structures, and what it would cost to remove them from the assembly pathway completely.”

The specialists are presently utilizing this method to examine the Covid spike protein. The receptor-binding domain (RBD) of the spike protein has three subunits, each of which can point either up or down.

Davis says“For me, watching the pandemic unfold over the past year has emphasized how important front-line antiviral drugs will be in battling similar viruses, which are likely to emerge in the future. As we start to think about how one might develop small molecule compounds to force all of the RBDs into the ‘down’ state so that they can’t interact with human cells, understanding exactly what the ‘up’ state looks like and how much conformational flexibility there is will be informative for drug design. We hope our new technique can reveal these sorts of structural details.”

Journal Reference:
  1. Zhong, E.D., Bepler, T., Berger, B. et al. CryoDRGN: reconstruction of heterogeneous cryo-EM structures using neural networks. Nat Methods (2021). DOI: 10.1038/s41592-020-01049-4


See stories of the future in your inbox each morning.