Predicting biological structures more accurately using machine learning

Scratching the surface in terms of scientific progress to be made.


It is quite challenging to determine the 3D shapes of biological molecules, especially in modern biology and medical discovery. The task often requires spending millions of dollars and even such massive efforts.

Now, scientists at Stanford University have come up with an approach that overcomes this problem by predicting accurate structures computationally. The new approach uses new machine learning techniques.

During testing, it was found that the approach accurately predicts the 3D shapes of drug targets and other important biological molecules, even when only limited data is available. This makes it applicable to the types of molecules whose structures are most difficult to determine experimentally. Along with predicting, the algorithm allows scientists to explain how different molecules work, with applications ranging from fundamental biological research to informed drug design practices.

Stanford University Ph.D. student Raphael Townshend said, “Structural biology, which is the study of the shapes of molecules, has this mantra that structure determines function.”

Stanford University Ph.D. student Stephan Eismann said, “Proteins are molecular machines that perform all sorts of functions. To execute their functions, proteins often bind to other proteins. If you know that a pair of proteins is implicated in disease and you know how they interact in 3D, you can try to target this interaction very specifically with a drug.”

Rather than determining what makes an underlying forecast pretty much precise, the analysts let the algorithm discover these molecular features for itself. They did this since they found that the conventional method of giving such knowledge can influence a calculation for specific elements, keeping it from discovering other educational provisions.

Eismann said, “The problem with these hand-crafted features in an algorithm is that the algorithm becomes biased towards what the person who picks these features thinks is important, and you might miss some information that you would need to do better.”

Townshend said, “The network learned to find fundamental concepts that are key to molecular structure formation, but without explicitly being told to. The exciting aspect is that the algorithm has clearly recovered things that we knew were important, but it has also recovered characteristics that we didn’t know about before.”

Scientists next applied their algorithm to another class of critical biological molecules, RNAs. They tested their algorithm in a series of ‘RNA Puzzles’ from a long-standing competition in their field. In every case, the tool outperformed all the other puzzle participants and did so without being explicitly designed for RNA structures.

Ron Dror, associate professor of computer science, said, “Most of the dramatic recent advances in machine learning have required a tremendous amount of data for training. The fact that this method succeeds given very little training data suggests that related methods could address unsolved problems in many fields where data is scarce.”

Townshend said, “Once you have this fundamental technology, then you’re increasing your level of understanding another step and can start asking the next set of questions. For example, you can start designing new molecules and medicines with this kind of information, which is an area that people are very excited about.”

Journal References:
  1. Stephan Eismann, Raphael J.L. Townshend et al. Hierarchical, rotation-equivariant neural networks to select structural models of protein complexes. DOI: 10.1002/prot.26033
  2. Raphael J. L. Townshend et al. Geometric deep learning of RNA structure. DOI: 10.1126/science.abe5650


See stories of the future in your inbox each morning.