Material Science With Artificial Intelligence

A system that could pore through millions of research papers to extract “recipes” for producing materials.

Material Science With Artificial Intelligence
A team of researchers at MIT, the University of Massachusetts at Amherst, and the University of California at Berkeley hope to close the materials-science automation gap, with a new artificial-intelligence system that would pore through research papers to deduce “recipes” for producing particular materials. Image: Chelsea Turner/MIT

MIT scientists have recently developed an artificial intelligence system that would pore through research papers to conclude ‘formulas’ for delivering specific materials. It also promises to close the gap that arises in material science automation.

Elsa Olivetti, the Atlantic Richfield Assistant Professor of Energy Studies in MIT’s Department of Materials Science and Engineering (DMSE) said, “Computational materials scientists have made a lot of progress in the ‘what’ to make— what material to design based on desired properties. But because of that success, the bottleneck has shifted to, ‘Okay, now how do I make it?”

The analysts imagine a database that contains materials formulas extricated from a huge number of papers. The user just needs to enter the name of a target material and any other criteria and the system will pull up the manufacturing recipe.

Understanding the vision, scientists developed a machine learning system that can analyze a research paper, deduce which of its paragraphs contain materials recipes. It then classifies the words in those paragraphs according to their roles within the recipes including names of target materials, numeric quantities, names of pieces of equipment, operating conditions, descriptive adjectives, and the like.

This machine learning system can also detect the extracted data according to their general characteristics.

The system uses a combination of supervised and unsupervised machine-learning techniques. Supervised means that the training data fed to the system is first annotated by humans; the system tries to find correlations between the raw data and the annotations. Unsupervised means that the training data is unannotated and the system instead learns to cluster data together according to structural similarities.

For further improvement of the system, scientists used an algorithm developed at Google called Word2vec. Word2vec works by searching for contexts in which words occur— the words’ syntactic roles within sentences and the other words around them— and groups together words that tend to have similar contexts.

Through this, researchers were able to greatly expand their training set. To test the system’s accuracy, however, they had to rely on the labeled data. In those tests, the system was able to identify with 99 percent accuracy the paragraphs that contained recipes and to label with 86 percent accuracy the words within those paragraphs.

Ram Seshadri, the Fred and Linda R. Wudl Professor of Materials Science said, “This is landmark work. The authors have taken on the difficult and ambitious challenge of capturing, through AI methods, strategies employed for the preparation of new materials. The work demonstrates the power of machine learning, but it would be accurate to say that the eventual judge of success or failure would require convincing practitioners that the utility of such methods can enable them to abandon their more instinctual approaches.”