MIT AI system helps biologists build ML models

BioAutoMATED: A new AI tool to automate biology research.


Scientists at MIT have developed a system capable of generating artificial intelligence (AI) models tailored explicitly for biological research. This groundbreaking system provides researchers a powerful tool to advance their understanding of biological processes and phenomena. By harnessing the capabilities of AI, this technology has the potential to revolutionize the field of biology and accelerate discoveries in various research areas.

Researchers at MIT, led by Jim Collins, have developed a solution called BioAutoMATED that enables the building of machine-learning models without needing machine-learning expertise. The goal was to address the challenges science and engineering labs face in recruiting machine-learning researchers and the time and effort required for model selection, dataset formatting, and fine-tuning.

The open-access paper detailing BioAutoMATED was published in Cell Systems, offering a promising approach to streamline and democratize machine-learning model development in biology.

Jacqueline Valeri, a fifth-year Ph.D. student of biological engineering in Collins’s lab who is the first co-author of the paper, said, “In your machine-learning project, how much time will you typically spend on data preparation and transformation?” asks a 2022 Google course on the Foundations of Machine Learning (ML). The choices offered are either “Less than half the project time” or “More than half the project time.” If you guessed the latter, you would be correct; Google states that it takes over 80 percent of project time to format the data, and that’s not even considering the time needed to frame the problem in machine-learning terms. It would take many weeks of effort to figure out the appropriate model for our dataset, and this is a prohibitive step for many folks that want to use machine learning or biology.”

BioAutoMATED is an automated machine-learning system that revolutionizes model selection and development for biological datasets, drastically reducing the time and effort required. This pioneering system, described in an open-access paper published in Cell Systems, addresses the unique challenges of working with biological sequences such as DNA, RNA, proteins, and glycans.

Unlike existing automated machine learning (AutoML) tools primarily designed for text, BioAutoMATED harnesses the standardized nature of biological sequences and expands the search space by incorporating multiple tools under one umbrella tool. This breakthrough offers immense potential for advancing machine learning in biology. It enables researchers to accelerate their investigations with greater efficiency.

BioAutoMATED offers a range of supervised machine learning (ML) models, including binary classification, multi-class classification, and regression models. It also aids in determining the necessary amount of data for appropriate model training. The tool explores models suited for smaller, sparser biological datasets and complex neural networks, providing an advantage to research groups with new and potentially challenging data for ML.

By reducing the need for extensive digital infrastructure and ML expertise, BioAutoMATED aims to lower barriers and costs associated with conducting innovative experiments at the intersection of biology and ML. Researchers can use the tool to run initial experiments and assess the value of engaging a machine-learning expert to build alternative models for further exploration.

The open-source code of BioAutoMATED is readily available and easy to use, encouraging researchers to leverage and enhance it collaboratively. The aim is to establish it as a tool accessible to the entire biological research community, merging the rigor of biological practice with the rapid advancements of AI and ML. The senior author, Jim Collins, and other MIT contributors emphasize the potential of AutoML techniques to bridge these disciplines effectively.

The research received support from various organizations, including grants from Defense Threat Reduction Agency, the Defense Advance Research Projects Agency SD2 program, the Paul G. Allen Frontiers Group, and the Wyss Institute. Other fellowships, scholarships, and funding sources contributed to this work, part of the Antibiotics-AI Project supported by the Audacious Project and other foundations and donors.

In conclusion, BioAutoMATED is a cutting-edge automated machine-learning tool designed specifically for explaining and designing biological sequences. This innovative tool bridges the gap between biology and machine learning by offering a user-friendly interface and a repertoire of supervised machine learning models tailored for biological data.

With automated data preprocessing, model selection, interpretation, and sequence design capabilities, BioAutoMATED holds great promise in accelerating discoveries and advancing the understanding of biological sequences. Its availability as an open-source implementation and emphasis on user-friendliness pave the way for widespread adoption and collaborative development. By empowering biologists and researchers with the power of AI and ML, BioAutoMATED has the potential to revolutionize biological research and facilitate breakthroughs in various domains, from disease prediction to protein engineering.

Journal Reference:

  1. Jacqueline A. Valer, Luis R. Soenksen et al., BioAutoMATED: An end-to-end automated machine learning tool for explaining and designing biological sequences. Cell Systems. DOI: 10.1016/j.cels.2023.05.00.
Latest Updates