New machine-learning technique promises 40% speed boost in real-world datasets

It predicts the future to optimize how information gets stored.


As the amount of digital data continues to grow exponentially, efficient storage and management have become critical for businesses and organizations. However, traditional storage methods often fall short in terms of scalability and cost-effectiveness. Fortunately, a new machine learning method is revolutionizing data storage and management.

Machine learning is a subset of artificial intelligence that allows computers to learn and make predictions without explicit programming. This technology has already been widely adopted in various industries, including healthcare, finance, and marketing.

Now, it is making its mark in the world of data storage. With its ability to adapt to changing data needs and optimize storage resources, machine learning is set to become the future of data storage.

Researchers from Carnegie Mellon University and Williams College have introduced a groundbreaking machine-learning technique that helps computer systems predict future data patterns and optimize the way information is stored.

The predictions were found to give up to a 40% speed boost on real-world data sets. This new method could lead to faster databases and more efficient data centers.

Researchers discussed a common data structure called a list labeling array, which stores information in sorted order inside a computer’s memory. Keeping data sorted helps computers find it quickly, just like how alphabetizing a long list of names makes it easy to locate someone. However, maintaining the sorted order can be challenging as new data comes in.

Until now, computer systems could only prepare for the worst-case scenario by constantly moving data around to make room for new items, which can be slow and computationally expensive.

But, the new machine learning method gives these data structures the power to predict. The computer analyzes patterns in recent data to forecast what may come next.

“This technique allows data systems to peek into the future and optimize themselves on the fly,” said Aidin Niaparasat, study coauthor and Ph.D. student at the Tepper School of Business at Carnegie Mellon University. “We demonstrate a clear tradeoff – the better the predictions, the faster the performance. Even when predictions are wildly off, the speed is still faster than normal.”

According to the researchers, the software is available along with the supplementary material published alongside the paper. They have also shared their code for others to use.

The researchers believe that this work will pave the way for the use of machine learning predictions in computer system design. They state that structures like search trees, hash tables, and graphs could operate more intelligently and efficiently by predicting expected data patterns. The researchers also hope this inspires new ways to design algorithms and data management systems.

“Learned optimizations could lead to faster databases, improved data center efficiency, and smarter operating systems,” said Benjamin Moseley, an associate professor at the Tepper School and study co-author. “We’ve shown predictions can beat worst-case limits. But this is just the beginning – there is enormous untapped potential in this area.”

Journal reference:

  1. Samuel McCauley, Benjamin Moseley, Aidin Niaparast, Shikha Singh. Online List Labeling with Predictions. arXiv, 2023; DOI: 10.48550/arxiv.2305.10536