Using wall street secrets to reduce the cost of cloud infrastructure

“Risk-aware” traffic engineering could help service providers such as Microsoft, Amazon, and Google better utilize network infrastructure.


To stay aware of the consistent growth in demand, cloud providers spend a huge number of dollars expanding the limit of their wide-area spines and dedicate noteworthy exertion to proficiently using WAN limit. A key test is striking a good balance between network usage and availability, as these are characteristically at chances; a profoundly used system probably won’t almost certainly withstand surprising traffic shifts coming about because of connection/hub failures.

In a new study, MIT scientists in collaboration with Microsoft advocate a novel approach to this challenge that draws inspiration from financial risk theory: leverage empirical data to generate a probabilistic model of network failures and maximize bandwidth allocation to network users subject to an operator-specified availability target.

Scientists have developed a ‘risk-aware’ mathematical that can potentially enhance the performance of cloud-computing networks across the globe.

Their model considers failure probabilities of links between data centers around the world — much the same as anticipating the unpredictability of stocks. At that point, it runs an optimization engine to distribute traffic through ideal ways to minimize loss, while boosting in general utilization of the network.

Scientists dubbed this model as TeaVar and it is expected to help major cloud-service providers — such as Microsoft, Amazon, and Google — better utilize their infrastructure. Additionally, it guarantees that for a target percentage of time — say, 99.9 percent — the network can handle all data traffic, so there is no need to keep any links idle. During that 0.01 percent of the time, the model also keeps the data dropped as low as possible.

Co-author Manya Ghobadi, the TIBCO Career Development Assistant Professor in the MIT Department of Electrical Engineering and Computer Science said, “Having greater utilized infrastructure isn’t just good for cloud services — it’s also better for the world. Companies don’t have to purchase as much infrastructure to sell services to customers. Plus, being able to efficiently utilize data center resources can save enormous amounts of energy consumption by the cloud infrastructure. So, there are benefits both for the users and the environment at the same time.”

The researchers tested the model against other TE software on simulated traffic sent through networks from Google, IBM, ATT, and others that spread across the world. The researchers created various failure scenarios based on their probability of occurrence. Then, they sent simulated and real-world data demands through the network and cued their models to start allocating bandwidth.

The model kept reliable links working to near full capacity while steering data clear of riskier links. Over traditional approaches, their model ran three times as much data through the network, while still ensuring all data got to its destination. The code is freely available on GitHub.

A paper describing the model and results will be presented at the ACM SIGCOMM conference this week.


See stories of the future in your inbox each morning.