Simulation is widely used in system design for evaluating different design options since testing ideas in the real world can be both costly and risky. However, it’s impossible to capture every detail of a complex system in a simulation. Therefore, trace-driven simulation is widely used technique that typically collects a small amount of real data that they replay while simulating the components they want to study.
Current trace-driven simulations assume that the interventions being simulated would not affect the validity of the traces. However, real-world traces are often biased by the choices algorithms make during trace collection.
To overcome this issue, a team of researchers from the Massachusetts Institute of Technology (MIT) has developed a new technique that eliminates this source of bias in trace-driven simulation.
Called CausalSim, the new system could enable unbiased trace-driven simulations. It could help researchers design better algorithms for a variety of applications, including improving video quality on the internet and increasing the performance of data processing systems.
CausalSim relaxes the exogenous trace assumption by explicitly modeling the fact that interventions can affect trace data. According to researchers, the new simulation method correctly predicted which newly designed algorithm would be best for video streaming compared to the existing simulators.
To simulate a new algorithm, CausalSim first estimates the latent factors at every time step of each trace. Then, it uses the estimated latent factors to predict the alternate evolution of the trace, actions, and observed variables of the component of interest, under the same latent conditions that were present when the trace was collected. This two-step process allows CausalSim to remove the bias in the trace data when simulating new algorithms.
Researchers used CausalSim to design an improved bitrate adaptation algorithm. It led them to select a new variant that had a stall rate – the amount of time a user spent rebuffering the video – that was nearly 1.4 times lower than a well-accepted competing algorithm, while achieving the same video quality.
On the other hand, an expert-designed trace-driven simulator predicted the opposite, indicating that this new variant should cause a stall rate that was nearly 1.3 times higher. The research team tested the algorithm on real-world video streaming and confirmed that CausalSim was correct.
“The gains we were getting in the new variant were very close to CausalSim’s prediction, while the expert simulator was way off. This is really exciting because this expert-designed simulator has been used in research for the past decade. If CausalSim can so clearly be better than this, who knows what we can do with it?” says Pouya Hamadanian, co-lead author of the paper.
Extensive evaluation of CausalSim on both real and synthetic datasets, including more than ten months of real data from the Puffer video streaming system, shows it improves simulation accuracy, reducing errors by 53% and 61% on average compared to expert-designed and supervised learning baselines.
In future work, the MIT team wants to apply CausalSim to situations where randomized control trial data are not available. They are also planning to explore how to design and monitor systems to make them more amenable to causal analysis.