Amy Ousterhout
UC Berkeley
amy.ousterhout@berkeley.edu
Bio
Amy Ousterhout is a postdoctoral researcher in the NetSys Lab at UC Berkeley. She received her PhD from MIT in 2019 advised by Hari Balakrishnan and Adam Belay. Before that she received a bachelor of science in engineering in computer science from Princeton University. During her PhD she was supported by an NSF Fellowship a Hertz Foundation Fellowship and an Irwin Mark Jacobs and Joan Klein Jacobs Presidential Fellowship. Her research focuses on enabling high performance for networked applications in datacenters.
Efficient Scheduling Policies for Microsecond-Scale Tasks
Efficient Scheduling Policies for Microsecond-Scale Tasks
As Moore’s Law slows datacenter operators are increasingly concerned with using their limited CPU resources efficiently. To do so they run multiple applications on the same server so that batch applications can use spare CPU cycles not used by latency-critical applications as their load varies over time. At the same time it is crucial to provide low latency for latency-critical applications. Operators typically face a tradeoff where granting more cores to latency-sensitive applications improves their latency but reduces overall CPU efficiency. We have found that in practice many existing systems do not navigate this tradeoff space effectively sacrificing significant latency CPU efficiency or both. This is particularly true when latency-critical applications are composed of small tasks (e.g. 1 us requests to a key-value store). This is despite significant effort in optimizing the implementations (e.g. threading libraries and network stacks) of these systems. In this work we instead consider the policy choices made by different systems and ask the question: which policies for allocating cores across applications and for load balancing tasks across cores yield the best combination of latency and CPU efficiency for small tasks? We use simulations to tease apart the impact of these two different policy choices on overall system performance and draw a few surprising conclusions. For example in terms of load balancing we conclude that work stealing is Pareto dominant over popular alternatives such as power-of-2 choices. In terms of core reallocations we find that for such short tasks reallocating cores is actually not beneficial and allocating a static number of cores would yield a Pareto improvement. Finally we implement the policies that we find to be optimal by extending an existing system (Caladan) and demonstrate that we are able to improve CPU efficiency without sacrificing latency.