Abstract
As big data analytics frameworks are developing towards larger degrees of parallelism and shorter task durations to provide lower latency, millions of scheduling decisions per second pose a great challenge to centralized schedulers. Therefore, increasing efforts are devoted to the study of distributed scheduling approaches to avoid the throughput limitation of centralized designs. Among these approaches, Sparrow is a leading design. However, due to Sparrow's sample-based techniques, some tasks in subsequent jobs may be scheduled earlier than those in the head-of-line job, which results in scheduling disorder and inevitably causes poor response times and unfairness. To address these problems, this paper proposes a simple algorithm called probe sharing: jobs that arrive at the same Sparrow scheduler can share their probes to ensure that all tasks in the head-of-line job can be scheduled earlier than subsequent jobs. We have performed theoretical analysis and proved that probe sharing makes a good improvement on Sparrow. We have implemented probe sharing in Sparrow and shown that probe sharing reduces scheduling delays by 2.2× and provides 100% fairness. Trace-driven simulations have been also used to evaluate probe sharing when scaling to large clusters. In addition, the simplicity of probe sharing makes it applicable to many schedulers that use Sparrow's techniques (e.g., Hopper, Tarcil and Eagle).