2017 IEEE International Conference on Big Data (Big Data)
Download PDF

Abstract

Recently, in-memory big data processing frameworks have emerged, such as Apache Spark and Ignite, to accelerate workloads requiring frequent data reuse. With effective in-memory caching these frameworks eliminate most of I/O operations, which would otherwise be necessary for communication between producer and consumer tasks. However, this performance benefit is nullified if the memory footprint exceeds available memory size, due to excessive spill and garbage collection (GC) operations. To fit the working set in memory, two system parameters play an important role: number of data partitions (Npartitions) specifying task granularity, and number of tasks per each executor (Nthreads) specifying the degree of parallelism in execution. Existing approaches to optimizing these parameters either do not take into account workload characteristics, or optimize only one of the parameters in isolation, thus yielding suboptimal performance. This paper introduces WASP, a workload-aware task scheduler and partitioner, which jointly optimizes both parameters at runtime. To find an optimal setting, WASP first analyzes the DAG structure of a given workload, and uses an analytical model to predict optimal settings of Npartitions and Nthreads for all stages based on their computation types. Taking this as input, the WASP scheduler employs a hill climbing algorithm to find an optimal Nthreads for each stage, thus maximizing concurrency while minimizing data spills and GCs. We prototype WASP on Spark and evaluate it using six workloads on three different parallel platforms. WASP improves performance by up to 3.22× and reduces the cluster operating cost on cloud by up to 40%, over the baseline following Spark Tuning Guidelines and provides robust performance for both shuffle-heavy and shuffle-light workloads.
Like what you’re reading?
Already a member?Sign In
Member Price
$11
Non-Member Price
$21
Add to CartSign In
Get this article FREE with a new membership!

Related Articles