Abstract
The previous major efforts on big data benchmark either propose a large amount of workloads (e.g. a recent comprehensive big data benchmark suite—BigDataBench [4]), which impose cognitive difficulty on workload characterization and serious benchmarking cost; or only select a few workloads according to so-called popularity[1], which lead to partial or biased observations.