Abstract
As the number of cores per processor grows, there is a strong incentive to develop parallel workloads to take advantage of the hardware parallelism. In comparison to single-threaded applications, parallel workloads are more complex to characterize due to thread interactions and resource stalls. This paper presents an accurate and scalable method for determining the optimal system operating points (i.e., number of threads and DVFS settings) at runtime for parallel workloads under a set of objective functions and constraints that optimize for energy efficiency in multi-core processors. Using an extensive training data set gathered for a wide range of parallel workloads on a commercial multi-core system, we construct multinomial logistic regression (MLR) models that estimate the optimal system settings as a function of workload characteristics. We use L