2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
Download PDF

Abstract

On today's multi-socket systems, the parallel performance is hampered by remote cache and memory access. There is much prior work on thread and data placement to curb remote access. However, the number of possible placements is large, and heuristic-based techniques only examines a fraction of the entire solution space. This paper presents a compositional model to analyze the effect of thread and data placement choices. The model includes an analysis for cache coherence and (remote) memory access. It has the property of being compositional, meaning the performances of all the placements can be composed from the results of one profiling pass. Based on this model, this paper further introduces a prototype tool called Tapas to optimize parallel programs for non-uniform memory access (NUMA) platforms.
Like what you’re reading?
Already a member?Sign In
Member Price
$11
Non-Member Price
$21
Add to CartSign In
Get this article FREE with a new membership!

Related Articles