Abstract
The choice of parallel programming models and languages is a major factor in program performance and programmer productivity in HPC. However, evaluation of their relative merits is usually done based on conventional wisdom and subjective beliefs. We present a quantitative approach to evaluate such hypotheses statistically and validate them with empirical data. We apply this approach to compare two languages representing the message passing (MPI) and shared memory programming (UPC) paradigms. We formulate hypothesis tests for comparing the performance and productivity of these two models and evaluate them with data from observational studies of HPC programmers. We present and analyze several results, some of which are statistically significant, that demonstrate the promise of empirical evaluation in HPC development.