2014 IEEE International Conference on Big Data (Big Data)
Download PDF

Abstract

Scientific datasets today are often far too large to fit into a single machine's memory or even a single disk. Partitioning multidimensional arrays across several machines or disks has become increasingly necessary. However, relatively little work has been done for unstructured grids composed of a collection of simplicial cells. Our previous work investigated partitioning unstructured grids at the disk level and its effect on overall system performance. In this paper, we build upon prior work by investigating the effect of an in-core partitioning performed on top of the existing disk level partitioning. The granularity of in-core partitioning has varying effect on the overall system performance. Based on our test results, we propose a formula for choosing an effective partitioning for large unstructured grids to facilitate fast data retrieval. We also examine the performance benefits of declustering unstructured grids across several disks. Given this declustered dataset, we describe and explore a parallel data retrieval method that takes advantage of prior knowledge of a user access pattern. Our test results demonstrate very significant performance gains.
Like what you’re reading?
Already a member?Sign In
Member Price
$11
Non-Member Price
$21
Add to CartSign In
Get this article FREE with a new membership!