Abstract
Disk I/O is a major bottleneck limiting the performance and scalability of data intensive applications. A common way to address disk I/O bottlenecks is using parallel storage systems and utilizing concurrent operation of independent storage components; however, achieving a consistently high parallel I/O performance is challenging due to static configurations. Modern parallel storage systems, especially in the cloud, enterprise data centers, and scientific clusters are commonly shared by various applications generating dynamic and coexisting data access patterns. Nonetheless, these systems generally utilize one-layout-fits-all data placement strategy frequently resulting in suboptimal I/O parallelism. Guided by association rule mining, graph coloring, bin packing, and network flow techniques, this paper proposes a general framework for adaptive parallel storage systems, with the goal of continuously providing a high-degree of I/O parallelism. Evaluation results indicate that the proposed framework is highly successful in adjusting to skewed parallel access patterns for both hard disk drive (HDD) based traditional storage arrays and solid-state drive (SSD) based all-flash arrays. In addition to the storage arrays, the proposed framework is sufficiently generic and can be tailored to various other parallel storage scenarios including but not limited to key-value stores, parallel/distributed file systems, and internal parallelism of SSDs.