Abstract
HDFS is a popular distributed file system, widely used in many commercial fields, which can store TB, even PB level data. Fast data reading and writing is the most important problem for HDFS. However, with the volume of data increasing sharply, the traditional HDFS, built on the PC cluster platform, is no longer suitable for fast data reading and writing. GPU is a highly parallel computing unit. Its power of calculation, reading and writing is hundreds of times as fast as CPU. Hence, this paper proposes an improved distributed file system, which uses GPU as an accelerator. Firstly, the improved HDFS uses GPU instead of CPU response data reading and writing requests. Secondly, the improved HDFS uses GPU’s cache as a buffer memory for data reading and writing. These two strategies significantly improve the performance of the distributed file system. The experimental results have proved the effectiveness of the improved algorithm.