Memory requirements of hadoop, spark, and MPI based big data applications on commodity server class architectures

Hosein Mohammadi Makrani; Houman Homayoun

doi:10.1109/IISWC.2017.8167763

Abstract

Emerging big data frameworks requires computational resources and memory subsystems that can naturally scale to manage massive amounts of diverse data. Given the large size and heterogeneity of the data, it is currently unclear whether big data frameworks such as Hadoop, Spark, and MPI will require high performance and large capacity memory to cope with this change and exactly what role main memory subsystems will play; particularly in terms of energy efficiency. The primary purpose of this study is to answer these questions through empirical analysis of different memory configurations available on commodity hardware and to assess the impact of these configurations on the performance and power of these well-established frameworks. Our results reveal that while for Hadoop there is no major demand for high-end DRAM, Spark and MPI iterative tasks (e.g. machine learning) are benefiting from a high-end DRAM; in particular high frequency and large numbers of channels. Among the configurable parameters, our results indicate that increasing the number of DRAM channels reduces DRAM power and improves the energy-efficiency across all three frameworks.

Memory requirements of hadoop, spark, and MPI based big data applications on commodity server class architectures

Authors

Abstract

Related Articles