2014 IEEE International Conference On Cluster Computing (CLUSTER)
Download PDF

Abstract

Increased complexity of computer architectures, consideration of power constraints, and expected failure rates of hardware components make the design and analysis of energy-efficient fault-tolerance schemes an increasingly challenging and important task. We develop run-time and study FTI, a multilevel checkpoint library, on an IBM Blue Gene/Q. We show that FTI has a low energy footprint and that, consequently optimal checkpoint-interval values with respect to time and energy are similar.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles