Field-Programmable Custom Computing Machines, Annual IEEE Symposium on
Download PDF

Abstract

The device-level size and complexity of reconfigurable architectures makes fault tolerance an important concern in system design. In this paper, we introduce a fully-automated fault recovery system for networked systems which contain FPGAs. If a fault is detected that can not be addressed locally, fault information is transferred to a reconfiguration server. Following design recompilation to avoid the fault, a new FPGA configuration is returned to the remote system and computation is reinitiated. To illustrate the benefit of this approach, we have implemented a complete fault recovery system which requires no manual intervention. An important part of the system is a timing-driven incremental router for Xilinx Virtex devices. This router is directly interfaced to Xilinx JBits and uses no CAD tools from the standard Xilinx Alliance tool flow. Our completed system has been applied to three benchmark designs and exhibits complete fault recovery in up to 12 \times less time than the standard incremental Xilinx PAR flow.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles