Abstract
Power efficiency - performance relative to power - is one of the most important concerns when designing RADAR processing systems. This paper analyzes power and performance trade-offs for a typical Space Time Adaptive Processing (STAP) application. We study STAP implementations for CUDA and OpenMP on two architectures, Intel Haswell Core I7-4770TE and NVIDIA Kayla with a GK208 GPU. We analyze the power and performance of STAP's computationally intensive kernels across the two hardware testbeds. We discuss an efficient parallel implementation for the Haswell CPU architecture. We also show the impact and trade-offs of GPU optimization techniques. The GPU architecture is able to process large size data sets without increase in power requirement. The use of shared memory has a significant impact on the power requirement for the GPU. Finally, we show that a balance between the use of shared memory and main memory access leads to an improved performance in a typical STAP application.