Optimized GPU implementation for dynamic programming in image data processing

Jing Ke; Tomasz Bednarz; Arcot Sowmya

doi:10.1109/PCCC.2016.7820646

Abstract

It is a trend now that computing power through parallelism is provided by multi-core systems or heterogeneous architectures for High Performance Computing (HPC) and scientific computing. Although many algorithms have been proposed and implemented using sequential computing, alternative parallel solutions provide more suitable and high performance solutions to the same problems. In this paper, three parallelization strategies are proposed and implemented for a dynamic programming based cloud smoothing application, using both shared memory and non-shared memory approaches. The experiments are performed on NVIDIA GeForce GT750m and Tesla K20m, two GPU accelerators of Kepler architecture. Detailed performance analysis is presented on partition granularity at block and thread levels, memory access efficiency and computational complexity. The evaluations described show high approximation of results with high efficiency in the parallel implementations, and these strategies can be adopted in similar data analysis and processing applications.

Optimized GPU implementation for dynamic programming in image data processing

Authors

Abstract

Related Articles