Abstract
Many image filtering operations provide ample parallelism, but progressive non-linear processing of images is among the hardest to parallelize due to long, sequential, and non-linear data dependency. A typical example of such an operation is error diffusion dithering, exemplified by the Floyd-Steinberg algorithm. In this paper, we present its parallelization on multicore CPUs using a block-based approach and on the GPU using a pixel based approach. We also present a hybrid approach in which the CPU and the GPU operate in parallel during the computation. High Performance Computing has traditionally been associated with high end CPUs and GPUs. Our focus is on everyday computers such as laptops and desktops, where significant compute power is available on the GPU as on the CPU. Our implementation can dither an 8K × 8K image on an off-the-shelf laptop with an Nvidia 8600M GPU in about 400 milliseconds when the sequential implementation on its CPU took about 4 seconds.