Abstract
Visual attention is the cognitive process of directing our gaze on one aspect of the visual field while ignoring others. The mainstream approach to modeling focal visual attention involves identifying saliencies in the image and applying a search process to the salient regions. However, such inference schemes commonly fail to accurately capture perceptual attractors, require massive computational effort and, generally speaking, are not biologically plausible. This paper introduces a novel approach to the problem of visual search by framing it as an adaptive learning process. In particular, we devise an approximate optimal control framework, based on reinforcement learning, for actively searching a visual field. We apply the method to the problem of face detection and demonstrate that the technique is both accurate and scalable. Moreover, the foundations proposed here pave the way for extending the approach to other large-scale visual perception problems.