2007 11th IEEE International Conference on Computer Vision
Download PDF

Abstract

This paper presents a fast, accurate, and novel method for the problem of estimating the number of humans and their positions from background differenced images obtained from a single camera where inter-human occlusion is significant. The problem is challenging firstly because the state space formed by the number, positions, and articulations of people is large. Secondly, in spite of many advances in background maintenance and change detection, background differencing remains a noisy and imprecise process, and its output is far from ideal: holes, fill-ins, irregular boundaries etc. pose additional challenges for our "mid-level" problem of segmenting it to localize humans. We propose a novel example-based algorithm which maps the global shape feature by Fourier descriptors to various configurations of humans directly. We use locally weighted averaging to interpolate for the best possible candidate configuration. The inherent ambiguity resulting from the lack of depth and layer information in the background difference images is mitigated by the use of dynamic programming, which finds the trajectory in state space that best explains the evolution of the projected shapes. The key components of our solution are simple and fast. We demonstrate the accuracy and speed of our approach on real image sequences.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles