2016 IEEE Winter Conference on Applications of Computer Vision (WACV)
Download PDF

Abstract

Human activity recognition from full video sequence has been extensively studied. Recently, there has been increasing interest in early recognition or recognition from partial observation. However, from a small fraction of the video, it might be demanding if not even impossible to make a fine grained prediction of the activity that is taking place. Therefore, we propose the first method to predict ongoing activities over a hierarchical label space. We approach this task as a sequence prediction problem in a recurrent neural network where we predict over a hierarchical label space of activities. Our model learns to realize accuracy-specificity trade-offs over time by starting with coarse labels and proceeding to more fine grained recognition as more evidence becomes available in order to meet a prescribed target accuracy. In order to study this task we have collected a large video dataset of complex activities with long duration. The activities are annotated in a hierarchical label space from coarse to fine. By directly training a sequence predictor over the hierarchical label space, our method outperforms several baselines including prior work on accuracy specificity tradeoffs originally developed for object recognition.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles