Abstract
We propose to use action, scene and object concepts as semantic attributes for classification of video events in InTheWild content, such as YouTube videos. We model events using a variety of complementary semantic attribute features developed in a semantic concept space. Our contribution is to systematically demonstrate the advantages of this concept-based event representation (CBER) in applications of video event classification and understanding. Specifically, CBER has better generalization capability, which enables to recognize events with a few training examples. In addition, CBER makes it possible to recognize a novel event without training examples (i.e., zero-shot learning). We further show our proposed enhanced event model can further improve the zero-shot learning. Furthermore, CBER provides a straightforward way for event recounting/understanding. We use the TRECVID Multimedia Event Detection (MED11) open source event definitions and datasets as our test bed and show results on over 1400 hours of videos.