2017 IEEE International Conference on Multimedia and Expo (ICME)
Download PDF

Abstract

How to track an arbitrary object in video is one of the main challenges in computer vision, and it has been studied for decades. Based on hand-crafted features, traditional trackers show poor discriminability for complex changes of object appearance. Recently, some trackers based on convolutional neural network (CNN) have shown some promising results by exploiting the rich convolutional features. In this paper, we propose a novel DenseTracker based on a mutli-task dense convolutional network. To learn a more compact and discriminative representation, we adopt a dense block structure to ensemble features from different layers. Then a multitask loss is designed to accurately predict the object position and scale by joint learning of box regression and pair-wise similarity. Furtherly, the DenseTracker is trained end-to-end on large-scale datasets including ImageNet Video (VID) and ALOV300++. The DenseTracker runs in 25 fps on GPU and achieves the state-of-the-art performance on two public benchmarks of OTB50 and VOT2016.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles