Abstract
The problem of online face tracking from unconstrained videos is still unresolved. Challenges range from coping with severe online appearance variations to coping with occlusion. We propose RFTD (Robust Face Tracking-by-Detection), a system which combines tracking and detection into a single framework to robustly track a face from unconstrained videos. RFTD is based on the idea that adaptive and stable algorithmic components can complement each other in the task of online tracking. An online Structured Output SVM (SO-SVM) is combined with an offline trained face detector to break the self-learning loop typical in tracking. In turn, the face detector is supervised by a Deformable Part Model (DPM) landmark detector to asses the reliability of the face detection output. Extensive evaluation shows that RFTD delivers consistently good tracking performances across different scenarios, i.e., high mean success rate and lowest standard deviation across benchmark videos.