2012 IEEE Conference on Computer Vision and Pattern Recognition
Download PDF

Abstract

In this work, we introduce a hierarchical matching framework with so-called side information for image classification based on bag-of-words representation. Each image is expressed as a bag of orderless pairs, each of which includes a local feature vector encoded over a visual dictionary, and its corresponding side information from priors or contexts. The side information is used for hierarchical clustering of the encoded local features. Then a hierarchical matching kernel is derived as the weighted sum of the similarities over the encoded features pooled within clusters at different levels. Finally the new kernel is integrated with popular machine learning algorithms for classification purpose. This framework is quite general and flexible, other practical and powerful algorithms can be easily designed by using this framework as a template and utilizing particular side information for hierarchical clustering of the encoded local features. To tackle the latent spatial mismatch issues in SPM, we design in this work two exemplar algorithms based on two types of side information: object confidence map and visual saliency map, from object detection priors and within-image contexts respectively. The extensive experiments over the Caltech-UCSD Birds 200, Oxford Flowers 17 and 102, PASCAL VOC 2007, and PASCAL VOC 2010 databases show the state-of-the-art performances from these two exemplar algorithms.
Like what you’re reading?
Already a member?Sign In
Member Price
$11
Non-Member Price
$21
Add to CartSign In
Get this article FREE with a new membership!