Visual dictionaries in the Brain: Comparing HMAX and BOW

Kandan Ramakrishnan; Iris I. A. Groen; H. Steven Scholte; Arnold W. M. Smeulders; Sennay Ghebreab

doi:10.1109/ICME.2014.6890312

Abstract

The human visual system is thought to use features of intermediate complexity for scene representation. How the brain computationally represents intermediate features is, however, still unclear. Here we tested and compared two widely used computational models — the biologically plausible HMAX model and Bag of Words (BoW) model from computer vision against human brain activity. These computational models use visual dictionaries, candidate features of intermediate complexity, to represent visual scenes, and the models have been proven effective in automatic object and scene recognition. We analyzed where in the brain and to what extent human fMRI responses to natural scenes can be accounted for by the HMAX and BoW representations. Voxel-wise application of a distance-based variation partitioning method reveals that HMAX explains significant brain activity in early visual regions and also in higher regions such as LO, TO while the BoW primarily explains brain acitvity in the early visual area. Notably, both HMAX and BoW explain the most brain activity in higher areas such as V4 and TO. These results suggest that visual dictionaries might provide a suitable computation for the representation of intermediate features in the brain.

Visual dictionaries in the Brain: Comparing HMAX and BOW

Authors

Abstract

Related Articles