Abstract
Inspired by the success of inverted indexing in the textual search domain, we provide sparseness justifications for using inverted file indexing on image content, which paves the way for developing scalable image content search systems. We use clustering to automatically generate a content vocabulary. To avoid the problem of generating cluster centers that are overcrowded in high density areas for sparse data sets, we use a cluster-merge procedure for cluster post-processing. We further use visual codewords to represent low level image features, which not only makes the inverted file indexing and search applicable to image content, but also helps bridge the gap between the low level image features and high-level human visual perception. Experimental results confirm the success of our methods.