Abstract
A probabilistic framework based on a universal source coding for content-based image retrieval is proposed. By a multidimensional incremental parsing technique, which is an extension of the Lempel-Ziv incremental parsing algorithm, a given image is parsed into a number of variable-size rectangular blocks, called parsed representations. To achieve a semantically relevant pattern matching, we introduce a new similarity measure from the first- and second-order statistics of given image patches. Once the occurrence patterns of images in the corpus are analyzed, the term-document joint distribution is estimated by an aspect modeling technique under the assumption of latent aspects. To compare the performance of the proposed image retrieval framework based on the parsed representations, we implement a benchmark system based on the fixed-shape block representations trained by vector quantization. In addition to these two systems, we bring two content-based image retrieval systems into the performance evaluation. The experimental results on a database of 20,000 natural scene images demonstrate that the proposed image retrieval system significantly outperforms other existing and the benchmark systems.