Abstract
In this article, we are interested in accelerating similarity search in high dimensional vector spaces. The presented approach, called HiPeR, is based on a hierarchy of subspaces and indexes: it performs nearest neighbors search across spaces of different dimensions, by beginning with the lowest dimensions up to the highest ones, with the aim of minimizing the effects of the curse of dimensionality. HiPeR significantly accelerates exact retrieval even with the best indexes, and also allows for progressive retrieval, i.e. the possibility to provide results to the user progressively with refinements until satisfaction. Scanning the hierarchy can be done according to several strategies. We propose and evaluate two heuristics: the first one supposes an a priori knowledge on the data-set distribution, while the second chooses the most interesting levels at run time. HiPeR is evaluated for range queries on 3 real data-sets varying from 500,000 vectors to 4 millions.