Abstract
Data management systems for "big science" often have tight memory and disk space constraints. In this paper, we introduce adaptive bitmap indexes, which conform to both space limits while dynamically adapting to the query load and offering excellent performance. So that adaptive bitmap indexes can use optimal bin boundaries, we show how to improve the scalability of optimal binning algorithms so that they can be used with real-world workloads. As the removal of false positives is the largest component of lookup time for a small-footprint bitmap index, we propose a novel way to materialize and drop auxiliary projection indexes, to eliminate the need to visit the data store to check for false positives. Our experiments with real-world data and queries show that adaptive bitmap indexes offer approximately 100-300% performance improvement (compared to standard binned bitmap indexes) at a cost of 5 MB of dedicated memory, under disk storage constraints that would cripple other indexes.