2008 IEEE 24th International Conference on Data Engineering
Download PDF

Abstract

We consider the problem of estimating the result of an aggregate query with a very low selectivity. Traditional sampling techniques can be ineffective for such a problem since a small random sample is likely to miss most or even all of the records satisfying the restrictive selection predicate. Stratfied sampling is useful in this situation, but a key problem in applying stratified sampling effectively is identifying which strata are important and developing a sampling plan that favors those strata in a robust fashion. We develop a solution to this problem that combines any prior knowledge or expectation about the stratification with information obtained from pilot sampling in a principled Bayesian framework.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles