|
Published Articles >> Table of Contents >> Abstract
19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)
p. 23
Reservoir Sampling over Memory-Limited Stream Joins
1 Al-Kateb, The University of Vermont, USA
1 Lee, The University of Vermont, USA
1 Wang, The University of Vermont, USA
Full Article Text:
 
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/SSDBM.2007.40
Send link to a friend
| Abstract |
|
In stream join processing with limited memory, uniform random sampling is useful for approximate query evaluation. In this paper, we address the problem of reservoir sampling over memory-limited stream joins. We present two sampling algorithms, Reservoir Join-Sampling (RJS) and Progressive Reservoir Join-Sampling (PRJS). RJS is designed straightforwardly by using a fixed-size reservoir sampling on a join-sample (i.e., random sample of a join output stream). Anytime the sample in the reservoir is used, RJS always gives a uniform random sample of the original join output stream. With limited memory, however, the available memory may not be large enough even for the join buffer, thereby severely limiting the reservoir size. PRJS alleviates this problem by increasing the reservoir size during the join-sampling 1. This increasing is possible since the memory requirement by the join-sampling algorithm decreases over time. A larger reservoir provides a closer representation of the original join output stream. However, it comes with a negative impact on the probability of the sample being uniform. Through experiments we examine the tradeoffs and compare the two algorithms in terms of the aggregation error on the reservoir sample.
|
Additional Information
|
Citation:
1 Al-Kateb, 1 Lee, 1 Wang,
"Reservoir Sampling over Memory-Limited Stream Joins,"
ssdbm,
p. 23,
19th International Conference on Scientific and Statistical Database Management (SSDBM 2007),
2007
|
|