4th IEEE International Conference on Cloud Computing Technology and Science Proceedings
Download PDF

Abstract

Due to prevalent use of sensors and network monitoring tools, big volumes of data or “big data” today traverse the enterprise data processing pipelines in a streaming fashion. While some companies prefer to deploy their data processing infrastructures and services as private clouds, others completely outsource these services to public clouds. In either case, attempting to store the data first for subsequent analysis creates additional resource costs and unwanted delays in obtaining actionable information. As a result, enterprises increasingly employ data or event stream processing systems and further want to extend them with complex online analytic and mining capabilities. In this paper, we present implementation details for doing both correlation analysis and association rule mining (ARM) over streams. Specifically, we implement Pearson-Product Moment Correlation for analytics and Apriori & FPGrowth algorithms for stream mining inside a popular event stream processing engine called Esper. As a unique contribution, we conduct experiments and present performance results of these new tools with different tumbling and sliding time-windows over two different stream types: one for moving bus trajectories and another for web logs from a music site. We find that while tumbling windows may be more preferable for performance in certain applications, sliding windows can provide additional benefits with rule mining. We hope that our findings can shed light on the design of other cloud analytics systems.
Like what you’re reading?
Already a member?Sign In
Member Price
$11
Non-Member Price
$21
Add to CartSign In
Get this article FREE with a new membership!

Related Articles