2015 31st IEEE International Conference on Data Engineering Workshops (ICDEW)
Download PDF

Abstract

Analytical processing of various information created in the operation of social media requires queries involving grouping and aggregating of large volumes of detail data. Any advanced query processing method should take into account two dominating hardware trends: increasing main memory capacities and increasing parallel processing capacity exposed as growing number of cores per processor chip. We introduce a scalable in-memory method for data aggregation (SINCA), using clustered operators, which profits from the hardware trends. The method uses a concept of a microengine being a set of resources that can be utilized in parallel, with great efficiency. The resulting parallelized aggregation algorithm is characterized by a low overhead and high volume, and is suitable to both real-time and extract-transform-load scenarios. The core idea of the method is to use real-time histograms to partition the data for grouping. As the data is already grouped during the partitioning phase, the group aggregation can be done very efficiently. Additionally, some of the grouped data can be cached for re-use in subsequent queries.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles