Computing BIG data cubes with hybrid memory

Authors

Rodrigo Rocha Silva
Celso Massaki Hirata
Joubert de Castro Lima

Abstract

Nowadays, analysis data volumes are reaching critical sizes challenging traditional data warehousing approaches. Cubing methods based on inverted indices, such as Frag-Cubing, are efficient alternatives to conventional approaches of computing OLAP data cubes over Big Data. However, similar to other memory-based cube solutions, the efficiency of such methods is constrained by available dynamic random-access memory (DRAM). In this paper, we implement and test the hybrid inverted cubing (HIC) method, which adopts a hybrid memory system, with main goal of able to compute and update BIG data cubes (with high dimensionality and high number of tuples). HIC stores the most frequent attribute values in DRAM; the remaining attribute values are retained in external memory. Tests using a relation with 480 dimensions and 10^7 tuples show that HIC is three times slower than Frag-Cubing when computing a data cube, and approximately 13 times faster than Frag-Cubing when answering complex cube queries. A BIG data cube with 60 dimensions and 10^9 tuples was computed by HIC using 110 GB of RAM and 286 GB of external memory, while Frag-Cubing could not compute such a cube in same machine.

Keywords

Online analytical processing (OLAP), data cube, big data

Subject

Big Data with Olap

Journal

Journal of Convergence Information Technology (Gyeongju), Vol. 11, pp. 13-30, January 2016

PDF File

Cited by

No citations found