CISUC

Abordagens para Cubo de Dados Massivos com Alta Cardinalidade Baseadas em Memória Principal e Memória Externa: HIC e bCubing

Authors

Abstract

Cubing methods based on inverted indices, such as Frag-Cubing, are efficient alternatives to conventional approaches of computing high-dimensional data cubes. methods approaches for data cubing based on inverted indices, such as Frag-Cubing, are efficient alternatives to conventional approaches for computing high-dimensional data cubes. However, similar to other memory-based cube solutions, the efficiency of such methods is constrained by available dynamic random-access memory (DRAM). In this work, we propose two initial approaches: qCube and H-Frag. qCube is a high dimension sequential range cube approach which implements Equal, Not Equal, Distinct, Sub-cube, Greater or Less than, Some, Between, Similar and Similar query operators over a high dimension data cube. H-Frag is an approach which adopts both main and external memories, but the end-user must configure its memory thresholds manually. Based on initial approaches, we develop two other contributions to improve hybrid memory systems utilization: HIC and bCubing. HIC describes a hybrid memory system to store cube partitions in external memory. The most frequent attribute values are stored in DRAM and the remaining attribute values are retained in hard disk drive storage. HIC introduces a new property, named critical cumulative frequency, to define which attribute values will be stored in RAM or in external memory. bCubing partitions a list of TIDs implementing the inverted index in two levels: one level in which the identifier is the block index (BIDs) and the second level in which the identifier is the tuple index (TIDs). The TIDs of attribute values are retained in hard disk drive storage. The BIDs are kept in main memory indexed by the attribute values. bCubing is able to calculate and maintain holistic measures accurately updated within high dimensional cubes with high number of tuples. Tests using a relation with 480 dimensions and 107 tuples show that bCubing is only 30% more time slower than Frag-Cubing when computing a data cube, and approximately 3 times faster than Frag-Cubing when answering complex cube queries. A massive cube with 60 dimensions and 109 tuples was computed by bCubing using 84 GB of RAM, while Frag-Cubing could not compute such a cube in a machine with 128 GB of RAM without operating system swaps. We also computed holistic measures in a high dimension data cube with bCubing and results demonstrated that bCubing spends, on average, 12% longer to calculate holistic measures than queries with COUNT measures. bCubing answered queries from a data cube with 1.2 billion tuples in up to 4 minutes, where a query Q has two sub-cub operators and one EQUAL operator. Query Q accurately calculated three holistic measures: standard deviation, median and mode.

Keywords

Online analytical processing (OLAP), data cube, high dimension

Subject

Massive Data Cubes Computation

PhD Thesis

Abordagens para Cubo de Dados Massivos com Alta Cardinalidade Baseadas em Memória Principal e Memória Externa: HIC e bCubing, November 2015

PDF File


Cited by

No citations found