SINGLE vs. MapReduce vs. Relational: Predicting Query Execution Time

Authors

Maryam Abbasi
Pedro Miguel Oliveira Martins
José Cecílio
Pedro Furtado

Abstract

Over the past decade’s several new concepts emerged to organize and query data over large Data Warehouse (DW) system with the same primary objective, that is, optimize processing speed. More recently, with the rise of BigData concept, storage cost lowered significantly, and performance (random accesses) increased, particularly with modern SSD disks. This paper introduces and tested a storage alternative which goes against current data normalization premises, where storage space is no longer a concern. By de-normalizing the entire data schema (transparent to the user) it is proposed a new concept system where query execution time must be entirely predictable, independently of its complexity, called, SINGLE. The proposed data model also allows easy partitioning and distributed processing to enable execution parallelism, boosting performance, as happens in MapReduce. TPC-H benchmark is used to evaluate storage space and query performance. Results show predictable performance when compared with approaches based on a normalized relational schema, and MapReduce oriented.

Keywords

Predictable, Query execution, Data warehouse, MapReduce, Normalization, De-normalization, Distributed, Relational

Subject

International Conference: Beyond Databases, Architectures and Structures

Conference

International Conference: Beyond Databases, Architectures and Structures, August 2018

DOI

Cited by

No citations found