SINGLE vs. MapReduce vs. Relational: Predicting Query Execution Time
Authors
Abstract
Over the past decade’s several new concepts emerged to organize and query data over large Data Warehouse (DW) system with the same primary objective, that is, optimize processing speed. More recently, with the rise of BigData concept, storage cost lowered significantly, and performance (random accesses) increased, particularly with modern SSD disks. This paper introduces and tested a storage alternative which goes against current data normalization premises, where storage space is no longer a concern. By de-normalizing the entire data schema (transparent to the user) it is proposed a new concept system where query execution time must be entirely predictable, independently of its complexity, called, SINGLE. The proposed data model also allows easy partitioning and distributed processing to enable execution parallelism, boosting performance, as happens in MapReduce. TPC-H benchmark is used to evaluate storage space and query performance. Results show predictable performance when compared with approaches based on a normalized relational schema, and MapReduce oriented.
Keywords
Predictable, Query execution, Data warehouse, MapReduce, Normalization, De-normalization, Distributed, Relational
Subject
International Conference: Beyond Databases, Architectures and Structures
Conference
International Conference: Beyond Databases, Architectures and Structures, August 2018
DOI
Cited by
No citations found