CISUC

Experimental Evidence on Partitioning in Parallel Data Warehouses

Authors

Abstract

Parallelism can be used for major performance improvement in large Data warehouses (DW) with performance and scalability challenges. A simple low-cost shared-nothing architecture with horizontally fully-partitioned facts can be used to speedup response time of the data warehouse significantly. However, extra overheads related to processing large replicated relations and repartitioning requirements between nodes can significantly degrade speedup performance for many query patterns if special care is not taken during placement to minimize such overheads. In this paper we show these problems experimentally with the help of the performance evaluation benchmark TPC-H and identify simple modifications that can minimize such undesirable extra overheads. We analyze experimentally a simple and easy-to-apply partitioning and placement decision that achieves good performance improvement results.

Subject

Data Warehousing

Conference

DOLAP 04 - WORKSHOP of the International Conference on Information and Knowledge Management (CIKM), November 2004


Cited by

Year 2006 : 2 citations

 Diogo Tuler Forlani, Cristina Dutra de Aguiar Ciferri, Ricardo Rodrigues Ciferri, "Melhorando o Desempenho do Processamento de Consultas Drill-Across em Ambientes de Data Warehousing, Simpósium Brasileiro de Banco de Dados, Florianópolis, SC, Brasil, 2006.

 Evelin Giuliana Lima, Marina Teresa Pires Vieira, "Ferramenta para Geração de Modelo Dimensional para Data Warehouses?, Simpósium Brasileiro de Banco de Dados, Florianópolis, SC, Brasil, 2006.