Hash-based Placement and Processing for Efficient Node Partitioned Query-intensive Databases
Authors
Abstract
This paper discusses efficient hash-partitioning using workload accesspatterns to place and process relations in a cluster or distributed query-intensive
database environment. In such an environment, there is usually more than one
partitioning alternative for each relation. We discuss a method and algorithm to
determine the hash partitioning attributes and placement. Among the alternatives, our
algorithm chooses a placement that reduces repartitioning overheads using expected or
historical query workloads. The paper includes a simulation study showing how our
strategy outperforms ad-hoc placement and previously proposed distributed database
strategies.