Controlling the Granularity of Automatic Parallel Programs
Authors
Abstract
Programming for concurrent platforms, such as multicore cpus, is very time consuming and requires fine tuning of the final program in order to optimize the program parallel layout to the hardware architecture. Parallelization of programs is done by identification parts of code (tasks) that can be executed concurrently and execution in different threads.Current approaches for automatic parallelization cannot achieve the same performance of manually parallelized programs. Current tools are limited and either parallelize everything possible, or are limited to parallelizing the outer loops, which may miss potential parallelism that could improve the program. Some approaches have controlled granularity during execution only, but without any relevant speedups. Automatic Parallelizing Compilers have shown little overall speedup without the manual guidance of programmers in terms of granularity. This work addresses the issue of achieving performant programs from a fully automated parallelization.
We propose a cost model to decide between different parallelization alternatives. By performing static-analysis, we are able to estimate the time of tasks and parallelize them only if the time is larger than the overhead of task spawning. Because the information during compilation might not be enough to make that decision, we delay some of the decisions to runtime, when all variables are available. Thus, we use an hybrid approach that performs optimizations at compile-time and at runtime.
Although we apply our model in the Java language on top of the Æminium runtime, our approach is modular and can be applied to any programming language in any task-based runtime for shared-memory.
We have evaluated our approach in existing benchmark programs, in cases where a wrong granularity value would result in slowing down the programs. We were able to achieve speedups greater than versions without granularity control, or with runtime-based granularity control information. We were also able to generate programs with better performance than the state-of-the-art Java automatic parallelizing compiler. Finally, in some cases we were able to outperform the human programmer.