Publication

OpenMP scalability limits on large SMPs and how to extend them

OpenMP-Skalierbarkeitslimits auf großen SMP-Systemen and wie sie erweitert werden können

Schmidl, Dirk; Müller, Matthias S. (Thesis advisor); Bischof, Christian (Thesis advisor)

Aachen (2016)
Dissertation / PhD Thesis

Dissertation, RWTH Aachen University, 2016

Abstract

The most widely used node type in high-performance computing nowadays is a 2-socket server node. These nodes are coupled to clusters with thousands of nodes via a fast interconnect, e.g. Infiniband. To program these clusters the Message Passing Interface (MPI) became the de-facto standard. However, MPI requires a very explicit expression of data layout and data transfer in a parallel program which often requires the rewriting of an application to parallelize it. An alternative to MPI is OpenMP, which allows to incrementally parallelize a serial application by adding pragmas to compute-intensive regions of the code.This is often more feasibly than rewriting the application with MPI. The disadvantage of OpenMP is that it requires a shared memory and thus cannot be used between nodes of a cluster. However, different hardware vendors offer large machines with a shared memory between all cores of the system.However, maintaining coherency between memory and all cores of the system is a challenging task and so these machines have different characteristics compared to the standard 2-socket servers. These characteristics must be taken into account by a programmer to achieve good performance on such a system. In this work, I will investigate different large shared memory machines to highlight these characteristics and I will show how these characteristics can be handled in OpenMP programs. When OpenMP is not able to handle different problems, I will present solutions in user space, which could be added to OpenMP for a better support of large systems. Furthermore, I will present a tools-guided workflow to optimize applications for such machines.I will investigate the ability of performance tools to highlight performance issues and I will present improvements for such tools to handle OpenMP tasks. These improvements allow to investigate the efficiency of task-parallel execution, especially for large shared memory machines.The workflow also contains a performance model to find out how well the performance of an application is on a system and when to stop tuning the application.Finally, I will present two application case studies where user codes have been optimized to reach a good performance by applying the optimization techniques presented in this thesis.

Tools

Services

Institutions

IT Center

Tools

Services

Institutions

IT Center

Publication

OpenMP scalability limits on large SMPs and how to extend them

Abstract

Institutions

Identifier

Downloads