Tutorials - Facing the Multicore-Challenge

Tutorial Speakers 2011
Jan Treibig, RRZE Erlangen, Germany
Marta Garcia, Barcelona Supercomputing Centre, Spain
Rainer Keller, HLRS, Stuttgart, Germany
Hans Pabst, Intel, Germany

Tutorials

Unleash the Power of Modern CPUs through Vectorization and Parallelization

Hans Pabst, Intel

Today's CPUs draw their computational power from both vectorization and parallelization. Growing vector widths as well as increasing numbers of cores demand both vectorization and parallelization to unleash the power of modern CPUs. In this tutorial, we will give an overview on the evolution of SIMD vectorization and the multicore CPUs. We will introduce programming models that make it easy for programmers to fully exploit the SIMD vectorization as well as multi-threading for optimal performance. We will focus on the C/C++ programming and discuss the following programming models: Threading Building Blocks, Array Building Blocks, Cilk+, and SIMD pragmas.

The SMPSs Programming Model

Marta Garcia, Barcelona Supercomputing Centre, Spain
Rainer Keller, HLRS, Stuttgart, Germany

MPI/SMPSs is a hybrid parallel programming model for large-scale homogeneous or heterogeneous systems. The SMPSs component of the model enables incremental parallelization of sequential code, while providing the necessary support for extracting task-level parallelism from applications at runtime, asynchronous execution, heterogeneity, modularity, and portability. SMPSs is based on directives and its tasking model can be seen as an extension of the OpenMP task to allow the runtime to dynamically compute dependences between tasks. A pool of threads can traverse the graph executing tasks as long as their dependences become satisfied by the completion of previous tasks. This dataflow execution model within a process propagates in hybrid MPI/SMPSs to the MPI level, allowing for adaptive overlap between communication and computation.
The tutorial will describe the SMPSs model and how it nicely hybridizes with MPI. We will present results of different applications and how support tools can be used to better understand, debug and tune MPI/SMPSs programs.

Principles of Multicore Optimization

Jan Treibig, RRZE Erlangen, Germany

This tutorial will give an introduction to high performance programming on current multicore architectures. Focus will be put on specific features of processor and node designs like, e.g., simultaneous multithreading (SMT), shared caches, and ccNUMA characteristics. These must be considered in order to fully understand the relevant performance bottlenecks. With the help of simple performance models and tools, best practices are introduced that enable programmers to identify optimization opportunities and also "dead ends". All issues will be backed by experiments and benchmark results. Finally optimization techniques are presented which specifically exploit multicore characteristics.