1. Research challenge
Despite all the experience on parallel programming gathered in the last decades and the experience in implementing runtime systems to support parallel execution, the productive expression and efficient exploitation of parallelism for current multicore architectures are not obvious tasks.
Many-/multi-core architectures currently include tenths of cores and will include hundreds of them in the near future, with memory organizations that will probably scape from the easy-to-program shared-memory address space. In addition, resource specialization is considered as the path to built power efficient architectures, leading to heterogeneous architectures. New programming paradigms and OS/hardware support to enable effective programming of these architectures with 100+ cores, in terms of scalability and portability, are necessary.
2. Objectives
The main objective of this cluster was to create a powerful european research community for programming models and runtime environment for exascale architectures based on future many-/multi-core architectures. During HiPEAC-2, the cluster has been working with the following objectives in mind:
- Propose novel programming models (or evolutions of the current ones) for future homogeneous/heterogeneous many-/multi-core chips. Expressiveness and productivity are equally important. Accelerators (SIMD units, GPUs, FPGAs, ...) should be considered. The introduction of dataflow concepts is necessary to exploit the distant parallelism that is hidden by the use of global synhronization mechanisms (i.e. barriers). Research on compiler and runtime support to parallel programming models, hiding as much as possible the particularities of the target architecture to the programmer (e.g. local memories and DMA transfers).
- Propose novel programming models (or evolutions of the current ones) for future exascale clustered acrhitectures. PGAS (Partitioned Global Address Space) programming models are gaining much momentum as a novel approach to improve programmability and productivity of large-scale computing architectures. PGAS programming models offer programming flexibility of a globally shared-memory programming model whilst introducing adata locality at a much higher level than the distributed-memory MPI model.
- Address the inter-operability of standards (e.g. MPI/OpenMP, OpenMP/OpenCL, ...) in order to provide scalability and portability.
- Explore and propose new architectural features to support the programming model and its runtime implementation: thread creation, task off-loading, locality optmization, transactional memory, ...
- Evaluate and propose OS support for architectures including heterogeneous cores (CPU, SIMD, GPU), reconfigurable cores (FPGA), ... that guarantees fast task creation and synchronization in the many-/multi-core environment, provides efficient task scheduling and allocation mechanism, thread management for power, temperature, and reliability, provides QoS in real time systems, ...
- Promote the use and development of common tools to support our research activities as well as new benchmarks and methodologies that could serve as a basis for the evaluation of ideas proposed in our community.
- Form interdisciplinary (intra- and inter-cluster) research groups that could make project proposals on programmability and parallelism of homogeneous or heterogeneous multi-core and/or reconfigurable architectures, with a holistic approach that addresses issues related to the underlying hardware, the operating system and the system software.
- Contribute to a common HiPEAC 2012-2020 vision.
3. Research activities carried on during HiPEAC-2
3.1 Novel programming models and their runtime support- Exploring and proposing novel paradigms based on the exploitation of parallelism at runtime, supported with program annotations. In this direction, the activity of the groups at BSC-UPC, U. of Siena, U. Cyprus and U. of Manchester was the incubator of the TERAFLUX project (FP7 IP 2010-13, Grant Agreement 249013). These groups are addressing the programmability challenges of many-core architectures by combining an underlying dataflow-based thread execution with advanced programming models like transactional memory. The addition of simple extensions to the BSC-UPC's OmpSs programming model to introduce programmer-directed speculation mechanisms to increase performance has been explored during 2011, as a joint research activity between BSC-UPC and Technion supported with cluster funding (Integrating Speculation in StarSs).
- Proposing extensions to OpenMP for accelerator-based architectures and implementation of a prototype for GPUs. Two groups were involved: U. Castellon and BSC-UPC. The collaboration resulted in a couple of two papers were the main proposal in described:
- IWOMP 2009: with a proposal to extend OpenMP to support the specification of task off-loading on accelerators (A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures)
- Europar 2009 with the implementation and evaluation of GPUSs (An Extension of the StarSs Programming Model for Platforms with Multiple GPUs).
- Exploring the use of FPGA-based accelerators to off-load tasks in OpenMP. This work was done in collaboration with the Reconfigurable Computing cluster and resulted in a paper presented at SAMOS 2009 (OpenMP extensions for FPGA Accelerators).
- The integration of OmpSs and OpenCL has been explored thanks to the mobility Pipeline Scheduling of OpenCL Kernels on Multi-core Architectures supported with cluster funding. The mobility has been the seed for a strong collaboration between the programming models group at BSC-UPC and the group at Politecnico di Milano.
- Exploring the applicability of the hybrid MPI/SMPSs in real applications, propagating the main characteristics of SMPSs to the global cluster level. This is also a way to leverage and provide a smooth migration path for the huge number of applications today written in MPI. In the EU TEXT project BSC-UPC, FORTH-ICS are U. of Castellon from HiPEAC are participating, in addition to other research institutions and supercomputing centers.
- Exploring the use of MapReduce-like programming paradigms for clusters and distributed-memory multi-core architectures. Two groups were involved: FORTH-ICS and BSC-CNS. The work on Mapreduce continued during 2011 with a collaboration between BSC-CNS and IBM, creating formal models to describe the behaviour of MapReduce applications and to drive scheduling decisions, paying special attention to the energy consumption management in addition to performance management. Mobility associated to this collaboration was supported with cluster funding (MapReduce Modeling and Energy Management).
- Promoting common research activities around transactional memory, including language support, runtime and hardware support. Two groups were initially involved (U. Manchester and BSC-CNS) with mobility (Investigating Runtime-Adaptable Software/Hardware Transactional Systems) to support it. During 2011, in collaboration with Onur Mutlu at CMU, the use of Hardware Transactional Memory to provide support to runtime systems for fault tolerance, leveraging the abort mechanism of TM for error recovery. Mobility associated to this collaboration was supported with cluster funding (Cost-Effective Fault Tolerance).
- Exploring hardware acceleration for managing task dependencies at runtime. The groups at U. Delft and BSC collaborated on. The mobility associated to this collaboration was supported with cluster funding (Collaboration funding Hardware Dependency Resolution). In addition, the activity in this cluster and in the multi-core architectures cluster was the incubator of the ENCORE project (FP7 STREP 2010-12). The project is investigating the appropriate hardware support for the parallel programming and the runtime environment that ensures scalability, performance, and cost-efficiency. It is also developing a runtime system to dynamically detect, manage, and exploit parallelism, data locality, and resources across several parallel architectures.
- Implementing automatic data block prefetching techniques in dataflow programming models. ARM and BSC-UPC have collaborated in porting the OmpSs runtime for the ARM Cortex architectures and implementing data prefetching based on the data directionality clauses provided by the programmer and the information gathered at runtime. Mobility associated to this collaboration was supported with cluster funding (Implement automatic data block prefetch in OmpSs runtime for the ARMs Cortex-A9). This work was an initial step before the start of the Mont-Blanc EU project.
- Building a new thermal-, power- and performance- simulation and modeling environment. This new environment aims to cover both the microscopic (high-performance microarchitecture) and macroscopic (processing rack and complete network of processing nodes) levels during the execution of typical high-performance computing applications. Thus, this environment enables exploration of new programming models for supercomputers, low-power architectures, compiler tool chains, execution environments, operating systems, and applications. Three research groups were involved: Barcelona Supercomputing Center (Spain), EPFL (Switzerland) and Complutense University of Madrid (Spain). The mobility associated to this activity was supported with cluster fundind (Performance, Power and Thermal Simulation Frameworks for High-Performance Computing Systems).
4. Other activities
- Organization of workshops on programming models and runtime support for multi-cores systems:
- MULTIPROG-2008 in Goteborg (January 2008)
- BMW'08 in Barcelona (June 2008)
- MULTIPROG-2009 in Paphos (January 2009)
- MULTIPROG-2010 in Pisa (January 2010)
- BMW'10 in Barcelona (October 2010)
- MULTIPROG-2011 in Heraklion (January 2011)
- BMW'11 in Barcelona (November 2011)
- MULTIPROG-2012 in Paris (January 2012)
5. Information about cluster meetings
- Information about the kick-off meeting in Goteborg. January 2008.
- Information about the 2nd meeting in Barcelona. June 2008.
- Information about the 3rd meeting in Paris. November 2008.
- Information about the 4th meeting in Paphos. January 2009.
- Information about the 5th meeting in Munich. June 2009.
- Information about the 6th meeting in Wroclaw. October 2009.
- Inter-cluster session "Dusk of general-purpose many-cores, dawn of heterogeneous multi-cores: What are the design and programming issues?" in Edinburgh. May 2010.
- Information about the 7th meeting in Barcelona. October 2010.
- "Systematic Approaches for the Analysis/Optimization of Applications" session in Chamonix (joint session with the Applications Taskforce). April 2010.
- "Languages and Tools for Heterogeneous and GPU-based multicores" session in Barcelona (joint session with the compilation and binary translation and virtualization clusters, and the applications task force). November 2011.
Coordinating partner: BSC
Programming models and operating systems
Programming models and operating systems
- Home page
- Contributions to the roadmap
- Goteborg meeting (January 2008)
- Barcelona meeting (June 2008)
- Paris meeting (November 2008)
- Application suite: HiPEAC-PARADOX
- Paphos meeting (January 2009)
- Munich meeting (June 2009)
- Wroclaw meeting (November 2009)
- Barcelona meeting (October 2010)
- Systematic Approaches for the Analysis/Optimization of Applications
