The PACE Project is developing an architecture aware compiler environment. Rice University is the lead site, with active participants at ET International, Ohio State, Stanford, and Texas Instruments.

 
Loading...

The PACE Runtime System

The purpose of the PACE Runtime System (RTS) is to measure the performance of program executions with three aims:

  • to help identify important program regions worthy of intensive optimization,
  • to provide data to support feedback directed optimization, and
  • to provide a harness that supports measurement-driven online parameter selection. 

With each generation, microprocessor-based computer systems have become increasingly sophisticated with the aim of delivering higher performance.  With this sophistication comes complexity. Today, nodes in microprocessor-based systems are typically equipped with one or more multicore microprocessors.  Individual processor cores support additional levels of parallelism typically including pipelined execution of multiple instructions, short vector operations, and simultaneous multithreading. In addition, microprocessors rely on deep multi-level memory hierarchies for reducing latency and improving data bandwidth to processor cores.

As the complexity of microprocessor-based systems has increased, it has become harder for applications to achieve a significant fraction of peak performance.  Attaining high performance requires careful management of resources at all levels. To date, the rapidly increasing complexity of microprocessor-based systems has outstripped the capability of compilers to map applications onto them effectively.  In addition, the memory subsystems in microprocessor-based systems are ill suited to data-intensive computations that voraciously consume data without significant spatial or temporal locality.  Achieving high performance with data-intensive applications on microprocessor-based systems is particularly difficult and often requires careful tailoring of an application to reduce the impedance mismatch between the application's needs and the target platform's capabilities.

To help compilers improve their ability to map applications onto modern microprocessor-based systems, the PACE RTS will collect detailed performance measurements of program executions to determine both where optimization is needed and what problems are the most important targets for optimization. With detailed insight into an application's performance shortcomings, the PACE compiler will be better equipped to select and employ optimizations that address them.

The RTS will include a harness to support online feedback-directed optimization. During compilation, the Platform-Aware Optimizer (PAO) may determine that certain parameters might benefit from runtime tuning. For instance, the best parameter settings for a tiled loop nest may depend upon the cache footprints of other threads running concurrently. To leverage RTS support for online tuning, the PAO will present the RTS with a closure that contains a tuple of initial parameter values (e.g., extents for each dimension of a data tile), a specification of the bounds of the parameter space, a generator function that will explore the parameter space and suggest new parameter tuples, and a parameterized version of the user’s function that will be invoked with the current tuple of parameters. During execution, the RTS will use the provided closure to adjust parameter values to select a configuration that delivers the best performance. Information about the results of online tuning will be provided to PACE's machine learning tools for future use.