An Overview of the PACE Compiler
The PACE compiler is a source-to-source optimizing compiler
that tailors application code for efficient execution on a specific target
system. It accepts as input a parallel program written in C with MPI and/or OpenMP parallel extensions. It analyzes and transforms the
code in a number of different ways and produces, as output, a C program that has been tailored for efficient execution on the target system. The transformations performed by the PACE compiler focus on optimizing locality and parallelism. That transformed program
will be tailored to the performance parameters of the target system and to
the specific details of the processor cores and memory hierarchies.
The PACE compiler uses information from several sources.
- Characterization:
The PACE characterization tools measure performance characteristics of the
combined hardware/software stack
of the target system. The PACE compiler uses those characteristic values, both
to drive optimizing transformations and to estimate the performance of alternative
optimization strategies on the target system.
- Machine
Learning: The PACE machine learning tools will provide suggestions to the
compiler to guide the optimization process. The specific nature of those
suggestions is a subject of ongoing research.
- Runtime
System: The PACE runtime system will provide the compiler with measured
performance data from application executions. This data will include detailed
performance profile information. The runtime system will identify rate-limiting
resources for loop nests and functions that use a significant fraction of execution time.
- Optimization
Plan: The PACE compiler will coordinate its internal components, in part,
by using an explicit but malleable optimization plan. The optimization plan is a concrete
representation of the individual transformations that the compiler will apply
to the code, along with any parameters to those transformations.
- Configuration
File: The configuration file is provided by the system installer. It
contains critical facts about the target system and its software configuration
that the PACE tools, including the PACE compiler, need to interact with that
software.
The compiler produces, as output, executable C code that will
interact with the runtime system. It embeds information into the executable that
the runtime system needs. As examples:
- The compiler creates a string representation of
the optimization plan so that the runtime system can retrieve that information
and associate it with the performance data, a necessary part of creating useful
information for feedback-directed optimization and the machine learning tools.
- The compiler inserts calls to the API for
runtime optimization. These calls register the locations of runtime
optimization parameters and suggested value ranges for them. The API allows the compiler and runtime
system to coordinate this behavior.
By embedding this information directly in the executable
code, PACE provides a simple solution to the storage of information needed for
feedback-driven optimization and for the application of machine learning to the
selection of optimization plans. It avoids the need for a centralized store,
like the central repository in the Rn
system of the mid-1980s. It avoids the complication of creating a unique name
for each compilation, recording those in some central repository, and ensuring
that each execution can contact the central repository.