Resource Characterization in the PACE Project
Resource characterization plays a critical role in the PACE
Project's strategy for building an optimizing compiler that adapts itself and
tunes itself to new systems.
The PACE Compiler and the PACE Runtime System need access to
measurements of a variety of performance-related characteristics of the target
computing system. The goal of the PACE Resource Characterization subproject is
to produce those measured values.
These values include cache characteristics such as size of each level of
the cache hierarchy, instruction costs and latencies, and the availability of vector operations. The overarching design
requirement of this subproject is that we can only describe those architectural
characteristics that have an impact on the code produced by the PACE
compiler. Thus, for example, while
it may be intellectually interesting to discover the length of a processor's
instruction pipeline, if the compiler cannot take advantage of that information, then we do not spend
time measuring it. On the other
hand, a characteristic such as the size of the first level of cache is
important because the compiler can use that statistic to guide optimizations
such as loop blocking. Our example
of pipeline length is instructive: while knowing the pipeline length might
improve some optimizations (e.g., the pipeline length describes the cost
of a missed branch, and that may be useful for instruction scheduling), the
PACE compiler is limited by the requirement that it produce C code -- rather
than native machine code -- as its output.
One of the fundamental design goals of the PACE compiler is
that it must be able to adapt to many different architectures, both those
currently in production, as well as architectures that have not yet been
designed. This means that the
resource characterization cannot rely on existing technologies and interfaces
-- for example, some commodity microprocessors have programming interfaces to
allow a piece of code to discover many of the characteristics for which we are
looking. The problem with relying
on these interfaces is that they are inherently idiosyncratic, both in the kind
of information offered and in the format of the calls and information
returned. The information is
usually based on physical capacity, rather than usable limits -- for example, we measured a significant slowdown when trying
to use more than five megabytes of a cache that can hold eight megabytes. The reason for the discrepancy is that the cache is shared among cores
and processes, so although the hardware reports a particular size, the
practical capacity (i.e. the amount we can use before the code slows down
-- the real value that the compiler needs to know) is a much different
value. Further, future
architectures may well have different interfaces, levels of information, etc. Clearly, the goals of the
PACE project require a more generic, universal solution.
As a result, our approach to resource characterization will
be to use microbenchmarks, small pieces of code designed to expose
architectural behavior. Each microbenchmark
is tightly focused on discovering the characteristics of a specific
architectural feature. This
results in a library of codes, along with a simple interface that allows the
rest of the PACE tools to access the results of resource characterization.