Compiler Optimization Through Machine Learning
Consider a problem in the PACE context: Given a program, a
target platform and a compiler, predict a good compiler configuration,
i.e., a good sequence of
optimizations which yields fast
execution for the program (or other advantageous properties, such as minimum
memory need). The sequence of optimizations that yields fast execution (optimum
performance, in general) depends on
- the characteristics of the program being
compiled,
- the characteristics of the target system, and
- the characteristics of the compiler.
A human designer uses past experience to achieve this
optimization, by remembering and applying a good configuration of compiler flag settings used for
similar programs encountered before; or by constructing a good configuration of
settings based on trial runs of the program of interest. Thus the success of
the designer depends on his or her ability to remember past experience, on the
ability to distill, abstract, and generalize knowledge from past experience,
and on the ability to spot patterns in a complex multidimensional space. This, in itself, is a formidable task.
Furthermore, all this experience and knowledge might become irrelevant if the
target platform changes, and it would involve massive effort to re-acquire the
relevant knowledge. The extremely large parameter spaces compiler optimization
tasks should ideally work with for knowledge extraction, make this problem
intractable for the human mind.
Automation is needed to effectively and efficiently characterize the
interactions between programs, platforms, and compilers, and their relations to
observed performance in a complex system that evolves over time.
Machine Learning aims to develop models of such complex
relationships by learning from available data (past experience or from
controlled experiments). The
learned models facilitate discovery of complex patterns and / or recognition of
patterns of known characters, in huge, unorganized high-dimensional parameter
spaces, thereby making the optimization task tractable and aiding in intelligent
decision making.
The Machine Learning Group of the PACE effort is concerned
with developing techniques to learn from the complex multidimensional data
spaces that characterize the interactions between programs, target system, and
compiler optimizations. The result of the learning -- the knowledge, captured in learned models
of relevant optimization scenarios -- can then be deployed and used in a variety of PACE related tasks such as
program optimization (for speed, for memory usage, etc.), or for resource characterization.
Moreover, with certain machine learning techniques, the models deployed after initial satisfactory training could learn continuously in a run-time environment. This not only enables their use as oracles, but also allows ongoing improvement of their knowledge based on feedback about optimization success. Thus the central objective of the PACE project,
which is to provide portable performance across a wide range of new and old
systems, and to reduce the time required to produce high-quality compilers for
new computer systems, can greatly be helped through machine learning
approaches.
Machine Learning
for compiler optimization is a relatively new area, with much unexplored
territory, which reflects the complexity of the associated challenges in both
compiler optimization and applicable Machine Learning algorithms, as well as
the large opportunities to develop new technologies for automation of code
optimization. The mission of the PACE Machine Learning Group is to respond to these
challenges and opportunities.