Adaptive Instruction Scheduling for Itanium


Benoit Hudson (bhudson@cs.cmu.edu)
Stephen Somogyi (ssomogyi@ece.cmu.edu)
Shobha Venkataraman (shobha@cs.cmu.edu)


Web Location: http://www.ece.cmu.edu/~ssomogyi/15745

Project Description

In several places, optimizing compilers want to solve NP-hard optimization problems. Typically, this is done by having the compiler writer decide on a heuristic approach that, in practice, gives good results (outputs fast code), while still compiling the code in a reasonable amount of time.

Our goal is to allow trading off time and code quality at compile time. That is, the compiler itself may decide to spend longer -- and get better code -- on one part of the program than on another. This allows two things: the user can decide to compile quickly during the development phase, and spend a long time to get very good code for deployment; and the compiler can use profile feedback to identify hot sections and optimize them well, while saving time on less critical sections of code.

Two approaches that handle this notion well are anytime algorithms from the AI community, and PTAS from the algorithms / operations research community. In an anytime algorithm, the algorithm quickly finds a feasible solution, then conducts a search to find a better solution. When it finds a better solution, it stores the solution, then resumes searching. At any time (hence the name), the algorithm can be stopped and asked for its best solution so far.

A PTAS (polynomial-time approximation scheme) provides the user with a knob to turn: the algorithm will give an answer provably within 1+epsilon of the optimum value, at a time cost of, O(n^{c+1/epsilon}) where the 'c' depends on the problem. That is, we might get a 2-approx -- the solution would provably be within a factor of two of optimal -- in linear time if 'c' were 1; and on the same problem we could get a 1.1-approximation in O(n^10) time. Many, but not all, NP-hard optimization problems admit a PTAS.

The project will investigate using a PTAS to perform instruction scheduling on the IA-64 Itanium processor. If there is no PTAS for the particular job-shop scheduling problem, we will instead use an anytime algorithm. We will use profile feedback methods to direct the compiler to spend more time optimizing hot segments, and less time on cold segments. Finally, we will compare the code quality and compile time for some benchmark programs from our implementation and a standard implementation (hopefully one already implemented).

Extensions include identifying other places in the compiler where we can formulate the compiler pass as a problem that admits a PTAS or some similar easy-to-turn time/performance knob (register allocation comes to mind). In the unlikely case that we get good results for instruction scheduling and have some extra time, we might even do the same work for those other areas.

A not unlikely result of this work is that the current heuristics are better for generating good code quickly than our technique will be; we can still claim our technique is useful if we can, by default, use the normal heuristics, and if the user wants to get very fast code, they can use our technique instead at a substantial increase in total compile time.

In the worst case, we can say that current general job-shop schedulers are insufficient to beat the quality tradeoff of traditional instruction scheduling without spending completely excessively too much time, meaning that we need either better general job-shop solvers, or a technique that
includes domain-based heuristics and hacks.


Schedule

Week
Activities
Status
1
(a) find the Itanium research compiler and documentation
(b) formulate scheduling on the Itanium as a particular kind of job-shop problem
done (Oct 31)
50%: it doesn't quite fit into standard frameworks, so we need to formulate it on our own.
2
(c) find a PTAS or formulate a good anytime search algorithm for that problem
(d) design an interface for describing the job-shop problem (one that deals only in terms of jobs, machines, and prerequisites)
20%: no PTAS exists; we must find our own algorithm.
50%: first draft done.
3
(e) write code in the compiler architecture to transfer from their IR to a job-shop problem
(f) write code to implement the job-shop algorithm, conforming to the interface we designed
4
(g) write a method to read a file of function names and epsilon values so the compiler will use those values to determine how much to optimize
(h) run the compiler on some toy test cases
(i) implement a set of scripts to quickly run and check test cases
5
(j) implement gathering profile information
(k) implement automatically using profile information to drive the choice of epsilon
6
(l) run experiments: SPEC or some other relatively standard benchmark.
(m) Compare with the Itanium compiler's default instruction scheduler

Milestone

See week 4: the milestone is being able to say "we can produce legal, optimized code on our toy cases." No guarantee that the code is anything but legal and scheduled by our scheduler.


Fallbacks


Literature


Resources