Treegion Scheduling for Wide Issue Processors

Treegion scheduling is superior to superblock scheduling on its
formation without profiling information and non-linear scope.
However, without profiling information we do not know which basic
blocks are important. This causes "dependence height scheduling"
and "exit count scheduling" do not do well on some cases. Anything
other than the profiling information can not guide the scheduling
very well because the real paths can not be told only according to
the structure of programs. The profiling information should be considered
whenever it is possible.

The speedup of treegion scheduling comes from the non-linear scope. It can
schedule multiple branches at the same time. Intuitively, it should be
better because if enough resources are available, the scheduling is limited
by data dependence. When we consider more than one path and some resources
are not used because of the limitation of data dependence, we can them to
compute another path. This is very helpful especially when the
possibilities of branches taken and not taken are close.

Treegion scheduling break the barrier of side exits. It means that we
can schedule a region with branches as long as no merge points are
included. If we could handle merge points successfully, we can find more
parallelism.

------
Dominator parallelism in treegion scheduling

A primary drawback of tail duplication is the introduction of redundant
operation. Dominator parallelism can eliminate the redundant. It depends
on the performance of dominator parallelism. Consider what we have done
here. First we duplicate the basic blocks. Then we try to find the same
operations among them. The second step may not work perfect. In our case,
we just check before the duplication if some operations in the
blocks duplicated can be moved to the common dominator.