Grant Wallace
CS 597d
Observation 7:
The Superblock: An Effective Technique for VLIW and Superscalar Compilation
[1]
It is interesting that just forming the superblocks,
without doing any optimizations on them, more than doubles the performance
(fig. 7). This gain must be coming from improved branch prediction, although
they never mention anything about branch prediction implementation. The
gain isn’t coming from improved I-cache locality because ideal caches are
assumed. If the processor used a static branch predictor such as FTBNT
or BTFNT, it seems the performance should actually decrease compared to
non-superblock formations. This is because there is no increase in branch
prediction performance (whether the exit from a superblock is forward or
back should be somewhat random), and there are more branches. Of course
if the static prediction is not-taken, then we should see performance gains.
All the results are reported assuming there are
no restrictions on which instructions can be issued together (i.e. uniform
functional units). I would be interested in seeing results where a real
machine description is used, and how much performance degradation occurs.
Strengths:
The results section (sec. 4) seems quite thorough.
They do a careful job at normalizing the results to well known commercial
compilers and traditional optimizations. Additionally, the performance
issues involved with I-cache and D-cache limitations are addressed, and
compile time, cost and code size are also looked at.
[1] Wen-mei W. Hwu, ... "The Superblock: An Effective Technique for
VLIW and Superscalar Compilation", The Journal of Supercomputing, Kluwer
Academic Publishers, 1993, pp. 229-248.