Grant Wallace
CS 597d
 
Observation 7:
 
The Superblock: An Effective Technique for VLIW and Superscalar Compilation [1]
 
    It is interesting that just forming the superblocks, without doing any optimizations on them, more than doubles the performance (fig. 7). This gain must be coming from improved branch prediction, although they never mention anything about branch prediction implementation. The gain isn’t coming from improved I-cache locality because ideal caches are assumed. If the processor used a static branch predictor such as FTBNT or BTFNT, it seems the performance should actually decrease compared to non-superblock formations. This is because there is no increase in branch prediction performance (whether the exit from a superblock is forward or back should be somewhat random), and there are more branches. Of course if the static prediction is not-taken, then we should see performance gains.
 
    All the results are reported assuming there are no restrictions on which instructions can be issued together (i.e. uniform functional units). I would be interested in seeing results where a real machine description is used, and how much performance degradation occurs.
 
Strengths:
    The results section (sec. 4) seems quite thorough. They do a careful job at normalizing the results to well known commercial compilers and traditional optimizations. Additionally, the performance issues involved with I-cache and D-cache limitations are addressed, and compile time, cost and code size are also looked at.
 
[1] Wen-mei W. Hwu, ... "The Superblock: An Effective Technique for VLIW and Superscalar Compilation", The Journal of Supercomputing, Kluwer Academic Publishers, 1993, pp. 229-248.