Grant Wallace
CS 597d
 

Observation 10:

Effective Compiler Support for Predicated Execution Using the Hyperblock [1]

 
Strengths:
    Allows for selective combination of basic blocks into a single if-converted (predicated) block structure (Hyperblock). This can help improve performance by limiting the number of paths of execution which must be issued, and allow for better compiler optimizations (compared to combining all basic blocks into one predicated structure). Basic blocks for inclusion are given priority based on smaller size, greater execution frequency, and fewer hazardous instructions (procedure calls, unresolved memory accesses).
    Allows for speculative code motion using techniques like instruction promotion, instruction merging.
    Reduces branch mispredictions because there are fewer branches (if-converted).
    Makes more complete use of a processors issue width (as compared to superblocks), and so on larger issue processors it outperforms superblocks.
 
Weaknesses:
    Will perform worse than superblock formations on lower issue width processors because many of the instructions which must be issued are predicated false (essentially no-ops).
 
    It’s interesting that hyperblocks with all execution paths (IP) and hyperblocks with selected execution paths (PP) often don’t show significant performance variation on 2-issue processors. Sometimes it’s not until 8-issue that a marked difference is seen. I would have expected IP to perform worse on 2-issue processors as compared to PP, and then to slowly catch up as the processor width is increased.
 
    The node-splitting algorithm concentrates on finding merge points where one path has a large flow and small size, and the other path has a small flow and large size. This is the worst case because the large flow path must wait for the instruction issue of the large size path. So these merge points are targeted for node splitting.
 
    For basic block inclusion, the algorithm looks for bbs which are small, executed frequently, and have few hazardous instructions. Large bbs have more opportunity for optimizations even without inclusion. Hazardous instructions decrease the overall code motion and optimizations possible in that bb.
 
Question:
    Section 3.5 mentions compiler optimizations across multiple paths. What types of optimizations are these? Would instruction merging fall into this category?
 
 
[1] Scott A. Mahlke, ... "Effective Compiler Support for Predicated Execution Using the Hyperblock", Proceedings of the 25th International Symposium on Microarchitecture, December, 1992.