It seems that a good amount of speedup from Hyperblocks is achieved
through
overlapping off-main-trace instructions with main trace instructions
to fill
otherwise idle cycles. With this in mind, we may be able to speed
up superblock
code by pulling off superblock code into the superblock. We can
favor
instructions that do not penalize other paths to import first, and
then resort
to instructions that penalize other paths with some cost estimate.
This may
significantly improve performance of superblock code and help non-predicated
architectures.