It seems that a good amount of speedup from Hyperblocks is achieved through
overlapping off-main-trace instructions with main trace instructions to fill
otherwise idle cycles. With this in mind, we may be able to speed up superblock
code by pulling off superblock code into the superblock. We can favor
instructions that do not penalize other paths to import first, and then resort
to instructions that penalize other paths with some cost estimate. This may
significantly improve performance of superblock code and help non-predicated
architectures.