The papers both seem to have good preliminary results with the PEP schemes. Table 1 in August's Paper made me wonder about the hardware cost of these approaches. First, I'm somewhat interested to see how this cost compares to the rest of the today's superscalar processors. The paper by Mahlke seemed to reflect a desire to reduce this overall cost. I'm sure that even some additional hardware would not detract greatly from the gains of improved prediction. Accordingly I feel that by making by using more hardware to interpret the compiler predictions or to enhance them could be of some benefit.
Another idea that I would like to explore is that the compiler has no knowledge of what a program might be expecting to happen and I would like to explore the idea of giving hints to the compiler about branch behavior. This is clearly a dangerous idea for giving the programmer the possible ability to give misinformation to the compiler.