Compiler Synthesized Dynamic Branch Prediction

The idea of insering additional instructions by the compiler to implement dynamic branch prediction is very interesting.

It is a good idea to get the branch predicition by observing the reg value. But neither the basic algorithm for branch prediction synthesis nor the practical prediction algorithm is efficient in memory storage requirement. 32 registers for a machine is common. The prediction of a certain branch may clearly be independent of some of them. Take advantage of it? Obviously, different reg, different branches can be assigned different quantization method to achieve better performance.

Prediction distance and the limitation on the complexity of the compiler synthsized predictor is determined by the property of different architectures. How to define the suitable value for them in different circumstances (trade off)?

Do sth other than examining the value of the registers by compiler synthsis? Combine some history information? For example, SAg. Build a set branch history reg file. The compiler can group branches to different set. The BPT can reside in fast cache. It need 2 cycles to get the direction prediction. But such fetch generally can start early. Seems there should not be too much trouble. For wide issue machine, there can be enough bubble for them.

The combination of different predictor will either comsume more hw or more static/dynamic code size.

Question:

What is an acceptable execution time for such a compiler synthesys in reality?