The following observations are mainly for paper[1], with occasional references to paper[2].
for (i=0;i<100;i++) {
if(i%2==1)
{ ... }
else
{ ... }
... }
In the above case, it may not be easy for the compiler to separate the branch condition from the branch for many cycles. The value of the register for i, which can be known well in advance, does not really provide any information about the branch. The branch condition is not related to any other registers than the one holding i. In this case, no matter how we profile the program and generate value-based predictor for the branch using the method from paper[1], the predictor would perform the same bad, while a synthesized dynamic branch predictor based on the POP scheme or PEP scheme[2], or a hardware two-level adaptive branch predictor can do much better. It can also be noticed that all the programs used for evaluation in paper[1] are integer programs. Currently, it seems to be the case that floating-point performance is more critical to the evaluation of processor performance because most multimedia applications use floating point. It is desirable that the method can also work well for floating point programs. However, since floating point registers usually cover much larger value range than integer registers, it will be very hard to find a generic quantizing scheme to fit the register value based approach in paper[1].
Compiler-synthesized predictor can utilized the useful information gained
during compilation and profiling. Hardware predictors are inferior in terms
of these. However, the advantages of hardware predictors are that they
are fast and they perform very well in general. For synthesized predictors,
because they insert prediction instructions into the code, they increase
the number of instructions to be executed. In case that both synthesized
predictor and hardware predictor predict correctly, using synthesized version
is slower, as summarized below.
- S right, H right, S slow;
- S right, H wrong, S fast;
- S wrong, H right, S slow;
- S wrong, H wrong, S slow.
Therefore, it may be worth consideration that we mix synthesized predictor with hardware predictor based on profiling information(if profiling is useful in general). For the portion of code that synthesized predictor achieves higher prediction rate, use synthesized predictor, otherwise use hardware predictor.
[1] Scott. Mahlke and B. Natarajan. Branch Prediction Based on Universal Data Compression Algorithms. Proceedings of the 29th International Symposium on Microarchitecture, 1997
[2] D.~I. August, D.~A. Connors, J.~C. Gyllenhaal and W.~W. Hwu Architectural
Support for Compiler-Synthesized Dynamic Branch Prediction Strategies:
Rationale and Initial Results. The 3rd International Symposium on High-Performance
Computer Architecture, 1997