Observations for "Imporiving the Accuracy of Dynamic Branch Predication Using Branch Correlation", Lecture 2

The scheme described in reference [1] can be classified as a GAp scheme based on the classification system described in reference [2]. The motivation behind the scheme is to allow the prediction of a branch to be based on the outcome of other, possibly related, branches.

After considering the results in the paper I was surprised to see that increasing the number of branches in the history almost always increased prediction accuracy. I would think that the unrelated branches would pollute the branch predictors for a given branch, if the given branch was independent of some of the branches in the history window. After more careful consideration, it is possible, after a large number of executions of a sequence of branches, say in a loop, the set of predictors required to predict the given branch have been "synchronized". Suppose that we consider a history of 4 branches. 3 of these branches are correlated and the 4th one is not. Here, we would need eight predictors to properly segment the branch tree leading to the given branch, but since our branch history register has 4 bits, we use sixteen predictors. This means that we may need more branch executions before we have primed all the predictors when compared against using eight predictors.

Another possible cause of misprediction is branch intervals. If the number of branches between repeated executions of a given branch is relatively prime compared to the length of the branch history register, it will take many iterations before the same predictor is used twice, despite the fact that the N branches previous to the given branch had the same history.

It is very difficult for the hardware to know which branches are actually correlate and which aren't. A long chain of data dependencies may need to be analyzed in hardware to determine possible correlations. It is even more difficult for the hardware to know which prior branches actually influence the behavior of a given branch in such a way as to make it more predictable. Considering the correlation of such a branch may reduce prediction accuracy.

The compiler, on the other hand, has the resources and information required to perform much more sophisticated correlation analysis. A scheme where the compiler provides the correlation information and the hardware makes the predictions follows.

The architecture specifies a new set of registers, called branch history registers. These registers have a predefined width of M. For each branch history register, the hardware provides 2^{I+M} branch predictors. The ISA allows the compiler to specify one or more BHRs per branch instruction, with one BHR as the primary BHR. When a prediction for a branch is needed, the primary BHR is used to index which set of 2^I predictors is used to predict the branch. The lower I bits of the branch address can be used to decide which of the predictors within the set of 2^I predictors should be used for the final prediction. Once the outcome of a branch is known, non-speculatively, a set of predictors is updated based on the outcome of the branch and then a 1(taken) or 0(not-taken) is shifted into all the BHRs the compiler specified for that branch. The predictors that are updated are chosen in a similar way to the way in which a predictor is chosen for a prediction. The only difference is that all the BHRs are used to select predictors yielding a set of predictors to be updated instead of a single one.

In the scheme described above the compiler can tell the hardware which branches are correlated by assigning them the same BHR. Of course, this is only effective if the correlated branches are less than or equal to the number of bits in the BHR. Note that more bits does not necessarily make for a better predictor, but more branches than bits in the BHR does not cause incorrect behavior. The use of multiple BHRs allows the compiler to have one branch affect the results of many other branches. Care must be taken to update the BHRs properly for predicated unconditional branches. A 0 must be shifted into the BHR when the predicate is false.

References:

[1] S. T. Pan, K. So, and J.T. Rahme. Improving the Accuracy of Dynamic Branch Prediction Using Branch COrrelation. In Proceedings of ASPLOS, pages 76-84, Oct 1992.

[2] T. Y. Teh and Y. N. Patt. Alternative Implementations of Two-Level Adaptive Branch Prediction. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 124-134, May 1992.