Branch Prediction, Instruction-Window Size, and Cache Size: Performance Tradeoffs and Sampling Techniques
Design parameters interact in complex ways in modern processors,
especially because out-of-order issue and decoupling buffers sometimes
allow latencies to be overlapped. Tradeoffs among
instruction-window size, branch-prediction accuracy, and instruction-
and data-cache size can change as these parameters move through
different domains. For example, modeling unrealistic caches can
under- or over-state the benefits of better prediction or a larger
instruction window. Avoiding such pitfalls requires understanding how
all these parameters interact.
Because such methodological mistakes are common, this paper provides a
comprehensive set of SimpleScalar simulation results from SPECint95
programs, showing the interactions among these major structures. In
addition to presenting this database of simulation results, major
mechanisms driving the observed tradeoffs are described. The
paper also considers appropriate simulation techniques when sampling
full-length runs with the SPEC reference inputs.
In particular, the results show that branch mispredictions limit the
benefits of larger instruction windows, that better branch prediction
and better instruction cache behavior have synergistic effects, and
that larger instruction windows and larger data caches trade off and
have overlapping effects. In addition, simulations of only 50 million
instruction in length can yield representative results if these short
windows are carefully selected.
- This technical report has been published as
- Branch Prediction, Instruction-Window Size, and Cache Size: Performance
Tradeoffs and Simulation Techniques.
K. Skadron, P.S. Ahuja, M. Martonosi, and D.W. Clark.
IEEE Transactions on Computers,
48(11):1260-81, Nov. 1999.