Princeton University
Computer Science Dept.

Computer Science 597d
Advanced Topics in Computer Science: Synergistic Hardware-Compiler Architecture Design

David August

Fall 1999


Reading List

Some papers are available in Tina McCoy's office, room CS410.

For Lecture 2 (9/22/99) - Read All, Observe 1:

"Improving the Accuracy of Dynamic Branch Prediction Using Branch Correlation", Shien-Tai Pan, Kimming So, Joseph T. Rahmeh, Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems. Available from me or Tina McCoy.

"Alternative Implementations of Two-Level Adaptive Branch Prediction", Tse-Yu Yeh and Yale N. Patt, Proceedings of the 19th International Symposium on Computer Architecture. Available here.

"Alternative Implementations of Hybrid Branch Predictors", Po-Yung Chang, Eric Hao, Yale N. Patt, Proceedings of the 28th International Symposium on Computer Architecture. Available here.

For Further Exploration:

"Branch Prediction Based on Universal Data Compression Algorithms", Eitan Fedorovsky, Meir Feder, Shlomo Weiss, Proceedings of the 25th International Symposium on Computer Architecture.

"A Language for Describing Predictors and its Application to Automatic Synthesis", Joel Emer and Nikolas Gloy, Proceedings of the 24th International Symposium on Computer Architecture.

For Lecture 3 (9/27/99) - Read All, Observe 1:

"Compiler Synthesized Dynamic Branch Prediction", S. A. Mahlke and B. Natarajan, Proceedings of the 29th International Symposium on Microarchitecture. Available here.

"Architectural Support for Compiler-Synthesized Dynamic Branch Prediction Strategies: Rationale and Initial Results", D. I. August, D. A. Connors, J. C. Gyllenhaal, and W. W. Hwu, The 3rd International Symposium on High-Performance Computer Architecture. Available here.

For Lecture 4 (9/29/99) - Read All, Observe 1:

"Programs follow paths", T. Ball and J. R. Larus, Microsoft Research Technical Report MSR-TR-99-01. Available here.

"Edge Profiling versus Path Profiling: The Showdown", T. Ball, P. Mataga, and M. Sagiv, Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. Available here.

"Efficient path profiling", T. Ball and J. R. Larus, Proceedings of 29th Annual International Symposium on Microarchitecture. Available here.

For Lecture 5 (10/4/99) - Read All, Observe 1:

"Exploiting Hardware Performance Counters with Flow and Context Sensitive Profiling", G. Ammons, T. Ball, and J. Larus, Proceedings of the 1997 ACM SIGPLAN Conference on Programming Language Design and Implementation. Available here.

"Using branch handling hardware to support profile-driven optimization", T. M. Conte, B. A. Patel, and J. S. Cox, Proceedings of the 27th Annual International Symposium on Microarchitecture. Available here.

T. M. Conte, K. N. Menezes, and M. A. Hirsch, "Accurate and practical profile-driven compilation using the profile buffer", Proceedings of the 29th Annual International Symposium on Microarchitecture. Available here.

For Lecture 7 (10/13/99) - Read All, Observe 1:

B. R. Rau, D. W. L. Yen, W. Yen, R. A. Towle, "The Cydra 5 Departmental Supercomputer - Design Philosophies, Decisions, and Trade-offs", IEEE Computer, 22(1):12-35, January 1989.

W. W. Hwu and Y. N. Patt, "Checkpoint Repair for High-Performance Out-of-Order Execution Machines", IEEE Transaction on Computers, C-36:1496-1514, December 1987.

R. M. Tomasulo, "An Efficient Algorthim for Exploiting Multiple Arithmetic Units", IBM Journal of Research and Development, 11:25-33, January 1967.

For Lecture 8 (10/18/99) - Read all, Observe 1:

Mahlke, Chen, Hwu, Rau, Schlansker, "Sentinel Scheduling for VLIW and Superscalar Processors", Proceedings of the Fifth Int'l Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS-V), Boston, MA, Oct. 12-15, 1992, pp.238-247. Available here.

Gyllenhaal, Hwu, and Rau, "Optimization of Machine Descriptions for Efficient Use", Proceedings of the 29th International Symposium on Microarchitecture, December 1996. Available here.

For Lecture 9 (10/20/99) - Read All, Observe 1:

Wen-mei W. Hwu, ... "The Superblock: An Effective Technique for VLIW and Superscalar Compilation", The Journal of Supercomputing, Kluwer Academic Publishers, 1993, pp. 229-248. Available here.

J. Fisher, "Trace Scheduling: A Technique for Global Microcode Compaction", IEEE Transactions on Computers, Vol. C-30, No. 7 July 1981. Now available outside 407.

W. A. Havanki, S. Banerjia and T. M. Conte, "Treegion scheduling for wide-issue processors", Proceedings of the 4th International Symposium on High-Performance Computer Architecture (HPCA-4), Feb. 1998. Available here.

For Lecture 10 (10/25/99) - Read All, Observe 1:

B. R. Rau, M. S. Schlansker, and P. P. Tirumalai. "Code generation schema for modulo scheduled loops." In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 158-169, December 1992. Available here.

B. R. Rau. "Iterative modulo scheduling: An algorithm for software pipelining loops." In Proceedings of the 27th International Symposium on Microarchitecture, pages 63-74, December 1994. Available here.

D. M. Lavery and W. W. Hwu. "Modulo Scheduling of Loops in Control- Intensive Non-Numeric Programs," In Proceedings of the 29th ACM/IEEE International Symposium on Microarchitecture, pp. 126-137. Available here.

For Lecture 11 (10/27/99) - Read All, Observe 1:

Brian L. Deitrich, Wen-mei W. Hwu, "Speculative Hedge: Regulating Compile-Time Speculation Against Profile Variations", Proceedings of the 29th International Symposium on Microarchitecture, December 2-4, 1996. Available here.

James Larus, "Whole Program Paths", Proceedings of the SIGPLAN 1999 Conference on Programming Languages Design and Implementation (PLDI 99), May 1999. Available here.

For Lecture 13 (11/24/99) - Read All, Observe 1:

Scott A. Mahlke, ... "Effective Compiler Support for Predicated Execution Using the Hyperblock", Proceedings of the 25th International Symposium on Microarchitecture, December, 1992. Available here.

Scott A. Mahlke, ... "A Comparison of Full and Partial Predicated Execution Support for ILP Processors", Proceedings of the 22nd International Symposium on Computer Architecture, June, 1995. Available here.

For Lecture 15 (12/1/99) - Read and Observe:

David I. August, Wen-mei W. Hwu, and Scott A. Mahlke "The Partial Reverse If-Conversion Framework for Balancing Control Flow and Predication", International Journal of Parallel Programming, Vol. 27, No. 5, October, 1999, pages 348-423. Available here.

For Lecture 16 (12/6/99) - Read and Observe:

David I. August, ... "The Program Decision Logic Approach to Predicated Execution" Proceedings of the 26th International Symposium on Computer Architecture, May, 1999. Available here.

For Lecture 17 (12/8/99) - Read 2 and Observe 1:

D. M. Gallagher, ... "Dynamic Memory Disambiguation Using the Memory Conflict Buffer", Proceedings of the 6th International Conference on Architecture Support for Programming Languages and Operating Systems, San Jose, California, October, 1994. pp.183-195. Available here.

David I. August, ... "Integrated Predicated and Speculative Execution in the IMPACT EPIC Architecture", Proceedings of the 25th International Symposium on Computer Architecture, May, 1998. Available here.

"IA-64 Application Developers Architecture Guide", May 1999. Available here. About 500 pages or so....