The Cydra 5 Departmental Supercomputer

The Cydra 5's Memory Latency Register got me thinking about how this
sort of problem is dealt with in microprocessors today. I was
wondering if there are such mechanism's in any of today's cpu's and
how they function / are addressed by the compiler. I'm sure there
must be a difference in the time it takes for different memory
operations among different processors. For example take a slow
266MHz PII and a 450 PII, their systems have different bus speeds,
faster memory (somewhat), but this will still lead to the possibility
of different memory access times, so I'm wondering how to optimize in
the compiler or how hardware designers have decided to deal with
this. This also brings me to another question about caches and
optimization. How does one optimize cache performance, always for
the largest cache, the most common, the smallest and how have this
decisions been made in practice.