

















## Locality/Caching Example: Matrix Mult

## Matrix multiplication

- Matrix = two-dimensional array
- Multiply n-by-n matrices A and B
- Store product in matrix C

## Performance depends upon

- Effective use of caching (as implemented by **system**)
- Good locality (as implemented by you)























| Storage Hierarchy & Caching Issues                                                                                                                                                                                                                                                                                                                            |                                                                |  |  |  |  |  |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------|--|--|--|--|--|
| Issue: Block size?         • Slow data transfer between levels k and k+1         • => use large block sizes at level k (do data transfer less often)         • Fast data transfer between levels k and k+1         • => use small block sizes at level k (reduce risk of cache miss)         • Lower in pyramid => slower data transfer => larger block sizes |                                                                |  |  |  |  |  |
| • Lower in pyramiu => si                                                                                                                                                                                                                                                                                                                                      |                                                                |  |  |  |  |  |
| Device                                                                                                                                                                                                                                                                                                                                                        | Block Size                                                     |  |  |  |  |  |
|                                                                                                                                                                                                                                                                                                                                                               | ç                                                              |  |  |  |  |  |
| Device                                                                                                                                                                                                                                                                                                                                                        | Block Size                                                     |  |  |  |  |  |
| Device<br>Register                                                                                                                                                                                                                                                                                                                                            | Block Size 8 bytes                                             |  |  |  |  |  |
| Device<br>Register<br>L1/L2/L3 cache line                                                                                                                                                                                                                                                                                                                     | Block Size<br>8 bytes<br>64 bytes                              |  |  |  |  |  |
| Device<br>Register<br>L1/L2/L3 cache line<br>Main memory page                                                                                                                                                                                                                                                                                                 | Block Size       8 bytes       64 bytes       4KB (4096 bytes) |  |  |  |  |  |

|                               | Storage Hierarchy & Caching Issues |                                                           |                                                                                                                    |    |  |  |  |
|-------------------------------|------------------------------------|-----------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------|----|--|--|--|
| Issue: Who manages the cache? |                                    |                                                           |                                                                                                                    |    |  |  |  |
|                               |                                    | Device                                                    | Managed by:                                                                                                        |    |  |  |  |
|                               |                                    | Registers<br>(cache of L1/L2/L3 cache and<br>main memory) | Compiler, using complex code-<br>analysis techniques<br>Assembly lang programmer                                   |    |  |  |  |
|                               |                                    | L1/L2/L3 cache<br>(cache of main memory)                  | Hardware, using simple algorithms                                                                                  |    |  |  |  |
|                               |                                    | Main memory<br>(cache of local sec storage)               | Hardware and OS, using virtual<br>memory concept with complex<br>algorithms (since accessing disk<br>is expensive) |    |  |  |  |
|                               |                                    | Local secondary storage<br>(cache of remote sec storage)  | End user, by deciding which files to download                                                                      |    |  |  |  |
| l                             |                                    |                                                           |                                                                                                                    | 26 |  |  |  |





















































## **Additional Benefits of Virtual Memory**

Dynamic memory allocation

User processes can request additional memory from the heap

-

- E.g., using malloc() to allocate, and free() to deallocate
- OS allocates *contiguous* virtual memory pages...
- ... and scatters them anywhere in physical memory





