# Memory Management

# Goals of this Lecture

- Help you learn about:
  - The memory hierarchy
  - Spatial and temporal locality of reference
  - Caching, at multiple levels
  - Virtual memory
  - ... and thereby ...
  - How the hardware and OS give application pgms:

1

8

- The illusion of a large contiguous address space
- · Protection against each other

Virtual memory is one of the most important concepts in systems programming

# **Motivation for Memory Hierarchy**

- · Faster storage technologies are more costly
  - Cost more money per byte
  - Have lower storage capacity
  - Require more power and generate more heat
- The gap between processing and memory is widening
  - Processors have been getting faster and faster
  - Main memory speed is not improving as dramatically
- · Well-written programs tend to exhibit good locality
- Across time: repeatedly referencing the same variables
- Across space: often accessing other variables located nearby

Want the *speed* of fast storage at the *cost* and *capacity* of slow storage. Key idea: memory hierarchy!



## Registers

- Usually reside directly on the processor chip
- Essentially no latency, referenced directly in instructions
  Low capacity (e.g., 32-512 bytes)

Around 100 times slower than a clock cycle
Constant access time for any memory location
Modest capacity (e.g., 512 MB-4GB typical)

## Main memory



**R** 

#### • Disk

- Around 100,000 times slower than main memory
- Faster when accessing many bytes in a rowHigh capacity (e.g., 200-500 GB typical)



Now starting to see solid-state disks
Higher I/O rates, no mechanical limits

# Widening Processor/Memory Gap

- Gap in speed increasing from 1986 to 2000
   CPU speed improved ~55% per year
  - Main memory speed improved only ~10% per year
- Main memory as major performance bottleneck
  Many programs stall waiting for reads and writes to finish
- Changes in the memory hierarchy
   Increasing the number of registers
  - 8 integer registers in the x86 vs. 128 in the Itanium
  - Adding caches between registers and main memory
  - On-chip level-1 cache and on/off-chip level-2 cache







## Two kinds of locality

- Temporal locality: recently referenced items are likely to be referenced in near future
- Spatial locality: Items with nearby addresses tend to be referenced close together in time.

## · Locality example

| 3um - 0,                  |
|---------------------------|
| for $(i = 0; i < n; i++)$ |
| <pre>sum += a[i];</pre>   |
| return sum;               |
|                           |

sum = 0;

**R** 

# Program data Temporal: the variable sum

- Spatial: variable a [i+1] accessed soon after a [i]
- Instructions
- Temporal: cycle through the for-loop repeatedly
- Spatial: reference instructions in sequence

# Locality Makes Caching Effective

#### Cache

- $\boldsymbol{\cdot}$  Smaller, faster storage device that acts as a staging area
- ... for a *subset* of the data in a larger, slower device
- Caching and the memory hierarchy
  - Storage device at level k is a cache for level k+1
  - Registers as cache of L1/L2 cache and main memory
  - Main memory as a cache for the disk
  - Disk as a cache of files from remote storage
- · Locality of access is the key
  - Most accesses satisfied by first few (faster) levels
- Very few accesses go to the last few (slower) levels





# **Cache Block Sizes**



Fixed-sized blocks are easier to manage (common case)

Variable-sized blocks make more efficient use of storage

# Block size

- Depends on access times at the level k+1 device
- Larger block sizes further down in the hierarchy
- $\bullet$  E.g., disk seek times are slow, so disk pages are larger

#### Examples

- CPU registers: 4-byte words
- L1/L2 cache: 32-byte blocks
- Main memory: 4 KB pages
- Disk: entire files







### · Evicting a block from the cache

- New block must be brought into the cache • Must choose a "victim" to evict
- Optimal eviction policy
  - · Evict a block that is never accessed again
  - · Evict the block accessed the furthest in the future
  - · Impossible to implement without knowledge of the future

-

•

- · Using the past to predict the future · Evict the "least recently used" (LRU) block · Assuming it is not likely to be used again soon
- But, LRU is often expensive to implement
  - · Need to keep track of access times
  - · So, simpler approximations of LRU are used

# Who Manages the Cache?

#### Registers

- Cache of L1/L2 cache and main memory
- · Managed explicitly by the compiler
- By determining which data are brought in and out of registers
- Using relatively sophisticated code-analysis techniques
- L1/L2 cache
  - · Cache of main memory
  - · Managed by the hardware
  - Using relatively simple mechanisms (e.g., "i mod 4")

#### • Main memory

- Cache of the disk
- · Managed (in modern times) by the operating system
- Using relatively sophisticated mechanisms (e.g., LRU-like)
- · Since reading from disk is extremely time consuming





# Making Good Use of Memory and Disk 😽

#### Good use of the disk

- Read and write data in large "pages"
- ... to amortize the cost of "seeking" on the disk
- E.g., page size of 4 KB
- Good use of main memory
  - Even though the address space is large
  - · ... programs usually access only small portions at a time
  - Keep the "working set" in main memory
    - Demand paging: only bring in a page when needed
    - Page replacement: selecting good page to swap out
- Goal: avoid thrashing
  - Continually swapping between memory and disk



6

















# VM as a Tool for Memory Protection



23

#### Memory protection

- Prevent process from unauthorized reading or writing of memory
- User process should not be able to
  - Modify the read-only text section in its own address space
     Read or write operating-system code and data structures
  - Read or write the private memory of other processes

#### Hardware support

- · Permission bits in page-table entries (e.g., read-only)
- Separate identifier for each process (i.e., process-id)
- Switching between *unprivileged* mode (for user processes) and *privileged* mode (for the operating system)



















# VM as a Tool for Memory Management 😽

#### Simplifying linking

- Same memory layout for each process
- E.g., text section always starts at  $0 \times 08048000$
- E.g., stack always grows down from 0x0bfffffff
   Linker can be independent of physical location of code

#### Simplifying sharing

- · User processes can share some code and data
- E.g., single physical copy of stdio library code (like printf)
- Mapped in to the virtual address space of each process

#### Simplifying memory allocation

- User processes can request additional memory from the heap • E.g., using malloc () to allocate, and free() to deallocate
- OS allocates *contiguous* virtual pages...
- ... and scatters them *anywhere* in physical memory

# Summary

- Memory hierarchy
  - Memory devices of different speed, size, and cost
  - Registers, on-chip cache, off-chip cache, main memory, disk, tape

1

29

- Locality of memory accesses making caching effective
- Virtual memory
  - Separate virtual address space for each process
- Provides caching, memory protection, and memory management
   Implemented via cooperation of the address-translation hardware and the OS (when page faults occur)

# In Dynamic Memory Management lectures:

- Dynamic memory allocation on the heap
- · Management by user-space software (e.g.,  ${\tt malloc}$  () and  ${\tt free}$  () )