### TransForm: Formally Specifying Transistency Models and Synthesizing Enhanced Litmus Tests

Naorin Hossain Princeton University

January 22, 2021

Naorin Hossain, Caroline Trippel, Margaret Martonosi. "TransForm: Formally Specifying Transistency Models and Synthesizing Enhanced Litmus Tests", 47th International Symposium on Computer Architecture (ISCA '20).





















Memory Consistency Models (MCMs) specify observable behaviors for concurrent programs

 MCMs specify rules for legal values that can be returned when software loads from memory on a shared memory system

Problem: don't reason about program executions impacted bya. sshared virtual memory (VM) state with existing MCM constructsa. sshared virtual memory (VM) state with existing MCM constructsb. st flag = 1;if (r1 != 1) goto c;<br/>d. ld r2 = data;b. st data = 1;c. ld r1 = flag;<br/>d. ld r2 = data;

Litmus test: small diagnostic

**This work: Memory Transistency Models (MTMs)** – the superset of MCMs that additionally capture VM-aware ordering specifications

memory fences where needed)

### Prior work

#### Leslie Lamport defined first MCM: sequential consistency [Lamport 1979]





# TransForm introduces constructs for ISA-level MTM specification and ELT synthesis

- Formal MTM vocabulary captures system- and hardware-level VM events and interactions with user-facing program instructions
- Enables ISA-level MTM specification
- Enables automated *enhanced litmus test (ELT)* synthesis



Allows for verification against formally specified MTM

**TransForm** 

our work

prior work

MTM vocabulary

MCM vocabulary

hardware operations

system operations

**ISA-specific MTM** 

ELT synthesis engine

used for



Background on ISA-level MCM vocabulary

Prior Work

My Work

# Approach to defining vocabulary for formally reasoning about MTMs

- MCMs can be defined axiomatically
  - Axiomatic MCM specifications use sets of **relations** that can describe user-facing program executions
  - MCM relations describe user-facing event executions of programs with one-to-one V-to-P address mappings
- MTMs are *superset* of MCMs
  - Axiomatic MTM specifications can use MCM relations but require additional relations to similarly describe transistency events and V-to-P address mappings that can have synonyms and be modified



# ISA-level MCM relations can describe programs and their *candidate executions*

**Candidate executions** – set of possible executions of a program and their outcomes

| riogram                                                      | <u>Instructions</u> |                                                                      |
|--------------------------------------------------------------|---------------------|----------------------------------------------------------------------|
| Core 0                                                       |                     | $, R_2, R_3 \}$                                                      |
| a. st data = 1;                                              | c.  d r  = i  ag;   | {W <sub>0</sub> , W <sub>1</sub> , R <sub>2</sub> , R <sub>3</sub> } |
| b. st flag = 1;                                              | d. ld r2 = data;    | $x, W_1 \rightarrow y, R_2 \rightarrow y, R_3 \rightarrow x$         |
| program order (po)                                           |                     |                                                                      |
| $\mathbf{po} = \{W_0 \rightarrow W_1, R_2 \rightarrow R_3\}$ |                     |                                                                      |

#### <u>Graph</u>



#### **Candidate execution**

Program

| Core 0               | Core 1               |
|----------------------|----------------------|
| W <sub>0</sub> x = 1 | R <sub>2</sub> y = 1 |
| W <sub>1</sub> y = 1 | $R_3 x = 0$          |

Communication (com) relationsreads from (rf)rf =  $\{W_1 \rightarrow R_2\}$ coherence order (co)co =  $\{\}$ from reads (fr)fr =  $\{R_3 \rightarrow W_0\}$ 

Numerical subscripts serve as instruction ID.

Accessed data (outcome) symbolically represented by com relations

[Shasha & Snir, 1988] [Alglave et al., 2014]

# ISA-level MCM relations can describe programs and their *candidate executions*

**Candidate executions** – set of possible executions of a program and their outcomes

#### **Program**

| Core 0           | Core 1           |
|------------------|------------------|
| W <sub>0</sub> x | R <sub>2</sub> y |
| W <sub>1</sub> y | R <sub>3</sub> x |

Instructions

Event = { $W_0$ ,  $W_1$ ,  $R_2$ ,  $R_3$ } MemoryEvent = { $W_0$ ,  $W_1$ ,  $R_2$ ,  $R_3$ } Location = {x, y} address { $W_0 \rightarrow x$ ,  $W_1 \rightarrow y$ ,  $R_2 \rightarrow y$ ,  $R_3 \rightarrow x$ } program order (po) po = { $W_0 \rightarrow W_1$ ,  $R_2 \rightarrow R_3$ }

#### <u>Graph</u>



#### **Candidate execution**

| Core 0               | Core 1               |
|----------------------|----------------------|
| W <sub>0</sub> x = 1 | R <sub>2</sub> y = 1 |
| W <sub>1</sub> y = 1 | $R_3 x = 0$          |

Communication (com) relationsreads from (rf)rf =  $\{W_1 \rightarrow R_2\}$ coherence order (co)co =  $\{\}$ from reads (fr)fr =  $\{R_3 \rightarrow W_0\}$ 

Numerical subscripts serve as instruction ID.

Accessed data (outcome) symbolically represented by com relations

[Shasha & Snir, 1988] [Alglave et al., 2014]

# MCM specifications place constraints on permitted execution behaviors

Axiomatic MCM specifications use MCM relations to describe axioms that constrain candidate execution behaviors

Intel x86 processors use the **total store order (TSO)** memory model (**x86-TSO**) [Owens et al., 2009]: strict sequential memory access orderings but relaxed Store->Load orderings to allow for store buffering

Causality - axiom for x86-TSO: acyclic(rfe + co + fr + ppo + fence)



# MCM specifications place constraints on permitted execution behaviors

Axiomatic MCM specifications use MCM relations to describe axioms that constrain candidate execution behaviors

Intel x86 processors use the **total store order (TSO)** memory model (**x86-TSO**) [Owens et al., 2009]: strict sequential memory access orderings but relaxed Store->Load orderings to allow for store buffering

Causality - axiom for x86-TSO: acyclic(rfe + co + fr + ppo + fence)





Prior Work

My

22

# V-to-P address mappings need to be stored and accessed during memory events

V-to-P address mappings are stored in **page tables**.

**Page table entries (PTEs)** hold address mapping and status bits (permissions, access, dirty).

Page tables are usually structured hierarchically.

When address translation is needed, a **page table walk** traverses the page table levels to find the desired address mapping.



A. Bhattacharjee and D. Lustig, "Architectural and operating system support for virtual memory", *Synthesis Lectures on Computer Architecture*, 2017.

# V-to-P address mappings are cached in the *translation lookaside buffer* (TLB)



### V-to-P address mappings can be changed by OS

Operating system (OS) may change address mappings in page tables.

Corresponding TLB entries must be invalidated on *each core* to prevent stale mapping accesses.



### V-to-P address mappings can be changed by OS

Operating system (OS) may change address mappings in page tables.

Corresponding TLB entries must be invalidated on *each core* to prevent stale mapping accesses.

New page table walk needed to load new mapping into TLB.



# Virtual memory events TransForm needs to support

| Hardware-level events                                                              | System-level events                                                                       |
|------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------|
| Page table walk<br>Loads TLB entries on memory access                              | Address mapping changes<br>V-to-P address mapping must be<br>modifiable like data         |
| PTE status bit updates<br>TransForm supports dirty bit<br>updates on memory stores | TLB entry invalidations<br>May be invoked on multiple cores<br>by address mapping changes |

# Outline TransForm MTM vocabulary MCM vocabulary hardware operations

used for

system operations

- Background on ISA-level MCM vocabulary
- Background on virtual memory systems
- Novel ISA-level MTM vocabulary
- Automating synthesis of ELTs
- Case Study: a estimated MTM for x86

ELT synthesis engine

• Future Work & Conclusions

**ISA-specific MTM** 



My Work

suite of ELTs

#### TransForm supports page table walks (PTW) and dirty bit updates

#### **Ghost instructions**

**PTW**: loads translation lookaside buffer (TLB) entry

**dirty bit update**: modifies dirty bit in PTE



### TransForm supports page table walks (PTW) and dirty bit updates

#### **Ghost instructions**

**PTW**: loads translation lookaside buffer (TLB) entry

**dirty bit update**: modifies dirty bit in PTE

ghost - relates user-facing MemoryEvent to
invoked ghost instructions (numerical subscripts)



### TransForm supports page table walks (PTW) and dirty bit updates

#### **Ghost instructions**

**PTW**: loads translation lookaside buffer (TLB) entry

**dirty bit update**: modifies dirty bit in PTE

ghost - relates user-facing MemoryEvent to
invoked ghost instructions (numerical subscripts)

rf\_ptw - relates PTW to user-facing
MemoryEvents that access loaded TLB entry



#### TransForm supports page table walks (PTW) and dirty bit updates

#### **Ghost instructions**

**PTW**: loads translation lookaside buffer (TLB) entry

**dirty bit update**: modifies dirty bit in PTE

ghost - relates user-facing MemoryEvent to
invoked ghost instructions (numerical subscripts)

rf\_ptw - relates PTW to user-facing
MemoryEvents that access loaded TLB entry



#### TransForm supports page table walks (PTW) and dirty bit updates

#### **Ghost instructions**

**PTW**: loads translation lookaside buffer (TLB) entry

**dirty bit update**: modifies dirty bit in PTE

ghost - relates user-facing MemoryEvent to
invoked ghost instructions (numerical subscripts)

rf\_ptw - relates PTW to user-facing
MemoryEvents that access loaded TLB entry



### MTM Vocabulary: System-level events

TransForm supports address remappings via PTE Writes and TLB entry invalidations

#### **Support instructions**

**PTE Write**: changes address mapping stored in a PTE for some VA v



### MTM Vocabulary: System-level events

TransForm supports address remappings via PTE Writes and TLB entry invalidations

#### **Support instructions**

**PTE Write**: changes address mapping stored in a PTE for some VA v

**INVLPG**: invalidates TLB entry (named after x86 instruction)

remap – relates PTE Writes to invoked INVLPGs



### MTM Vocabulary: System-level events

TransForm supports address remappings via PTE Writes and TLB entry invalidations

#### **Support instructions**

**PTE Write**: changes address mapping stored in a PTE for some VA v

**INVLPG**: invalidates TLB entry (named after x86 instruction)

remap – relates PTE Writes to invoked INVLPGs

**rf\_pa** – relates PTE Write for VA v  $\rightarrow$  PA p to user-facing MemoryEvents accessing PA p via VA v **fr\_pa** – relates user-facing MemoryEvents accessing PA p via VA v to PTE Writes for VA v'  $\rightarrow$  PA p **co\_pa** and **fr\_va** follow similarly



### MTM Vocabulary: System-level events

#### TransForm supports address remappings via PTE Writes and TLB entry invalidations

#### **Support instructions**

**PTE Write**: changes address mapping stored in a PTE for some VA v

**INVLPG**: invalidates TLB entry (named after x86 instruction)

remap – relates PTE Writes to invoked INVLPGs

**rf\_pa** – relates PTE Write for VA v  $\rightarrow$  PA p to user-facing MemoryEvents accessing PA p via VA v **fr\_pa** – relates user-facing MemoryEvents accessing PA p via VA v to PTE Writes for VA v'  $\rightarrow$  PA p **co\_pa** and **fr\_va** follow similarly



### MTM Vocabulary: System-level events

TransForm supports address remappings via PTE Writes and TLB entry invalidations

These new com relations can be used to derive same PA accesses.

**rf\_pa** – relates PTE Write for VA v → PA p to user-facing MemoryEvents accessing PA p via VA v **fr\_pa** – relates user-facing MemoryEvents accessing PA p via VA v to PTE Writes for VA v' → PA p **co\_pa** and **fr\_va** follow similarly



### MTM Vocabulary: System-level events

TransForm supports address remappings via PTE Writes and TLB entry invalidations

These new com relations can be used to derive same PA accesses.



**rf\_pa** – relates PTE Write for VA v  $\rightarrow$  PA p to user-facing MemoryEvents accessing PA p via VA v **fr\_pa** – relates user-facing MemoryEvents accessing PA p via VA v to PTE Writes for VA v'  $\rightarrow$  PA p **co\_pa** and **fr\_va** follow similarly

Forbidden!

### MTM Vocabulary: Putting it all together

Program executions with transistency events and relations can get quite complex but they allow us to capture these additional interactions that can occur and impact the program's execution.





- Background on ISA-level MCM vocabulary

Prior

### MCMs can be verified with litmus tests. MTMs can be verified with *enhanced* litmus tests.

- Litmus tests: small diagnostic programs for validating MCM behaviors
  - Executions and their outcomes can be deemed **permitted** or **forbidden** by MCM specification
- Enhanced litmus tests (ELTs): litmus tests enhanced with system- and hardware-level events that facilitate address translation



### From Specification to Test Synthesis



- ELTs can be described with MTM relations and support verification against an MTM spec
  - Goals:
    - Automated
    - Interesting and minimal ("Spanning set")
    - Deduplicated
    - Comprehensive (to a bound)

TransForm's synthesis engine starts by synthesizing all possible candidate executions up to a bound



## Candidate executions are pruned for interesting ELT behaviors and checked for minimality



# Unique ELTs are found by deduplicating synthesized ELTs with a post-processing script





Background on ISA-level MCM vocabulary

Prior

**x86t\_elt** predicates are composed of x86-TSO axioms and new transistency-specific axioms

- **x86t\_elt**: an approximate x86 transistency model based on prior work and publicly available documentation
- x86-TSO: sc\_per\_loc, rmw\_atomicity, causality invlpg (required) acylic[fr\_va + remap + ^po]
   x86-TSO: sc\_per\_loc, rmw\_atomicity, causality tlb\_causality (auxiliary) acyclic[ptw\_source + com]



# A per-axiom suite was synthesized for each **x86t\_elt** axiom



#### 140 total unique ELTs!

The synthesized x86t\_elt suite consisted of all relevant ELTs from COATCheck and more

- All 22 relevant ELTs from COATCheck synthesized
  - 7 ELTs synthesized verbatim  $\rightarrow$  map to 4 ELT programs in x86t\_elt suite
  - 15 ELTs can be reduced to a minimal ELT that is synthesized
- 4 ELTs from COATCheck, 136 new ELTs



Background on ISA-level MCM vocabulary

Prior Work

My Work

### Ongoing and Future Work

- ► Empirical MTM testing
- ➢Validate x86t\_elt and verify x86 processors using synthesized ELTs
- Specify other MTMs (e.g., RISC-V)
- Model additional transistency interactions (e.g., permission bit updates)
- Formally reason about transistency and security

### Conclusions

- **TransForm**: framework for formal specification of MTMs and ELT synthesis
- Enables modern ISAs to have a formal specification that includes VM
- Offers systems programmers and hardware designers a stronger opportunity for verification of full systems
- Available at: <u>https://github.com/naorinh/TransForm</u>

## TransForm: Formally Specifying Transistency Models and Synthesizing Enhanced Litmus Tests

Naorin Hossain Princeton University

January 22, 2021

https://github.com/naorinh/TransForm

## Backup Slides

Prior work developed a tool for verifying correctness of hardware MTM *implementations* 

• Enhanced litmus tests (ELTs): litmus tests enhanced with hardwareand system-level interactions that facilitate the VM abstraction



- 1. Formal MTM verification at *microarchitectural*-level
- 2. Lacked formal MTM specification
- 3. Hand-generated ELT suites

Communication relations can be extended to apply to virtual-to-physical address mappings

- rf\_pa: maps a PTE write mapping VA v → PA p to instructions that access VA v → PA p.
- fr\_pa: maps instructions that access VA v → PA p to PTE writes that map VA v' → PA p.
- **co\_pa**: maps PTE writes that change mappings for different VAs to the same PA.
- **fr\_va**: maps instructions that access VA v  $\rightarrow$  PA p to PTE writes that map VA v  $\rightarrow$  PA p'.



Tests with instructions that do not contribute to forbidden outcomes violate minimality criterion

**Minimality criterion**: any **relaxation** on the test results in satisfying the transistency predicates

**Relaxations** for transistency purposes are the removal of instructions and the following dependent instructions:

- 1. Invoked ghost instructions
- 2. Invoked INVLPGs
- 3. Dependent RMW operations

ELT execution should have a forbidden outcome that becomes legal under *every* possible *isolated* relaxation.

- x86-TSO causality axiom: acyclic(rfe + co + fr + ppo + fence)
- Removal of W<sub>4</sub> as relaxation does not result in satisfying causality