#### Better I/O Through Byte-Addressable, Persistent Memory

Jeremy Condit, Ed Nightingale, <u>Chris Frost</u>, Engin Ipek, Ben Lee, Doug Burger, Derrick Coetzee

> Microsoft<sup>®</sup> Research



+ Fast

- + Byte-addressable
- Volatile

#### Disk / Flash



- + Non-volatile
- Slow
- Block-addressable

<u>Byte-addressable, Persistent RAM</u>



- + Fast
- + Byte-addressable
- + Non-volatile

<u>Byte-addressable, Persistent RAM</u>



+ Fast

- + Byte-addressable
- + Non-volatile

#### How do we build fast, reliable systems with BPRAM?

## Phase Change Memory

 Most promising form of BPRAM

 "Melting memory chips in mass production"
*– Nature*, 9/25/09



#### Phase Change Memory



<u>Byte-addressable</u>, <u>Persistent RAM</u>



+ Fast

- + Byte-addressable
- + Non-volatile

How do we build fast, reliable systems with BPRAM?

<u>This talk</u>: BPFS, a file system for BPRAM <u>Result</u>: Improved performance and reliability

#### Goal

New guarantees for applications

- File system operations will commit atomically and in program order
- Your data is durable as soon as the cache is flushed

New mechanism: short-circuit shadow paging



#### **Design Principles**

1. Eliminate the DRAM buffer cache; use the L1/L2 cache instead



2. Put BPRAM on the memory bus





# Outline

- Intro
- File System
- Hardware Support
- Evaluation
- Conclusion

#### BPRAM in the PC



#### BPRAM in the PC



- BPRAM and DRAM are addressable by the CPU
- Physical address space is partitioned
- BPRAM data may be cached in L1/L2

#### BPRAM in the PC



- BPRAM and DRAM are addressable by the CPU
- Physical address space is partitioned
- BPRAM data may be cached in L1/L2

## **BPFS: A BPRAM File System**

 Guarantees that all file operations execute <u>atomically</u> and <u>in program order</u>

• Despite guarantees, significant <u>performance</u> <u>improvements</u> over NTFS on the same media

 Short-circuit shadow paging often allows <u>atomic, in-place updates</u>

#### **BPFS: A BPRAM File System**



#### **BPFS: A BPRAM File System**











- Disk: Use journaling or shadow paging
- BPRAM: Use short-circuit shadow paging

• Write to journal, then write to file system



• Write to journal, then write to file system



• Write to journal, then write to file system



• Write to journal, then write to file system



• Reliable, but all data is written twice













- Any change requires bubbling to the FS root
- Small writes require large copying overhead

- Inspired by shadow paging
  - Optimization: In-place update when possible



- Inspired by shadow paging
  - Optimization: In-place update when possible



- Inspired by shadow paging
  - Optimization: In-place update when possible



- Inspired by shadow paging
  - Optimization: In-place update when possible



• Aligned 64-bit writes are performed in place



• Aligned 64-bit writes are performed in place



• Aligned 64-bit writes are performed in place



• Aligned 64-bit writes are performed in place



• Aligned 64-bit writes are performed in place



# <u>Opt. 2</u>: Exploit Data-Metadata Invariants

• Appends committed by updating file size



# <u>Opt. 2</u>: Exploit Data-Metadata Invariants

• Appends committed by updating file size



# <u>Opt. 2</u>: Exploit Data-Metadata Invariants

• Appends committed by updating file size



#### **BPFS** Example



## **BPFS** Example



 Cross-directory rename bubbles to common ancestor

#### **BPFS** Example



# Outline

- Intro
- File System
- Hardware Support
- Evaluation
- Conclusion

























for any Windows updates you might need.



























# **Enforcing Ordering and Atomicity**

- Ordering
  - <u>Solution</u>: **Epoch barriers** to declare constraints
  - Faster than write-through
  - Important hardware primitive (cf. SCSI TCQ)
- Atomicity
  - <u>Solution</u>: Capacitor on DIMM
  - Simple and cheap!























































#### MP works too (see paper)

# Outline

- Intro
- File System
- Hardware Support
- Evaluation
- Conclusion

# Methodology

• Built and evaluated BPFS in Windows

- Three parts:
  - Experimental: BPFS vs. NTFS on DRAM
  - <u>Simulation</u>: Epoch barrier evaluation
  - <u>Analytical</u>: BPFS on PCM

#### Microbenchmarks



#### **BPFS Throughput On PCM**



#### **BPFS Throughput On PCM**



#### Conclusions

- BPRAM changes the trade-offs for storage
  - Use consistency technique designed for medium
- Short-circuit shadow paging:
  - improves performance
  - improves reliability

Bonus: PCM chips on display at poster session!