# Scalable Many-Core Memory Systems Lecture 3, Topic 2: Emerging Technologies and Hybrid Memories

Prof. Onur Mutlu <u>http://www.ece.cmu.edu/~omutlu</u> <u>onur@cmu.edu</u> HiPEAC ACACES Summer School 2013 July 17, 2013



### What Will You Learn in This Course?

- Scalable Many-Core Memory Systems
   July 15-19, 2013
- Topic 1: Main memory basics, DRAM scaling
- Topic 2: Emerging memory technologies and hybrid memories
- Topic 3: Main memory interference and QoS
- Topic 4 (unlikely): Cache management
- Topic 5 (unlikely): Interconnects
- Major Overview Reading:
  - Mutlu, "Memory Scaling: A Systems Architecture Perspective," IMW 2013.

# Readings and Videos

### Course Information

- Website for Course Slides and Papers
  - http://users.ece.cmu.edu/~omutlu/acaces2013-memory.html
  - <u>http://users.ece.cmu.edu/~omutlu</u>
  - Lecture notes and readings are uploaded
- My Contact Information
  - Onur Mutlu
  - onur@cmu.edu
  - <u>http://users.ece.cmu.edu/~omutlu</u>
  - +1-512-658-0891 (my cell phone)
  - □ Find me during breaks and/or email *any* time.

### Memory Lecture Videos

- Memory Hierarchy (and Introduction to Caches)
  - http://www.youtube.com/watch? v=JBdfZ5i21cs&list=PL5PHm2jkkXmidJOd59REog9jDnPDTG6IJ&index=22
- Main Memory
  - http://www.youtube.com/watch? v=ZLCy3pG7Rc0&list=PL5PHm2jkkXmidJOd59REog9jDnPDTG6IJ&index=25
- Memory Controllers, Memory Scheduling, Memory QoS
  - http://www.youtube.com/watch? v=ZSotvL3WXmA&list=PL5PHm2jkkXmidJOd59REog9jDnPDTG6IJ&index=26
  - http://www.youtube.com/watch? v=1xe2w3\_NzmI&list=PL5PHm2jkkXmidJOd59REog9jDnPDTG6IJ&index=27
- Emerging Memory Technologies
  - http://www.youtube.com/watch? v=LzfOghMKyA0&list=PL5PHm2jkkXmidJOd59REog9jDnPDTG6IJ&index=35
- Multiprocessor Correctness and Cache Coherence
  - <u>http://www.youtube.com/watch?v=U-</u> <u>VZKMgItDM&list=PL5PHm2jkkXmidJOd59REog9jDnPDTG6IJ&index=32</u>

# Readings for Topic 1 (DRAM Scaling)

- Lee et al., "Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture," HPCA 2013.
- Liu et al., "RAIDR: Retention-Aware Intelligent DRAM Refresh," ISCA 2012.
- Kim et al., "A Case for Exploiting Subarray-Level Parallelism in DRAM," ISCA 2012.
- Liu et al., "An Experimental Study of Data Retention Behavior in Modern DRAM Devices," ISCA 2013.
- Seshadri et al., "RowClone: Fast and Efficient In-DRAM Copy and Initialization of Bulk Data," CMU CS Tech Report 2013.
- David et al., "Memory Power Management via Dynamic Voltage/ Frequency Scaling," ICAC 2011.
- Ipek et al., "Self Optimizing Memory Controllers: A Reinforcement Learning Approach," ISCA 2008.

### Readings for Topic 2 (Emerging Technologies)

- Lee, Ipek, Mutlu, Burger, "Architecting Phase Change Memory as a Scalable DRAM Alternative," ISCA 2009, CACM 2010, Top Picks 2010.
- Qureshi et al., "Scalable high performance main memory system using phase-change memory technology," ISCA 2009.
- Meza et al., "Enabling Efficient and Scalable Hybrid Memories," IEEE Comp. Arch. Letters 2012.
- Yoon et al., "Row Buffer Locality Aware Caching Policies for Hybrid Memories," ICCD 2012 Best Paper Award.
- Meza et al., "A Case for Efficient Hardware-Software Cooperative Management of Storage and Memory," WEED 2013.
- Kultursay et al., "Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative," ISPASS 2013.

# Readings for Topic 3 (Memory QoS)

- Moscibroda and Mutlu, "Memory Performance Attacks," USENIX Security 2007.
- Mutlu and Moscibroda, "Stall-Time Fair Memory Access Scheduling," MICRO 2007.
- Mutlu and Moscibroda, "Parallelism-Aware Batch Scheduling," ISCA 2008, IEEE Micro 2009.
- Kim et al., "ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers," HPCA 2010.
- Kim et al., "Thread Cluster Memory Scheduling," MICRO 2010, IEEE Micro 2011.
- Muralidhara et al., "Memory Channel Partitioning," MICRO 2011.
- Ausavarungnirun et al., "Staged Memory Scheduling," ISCA 2012.
- Subramanian et al., "MISE: Providing Performance Predictability and Improving Fairness in Shared Main Memory Systems," HPCA 2013.
- Das et al., "Application-to-Core Mapping Policies to Reduce Memory System Interference in Multi-Core Systems," HPCA 2013.

# Readings for Topic 3 (Memory QoS)

- Ebrahimi et al., "Fairness via Source Throttling," ASPLOS 2010, ACM TOCS 2012.
- Lee et al., "Prefetch-Aware DRAM Controllers," MICRO 2008, IEEE TC 2011.
- Ebrahimi et al., "Parallel Application Memory Scheduling," MICRO 2011.
- Ebrahimi et al., "Prefetch-Aware Shared Resource Management for Multi-Core Systems," ISCA 2011.

## Readings in Flash Memory

- Yu Cai, Gulay Yalcin, <u>Onur Mutlu</u>, Erich F. Haratsch, Adrian Cristal, Osman Unsal, and Ken Mai, <u>"Error Analysis and Retention-Aware Error Management for NAND Flash Memory"</u> <u>Intel Technology Journal</u> (ITJ) Special Issue on Memory Resiliency, Vol. 17, No. 1, May 2013.
- Yu Cai, Erich F. Haratsch, <u>Onur Mutlu</u>, and Ken Mai, <u>"Threshold Voltage Distribution in MLC NAND Flash Memory: Characterization,</u> <u>Analysis and Modeling"</u> *Proceedings of the <u>Design, Automation, and Test in Europe Conference</u> (DATE), Grenoble, France, March 2013. <u>Slides (ppt)</u>*
- Yu Cai, Gulay Yalcin, <u>Onur Mutlu</u>, Erich F. Haratsch, Adrian Cristal, Osman Unsal, and Ken Mai,

"Flash Correct-and-Refresh: Retention-Aware Error Management for Increased Flash Memory Lifetime"

Proceedings of the <u>30th IEEE International Conference on Computer Design</u> (**ICCD**), Montreal, Quebec, Canada, September 2012. <u>Slides (ppt)</u> (pdf)

 Yu Cai, Erich F. Haratsch, <u>Onur Mutlu</u>, and Ken Mai, <u>"Error Patterns in MLC NAND Flash Memory: Measurement, Characterization,</u> <u>and Analysis"</u> *Proceedings of the <u>Design, Automation, and Test in Europe Conference</u> (DATE), Dresden, Germany, March 2012. Slides (ppt)* 

### Online Lectures and More Information

- Online Computer Architecture Lectures
  - <u>http://www.youtube.com/playlist?</u> <u>list=PL5PHm2jkkXmidJOd59REog9jDnPDTG6IJ</u>
- Online Computer Architecture Courses
  - Intro: <u>http://www.ece.cmu.edu/~ece447/s13/doku.php</u>
  - Advanced: <u>http://www.ece.cmu.edu/~ece740/f11/doku.php</u>
  - Advanced: <u>http://www.ece.cmu.edu/~ece742/doku.php</u>
- Recent Research Papers
  - <u>http://users.ece.cmu.edu/~omutlu/projects.htm</u>
  - http://scholar.google.com/citations?
    user=7XyGUGkAAAJ&hl=en

# Emerging Memory Technologies



- Major Trends Affecting Main Memory
- Requirements from an Ideal Main Memory System
- Opportunity: Emerging Memory Technologies
- Conclusions
- Discussion

## Major Trends Affecting Main Memory (I)

Need for main memory capacity and bandwidth increasing

Main memory energy/power is a key system design concern

DRAM technology scaling is ending

### Trends: Problems with DRAM as Main Memory

Need for main memory capacity and bandwidth increasing
 DRAM capacity hard to scale

Main memory energy/power is a key system design concern
 DRAM consumes high power due to leakage and refresh

DRAM technology scaling is ending
 DRAM capacity, cost, and energy/power hard to scale



- Major Trends Affecting Main Memory
- Requirements from an Ideal Main Memory System
- Opportunity: Emerging Memory Technologies
- Conclusions
- Discussion

### Requirements from an Ideal Memory System

### Traditional

- Enough capacity
- Low cost
- High system performance (high bandwidth, low latency)

### New

- Technology scalability: lower cost, higher capacity, lower energy
- Energy (and power) efficiency
- QoS support and configurability (for consolidation)

### Requirements from an Ideal Memory System

### Traditional

- Higher capacity
- Continuous low cost
- High system performance (higher bandwidth, low latency)

#### New

- Technology scalability: lower cost, higher capacity, lower energy
- Energy (and power) efficiency
- QoS support and configurability (for consolidation)

### Emerging, resistive memory technologies (NVM) can help

### Review: Solutions to the DRAM Scaling Problem

- Two potential solutions
  - Tolerate DRAM (by taking a fresh look at it)
  - Enable emerging memory technologies to eliminate/minimize DRAM
- Do both
  - Hybrid memory systems

### Solution 1: Tolerate DRAM

- Overcome DRAM shortcomings with
  - System-DRAM co-design
  - Novel DRAM architectures, interface, functions
  - Better waste management (efficient utilization)
- Key issues to tackle
  - Reduce refresh energy
  - Improve bandwidth and latency
  - Reduce waste
  - Enable reliability at low cost
- Liu, Jaiyen, Veras, Mutlu, "RAIDR: Retention-Aware Intelligent DRAM Refresh," ISCA 2012.
- Kim, Seshadri, Lee+, "A Case for Exploiting Subarray-Level Parallelism in DRAM," ISCA 2012.
- Lee+, "Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture," HPCA 2013.
- Liu+, "An Experimental Study of Data Retention Behavior in Modern DRAM Devices" ISCA'13.
- Seshadri+, "RowClone: Fast and Efficient In-DRAM Copy and Initialization of Bulk Data," 2013.

# Solution 2: Emerging Memory Technologies

- Some emerging resistive memory technologies seem more scalable than DRAM (and they are non-volatile)
- Example: Phase Change Memory
  - Expected to scale to 9nm (2022 [ITRS])
  - Expected to be denser than DRAM: can store multiple bits/cell
- But, emerging technologies have shortcomings as well
   Can they be enabled to replace/augment/surpass DRAM?
- Lee, Ipek, Mutlu, Burger, "Architecting Phase Change Memory as a Scalable DRAM Alternative," ISCA 2009, CACM 2010, Top Picks 2010.
- Meza, Chang, Yoon, Mutlu, Ranganathan, "Enabling Efficient and Scalable Hybrid Memories," IEEE Comp. Arch. Letters 2012.
- Yoon, Meza et al., "Row Buffer Locality Aware Caching Policies for Hybrid Memories," ICCD 2012 Best Paper Award.

## Hybrid Memory Systems



#### Hardware/software manage data allocation and movement to achieve the best of multiple technologies

Meza+, "Enabling Efficient and Scalable Hybrid Memories," IEEE Comp. Arch. Letters, 2012. Yoon, Meza et al., "Row Buffer Locality Aware Caching Policies for Hybrid Memories," ICCD 2012 Best Paper Award.



- Major Trends Affecting Main Memory
- Requirements from an Ideal Main Memory System
- Opportunity: Emerging Memory Technologies
- Conclusions
- Discussion

# The Promise of Emerging Technologies

#### Likely need to replace/augment DRAM with a technology that is

- Technology scalable
- □ And at least similarly efficient, high performance, and fault-tolerant
  - or can be architected to be so

- Some emerging resistive memory technologies appear promising
  - Phase Change Memory (PCM)?
  - Spin Torque Transfer Magnetic Memory (STT-MRAM)?
  - Memristors?
  - And, maybe there are other ones
  - Can they be enabled to replace/augment/surpass DRAM?



- Major Trends Affecting Main Memory
- Requirements from an Ideal Main Memory System
- Opportunity: Emerging Memory Technologies
  - Background
  - PCM (or Technology X) as DRAM Replacement
  - Hybrid Memory Systems
- Conclusions
- Discussion

### Charge vs. Resistive Memories

- Charge Memory (e.g., DRAM, Flash)
  - Write data by capturing charge Q
  - Read data by detecting voltage V

- Resistive Memory (e.g., PCM, STT-MRAM, memristors)
  - Write data by pulsing current dQ/dt
  - Read data by detecting resistance R

### Limits of Charge Memory

- Difficult charge placement and control
  - Flash: floating gate charge
  - DRAM: capacitor charge, transistor leakage
- Reliable sensing becomes difficult as charge storage unit size reduces



# Emerging Resistive Memory Technologies

#### PCM

- Inject current to change material phase
- Resistance determined by phase

### STT-MRAM

- Inject current to change magnet polarity
- Resistance determined by polarity

### Memristors

- Inject current to change atomic structure
- Resistance determined by atom distance

### What is Phase Change Memory?

- Phase change material (chalcogenide glass) exists in two states:
  - Amorphous: Low optical reflexivity and high electrical resistivity
  - Crystalline: High optical reflexivity and low electrical resistivity



PCM is resistive memory: High resistance (0), Low resistance (1) PCM cell can be switched between states reliably and quickly

## How Does PCM Work?

- Write: change phase via current injection
  - SET: sustained current to heat cell above T*cryst*
  - RESET: cell heated above T*melt* and quenched
- Read: detect phase via material resistance
  - amorphous/crystalline





Photo Courtesy: Bipin Rajendran, IBM Slide Courtesy: Moinuddin Qureshi, IBM

## Opportunity: PCM Advantages

### Scales better than DRAM, Flash

- Requires current pulses, which scale linearly with feature size
- Expected to scale to 9nm (2022 [ITRS])
- Prototyped at 20nm (Raoux+, IBM JRD 2008)

### Can be denser than DRAM

- Can store multiple bits per cell due to large resistance range
- Prototypes with 2 bits/cell in ISSCC' 08, 4 bits/cell by 2012

### Non-volatile

Retain data for >10 years at 85C

### No refresh needed, low idle power

### Phase Change Memory Properties

- Surveyed prototypes from 2003-2008 (ITRS, IEDM, VLSI, ISSCC)
- Derived PCM parameters for F=90nm

Lee, Ipek, Mutlu, Burger, "Architecting Phase Change Memory as a Scalable DRAM Alternative," ISCA 2009.

|                              |                     | Table 1. Technology survey. |                        |      |                        |                    |                    |                       |                   |                   |
|------------------------------|---------------------|-----------------------------|------------------------|------|------------------------|--------------------|--------------------|-----------------------|-------------------|-------------------|
|                              | Published prototype |                             |                        |      |                        |                    |                    |                       |                   |                   |
| Parameter*                   | Horri <sup>6</sup>  | Ahn <sup>12</sup>           | Bedeschi <sup>13</sup> | Oh14 | Pellizer <sup>15</sup> | Chen <sup>5</sup>  | Kang <sup>16</sup> | Bedeschi <sup>9</sup> | Lee <sup>10</sup> | Lee <sup>2</sup>  |
| Year                         | 2003                | 2004                        | 2004                   | 2005 | 2006                   | 2006               | 2006               | 2008                  | 2008              | ••                |
| Process, F (nm)              | **                  | 120                         | 180                    | 120  | 90                     | ••                 | 100                | 90                    | 90                | 90                |
| Array size (Mbytes)          | **                  | 64                          | 8                      | 64   | **                     | ••                 | 256                | 256                   | 512               | **                |
| Material                     | GST, N-d            | GST, N-d                    | GST                    | GST  | GST                    | GS, N-d            | GST                | GST                   | GST               | GST, N-d          |
| Cell size (µm <sup>2</sup> ) | ••                  | 0.290                       | 0.290                  | ••   | 0.097                  | 60 nm <sup>2</sup> | 0.166              | 0.097                 | 0.047             | 0.065 to<br>0.097 |
| Cell size, F <sup>2</sup>    |                     | 20.1                        | 9.0                    | ••   | 12.0                   |                    | 16.6               | 12.0                  | 5.8               | 9.0 to<br>12.0    |
| Access device                | **                  | **                          | вл                     | FET  | BJT                    | ••                 | FET                | BJT                   | Diode             | BJT               |
| Read time (ns)               | **                  | 70                          | 48                     | 68   | **                     | ••                 | 62                 |                       | 55                | 48                |
| Read current (µA)            | **                  | **                          | 40                     | **   | **                     | ••                 | **                 |                       | **                | 40                |
| Read voltage (V)             | **                  | 3.0                         | 1.0                    | 1.8  | 1.6                    | ••                 | 1.8                |                       | 1.8               | 1.0               |
| Read power (µW)              | **                  | **                          | 40                     | **   | **                     | ••                 | ••                 |                       | ••                | 40                |
| Read energy (pJ)             | **                  | **                          | 2.0                    | **   | **                     | ••                 | ••                 |                       | ••                | 2.0               |
| Set time (ns)                | 100                 | 150                         | 150                    | 180  | **                     | 80                 | 300                |                       | 400               | 150               |
| Set current (µA)             | 200                 | **                          | 300                    | 200  | **                     | 55                 | ••                 |                       | ••                | 150               |
| Set voltage (V)              | **                  | **                          | 2.0                    | **   | **                     | 1.25               | **                 |                       | **                | 1.2               |
| Set power (µW)               | **                  | **                          | 300                    | **   | **                     | 34.4               | **                 |                       | **                | 90                |
| Set energy (pJ)              | **                  | **                          | 45                     | **   | **                     | 2.8                | ••                 |                       | ••                | 13.5              |
| Reset time (ns)              | 50                  | 10                          | 40                     | 10   | **                     | 60                 | 50                 |                       | 50                | 40                |
| Reset current (µA)           | 600                 | 600                         | 600                    | 600  | 400                    | 90                 | 600                | 300                   | 600               | 300               |
| Reset voltage (V)            | **                  | **                          | 2.7                    | **   | 1.8                    | 1.6                | **                 | 1.6                   | **                | 1.6               |
| Reset power (µW)             | **                  | **                          | 1620                   | **   | **                     | 80.4               | **                 |                       | **                | 480               |
| Reset energy (pJ)            | **                  | **                          | 64.8                   | **   | **                     | 4.8                | **                 | **                    | **                | 19.2              |
| Write endurance              | 107                 | 10 <sup>9</sup>             | 106                    | **   | 10 <sup>8</sup>        | 104                | ••                 | 10 <sup>5</sup>       | 10 <sup>5</sup>   | 10 <sup>8</sup>   |

\* BJT: bipolar junction transistor; FET: field-effect transistor; GST: Ge<sub>2</sub>Sb<sub>2</sub>Te<sub>5</sub>; MLC: multilevel cells; N-d: nitrogen doped. \*\* This information is not available in the publication cited.

## Phase Change Memory Properties: Latency

Latency comparable to, but slower than DRAM



### Phase Change Memory Properties

- Dynamic Energy
  - 40 uA Rd, 150 uA Wr
  - 2-43x DRAM, 1x NAND Flash
- Endurance
  - Writes induce phase change at 650C
  - Contacts degrade from thermal expansion/contraction
  - <u>10<sup>8</sup> writes per cell</u>

<sup>10-8</sup>x DRAM, 10<sup>3</sup>x NAND Flash

Cell Size

9-12F<sup>2</sup> using BJT, single-level cells

1.5x DRAM, 2-3x NAND

### Phase Change Memory: Pros and Cons

- Pros over DRAM
  - Better technology scaling
  - Non volatility
  - Low idle power (no refresh)
- Cons
  - Higher latencies: ~4-15x DRAM (especially write)
  - □ Higher active energy: ~2-50x DRAM (especially write)
  - Lower endurance (a cell dies after  $\sim 10^8$  writes)
- Challenges in enabling PCM as DRAM replacement/helper:
  - Mitigate PCM shortcomings
  - Find the right way to place PCM in the system
  - Ensure secure and fault-tolerant PCM operation

### PCM-based Main Memory: Research Challenges

- Where to place PCM in the memory hierarchy?
  - Hybrid OS controlled PCM-DRAM
  - Hybrid OS controlled PCM and hardware-controlled DRAM
  - Pure PCM main memory
- How to mitigate shortcomings of PCM?
- How to minimize amount of DRAM in the system?
- How to take advantage of (byte-addressable and fast) nonvolatile main memory?
- Can we design specific-NVM-technology-agnostic techniques?

## PCM-based Main Memory (I)

How should PCM-based (main) memory be organized?



Hybrid PCM+DRAM [Qureshi+ ISCA'09, Dhiman+ DAC'09, Meza+ IEEE CAL'12]:

□ How to partition/migrate data between PCM and DRAM

### Hybrid Memory Systems: Challenges

### Partitioning

- Should DRAM be a cache or main memory, or configurable?
- What fraction? How many controllers?
- Data allocation/movement (energy, performance, lifetime)
  - Who manages allocation/movement?
  - What are good control algorithms?
  - How do we prevent degradation of service due to wearout?
- Design of cache hierarchy, memory controllers, OS
   Mitigate PCM shortcomings, exploit PCM advantages
- Design of PCM/DRAM chips and modules
  - Rethink the design of PCM/DRAM with new requirements

## PCM-based Main Memory (II)

How should PCM-based (main) memory be organized?



Pure PCM main memory [Lee et al., ISCA'09, Top Picks'10]:

 How to redesign entire hierarchy (and cores) to overcome PCM shortcomings



## Aside: STT-RAM Basics

- Magnetic Tunnel Junction (MTJ)
  - Reference layer: Fixed
  - Free layer: Parallel or anti-parallel
- Cell

- Access transistor, bit/sense lines
- Read and Write
  - Read: Apply a small voltage across bitline and senseline; read the current.
  - Write: Push large current through MTJ.
     Direction of current determines new orientation of the free layer.
- Kultursay et al., "Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative," ISPASS 2013





### Aside: STT MRAM: Pros and Cons

#### Pros over DRAM

- Better technology scaling
- Non volatility
- Low idle power (no refresh)

### Cons

- Higher write latency
- Higher write energy
- Reliability?
- Another level of freedom
  - Can trade off non-volatility for lower write latency/energy (by reducing the size of the MTJ)



- Major Trends Affecting Main Memory
- Requirements from an Ideal Main Memory System
- Opportunity: Emerging Memory Technologies
  - Background
  - PCM (or Technology X) as DRAM Replacement
  - Hybrid Memory Systems
- Conclusions
- Discussion

# An Initial Study: Replace DRAM with PCM

- Lee, Ipek, Mutlu, Burger, "Architecting Phase Change Memory as a Scalable DRAM Alternative," ISCA 2009.
  - □ Surveyed prototypes from 2003-2008 (e.g. IEDM, VLSI, ISSCC)
  - Derived "average" PCM parameters for F=90nm



### Results: Naïve Replacement of DRAM with PCM

- Replace DRAM with PCM in a 4-core, 4MB L2 system
- PCM organized the same as DRAM: row buffers, banks, peripherals
- 1.6x delay, 2.2x energy, 500-hour average lifetime





 Lee, Ipek, Mutlu, Burger, "Architecting Phase Change Memory as a Scalable DRAM Alternative," ISCA 2009.

# Architecting PCM to Mitigate Shortcomings

- Idea 1: Use multiple narrow row buffers in each PCM chip
   → Reduces array reads/writes → better endurance, latency, energy
- Idea 2: Write into array at cache block or word granularity
  - $\rightarrow$  Reduces unnecessary wear



## Results: Architected PCM as Main Memory

- 1.2x delay, 1.0x energy, 5.6-year average lifetime
- Scaling improves energy, endurance, density



- Caveat 1: Worst-case lifetime is much shorter (no guarantees)
- Caveat 2: Intensive applications see large performance and energy hits
- Caveat 3: Optimistic PCM parameters?



- Major Trends Affecting Main Memory
- Requirements from an Ideal Main Memory System
- Opportunity: Emerging Memory Technologies
  - Background
  - PCM (or Technology X) as DRAM Replacement
  - Hybrid Memory Systems
- Conclusions
- Discussion

## Hybrid Memory Systems



#### Hardware/software manage data allocation and movement to achieve the best of multiple technologies

Meza, Chang, Yoon, Mutlu, Ranganathan, "Enabling Efficient and Scalable Hybrid Memories," IEEE Comp. Arch. Letters, 2012.

### One Option: DRAM as a Cache for PCM

- PCM is main memory; DRAM caches memory rows/blocks
   Benefits: Reduced latency on DRAM cache hit; write filtering
- Memory controller hardware manages the DRAM cache
  - Benefit: Eliminates system software overhead
- Three issues:
  - □ What data should be placed in DRAM versus kept in PCM?
  - What is the granularity of data movement?
  - How to design a low-cost hardware-managed DRAM cache?
- Two idea directions:
  - Locality-aware data placement [Yoon+, ICCD 2012]
  - Cheap tag stores and dynamic granularity [Meza+, IEEE CAL 2012]