Towards Improved Power Management in Cloud GPUs

Abstract

As modern server GPUs are increasingly power intensive, better power management mechanisms can significantly reduce the power consumption, capital costs, and carbon emissions in large cloud datacenters. This letter uses diverse datacenter workloads to study the power management capabilities of modern GPUs. We find that current GPU management mechanisms have limited compatibility and monitoring support under cloud virtualization. They have sub-optimal, imprecise, and non-intuitive implementations of Dynamic Voltage and Frequency Scaling (DVFS) and power capping. Consequently, efficient GPU power management is not widely deployed in clouds today. To address these issues, we make actionable recommendations for GPU vendors and researchers.

Publication
IEEE Computer Architecture Letters. 2023.
Akshitha Sriraman
Akshitha Sriraman
Assistant Professor

I am an Assistant Professor in the Department of Electrical and Computer Engineering at Carnegie Mellon University. My research bridges computer architecture and software systems, with a focus on making datacenter-scale web systems more efficient, sustainable, and equitable (via solutions that span the systems stack).