NVIDIA DCGM Exporter Dashboard
615,914

Created 5/2/2020
Updated 5/6/2020
Revision 1
Grafana Version >=6.7.3
Datasources
Prometheus

Introduction

This dashboard displays GPU metrics collected from NVIDIA dcgm-exporter via a metric endpoint added to Prometheus. A separate endpoint is added to Prometheus via a Service Monitor.

Quickstart

helm install stable/prometheus-operator --generate-name \
    --set "prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false"

kubectl create -f https://raw.githubusercontent.com/NVIDIA/gpu-monitoring-tools/2.0.0-rc.8/dcgm-exporter.yaml
kubectl create -f https://raw.githubusercontent.com/NVIDIA/gpu-monitoring-tools/2.0.0-rc.8/service-monitor.yaml

More information on the dcgm-exporter README.

Get Dashboard
Download
Copy to Clipboard
Source Grafana.com

Used Metrics 7

  • DCGM_FI_DEV_GPU_TEMP

  • DCGM_FI_DEV_POWER_USAGE

  • DCGM_FI_DEV_SM_CLOCK

  • DCGM_FI_DEV_MEM_CLOCK

  • DCGM_FI_DEV_GPU_UTIL

  • DCGM_FI_DEV_MEM_COPY_UTIL

  • DCGM_FI_DEV_FB_USED