Better NVIDIA DCGM Dashboard 2,0862,086
12/12/2024
12/12/2024
1
Host Metrics
>=11.3.0
Prometheus
This dashboard is based on the original DCGM-Exporter dashboard by NVIDIA, but comes with an improved layout and a few additional visualizations.
Changes over upstream dashboard
- Better layout, thinner lines
- Uses the
Hostnamelabel for the host variable instead ofinstance - Legend labels are prefixed with hostnames
- Displays cumulative energy draw over last 1h and last 24h
- Larger range on total GPU power gauge (you should adjust this to your total max wattage)
- Displays GPU memory usage as percentage in addition to absolute values
- Power, GPU, and memory utilization graphs use stacked y axes
Get Dashboard✕
Download
Copy to Clipboard
Used Metrics 88
DCGM_FI_DEV_GPU_TEMP
DCGM_FI_DEV_POWER_USAGE
DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION
DCGM_FI_DEV_FB_USED
DCGM_FI_DEV_FB_FREE
DCGM_FI_DEV_GPU_UTIL
DCGM_FI_PROF_PIPE_TENSOR_ACTIVE
DCGM_FI_DEV_SM_CLOCK