AI智算-GPU资源监控概览-20241127 1,0161,016
11/27/2024
11/30/2024
3
>=10.3.3
Prometheus
Export Dashboard✕
Download
Copy to Clipboard
Used Metrics 1515
DCGM_FI_DEV_GPU_UTIL
DCGM_FI_DEV_GPU_TEMP
DCGM_FI_DEV_POWER_USAGE
host_ip
kube_pod_info
kube_node_info
Hostname
node
-
container_cpu_usage_seconds_total
kube_node_status_allocatable
-
container_memory_working_set_bytes
DCGM_FI_DEV_MEM_CLOCK
DCGM_FI_DEV_SM_CLOCK
DCGM_FI_DEV_FB_USED
DCGM_FI_DEV_FB_FREE