GPU Nodes v2 3,0803,080
Based on https://grafana.com/grafana/dashboards/9957
- List only gpu nodes
- Fix pci-e/ib metrics
- Fix disk/pci-e/eth/ib unit
- Fix some titles
- Fix Omni-Path traffic bandwidth
Tip: Network Traffic does not include RDMA traffic, InfiniBand Traffic includes RDMA and non-RDMA traffic of InfiniBand and Omni-Path.
Used Metrics 3333
-
node_load1
-
node_load5
-
node_load15
dcgm_power_usage
dcgm_nvlink_bandwidth_total
dcgm_nv
-
node_memory_MemTotal_bytes
-
node_memory_MemFree_bytes
-
node_memory_Buffers_bytes
-
node_memory_Cached_bytes
dcgm_gpu_temp
dcgm_pcie_tx_throughput
dcgm_pcie_rx_throughput
-
node_filesystem_avail_bytes
-
node_filesystem_size_bytes
dcgm_gpu_utilization
-
node_network_receive_bytes_total
node_network_receive_bytes
-
node_network_transmit_bytes_total
node_network_transmit_bytes
-
node_disk_written_bytes_total
node_disk_sectors_written
-
node_disk_read_bytes_total
node_disk_sectors_read
dcgm_mem_copy_utilization
node_infiniband_port_data_transmitted_bytes_total
node_infiniband_port_data_received_bytes_total
-
node_uname_info
dcgm_fb_used
dcgm_fb_free
node_hwmon_power_average_watt
dcgm_sm_clock
dcgm_memory_clock