SLURM Dashboard 13,49813,498 5.0 (1 reviews)
This dashboard can be used to visualize the status of a Linux cluster managed through SLURM.
SLURM is a scalable cluster management and job scheduling system for Linux clusters.
In order to use this dashboard you need to install the SLURM exporter for Prometheus.
Latest version of the dashboard should be used only with most recent version of the Slurm exporter.
The following metrics will be displayed:
- State of CPUs/GPUs
- State of the Nodes
- Status of the Jobs: include also info about Running/Pending/Suspended jobs per Account/User
- Scheduler Information
- Share Information
Used Metrics 4444
slurm_account_fairshare
slurm_nodes_alloc
slurm_nodes_comp
slurm_nodes_mix
slurm_nodes_idle
slurm_nodes_down
slurm_nodes_drain
slurm_nodes_maint
slurm_nodes_resv
slurm_nodes_err
slurm_nodes_fail
slurm_queue_completing
slurm_queue_running
slurm_queue_pending
slurm_queue_completed
slurm_queue_timeout
slurm_queue_failed
slurm_queue_node_fail
slurm_queue_suspended
slurm_queue_cancelled
slurm_queue_preempted
slurm_partition_jobs_pending
slurm_account_jobs_running
slurm_account_jobs_pending
slurm_user_jobs_running
slurm_user_jobs_pending
slurm_account_cpus_running
slurm_user_cpus_running
slurm_cpus_total
slurm_cpus_alloc
slurm_cpus_idle
slurm_partition_cpus_allocated
slurm_partition_cpus_idle
slurm_scheduler_threads
slurm_scheduler_queue_size
slurm_scheduler_dbd_queue_size
slurm_scheduler_last_cycle
slurm_scheduler_mean_cycle
slurm_scheduler_backfill_last_cycle
slurm_scheduler_backfill_mean_cycle
slurm_scheduler_backfill_depth_mean
slurm_scheduler_backfilled_heterogeneous_total
slurm_scheduler_backfilled_jobs_since_start_total
slurm_scheduler_backfilled_jobs_since_cycle_total