Metrics Overview Cloud
65,180

Created 3/8/2022
Updated 11/4/2025
Revision 40
Categories
AWSAzureHost MetricsWeb Servers
Grafana Version >=11.4.0-79828
Datasources
Prometheus

Description

This dashboard provides a consolidated view of service performance and resource utilization, aggregating latency, error, throughput, and various resource metrics to surface operational health at a glance. It highlights latency distribution with percentile-based requests, error and throughput metrics like requests_per_second, and resource usage across CPU and memory (e.g., cpu_used, mem_used, cpu_reserved, mem_reserved) to identify bottlenecks and capacity needs. It also tracks deployment and infrastructure state with panels such as Progressing Deployments, Replica Count, and Egress, enabling quick detection of rollout issues and external connectivity concerns. Key features include percentile-based latency insights, conditional error categorization via response_class, and a mix of billed vs. used resource metrics for cost-aware scaling.

Screenshots

Source Grafana.com

Used Metrics 28

  • agent_peers_count

  • agent_rx_bytes_total

  • agent_services_count

  • agent_tx_bytes_total

  • container_restarts

  • cpu_billable

  • cpu_reserved

  • cpu_used

  • cron_executions

  • domain_warnings

  • egress

  • load_balancer

  • logs_storage_mb

  • mem_billable

  • mem_reserved

  • mem_used

  • percentile

  • replica_count

  • request_duration_ms_bucket

  • requests_per_second

  • response_class

  • threat_detection_alerts

  • threat_detection_forward_total

  • tracing_storage_mb

  • volume_set_capacity_billable

  • workload_progress_failure

  • workload_ready_replicas

  • workload_rescheduled_replicas

Get Dashboard
Download
Copy to Clipboard