Cilium v1.9 Agent Metrics
416,381

Created 12/8/2020
Updated 12/8/2020
Revision 1
Grafana Version >=7.0.1
Datasources
Prometheus

Description

This dashboard monitors the health and performance of Cilium agents, aggregating process and system metrics to reveal resource usage, API latency, and BPF-related activity across nodes. It emphasizes per-node and system-wide perspectives, with key panels tracking cilium_process_cpu_seconds_total, cilium_agent_api_process_time_seconds_sum/count for API latency, and cilium_bpf_maps_virtual_memory_max_bytes (as well as related BPF memory metrics) to surface memory pressure and efficiency. Other notable areas include open file descriptors, system call latency, and map operations, enabling quick identification of bottlenecks and resource exhaustion risks.

Source Grafana.com

Used Metrics 56

  • cilium_agent_api_process_time_seconds_count

  • cilium_agent_api_process_time_seconds_sum

  • cilium_bpf_map_ops_total

  • cilium_bpf_maps_virtual_memory_max_bytes

  • cilium_bpf_progs_virtual_memory_max_bytes

  • cilium_bpf_syscall_duration_seconds_count

  • cilium_bpf_syscall_duration_seconds_sum

  • cilium_controllers_failing

  • cilium_controllers_runs_duration_seconds_count

  • cilium_controllers_runs_duration_seconds_sum

  • cilium_controllers_runs_total

  • cilium_datapath_conntrack_gc_entries

  • cilium_datapath_errors_total

  • cilium_drop_bytes_total

  • cilium_drop_count_total

  • cilium_endpoint_regeneration_time_stats_seconds_bucket

  • cilium_endpoint_regenerations_total

  • cilium_endpoint_state

  • cilium_errors_warnings_total

  • cilium_forward_bytes_total

  • cilium_forward_count_total

  • cilium_ip_addresses

  • cilium_k_client_api_calls_total

  • cilium_k_client_api_latency_time_seconds_count

  • cilium_k_client_api_latency_time_seconds_sum

  • cilium_kubernetes_events_received_total

  • cilium_kubernetes_events_total

  • cilium_kvstore_events_queue_seconds_count

  • cilium_kvstore_operations_duration_seconds_count

  • cilium_kvstore_operations_duration_seconds_sum

  • cilium_nodes_all_events_received_total

  • cilium_nodes_all_num

  • cilium_policy

  • cilium_policy_endpoint_enforcement_status

  • cilium_policy_import_errors_total

  • cilium_policy_l7_denied_total

  • cilium_policy_l7_forwarded_total

  • cilium_policy_l7_parse_errors_total

  • cilium_policy_l7_received_total

  • cilium_policy_max_revision

  • cilium_process_cpu_seconds_total

  • cilium_process_open_fds

  • cilium_process_resident_memory_bytes

  • cilium_process_virtual_memory_bytes

  • cilium_proxy_redirects

  • cilium_proxy_upstream_reply_seconds_count

  • cilium_proxy_upstream_reply_seconds_sum

  • cilium_services_events_total

  • cilium_triggers_policy_update_call_duration_seconds_count

  • cilium_triggers_policy_update_call_duration_seconds_sum

  • cilium_triggers_policy_update_folds

  • cilium_triggers_policy_update_total

  • cilium_unreachable_health_endpoints

  • cilium_unreachable_nodes

  • kvstore_operations_total

  • topk

Get Dashboard
Download
Copy to Clipboard