Thanos Store Grpc Error Rate warning
Thanos Store {{$labels.job}} is failing to handle {{$value | humanize}}% of requests.
>>>
(sum by (job) (
rate
(
{grpc_code=~"Unknown|ResourceExhausted|Internal|Unavailable|DataLoss|DeadlineExceeded", job=~".*thanos-store.*"}[5m]))/ sum by (job) (
rate
(
{job=~".*thanos-store.*"}[5m])) * 100 > 5)
The rule computes, per Thanos Store job, the percentage of gRPC requests that returned error codes (Unknown, ResourceExhausted, Internal, Unavailable, DataLoss, DeadlineExceeded) over the past 5 minutes – error_rate = (sum rate(grpc_server_handled_total with those codes) / sum rate(grpc_server_started_total)) * 100. The alert fires when this error rate exceeds 5 % for any job matching .*thanos-store.*.
Get Alert✕
Download
Copy to Clipboard