Thanos Query Grpc Server Error Rate
warning

Description Thanos Query {{$labels.job}} is failing to handle {{$value | humanize}}% of requests.
Query for 5m
>>>
	
				
					(sum by (job) (
				
			
				
					
					
						rate
					
				
			
				
					(
				
			
				
					
				
			
				
					{grpc_code=~"Unknown|ResourceExhausted|Internal|Unavailable|DataLoss|DeadlineExceeded", job=~".*thanos-query.*"}[5m]))/  sum by (job) (
				
			
				
					
					
						rate
					
				
			
				
					(
				
			
				
					
				
			
				
					{job=~".*thanos-query.*"}[5m])) * 100 > 5)
				
			
    
Query Explanation

The rule calculates the percentage of gRPC requests that ended with error codes (Unknown, ResourceExhausted, Internal, Unavailable, DataLoss, DeadlineExceeded) for each Thanos Query job over the last 5 minutes, and fires when that error rate exceeds 5 %. When triggered, the alert reports the job name and the error percentage.

Get Alert
Download
Copy to Clipboard