Promtail request errors
critical

Description The {{ $labels.job }} {{ $labels.route }} is experiencing {{ printf "%.2f" $value }}% errors.
Query for 5m
>>>
	
				
					100 * 
				
			
				
					
					
						sum
					
				
			
				
					(
				
			
				
					
					
						rate
					
				
			
				
					(
				
			
				
					
				
			
				
					{status_code=~"5..|failed"}[1m])) by (namespace, job, route, instance) / 
				
			
				
					
					
						sum
					
				
			
				
					(
				
			
				
					
					
						rate
					
				
			
				
					(
				
			
				
					
				
			
				
					[1m])) by (namespace, job, route, instance) > 10
				
			
    
Query Explanation

The rule calculates the percentage of Promtail requests that resulted in a 5xx status code or a "failed" status over the past minute (using rate to get per‑second request counts), groups this by namespace, job, route, and instance, and triggers when that error rate exceeds 10 %. In other words, if more than 10 % of a given job/route’s requests are errors in the last minute, the alert fires.

Get Alert
Download
Copy to Clipboard