Thanos Receive High Replication Failures
warning

Description Thanos Receive {{$labels.job}} is failing to replicate {{$value | humanize}}% of requests.
Query for 5m
>>>
	
				
					
				
			
				
					 > 1 and ((sum by (job) (
				
			
				
					
					
						rate
					
				
			
				
					(
				
			
				
					
				
			
				
					{result="error", job=~".*thanos-receive.*"}[5m])) / sum by (job) (
				
			
				
					
					
						rate
					
				
			
				
					(
				
			
				
					
				
			
				
					{job=~".*thanos-receive.*"}[5m]))) > (max by (job) (
				
			
				
					
					
						floor
					
				
			
				
					((
				
			
				
					
				
			
				
					{job=~".*thanos-receive.*"}+1)/ 2)) / max by (job) (
				
			
				
					
				
			
				
					{job=~".*thanos-receive.*"}))) * 100
				
			
    
Query Explanation

The rule fires when a Thanos Receive instance (matching .*thanos-receive.*) has a replication factor greater than 1 and the percentage of replication attempts that end in error over the last 5 minutes exceeds the tolerated failure threshold, which is calculated as half (rounded down) of the replication factor divided by the total number of hash‑ring nodes, expressed as a percent. In short, it alerts if error‑rate % > (⌊(replication_factor+1)/2⌋ / hashring_nodes) × 100, indicating too many replication failures for that job.

Get Alert
Download
Copy to Clipboard