Latency Sneaks Up On You

November 30, 2022

The relationship between system utilization and latency is dangerously non-linear. Small increases in utilization near capacity cause explosive increases in queue depth and tail latency.

"High-percentile latency is a bad way to measure efficiency, but a good (leading) indicator of pending overload." Marc Brooker

Consider the simplest possible system: a single server with a queue. The expected number of items in the system is rho/(1-rho), where rho is the utilization ratio (arrival rate divided by completion rate). At 50% utilization, you have on average 1 item in the system. At 90%, you have 9. At 99%, you have 99. The curve is a hockey stick that goes vertical as utilization approaches 1.0. This is not a theoretical curiosity; it is the fundamental dynamic that governs latency in real systems.

This creates a recurring trap. Engineers do efficiency work that increases the server's processing rate. Tail latencies drop dramatically because utilization decreases, pushing the system left on the curve. Everyone celebrates. Then growth arrives, or the fleet is right-sized to "realize the efficiency gains," and utilization creeps back up. Latencies return to where they were, or worse. People conclude the efficiency work was wasted, when in fact the problem is that they harvested the capacity buffer that was absorbing queueing delays.

The practical implication is that high-percentile latency is a leading indicator of trouble, not a measure of code quality. If your p99 is climbing, your system is telling you it is running out of slack to absorb bursts. Mean latency, counterintuitively, is a better measure of raw efficiency because it is less sensitive to utilization-driven queueing effects. And implicit queues are everywhere: threads waiting on locks, async tasks waiting for I/O, connection pools, each one a hidden instance of this same non-linear dynamic.

Utilization and latency have a non-linear relationship; treat rising tail latency as an early warning of approaching overload, not just a performance metric to optimize.

Latency Sneaks Up On You

Linked from