A team I worked with spent six months cutting their CI pipeline from 27 minutes to 8. Real engineering work, well executed. The number on the dashboard was three times better.
Shipping speed didn’t change.
What changed: the PR review queue got longer. Where before a PR sat in CI for 27 minutes and then in review for a day, now it sat in CI for 8 minutes and then in review for three days. The constraint had moved one step over, where nobody was watching.
The build was never the bottleneck. This is the most common mistake I see in technology decisions.
The visible delay is not the constraint
Goldratt made the point forty years ago in The Goal: every system has a constraint, and the throughput of the whole system is determined by that single constraint. Speed up anything else and you push more work into the constraint’s queue.
This is obvious in physical systems. If your factory can only paint 100 widgets an hour, building 200 upstream doesn’t ship more — it builds a stockpile in front of the paint booth.
It’s much less obvious in software because the inventory is invisible: PRs in queues, tickets in backlogs, decisions waiting for someone to make them. Because the inventory is invisible, the constraint is invisible. So we optimize what we can see — build times, query times, deploy times — and the system doesn’t get faster.
Where the constraint usually lives
In ten years of technology projects, the constraint is almost never inside a step. It’s almost always between steps.
The slow build isn’t the constraint — the wait for someone to approve the PR is. The slow query isn’t the constraint — the decision about whether to change the schema is. The slow deploy isn’t the constraint — the change advisory board that meets twice a week is. The AI coding tool isn’t the constraint — the under-specified ticket is.
Engineers look for constraints inside the work they do because that’s where they have control and visibility. The real constraints sit at handoffs, in queues, in approvals. Places where work waits for a human or for an agreement.
Why we keep picking the wrong step
The visible step is measurable — you can benchmark it, graph it, watch it improve. Nobody publishes a dashboard of average days a PR sits in review. The visible step also has a tool you can buy; there’s a whole market for faster CI, faster databases, faster CDNs. And it’s politically safe: nobody objects to a faster build, while optimizing the real constraint means asking someone to change how they work. “We need to upgrade the CI runners” is a budget conversation. Budget conversations are easier.
How to find the actual constraint
Stop measuring work time. Start measuring wait time.
Track any unit of work end to end. Break the timeline into states, not activities — not “coding, reviewing, deploying” but “waiting in backlog, in development, waiting for review, in review, waiting for QA, deployed.” Add up the time in each state. The state with the longest cumulative time is your constraint. It will almost never be an active state. It will be a waiting state.
flowchart LR
A["<b>Write code</b><br/>2h"] --> W1["<b>Wait for review</b><br/>3 days"] --> B["<b>Review</b><br/>10m"] --> W2["<b>Wait for QA</b><br/>2 days"] --> C["<b>QA</b><br/>30m"] --> W3["<b>Wait for deploy window</b><br/>1 day"] --> D["<b>Deploy</b><br/>8m"]
The active work in that timeline takes under three hours. The wait time takes six days. Every speedup applied to an active step is rounding-error work against a six-day queue. The first time a team puts numbers like this on a whiteboard, the conversation changes — the bottleneck is usually a person, a meeting cadence, or an approval that nobody had named as a bottleneck. The team had been optimizing the wrong thing for months because nobody had measured the right thing.
The constraint moves
Once you find and fix the constraint, the constraint moves. If CI was the bottleneck and you cut it from 27 minutes to 8, now review is the bottleneck. Fix review, now QA is. The system always has a constraint. The work is never done.
Most organizations want to fund a project that “fixes” performance — a six-month initiative with a clean end. The honest version is that constraint management is a permanent discipline. Teams that internalize this stop running performance as projects and start running it as a continuous practice: measure lead time at the system level, watch where the queue is sitting, direct attention there.
What this means for architecture
A microservices migration sold on “deploy independence” only delivers if deploy coordination was your actual constraint. A caching layer added to a system gated by a downstream API gives you faster reads, but user-perceived speed is whatever the slowest gated path returns. An AI coding assistant in an organization where the constraint is specification quality won’t speed up delivery — it will produce more code, faster, against the same vague requirements. A new message queue introduced to “decouple services” only helps if synchronous coupling was the constraint.
The discipline is to ask, before any architecture decision: if we make this step instant, does the system get faster? If no, the step isn’t the constraint, and the change won’t deliver what’s promised.
Most technology decisions are made by smart people doing careful work on the wrong problem. The careful work isn’t the issue. The diagnosis upstream of it is.
Find the constraint. Optimize the constraint. Watch it move. Find the next one. The teams that compound speed over time take this loop seriously. The teams that don’t keep delivering impressive isolated wins that never quite add up.
The visible step almost always isn’t the constraint. Finding the one that is — by measuring wait time instead of work time — is unglamorous, political, and where the leverage actually lives.