CodeNSM
Engineering leadership

Walking the floor at scale: what tech leads actually monitor, and why your dashboards miss it

2026-05-24· 7 min read· by Think North

Toyota's managers had a discipline called the gemba walk: go to where the work happens and look, because the truth is on the floor, not in the reports. Your best tech lead does the same thing without naming it. They wander the codebase daily — a PR here, a log tail there, a suspicious latency graph at lunch — and maintain, in their heads, a live model of where the system is strong and where it is quietly rotting.

Then the company grows, the codebase triples, and the walk becomes physically impossible. What replaces it, almost always, is a dashboard of commits, PRs merged, story points, DORA-style delivery metrics. All defensible numbers. None of them is the walk. And here is the uncomfortable part: nobody notices the walk stopped, because the dashboard is still updating.

The five questions your velocity chart cannot answer

Watch a strong tech lead closely and their daily monitoring reduces to a handful of questions no velocity chart answers:

  • Where is the traffic actually going? Production load is savagely concentrated — in CodeNSM's own fleet telemetry, a small minority of functions reliably carries the overwhelming majority of calls, a concentration we consider strong enough that we pre-registered it as a formal research hypothesis. The lead knows, roughly, which functions those are, and reads every diff that touches them differently.
  • What is fragile and load-bearing? Not "what is the worst code" — every codebase has quarantined horrors that hurt nobody. The dangerous set is the intersection: brittle code at high-traffic desks. That intersection is the lead's true worry list, and it exists nowhere in Jira.
  • What is drifting? The endpoint that has been 40ms for a year and is now 65ms. The third-party integration whose error rate doubled since the vendor's last "upgrade." Individually sub-alarm-threshold; collectively, the plot of the next incident.
  • What has gone quiet? Code that stopped being called is either a successful deprecation or a broken caller, and only context says which. Dormant code is also payroll: it gets maintained, migrated and security-patched while contributing nothing.
  • Who actually understands what? The map from people to the code they can safely change — the real one, not the CODEOWNERS file. Peter Naur argued in "Programming as Theory Building" that this mental model is the program, and that it dies when the people leave.

Why the standard dashboards are structurally blind to this

Commit-and-velocity dashboards measure the production of code. The walk monitors the behaviour of code in production. These are different objects, and conflating them is the central measurement error of engineering management. The researchers behind the SPACE framework (Forsgren, Storey and colleagues) made the point bluntly: no single activity metric captures developer productivity, and activity counts in particular are the most seductive and least meaningful. Meanwhile Xia and colleagues' large-scale field study found developers spending well over half their time simply comprehending existing code — effort that is invisible in every activity metric, but is precisely what the lead's mental map exists to reduce.

Even DORA's own delivery metrics — genuinely valuable for what they measure — describe the pipe, not the water. Deployment frequency tells you changes flow smoothly. It does not tell you that the payment-retry function three layers down is one bad deploy away from being your Q3 story.

Velocity dashboards measure how fast you are producing code. The tech lead's walk measures whether the code already produced is still earning its keep. Companies track the first obsessively and the second not at all.

You can automate the walk without automating the walker

The judgment a tech lead applies on the walk is largely role-conditioned pattern matching: this function is a router, so its latency budget is tiny; that one wraps a vendor API, so retries and error taxonomy matter more than elegance; this one is the auth gate, so any change is a security change. It is expertise, but it is legible expertise — which means the observational layer beneath it can be automated, deterministically, without pretending to replace the human who acts on it.

That is the design brief CodeNSM was built to: classify every function by its job (we use ten workplace archetypes), track each one's live state — Fit, Snoozing, Growing fat, Stressed, Injury-prone — from real production behaviour, and surface exactly the walk's worry list: traffic concentration, the fragile-and-load-bearing intersection, drift against each function's own baseline, dormancy, and the people-to-code map via git history. The lead stops being the company's only instrument and becomes its best analyst.

A two-question test for whether you have already outgrown the walk

Try this. Ask your tech lead to name the ten functions where an injected failure would do the most business damage. Then ask them the same question about a service they don't personally work in. The first answer will be excellent. The silence after the second question is the size of your blind spot — and it grows with every hire. Adam Tornhill's Your Code as a Crime Scene showed how much of this map can be mined from version control alone; adding production runtime behaviour completes the picture version control cannot see: not just where code changes, but whether anyone shows up when it does.

The floor got too big to walk. The answer is not to stop walking. It is to give the walker instruments — so the walk becomes a review of exceptions rather than a search for them, and the lead's scarce judgment lands where the telemetry says it is needed rather than where habit takes them.

References

  1. Forsgren, N., Storey, M-A. et al. (2021). The SPACE of Developer Productivity. ACM Queue.
  2. Xia, X. et al. (2018). Measuring Program Comprehension: A Large-Scale Field Study with Professionals. IEEE TSE.
  3. Google Cloud — DORA research program.
  4. Naur, P. (1985). Programming as Theory Building.
  5. Tornhill, A. (2024). Your Code as a Crime Scene, 2nd ed.

See your own codebase as an office.

One pip install and every function reports for duty — archetype, live state, debt tier, and a single Code-Health North-Star. Free plan, no card.

Read next