PART 3 OF 5 · SOLUTION-AWARE

The hidden cost of hand-rolling an MCP server

The demo takes an afternoon. Then come auth, keys, rate limits, versioning, observability and billing — and you realize you didn't build an MCP server, you started building a gateway you never staffed.

2026-06-307 minwritten for · Engineering leadswritten for · Platform engineerswritten for · Founders

Every hand-rolled MCP server starts the same way, and the start is genuinely delightful. You pull in an SDK, wrap three endpoints as tools, run it locally, and watch an agent call your API in plain English. It works. It feels like the whole problem. You demo it in standup and someone says "ship it." That afternoon is real, and it's why teams underestimate everything that comes after it.

The demo is the first 10%

The reference SDKs make the happy path short on purpose. Anthropic's quickstart walks you from nothing to a working server in a single sitting, and community frameworks like FastMCP advertise "the fast, Pythonic way to build MCP servers" — decorate a function, get a tool. [1] That's not marketing dishonesty; it's accurate about the part they cover. The trouble is that the part they cover is the part that was never hard. The hard part is everything that turns a local toy into something you'd let a customer's agent hit in production.

The gateway you accidentally start building

Watch what the backlog does the week after the demo. Each item sounds reasonable in isolation. Together they are a product.

  • Auth and identity. The local demo trusted whoever ran it. Production needs to know which caller this is — a human's assistant, an internal service, a partner — and MCP's own authorization guidance now leans on OAuth 2.1 flows, which you get to implement and keep current as the spec evolves. [2]
  • Keys. Different callers need different credentials, scopes, and the ability to be issued and revoked without a redeploy. That's a key-management surface with a database behind it.
  • Rate limits. An agent in a loop can hammer a tool far faster than a human ever would. Without per-key limits, one misbehaving agent degrades everyone, and you learn about it from an incident, not a dashboard.
  • Versioning. The MCP spec itself is versioned and moving — the specification carries dated revisions, and servers are expected to negotiate versions with clients. [2] You are now on a treadmill of protocol updates on top of your own API's versions.
  • Observability. When an agent says "your tool failed," you need to see the call, the arguments, the latency and the error — per tool, per caller. Vendors in this space, like Moesif, frame API observability as table stakes precisely because you can't operate what you can't see. [3]
  • Monetization. The moment agent access has value, someone asks how to charge for it — metering, tiers, quotas, overage. That's a billing system.

None of these are MCP-specific inventions. They are the standard concerns of any API gateway and any API-monetization layer — the same list Zuplo and Moesif built entire companies around for ordinary APIs. [4] You didn't set out to build a gateway. You set out to expose three tools. But the requirements are identical, so you back into the same architecture, one "reasonable" ticket at a time.

Nobody decides to build an API gateway. You decide to add auth, then keys, then limits, then billing — and one morning you notice you've been operating a gateway for six weeks with a team of one.

The costs that don't show up in the estimate

The estimate you gave was for the demo. The costs that blow it up are the ones that don't have a ticket: the on-call rotation for a new production service, the security review when the auth surface expands, the migration when the spec revises, the support load when a customer's agent behaves in a way you didn't anticipate. These are ongoing, not one-time. A hand-rolled server isn't a project you finish; it's a service you now run, forever, with all the operational weight that implies.

Why this is worth naming out loud

Being solution-aware doesn't mean "buy something." It means seeing the true shape of the thing you're considering building. If you have a platform team that already runs gateways, monetization and observability, wrapping MCP onto that stack is a sensible extension of muscle you already have. If you don't — if this would be net-new infrastructure owned by whoever drew the short straw — then "let's just hand-roll it" is a decision to start a second product nobody scoped.

That's the honest fork. The demo proved the capability is close. The backlog proves the operation is not. In the next post we'll put the real options side by side — codegen, gateway, and managed MCP — and be candid about which team size each one actually fits, so the choice is made with eyes open instead of one ticket at a time.

REFERENCES

  1. FastMCP — The fast, Pythonic way to build MCP servers.
  2. Model Context Protocol — Authorization specification (2025-06-18).
  3. Moesif — API analytics & monetization blog.
  4. Zuplo — API gateway & management blog.
Turn the APIs you already have into an MCP agents can call. Register a platform →