Observability without an enterprise budget: a stack for small teams

Logs, metrics and traces aren't a big-company luxury. Here's the cheap, lightweight stack we use to understand what happens in production.

"Observability is for companies with an SRE team." It's the most common excuse for going to production blind. The truth is that in 2026 a small team can have serious logs, metrics and traces at very low cost and half a day of setup. Here's the stack we use on our projects, and the philosophy behind it.

The three pillars, without the mystique

Logs: what happened, with context. Structured (JSON), not free-form strings.
Metrics: numbers aggregated over time (latency, throughput, error rate). Little space, lots of information.
Traces: the journey of a single request across your services. The pillar small teams most underrate, and the one that solves "impossible" bugs.

The standard that changed the rules: OpenTelemetry

The reason a small team can do serious observability today is OpenTelemetry. It's the open standard for instrumenting code: you instrument once with its SDKs, then send the data wherever you want. No more lock-in to a single vendor. You switch backends without touching application code. For an SME budget, that's the freedom to pick the cheap option today and change tomorrow without redoing everything.

The stack we use

Instrumentation

OpenTelemetry SDK on the app. For a Next.js project, auto-instrumentation already covers HTTP, database queries and external calls with almost no code. We add manual spans only on business-critical paths.

Structured logs

JSON with consistent fields: timestamp, level, traceId, userId where lawful. The traceId in the log is the trick that links an error to its full trace. Without it, you have two separate systems that don't talk to each other.

Backend

Here the small team has choices. Mature self-hosted options (the Grafana stack with Loki for logs, Tempo for traces, Prometheus for metrics) run on a cheap VM. Alternatively, the free or low-cost tiers of managed backends are plenty for SME volume. The rule: you pay for data ingestion, so you control what you send.

The cost trap: sampling

Observability's steep bill isn't the tool: it's the data volume. A busy app generates millions of spans. Sending them all blows up the bill. The answer is sampling: keep 100% of traces with errors and a sample of normal requests (5-10% is enough to see the patterns). That way you see every problem and pay a fraction.

Useful alerts, not noise

The classic mistake is to alert on everything and end up ignoring them. We start from the few alerts that matter: error rate over threshold, p95 latency out of range, resource saturation. An alert that requires no action is noise that trains the team to ignore alerts. Better three reliable alerts than thirty ignored ones.

Verdict

In 2026 there's no excuse left for shipping to production without observability. OpenTelemetry made the tooling free and portable; the residual cost is only data volume, and you control that with sampling. The return is huge: the first serious incident you resolve in ten minutes instead of an afternoon already pays back the half-day of setup. It's not a big-company thing, it's basic hygiene.