Traceway hands-on: a full observability stack on a 512 MB box

By Seba Kubisz · · 10 min read · Developer Tool
Some links on this site are affiliate links. We only feature tools we'd genuinely recommend. More about this →

Vultr runs a free-tier program: apply for it and you get a small VPS for a year — one vCPU, 512 MB of RAM, 10 GB of disk. It is the cheapest box money can't buy. I put Traceway on it.

It ran. Logs, traces, metrics and errors all flowing into one dashboard; about 24 MB resident at idle; healthy 1.6 seconds after docker run. Under a sustained burst of synthetic load it climbed to ~75 MB resident, never dropped an ingest, and never came close to the wall. As "a full observability stack on a free box" goes, it works.

The catch is that the version of Traceway most people see — the one the marketing leads with, git clone && docker compose up -d, "self-hosted in 90 seconds" — would never have fit on that box. That command builds and runs ClickHouse, PostgreSQL, a Go backend and a SvelteKit frontend, on your hardware. The one that fits is a different, less-advertised mode. The gap between those two is most of the story — that, and what it's like to bet an indie app's observability on a five-month-old project shipped by what is effectively one person.

Two backends wearing one name

Traceway is OpenTelemetry-native: point an OTLP exporter at it and traces, metrics and logs flow in with no Traceway-specific code. Underneath, there are two storage backends, chosen at compile time by a Go build tag.

The default — and what every Docker image ships, including the 90-second docker compose up — is ClickHouse for telemetry plus PostgreSQL for relational data. The other mode is a plain go build with no tag: SQLite for both, with the frontend embedded in the binary. That's Dockerfile.sqlite, a single Alpine container that weighs about 20 MB and needs no external databases; it's also available as an in-process library you can run inside a Go program, though the docs flag that variant as development-only. Either way, the blobs — session-replay recordings, source maps, AI traces — always go to the filesystem or any S3-compatible bucket, never into the telemetry store, which is the sensible call: those are write-once, read-by-id, no querying needed.

So the "no ClickHouse" story is real, but it's a deployment choice the docs half-bury rather than the architecture. The default architecture still has ClickHouse in it. And there's a reason SQLite isn't the default everywhere: columnar storage is what keeps log search and aggregations fast as data grows, and the SQLite mode doesn't have it.

One more thing to know before you wire anything up. Traceway calls itself "OpenTelemetry-native, no proprietary SDK," and that's the direction of travel — but right now there are framework middlewares (Gin, Chi, Fiber and friends) that use a proprietary connection string, and session replay is rrweb segments posted to a proprietary endpoint, not OTLP at all. The migration to OTel-first for everything is in flight; the open issue tracker spells it out. For backend traces, metrics and logs today, plain OTLP works fine and that's what I used.

The hands-on

One thing to be clear about up front: everything below is Traceway in SQLite mode — the single small container. The default ClickHouse + Postgres deployment is the heavier, recommended path, built to keep analytics fast at scale, and I didn't run it. This is the lightweight mode on cheap hardware, not Traceway at scale.

Installing it on the small box

There is no published Docker image. Not on Docker Hub, not on GitHub Container Registry, and the CI doesn't build one. So "docker compose up -d" means: clone the repo, then your machine builds the SvelteKit app and the Go binary and the Alpine image, and pulls ClickHouse and Postgres. On a 512 MB / 10 GB box that build won't fit — not enough memory for the frontend build, not enough disk for the layers. So I built the SQLite image on a laptop and shipped it over with docker save | gzip | ssh … docker load. After that, docker run with a /data volume, and it was answering /health 1.6 seconds later. The "90-second install" the website headlines is the docker compose up path — the ClickHouse stack — not this; and the docs don't offer a "pull a prebuilt image instead" route for someone on a small box.

For a tool whose entire pitch is frictionless self-hosting, "no prebuilt image, build it yourself" is unusual — Sentry, Plausible, Umami, Grafana and Gitea all publish images you docker pull or reference in a compose file. It's a young-project gap, not a sinister one. It also means no digest-pinning and no signed-image supply-chain story yet — though a new contributor's open pull request (#136, mid-May 2026) wires up GHCR publishing with Cosign signatures, with the maintainer directing it to cover all the documented deployment variants including the SQLite single-container. If it lands as scoped, the gap closes across the board.

Instrumenting it

The "select your framework for tailored setup" picker covers six Go variants, seven JavaScript/TypeScript ones, plus Symfony, Flutter and Android — and nothing for Python, Ruby, Java, .NET, Rust or Elixir. That's less of a wall than it looks. Traceway is OpenTelemetry-native, so anything with a standard OTel SDK — which is all of those languages — connects with an endpoint URL and an auth header, the same way I wired up the synthetic load, and that's arguably a more future-proof way to instrument an app than any vendor SDK. What's missing for those languages is the hand-held onboarding, plus the fact that the "pick your framework" page doesn't mention the OTel path at all, so it can read as unsupported when it isn't. It tracks with the maintainer being a Go developer building a Go-backed product, and with the project's stated plan to lean on OTel rather than maintain per-language SDKs.

A smaller surprise: the dashboard hides the Sessions tab unless the project's framework is a front-end one. Instrument a full-stack app under a single "Go" project and your session-replay data still lands, but the tab to look at it doesn't appear until you create the project with a front-end framework. Discoverability wrinkle, easily worked around once you know.

Loading it

I drove synthetic OTLP traffic at the box: HTTP-request traces with a couple of child spans each, roughly ten log records per request, about a tenth of requests carrying an exception, and periodic gauge metrics — at three rates. Quiet (10 requests/minute) was nothing. Steady (100 requests/minute, three minutes) was a non-event: resident memory went from 38 to 45 MB, swap stayed untouched, CPU hovered near zero, every OTLP post returned 200, nothing was dropped. The burst regime — about 1,000 requests/minute, ~17 per second, held for two and a half minutes — was the real test, and it held: resident memory peaked around 75 MB and stayed there afterward (Go keeps the heap it grew), the box still had ~220 MB free, swap usage stayed negligible — about twenty megabytes of a 1.4 GB pool — every post a 200, no back-pressure, no errors in the logs. For an indie app's actual traffic, the SQLite minimal image on a free 512 MB box does not break a sweat.

Growing it

Then I bulk-loaded the telemetry database to about a million log records, plus roughly 150,000 spans, 150,000 endpoint rows and 20,000 exception occurrences — call it a year-plus of a modest app's telemetry. The file came out at 545 MB. SQLite doesn't compress, so that's about 0.4 KB per row, raw; the website's "1M daily events compresses to ~2 GB/month" claim is the ClickHouse path, which does compress — on the SQLite path you should expect materially more disk per event.

The recent-data views the dashboard shows you most of the time stayed snappy at a million rows: the logs list, a "last hour" count, grouped exceptions, a metrics aggregate — all under a third of a second, and the schema is indexed sensibly enough that the query planner uses those indexes. But the full-dataset queries dragged. "How many logs do I have" took four seconds. The endpoint-impact ranking — which the dashboard homepage renders — took 5.7 seconds (and dropped to half a second after a manual ANALYZE, which Traceway never runs, so the planner is working from stale statistics in practice). A substring log search across a million rows — body LIKE '%cache%' — took over three minutes on a cold cache, three and a half seconds warm. Resident memory crept to ~154 MB and box headroom to ~170 MB while those queries ran. Nothing fell over — but the ceiling is visible, and it's the one you'd expect: the SQLite mode trades columnar storage for zero dependencies, and full-table aggregations and substring scans over a million rows are exactly what columnar storage buys back. This isn't Traceway falling short of a claim — the SQLite build isn't pitched as a same-performance replacement for the ClickHouse deployment — it's the line past which you'd want that deployment, which is built for precisely this.

The sustainability picture

The license. Traceway says "MIT — no asterisks" on the website, in the README badge, and in the comparison table — and for its first five months that wasn't quite true. There was no LICENSE file in the repository; the README's MIT badge linked to a /blob/main/LICENSE that 404'd; GitHub detected no license. Without a license file the legal default is all-rights-reserved, whatever the prose says — so the MIT grant wasn't actually in force, and the code wasn't forkable the way the pitch implied. When I flagged it in mid-May 2026, the maintainer added a proper MIT LICENSE file within hours. So the gap is closed — but it's worth knowing it existed, and the speed of the fix is a small point in the project's favour on responsiveness.

Who's behind it. One person, effectively: one human contributor with 500-plus commits, plus the occasional outside commit and a release bot. The maintainer is actively trying to widen that base — a community Discord opened in mid-May 2026, a public ask for collaborators on r/golang back in February, and a new contributor brought onto the repo in mid-May around the Cosign-signing pull request — though that's a recruitment posture, not co-maintainership yet. The repository is five months old, and the codebase carries the markers of AI-assisted development — a CLAUDE.md, a skills/ directory, and the kind of output one person can't produce by hand. That's two separate concerns: a bus factor of one, and a throughput-versus-review mismatch — a Go backend, a SvelteKit frontend, around twenty framework integrations, mobile clients, an AI-observability pillar, SSO, and always-on session replay, all shipped in five months, with pull requests reviewed and merged by the person who wrote them. The one signal that is genuinely reassuring is commit cadence: continuous since December 2025, triple digits most months. (The GitHub "release" count is noisier than it looks — most of the back-version tags were stamped within a four-minute window.)

The funding. Traceway Cloud — the same MIT code, run by them — is the model. Five tiers, every one with the full feature set; the only axis is monthly event volume: free at 10,000 events, $12.99 at 100,000, $24.99 at 1 million, $499.99 at 200 million, "contact us" above. Two things stand out. The 10,000-events free tier is a kick-the-tires tier — a moderately busy app burns through it in hours — and there's nothing between 1 million and 200 million, so a mid-volume app jumps straight to $500. On the other side of the ledger, the billing is honestly described — "no overage charges; we notify you before the bill rises" — which is the opposite of the viral-Datadog-invoice story, and the "event" is coarse-grained (an HTTP request, an issue, or a task run; spans and individual log lines aren't billed separately), which is more generous than Sentry's accounting.

What's actually at stake. Observability data is low-lock-in: it's recent telemetry on a 30-day default retention, not years of irreplaceable notes or photos. If Traceway stalls or vanishes, you lose dashboards and recent data, not your history — the downside is bounded, and a switch costs you a fresh setup, not a migration. The realistic near-term risk is the development-gap kind — one AI-paced developer, no second pair of eyes — more than the data-loss kind. And with the LICENSE file now in place, the code is genuinely forkable if it comes to that, which puts a floor under the worst case.

Where it fits, and where it doesn't

Run the SQLite mode if you want unified logs, traces, metrics, errors and session replay in one self-hosted system instead of wiring six tools together, and you're starting small — small enough that a free 512 MB box is on the table. It carries an indie app's volume without complaint; it's a 20 MB container plus a /data volume; and you don't need ClickHouse, Postgres or a collector to run it. If you'd rather not pay Sentry's per-seat tier, that's a point in its favour. The guided setup is Go-and-JS-first, but anything with a standard OpenTelemetry SDK connects fine, so the language list isn't really a gate.

Where the lightweight mode runs out of road: around a million rows on a 512 MB / one-vCPU box, where the full-dataset views — the homepage endpoint ranking, a substring log search — go from milliseconds to seconds-or-minutes, and memory headroom tightens. If your telemetry is going to grow like that, the SQLite container isn't where you want to live long-term. Traceway's answer is the default ClickHouse + Postgres deployment, which is built for exactly that — but it's a heavier setup, a different operational beast, and I didn't run it, so treat "it scales with ClickHouse" as the design intent rather than a tested fact. If you're planning for scale, evaluate that path on its own terms — SigNoz and ClickStack (ex-HyperDX) are the other ClickHouse-backed OSS stacks in the same shape, and worth weighing alongside it.

And some cautions apply whichever way you run it: there's no published Docker image, so you build it yourself; it's five months old; and it's effectively one AI-paced developer with no second pair of eyes.

And the broadest framing, the same one that applies to observability for indies generally: most small apps don't need unified observability yet. SSH plus grep plus a free error-tracking SaaS plus an uptime monitor covers the day-to-day, and each heavier pillar should earn its container. Traceway earns it when a second host enters the picture, when you have multi-service traces to follow, or when session replay genuinely pays for itself on a consumer-facing app. Until then it's a good thing to know exists, not a thing to deploy.

Verdict

Traceway is the right shape. The build-tag storage abstraction is a clever way to be both "one tiny container" and "scales with ClickHouse" from one codebase; the blob-on-disk-or-S3 split for replay is the right design; and "a full observability stack you can run on a free VPS" is a genuinely useful thing that mostly works. It is also five months old, one AI-paced developer, Go-and-JS-first in its onboarding, released without a prebuilt image, and — in the lightweight SQLite mode — slow on analytics once the data grows. None of that is disqualifying; all of it is the profile of a promising early-stage project, and worth knowing before you commit.

Use it if you fit the niche and you treat it as "useful now, re-evaluate in a year." If you're putting an indie app's observability on it, keep an eye on two things: whether prebuilt images ship, and whether a second maintainer joins.

Try Traceway →