- v0.4.0 feature
See every byte
- Request + response bodies captured for every gateway and MCP call
- PII redaction at capture; per-column TTL splits hot bodies from cold metadata
- S3-compatible body offload — AWS S3, MinIO, Ceph, or the bundled RustFS
- Bloom-filter substring search across captured bodies
- audit:read_bodies permission gates the body viewer
- MCP downstream SSE — clients can subscribe to tools/call event timelines
- Per-model routing with Auto / Manual modes + drag-to-redistribute traffic bar
- Model-level kill switch — pause one model without the whole provider
- Pre-call budget check rejects requests when caps are already exhausted
- allowed_models enforced uniformly on Anthropic Messages + OpenAI Responses
Full-body audit capture#
ThinkWatch positioned itself as a bastion for AI traffic — but every audit row was metadata-only. Token counts, latency, cost. No record of what the user actually asked or what the model actually answered. Compliance teams couldn’t replay incidents, attest “PII X was sent”, or even tell two near-identical requests apart.
v0.4.0 closes the gap.
gateway_logsnow persistsrequest_body+response_body(with byte counts and abody_capture_statusdimension);mcp_logspersists the paralleltool_arguments+tool_result. ZSTD(6) compression keeps cold storage tractable; six dynamic config toggles (capture toggles per field + a globalbody_max_bytes) let operators dial scope without a redeploy. Cache-hit responses now emit a full audit row instead of disappearing into a hole.Per-column TTL + S3 offload#
Hot evidence and cold metadata have very different retention shapes. Per-column TTL lets you keep the last 30 days of bodies and several years of token/cost rows in the same table, without paying ClickHouse for the cross.
For payloads that exceed
body_max_bytes— 200k-context prompts, MCP file-read tools returning whole files, base64 images — bodies offload to S3 instead of being truncated. Backend is config-driven via six env vars (S3_ENDPOINT_URL,S3_BUCKET, etc.) and works against AWS S3, MinIO, Ceph, or the bundled RustFS container shipped in the dev and prod Docker Compose files. Object keys carry a date prefix (bodies/<table>/yyyy/mm/dd/...) so bucket lifecycle rules can age them off independently. The body-viewer endpoint transparently dereferencess3://cells, so the console doesn’t know offload exists.Bloom-filter substring search#
Captured-body columns ship with a
tokenbf_v1(512, 3, 0)index. Auditors hunting for “which conversations mentioned API key XYZ” or “which tool calls touched /etc/passwd” no longer scan every granule — the bloom filter rejects the vast majority of rows up front. Indexes are sized at ~50 KB per granule, paying for themselves the first time someone needs to answer a real compliance question.A new
audit:read_bodiespermission gates the body viewer in the console and the underlying/api/admin/{gateway,mcp}/logs/{id}/bodyendpoint, so the raw payload is preserved as legal record of intent while still requiring a separate grant to read.MCP downstream streaming#
tools/callwas buffered-only on the client side: even when the upstream emitted progress notifications, the gateway swallowed the timeline and returned a single JSON. v0.4.0 negotiates on theAcceptheader — clients includingtext/event-streamreceive each upstream event (notifications/progress, partials, final response) as a discrete SSE event downstream. JSON-only clients keep the buffered shape for backward compat. The audit pipeline still gets the full event timeline either way.Per-model routing#
Model-route configuration is now organised around the admin’s actual mental model: Auto (latency-cost, balanced, or latency-only) or Manual (a drag-to-redistribute traffic bar across peers). The priority-tier failover concept is gone; replaced with nginx-style flat peers + circuit-breaker bypass. Each model can override the global strategy; the dashboard surfaces
routing-projectionso the UI can show “your manual config will cost $X; auto would cost $Y” before you commit.Three new endpoints back the UI:
PATCH /api/admin/model-routes/batch-weights— the drag bar’s commit endpointGET /api/admin/models/{id}/routing-projection— current vs. auto split + projected $/1M tokensGET /api/admin/models/{id}/route-history— 60 one-minute p50/p95 buckets for the latency sparkline
The decision log captures provider chosen + reason + any fallback for every request, and every
gateway_logsrow stamps the actualupstream_modelso post-mortems are unambiguous.Pre-call budget enforcement#
Budget caps used to fire post-flight: a runaway client whose monthly cap was already at zero could still fire requests, and only the post-call debit would notice — by which point the upstream had already burned tokens. v0.4.0 adds a
check_budgetlifecycle stage that runs before the upstream call. Steady-state “your cap is exhausted, stop sending” is now blocked at the gate; the existing crossing alerts continue to cover the concurrent-burst race.allowed_models on every API surface#
The per-API-key
allowed_modelsallowlist is now enforced uniformly on OpenAI Chat Completions, Anthropic Messages, and OpenAI Responses. The same enforcement is wired into the shared lifecycle pipeline, so future surfaces inherit it automatically — no API surface bypasses the model allowlist.Under the hood#
A shared lifecycle pipeline unifies the AI gateway and MCP gateway request paths behind a common
Surfacetrait.check_limits,check_access,check_budget,record_usageare now stages applied symmetrically on both surfaces, eliminating the historical drift between buffered, streaming, and short-circuit branches. The MCPproxy.rsand gatewayproxy.rsboth got split into submodules, and the 1667-line dashboard component on the frontend was carved into focused subcomponents. - v0.3.0 feature
MCP, the per-user way
- Per-user upstream credentials — every developer authenticates as themselves to GitHub, Linear, Slack
- One-paste OAuth onboarding via Dynamic Client Registration
- MCP Store with bilingual templates (Linear OAuth seeded out of the box)
- Step-by-step registration wizard + auth-mode-aware edit form
- Three-tier upstream subject resolution (JWT + userinfo + discovery)
- Per-credential Test Connection on /connections
- Per-user tool catalogs — different users see different tools based on upstream permissions
- MCP response cache scoped per (user, account_label) — no cross-user leakage
- Security hardening from review — SSRF, cache, rate limits, audit
Per-user upstream credentials#
The MCP gateway no longer pretends every user is the same upstream service account. Each developer authenticates as themselves — via OAuth or PAT — to GitHub, Linear, Slack, and any other MCP-enabled service. The upstream audit trail finally works end-to-end: tickets are assigned to real people, GitHub issues are created by the engineer who actually filed them. Per-key account overrides let one user maintain multiple identities (personal + work GitHub) and pick which one a given API key uses.
One-paste OAuth onboarding#
Paste an MCP server URL into the registration wizard. ThinkWatch handles Dynamic Client Registration with the upstream OAuth server, runs the auth probe, captures the OAuth client credentials at install time, and walks you through the consent screen. 401/403 from anonymous probes is correctly classified as
auth_required(with an amber status indicator on the catalog tile) rather than a hard failure, so partially-protected servers register cleanly.MCP Store#
A bilingual template registry is now built into the gateway. Templates ship with sensible defaults, the necessary OAuth scopes, and end-user-facing notes. The Linear OAuth template is seeded out of the box; more popular services follow. Display labels disambiguate multi-install templates so a personal GitHub install and a work GitHub install appear as distinct tiles instead of two indistinguishable entries.
Step-by-step registration wizard#
MCP server registration was reworked into a guided wizard with auth-mode-aware screens. The edit form refuses to save fields that don’t belong to the current auth mode, and tool-call / install errors are now surfaced in plain language with actionable next steps instead of raw JSON-RPC error codes.
Three-tier upstream subject resolution#
When an upstream MCP server uses OAuth, ThinkWatch resolves the upstream user identity through a three-tier strategy: parse the JWT if present, fall back to the OAuth userinfo endpoint, fall back again to issuer discovery. The resolved subject becomes the cache and audit key, so two users who share an MCP server stay strictly separated downstream.
Per-credential Test Connection#
The
/connectionspage surfaces a per-credential Test Connection button. Verify your OAuth/PAT actually works against the upstream server before committing to it; the result is structured (auth_ok / auth_required / unreachable / tool_call_ok) rather than a single green/red dot, so you know exactly which step is broken.Security hardening#
Following an internal security review, the MCP path now has SSRF protection on probe URLs, the response cache is scoped per
(user, account_label)so OAuth/PAT data never leaks across users, rate limits are applied at every gateway hop, audit records are emitted on tool-call boundaries, and admin foot-gun guards prevent the most common misconfigurations (verifying static tokens at paste time, blocking obviously-wrong credential combinations). - v0.2.0 feature
Teams, spending controls, and a programmable API
- Teams — isolate keys, budgets, and analytics per business unit
- Rate limits & budget caps with fail-closed enforcement and spend alerts
- RBAC v2 — custom roles, permission history, one-click import/export
- Tokens moved to HttpOnly cookies — XSS can no longer steal sessions
- Full management API with OpenAPI docs and API-key auth
- Real-time dashboard powered by WebSocket
Teams#
Users and API keys can now be organized into teams. Each team gets its own isolated view of the analytics dashboard, its own user list, and its own scope when assigning API keys. A structured scope picker makes member assignment straightforward.
Spending controls#
A new limits engine enforces rate limits and budget caps at every layer — per user, per API key, per provider, and per MCP server. Limits are configured through a panel embedded directly in each edit dialog. When a budget is exhausted the gateway fails closed — no silent overruns. Budget alert thresholds let you set a warning before the hard cap is hit. Streaming requests are metered correctly; weighted token costs are supported for models with asymmetric pricing.
RBAC v2#
The role system has been completely reworked. Built-in and custom roles share a unified table. Custom roles are fully editable in a CodeMirror editor, support cloning from an existing role as a starting point, and track a full permission history so you can see who changed what and when. Roles can be exported and imported as JSON for environment parity. Role assignments are now scope-aware — a role granted at the team level applies only within that team.
Security hardening#
Access and refresh tokens have been migrated from
localStorageto HttpOnly cookies — the JavaScript-visible session surface is now zero. The refresh endpoint binds each token to the originating client IP, so a stolen cookie cannot be replayed from a different network. The WebSocket dashboard connection now uses a one-shot ticket rather than a long-lived credential.Management API#
The gateway now exposes a programmable management API authenticated by API key. A full OpenAPI specification is served alongside the gateway so you can integrate key lifecycle and provisioning into your own tooling or CI pipelines without touching the console.
- v0.1.0 release
Public preview
- Multi-format AI API gateway (OpenAI, Anthropic, Gemini, Bedrock)
- MCP gateway with namespace isolation and tool-level RBAC
- ClickHouse-powered audit logs with multi-channel forwarding
- First-run setup wizard and built-in configuration guide
ThinkWatch is now in public preview. This first tagged release ships the dual-port architecture (gateway
:3000, console:3001), virtual API key lifecycle management, sliding-window rate limiting, circuit breakers, and the unified log explorer.The Helm chart and distroless container images are available alongside Docker Compose for self-hosted deployments.