ThinkWatch Deployment Guide#

1. Prerequisites#

Hardware Requirements#

Resource	Minimum	Recommended
CPU	2 cores	4+ cores
RAM	4 GB	8+ GB
Disk	20 GB SSD	50+ GB SSD
Network	100 Mbps	1 Gbps

The server process itself is lightweight (Rust binary, ~50 MB RSS typical). Most resource consumption comes from PostgreSQL, Redis, and ClickHouse.

Software Requirements#

For development:

Software	Version	Purpose
Rust	Edition 2024 (see `rust-toolchain.toml`)	Build the server
Node.js	20+	Build the web UI
pnpm	9+	Web UI package manager
Docker	24+	Run infrastructure services
Docker Compose	v2+	Orchestrate dev services

For production (Docker-only):

Software	Version
Docker	24+
Docker Compose	v2+

2. Development Setup#

2.1 Clone and Install Dependencies#

git clone <repository-url> ThinkWatch
cd ThinkWatch

# Install Rust toolchain (reads from rust-toolchain.toml)
rustup show

# Install web UI dependencies
cd web && pnpm install && cd ..

2.2 Start Infrastructure Services#

make infra

This starts:

PostgreSQL on port 5432 (user: postgres, password: postgres, db: think_watch)
Redis on port 6379
ClickHouse (audit log storage) on port 8123 (HTTP) / 9000 (native TCP)
Zitadel (OIDC SSO) on port 8080

Wait for all services to become healthy:

make infra-down   # to stop services

2.3 Start the Server#

# From the project root
cargo run

The server reads configuration from environment variables (with sensible development defaults). It will:

Connect to PostgreSQL and run migrations automatically.
Connect to Redis.
Start the gateway on http://localhost:3000.
Start the console on http://localhost:3001.

2.4 Start the Web UI (Development)#

cd web
pnpm dev

The Vite dev server starts on http://localhost:5173 and proxies API requests to the console server at localhost:3001.

2.5 Creating the First Admin User (Setup Wizard)#

On a fresh database, ThinkWatch provides a setup wizard accessible through the Web UI or the API. The setup wizard creates the first admin user and optionally configures the first AI provider in a single step.

Via the Web UI:

Open http://localhost:5173 in your browser. If the system has not been initialized, the UI will automatically show the setup wizard.

Via the API:

Check whether setup is needed:

curl http://localhost:3001/api/setup/status
# Returns: {"initialized": false, "needs_setup": true}

Initialize the system:

curl -X POST http://localhost:3001/api/setup/initialize \
  -H "Content-Type: application/json" \
  -d '{
    "admin": {
      "email": "admin@example.com",
      "display_name": "Admin",
      "password": "your-secure-password"
    },
    "provider": {
      "name": "openai-prod",
      "display_name": "OpenAI",
      "provider_type": "openai",
      "base_url": "https://api.openai.com/v1",
      "api_key": "sk-..."
    }
  }'

The provider field is optional. You can add providers later through the admin UI.

Note: The setup endpoint is rate-limited (5 requests per minute) and is disabled after the first admin user is created. See the security documentation for details.

Legacy method (manual):

Alternatively, register via POST /api/auth/register and then assign the super_admin role directly in PostgreSQL:

INSERT INTO user_roles (user_id, role_id, scope)
SELECT u.id, r.id, 'global'
FROM users u, roles r
WHERE u.email = 'admin@example.com' AND r.name = 'super_admin';

Subsequent users can be promoted to admin roles through the web UI by the super admin.

2.6 Configuring Zitadel SSO (Development)#

The dev compose file starts a Zitadel instance at http://localhost:8080.

Log in to Zitadel at http://localhost:8080 with username admin and password Admin1234!.
Create a new project and an OIDC application of type “Web” with:
- Redirect URI: http://localhost:3001/api/auth/oidc/callback
- Post-logout redirect: http://localhost:5173
Copy the client ID and client secret.
Set environment variables before starting the server:

export OIDC_ISSUER_URL=http://localhost:8080
export OIDC_CLIENT_ID=<your-client-id>
export OIDC_CLIENT_SECRET=<your-client-secret>
export OIDC_REDIRECT_URL=http://localhost:3001/api/auth/oidc/callback

Or add them to a .env file in the project root (loaded automatically by dotenvy).

3. Docker Compose Production Deployment#

3.1 Create Production Environment File#

Create a .env.production file with real secrets. Never commit this file to version control.

# Database
DB_USER=thinkwatch
DB_PASSWORD=<generate-a-strong-password>
DB_NAME=think_watch

# Authentication
JWT_SECRET=<generate-a-64-char-hex-string>
ENCRYPTION_KEY=<generate-a-64-char-hex-string>

# Ports
GATEWAY_PORT=3000
WEB_PORT=80

# CORS — set to your actual console domain
CORS_ORIGINS=https://console.yourdomain.com

# ClickHouse
CLICKHOUSE_USER=default
CLICKHOUSE_PASSWORD=<generate-a-strong-password>

# Redis
REDIS_PASSWORD=<generate-a-strong-password>

# OIDC SSO (optional)
OIDC_ISSUER_URL=https://auth.yourdomain.com
OIDC_CLIENT_ID=<client-id>
OIDC_CLIENT_SECRET=<client-secret>
OIDC_REDIRECT_URL=https://console.yourdomain.com/api/auth/oidc/callback

# Logging
RUST_LOG=info,think_watch=info

3.2 Generate Secure Secrets#

# JWT_SECRET (64 hex characters = 256 bits)
openssl rand -hex 32

# ENCRYPTION_KEY (64 hex characters = 256 bits, used for AES-256-GCM)
openssl rand -hex 32

# Database and Redis passwords
openssl rand -base64 24

3.3 Deploy#

Pull the pre-built images from GitHub Container Registry and start all services:

docker compose -f deploy/docker-compose.yml --env-file .env.production pull
docker compose -f deploy/docker-compose.yml --env-file .env.production up -d

To pin a specific release instead of latest:

IMAGE_TAG=<git-sha> docker compose -f deploy/docker-compose.yml --env-file .env.production up -d

This starts:

server — The ThinkWatch Rust binary (gateway on port 3000, console on port 3001 internal only)
web — nginx serving the built React SPA (port 80)
postgres — PostgreSQL 18 with persistent volume
redis — Redis 8 with persistent volume
clickhouse — ClickHouse columnar database for audit logs

3.4 Verify Health#

# Gateway health
curl http://localhost:3000/health

# Console health (from within the Docker network, or via the web container)
docker compose -f deploy/docker-compose.yml exec server curl http://localhost:3001/api/health

3.5 ClickHouse Configuration#

ClickHouse stores audit logs in a columnar format optimized for analytical queries. Configure the connection via these environment variables in .env.production:

# ClickHouse connection
CLICKHOUSE_URL=http://clickhouse:8123
CLICKHOUSE_DB=think_watch
CLICKHOUSE_USER=default
CLICKHOUSE_PASSWORD=<your-clickhouse-password>

Variable	Default	Description
`CLICKHOUSE_URL`	`http://clickhouse:8123`	ClickHouse HTTP interface URL.
`CLICKHOUSE_DB`	`think_watch`	Database name for audit log tables.
`CLICKHOUSE_USER`	`default`	ClickHouse user for authentication.
`CLICKHOUSE_PASSWORD`	—	ClickHouse password.

ThinkWatch creates six tables in ClickHouse:

audit_logs — Security audit trail (90-day TTL)
gateway_logs — AI API request logs (90-day TTL)
mcp_logs — MCP tool invocation logs (90-day TTL)
platform_logs — Platform management operations (90-day TTL)
access_logs — HTTP access logs (30-day TTL)
app_logs — Application runtime logs (30-day TTL)

Tables are automatically created on first startup via deploy/clickhouse/init.sql.

3.6 Set Up a Reverse Proxy#

In production, place a reverse proxy in front of the Docker services to handle TLS termination. Only the gateway port (3000) and the web UI port (80) need to be reachable by end users.

Important: The console port (3001) should NOT be exposed to the public internet. Access it only through an internal network, VPN, or via the web container’s nginx that reverse-proxies /api/* to port 3001.

Example: nginx reverse proxy#

# Gateway — public-facing
server {
    listen 443 ssl;
    server_name gateway.yourdomain.com;

    ssl_certificate     /etc/letsencrypt/live/gateway.yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/gateway.yourdomain.com/privkey.pem;

    location / {
        proxy_pass http://127.0.0.1:3000;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Required for SSE streaming
        proxy_buffering off;
        proxy_cache off;
        proxy_read_timeout 300s;
    }
}

# Console Web UI — restricted access
server {
    listen 443 ssl;
    server_name console.yourdomain.com;

    ssl_certificate     /etc/letsencrypt/live/console.yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/console.yourdomain.com/privkey.pem;

    # Optionally restrict by IP
    # allow 10.0.0.0/8;
    # deny all;

    location / {
        proxy_pass http://127.0.0.1:80;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

Example: traefik (via Docker labels)#

Add labels to the server and web services in your compose override to configure Traefik routing and TLS with Let’s Encrypt automatically.

4. Kubernetes Deployment#

A Helm chart is provided at deploy/helm/think-watch/.

4.1 Images#

Pre-built images are published automatically to GitHub Container Registry on every push to main:

ghcr.io/thinkwatch/think-watch-server:latest
ghcr.io/thinkwatch/think-watch-server:<git-sha>

ghcr.io/thinkwatch/think-watch-web:latest
ghcr.io/thinkwatch/think-watch-web:<git-sha>

No authentication is needed — the packages are public.

4.2 Install with Helm#

helm install think-watch deploy/helm/think-watch \
  --set secrets.jwtSecret=$(openssl rand -hex 32) \
  --set secrets.encryptionKey=$(openssl rand -hex 32) \
  --set secrets.databaseUrl="postgres://thinkwatch:password@postgres:5432/think_watch" \
  --set secrets.redisUrl="redis://:password@redis:6379" \
  --set config.corsOrigins="https://console.internal.example.com"

To deploy a specific image tag:

helm install think-watch deploy/helm/think-watch \
  --set image.server.tag=<git-sha> \
  --set image.web.tag=<git-sha> \
  ...

4.3 Configure Ingress#

The Helm chart supports dual-host Ingress. Enable and configure in values.yaml or via --set:

ingress:
  enabled: true
  className: nginx
  gateway:
    host: gateway.yourdomain.com        # Public-facing
    tls:
      - secretName: gateway-tls
        hosts:
          - gateway.yourdomain.com
  console:
    host: console.internal.yourdomain.com  # Internal only
    tls:
      - secretName: console-tls
        hosts:
          - console.internal.yourdomain.com

For the console Ingress, use an internal ingress class or add annotations to restrict access:

ingress:
  console:
    annotations:
      nginx.ingress.kubernetes.io/whitelist-source-range: "10.0.0.0/8,172.16.0.0/12"

4.4 External Secrets#

For production, use the External Secrets Operator instead of passing secrets via --set:

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: think-watch
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: vault-backend  # or aws-secrets-manager, etc.
    kind: SecretStore
  target:
    name: think-watch-secrets
  data:
    - secretKey: jwt-secret
      remoteRef:
        key: think-watch/jwt-secret
    - secretKey: encryption-key
      remoteRef:
        key: think-watch/encryption-key
    - secretKey: database-url
      remoteRef:
        key: think-watch/database-url
    - secretKey: redis-url
      remoteRef:
        key: think-watch/redis-url

4.5 Horizontal Pod Autoscaling#

Enable HPA in values.yaml:

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

The server is stateless (all state is in PostgreSQL and Redis), so it scales horizontally without issue.

5. SSL/TLS#

Using Let’s Encrypt with a Reverse Proxy#

The recommended approach is to terminate TLS at the reverse proxy layer. ThinkWatch itself serves plain HTTP on ports 3000 and 3001.

With certbot (standalone nginx):

certbot certonly --nginx -d gateway.yourdomain.com -d console.yourdomain.com

With Kubernetes cert-manager:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: admin@yourdomain.com
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
      - http01:
          ingress:
            class: nginx

Then reference the issuer in your Ingress annotations:

annotations:
  cert-manager.io/cluster-issuer: letsencrypt-prod

Configuring CORS for HTTPS#

When the console is served over HTTPS, update the CORS_ORIGINS environment variable:

CORS_ORIGINS=https://console.yourdomain.com

Multiple origins can be comma-separated if needed.

6. Backup and Recovery#

6.1 PostgreSQL#

PostgreSQL contains all critical application state: users, API keys, provider configurations, usage records, and audit logs.

Scheduled pg_dump (simple):

# Daily backup
pg_dump -h localhost -U thinkwatch -d think_watch -Fc > backup_$(date +%Y%m%d).dump

# Restore
pg_restore -h localhost -U thinkwatch -d think_watch -c backup_20260401.dump

WAL archiving (point-in-time recovery):

For production deployments, configure PostgreSQL WAL archiving to an S3 bucket or network storage. This allows restoring to any point in time, not just the last dump. Use tools like pgBackRest or wal-g for automated WAL management.

Managed databases: If using a managed PostgreSQL service (AWS RDS, Google Cloud SQL, etc.), automated backups and point-in-time recovery are typically included.

6.2 Redis#

Redis stores ephemeral data (rate limit counters, session state, OIDC flow state). Losing Redis data is not catastrophic — rate limits reset and users need to log in again.

That said, to preserve rate limit state across restarts, enable Redis persistence:

# redis.conf
appendonly yes
appendfsync everysec

The Docker Compose files mount a persistent volume for Redis data by default.

6.3 ClickHouse#

ClickHouse stores audit logs in columnar tables. The Docker Compose files mount a persistent volume for ClickHouse data.

For disaster recovery:

Back up the ClickHouse data volume, or use clickhouse-client to export data.
Since audit logs are also written to PostgreSQL (audit_logs table), ClickHouse data can be re-inserted from the primary database if needed.

7. Monitoring#

7.1 Health Endpoints#

Endpoint	Port	Purpose
`GET /health`	3000	Gateway basic health check
`GET /health/live`	3000	Liveness probe (is the process alive?)
`GET /health/ready`	3000	Readiness probe (returns 503 if PG or Redis is down)
`GET /api/health`	3001	Console detailed health (latency, pool stats, uptime)
`GET /metrics`	3000	Prometheus metrics (unauthenticated)

Recommended probe configuration for Kubernetes:

livenessProbe:
  httpGet:
    path: /health/live
    port: 3000
  initialDelaySeconds: 5
  periodSeconds: 10
readinessProbe:
  httpGet:
    path: /health/ready
    port: 3000
  initialDelaySeconds: 10
  periodSeconds: 5

Use /health/live for liveness probes (restart the pod if the process is stuck) and /health/ready for readiness probes (remove the pod from the load balancer if dependencies are down).

7.2 Structured Logging#

ThinkWatch uses tracing with configurable output via the RUST_LOG environment variable:

# Default: info level
RUST_LOG=info,think_watch=info

# Debug for ThinkWatch crates, info for dependencies
RUST_LOG=info,think_watch=debug

# Trace-level for detailed request/response debugging
RUST_LOG=info,think_watch=trace

# JSON-formatted logs (configured in tracing-subscriber setup)

Logs are written to stdout and can be collected by any log aggregation system (Datadog, Loki, CloudWatch, etc.).

7.3 Audit Log Forwarding#

ThinkWatch supports real-time forwarding of audit events to external systems via four channels:

Channel	Protocol	Use Case
UDP Syslog	RFC 5424	Traditional SIEM (Splunk, Graylog)
TCP Syslog	RFC 5424	Reliable transport for SIEM
Kafka	REST Proxy (JSON)	Data lakes, stream processing pipelines
HTTP Webhook	POST JSON	Custom alerting, Slack/Teams integration

Configure via the admin console at Admin > Log Forwarding:

Multiple channels can run in parallel
Each channel can be independently enabled/disabled
Built-in connection testing and delivery statistics
Automatic retry with error counting

Audit events are always written to PostgreSQL and ClickHouse (if configured). Forwarding channels provide additional delivery to external systems.

7.4 Prometheus Metrics#

ThinkWatch exposes Prometheus-compatible metrics at GET /metrics on the gateway port (3000). This endpoint is unauthenticated to allow standard Prometheus scraping.

Available metrics include:

Request latency histograms (http_request_duration_seconds) — by endpoint, method, and status
Token throughput counters (llm_tokens_total) — prompt and completion tokens by model
Active connection gauges (http_active_connections) — current number of open connections
Error rate counters (http_requests_errors_total) — by endpoint and error type

Prometheus scrape configuration:

scrape_configs:
  - job_name: 'thinkwatch'
    scrape_interval: 15s
    static_configs:
      - targets: ['thinkwatch:3000']
    metrics_path: /metrics

Security: The /metrics endpoint is unauthenticated. In production, restrict access via network policies or a reverse proxy to prevent exposing internal metrics to untrusted networks.

7.5 Background Tasks#

ThinkWatch runs several background tasks that operate automatically:

Task	Interval	Description
API key lifecycle management	Periodic	Disables keys that exceed their inactivity timeout
Data retention purge	Daily	Permanently deletes soft-deleted records older than `data_retention_days`
Key expiry notifications	Daily	Logs warnings for keys expiring within 7 days

These tasks run within the server process and do not require external schedulers (e.g., cron). They are automatically active when the server starts.

8. Client Configuration Guide#

ThinkWatch includes a built-in Configuration Guide page in the web console at /gateway/guide. This page provides copy-paste setup instructions for connecting various AI client tools to your gateway and auto-detects the gateway URL.

Supported Client Tools#

The Configuration Guide provides ready-to-use instructions for:

Claude Code — set ANTHROPIC_BASE_URL to your gateway’s /v1 endpoint (uses the Anthropic Messages API at /v1/messages)
Cursor — configure the OpenAI-compatible endpoint in Cursor settings (uses /v1/chat/completions)
Continue — configure the provider in ~/.continue/config.json (uses /v1/chat/completions)
Cline — configure the API base URL in Cline settings (uses /v1/chat/completions)
OpenAI SDK — set OPENAI_BASE_URL and OPENAI_API_KEY environment variables (uses /v1/chat/completions or /v1/responses)
Anthropic SDK — set ANTHROPIC_BASE_URL and ANTHROPIC_API_KEY environment variables (uses /v1/messages)
cURL — example commands for all three API formats

Gateway API Endpoints#

The gateway (port 3000) serves three API formats on a single port:

Endpoint	Format	Typical Clients
`POST /v1/chat/completions`	OpenAI Chat Completions	Cursor, Continue, Cline, OpenAI SDK
`POST /v1/messages`	Anthropic Messages API	Claude Code, Anthropic SDK
`POST /v1/responses`	OpenAI Responses API	OpenAI SDK (2025 format)
`GET /v1/models`	OpenAI Models list	All clients

All endpoints accept tw- API keys via the Authorization: Bearer header.

9. Upgrading#

9.1 Database Migrations#

Migrations are run automatically when the server starts. The sqlx migrate system tracks which migrations have been applied in a _sqlx_migrations table and only runs new ones.

Important: Always back up your database before upgrading to a new version that includes schema changes.

9.2 Rolling Updates (Kubernetes)#

The server is stateless, so rolling updates work out of the box:

# Update the image tag
helm upgrade think-watch deploy/helm/think-watch \
  --set image.server.tag=0.2.0 \
  --reuse-values

Kubernetes will perform a rolling update, starting new pods before terminating old ones. The first new pod to start will run any pending database migrations.

Tip: Set maxSurge: 1 and maxUnavailable: 0 in your deployment strategy to ensure zero downtime during upgrades.

9.3 Docker Compose Updates#

# Pull new images or rebuild
docker compose -f deploy/docker-compose.yml --env-file .env.production build

# Restart with new images (database migrations run on startup)
docker compose -f deploy/docker-compose.yml --env-file .env.production up -d

9.4 Breaking Changes Policy#

Patch versions (0.1.x): Bug fixes only. No migration changes. Safe to upgrade without review.
Minor versions (0.x.0): May include new migrations that add tables or columns. Always backward-compatible. Review the changelog.
Major versions (x.0.0): May include breaking API changes, destructive migrations, or configuration changes. Read the upgrade guide carefully before proceeding.

Check the project changelog or release notes for migration details before upgrading.