Skip to main content

Observability and Alerting

Observability determines whether you catch latency and reliability regressions early.

Signals to Track

Minimum baseline:

session setup success/failure
handshake error rate
direct vs relay ratio
p95/p99 latency indicators
session drop/disconnect rates

Linux host focus:

portal preflight failures
PipeWire/session backend failures
host encoder/capture startup failures

Log Strategy

Use structured logs and include:

timestamp with timezone
component (gateway, relay, host, client)
session/correlation identifiers when available
clear error category and action context

Do not log:

secrets/tokens/private key material
sensitive payload data

Alert Baseline

Create alerts for:

gateway health endpoint failure
auth failure surge
handshake failure surge
relay registration/heartbeat anomalies
sudden direct-to-relay ratio shift

Tune thresholds per environment tier.

Dashboard Baseline

At minimum, maintain dashboards for:

control-plane health and request volume
session quality and adaptation trends
relay utilization and error states
Linux host runtime health markers

Release Observability Checklist

Before rollout:

baseline current metrics
define rollback thresholds
enable focused alerting during rollout window

After rollout:

compare latency and failure deltas
verify no hidden increase in relay usage
confirm error distribution stability

Troubleshooting Correlation Tips

When an issue appears:

align client, host, and control-plane timestamps
compare before/after deployment windows
isolate whether change is network, runtime, or control-plane induced

Signals to Track
Log Strategy
Alert Baseline
Dashboard Baseline
Release Observability Checklist
Troubleshooting Correlation Tips
Related Docs