Skip to content

Distributed Tracing

Passage records a trace for every player connection — a timeline of everything that happened from the moment a client connected until they were transferred to a backend server. These traces, together with the metrics Passage exports, give you a complete picture of your network’s health and player experience.

Prerequisite: Tracing and metrics must be enabled in configuration before anything appears in your observability stack. See Monitoring and Observability for setup instructions.

Each player connection produces a single trace. In your tracing tool (Grafana Tempo, Jaeger, etc.) it appears as a timeline bar labelled passage with a duration equal to the full connection time — typically a few hundred milliseconds for a successful transfer.

The trace is broken into phases that reflect the Minecraft protocol:

PhaseWhat it represents
Status checkResponding to a server-list ping (no login involved)
AuthenticationVerifying the player’s identity with the configured auth adapter
ConfigurationDelivering resource packs and keep-alive packets before the transfer
TransferSending the backend address and closing the connection

Slow or failed connections show up as unusually long or error-marked traces. A spike in authentication duration, for example, points directly to a slow or unreachable auth adapter.

Every trace and metric Passage emits is tagged with the following labels so you can filter by environment and version in your dashboards:

LabelValue
Service namepassage
Service namespacescrayosnet
Service versionPassage version (e.g. 0.3.0)
EnvironmentYour otel.environment config value (e.g. production)

These metrics give you a real-time view of traffic flowing through Passage.

MetricWhat it measures
listener_requestsTotal incoming connections. The decision label splits this into accepted (processed normally) and rejected (dropped by the rate limiter or a proxy protocol error).
open_connectionsHow many player connections are currently being handled.
connection_durationHow long connections take from start to finish, in seconds. Watch the p95/p99 here — a rise indicates something is slowing down the authentication or discovery phase.
transfer_connectionsConnections grouped by type: status (server-list pings), login (new player logins), or transfer (reconnecting players using a transfer cookie).
rate_limiter_sizeThe number of IPs currently tracked by the rate limiter. This should stay small during normal operation and reset itself automatically. A high value may indicate a connection flood.
client_localesDistribution of player client languages. Useful for knowing which languages to prioritize for localized disconnect messages.
client_view_distancesDistribution of view distances reported by clients during login.

When system_observer_interval is configured, Passage also reports host-level metrics. These help correlate player-facing issues with resource pressure on the host:

MetricWhat it measures
cpu_usageOverall CPU usage of the host (0–100%).
total_memory / used_memory / free_memory / available_memorySystem RAM in bytes.
total_swap / used_swap / free_swapSwap space in bytes.

When Passage creates a session cookie for a player, it embeds the current trace ID into the cookie. This means your backend servers can attach their own spans to the same trace, giving you an unbroken timeline from Passage all the way through your backend network.

The session cookie (passage:session) includes an extra field containing a traceparent value:

{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"server_address": "play.example.com",
"server_port": 25565,
"extra": {
"traceparent": "00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01"
}
}

The traceparent follows the W3C Trace Context standard and contains the trace ID that links the player’s entire journey. A backend server that reads this value and passes it to its own OpenTelemetry SDK will appear as a connected child in the same trace — no separate correlation step needed.

This enables scenarios like:

  • Viewing a single trace that covers Passage authentication, resource pack delivery, and the player’s first few seconds on a lobby server
  • Searching your trace backend by trace ID to find every service that touched a specific player connection
  • Setting alerts on end-to-end latency rather than per-service latency

The authentication cookie (passage:authentication) also has an extra field but it does not carry trace context — only the session cookie does.

  • Traffic overview: listener_requests total rate, split by decision. Shows overall throughput and rejection rate over time.
  • Connection latency: connection_duration histogram (p50, p95, p99). The single most useful signal for player-facing performance.
  • Connection types: transfer_connections by state. The ratio of transfer to login shows how effectively auth cookies are working — more transfers means fewer Mojang API calls.
  • Active connections: open_connections as a live gauge. Pair with connection_duration to spot overload.
  • Host health: cpu_usage and used_memory alongside connection metrics to catch resource-pressure incidents.
  • listener_requests{decision="rejected"} rate above baseline — possible flood or upstream misconfiguration
  • connection_duration p99 above ~2 seconds — adapter is slow or unreachable
  • open_connections growing without new listener_requests — connections are stalling

Every player connection produces a trace. For high-traffic networks this volume can be expensive to store. Most tracing backends support head-based sampling — configure your OTel Collector or tracing backend to keep only a percentage of traces (5–10% is usually enough for latency analysis). Error traces should always be kept regardless of the sampling rate.

If you use an OTel Collector between Passage and your backend, the probabilistic sampler processor is the simplest option:

processors:
probabilistic_sampler:
sampling_percentage: 10

Passage itself does not perform sampling — all traces are exported and sampling decisions are left to the pipeline.