gw auth api ord inv db
Some paths require a wider view.
Return on a larger screen.

What is a Trace?

A hundred events pass. A number rises.
Somewhere in the system, something waits.
But where?
incident report
Users report that checkout is slow.
Your system has four services: gateway, api, orders, and database. Let's investigate with the monitoring tools you have.
metrics dashboard
842
requests / min
1.2s
latency p99
Request count looks normal. But latency is high.
The metric tells you something is slow, but not which service.

Looking at the metric dashboard: what does it tell you?

The metric found a problem but can't point to the cause. Let's check the other tool: logs — messages each service writes as it handles requests.
log viewer - last 5 seconds
14:32:01.204 [gateway] incoming POST /checkout
14:32:01.218 [api] validating cart items
14:32:01.307 [database] query inventory_check: OK
14:32:01.412 [orders] reserve_stock called
14:32:01.884 [database] write order_record: OK
14:32:02.003 [orders] confirmation generated
14:32:02.117 [api] response sent: 200 OK
14:32:02.130 [gateway] POST /checkout completed

The logs show what each service did. Can you now pinpoint which service is the bottleneck?

following one request
gateway
api
orders
database
timing breakdown:
gateway
0 – 1200ms
api
14 – 950ms
orders
100 – 800ms
database
140 – 360ms

What you just did, following one request through multiple services, is called a...

trace

A trace follows a single request from beginning to end.

What metrics and logs could not reveal alone, the trace shows.

Continue →