Correlation
One signal sees the shadow. Another finds the shape. A third reveals the cause.
Apart, they are clues. Together, they are the answer.
request_duration p99
2.1s
seconds
threshold exceeded
service: orders
window: 10:40 – 10:45 UTC
threshold: 500ms
current: 2100ms, 4.2x over limit
Something is slow. The metric tells you THAT something is wrong, but not WHY.
Which signal would help you find the specific slow request?
↓
same time window, same service
trace_id: a3f8b2c1d4e5 · service: orders · 10:42:01 UTC
The trace shows WHERE the time was spent, almost all of it in the database call. But WHY is the database slow?
Which signal would tell you what happened inside the database?
10:42:01.980
ERROR
lock timeout waiting for table 'orders'
10:42:01.980
concurrent_locks=47 wait_time=1980ms table=orders
trace_id: a3f8b2c1d4e5 · severity: ERROR
Metric
Tells you THAT
p99 = 2.1s
10:40 – 10:45
Trace
Shows you WHERE
database: 1980ms
trace_id: a3f8b2
Log
Tells you WHY
lock timeout, 47 locks
trace_id: a3f8b2
Metrics tell you something is wrong. Traces show you where. Logs tell you why.
Correlation is the thread that connects them.
Continue →