Some paths require a wider view. Return on a larger screen.
The Full Picture
The system failed on a Tuesday.
But this time, the developer did not ask "When did this begin?"
They opened three windows and found the answer themselves.
!
PRODUCTION ALERT: orders error rate > 5%
orders · error_rate = 12.4% · threshold = 5%
14:37 UTC
An alert fires. Something is wrong. You have three tools at your disposal. Where do you start?
Choose your first investigation step.
There is no wrong answer. Each reveals a different piece.
14:37:14gatewayERROR502 bad gatewayroute=/checkout
Errors are scattered across six services. Without knowing the request path, it is hard to tell which error is the cause and which is a symptom.
One signal gave you a piece. Not the whole answer.
What do you check next?
Choose your second signal.
You need at least two signals to narrow down the problem.
Two signals, two perspectives. One showed the shape, the other added context.
The last signal will complete the picture.
Check the final signal.
You know what is left.
You have seen all three signals. Errors appear in gateway, api, and orders. Logs mention payment and inventory too.
But which service is the actual root cause?
Click the service where the failure originates.
Think about what the trace showed you.
Root cause identified
The database connection pool is exhausted.
Every service above it in the call chain, orders, api, gateway, fails as a consequence.
The log warnings from payment and inventory are symptoms of the same bottleneck.