Member post originally published on Last9’s blog by Nishant Modak

Observability is often a misunderstood and misused term. It has come to mean nothing and everything at this point. Read more on how Observability can be viewed from the lens of a Radar and a Black Box.

An illustration showing two persons in mission control centre

Observability is often a misunderstood and misused term. It has come to mean nothing and everything at this point.

You can simplify thinking about software observability if you think about it in 2 parts 

1.  Radar

2.  Black-box

Radar

Radar systems are real-time. They must be able to detect any anomalies, understand related and unrelated signals, handle out-of-order data or loss of signal, and absolutely cannot have a blip. They enable operators to have a live view of their systems and respond immediately.

Definition of Radar

Radar definition
Definition of Radar

In software observability parlance, metrics & events lend themselves amazingly well to radars (monitoring). They are easy to instrument, store and fast to retrieve and alert. They are must-haves for any business to keep the lights on.

There is ONE purpose of radar systems: To avoid mishaps in the first place.

Black Box

What about black boxes? Before jumping on to ‘flight recorders’ – I realize everyone imagines them to be black, which they are absolutely not. They are fluorescent in color to ensure they can be traced back easily in case of a mishap!

Honeywell flight recorder

These black boxes are great at collecting all the data. They can record all signals, a timeline of events, actual user journeys, environments, configurations, and things you never knew existed. Have a large enough storage capacity that spans at least the last three flights!

Definition of Black Box

Black box definition
Black Box

In software observability parlance, logs & traces are best suited for understanding the sequence of events, user journeys, and code paths. They need lots of storage and are available with a latency of minutes. They are excellent at debugging, performing an RCA, and helping software engineering teams improve the system’s design.

The answers from a black box can, at times, mean going back to the drawing board to change the design of the planes itself!

There is one primary purpose of flight recorder systems: To provide all possible details to perform a post-fact or post-incident analysis. RCA.

💡Both are essential systems and lend themselves to keeping the lights on vs. ensuring that design can be continuously improved. You should not use one for another!

If you liked this post, you would like my take on understanding the model of failures and its connection to Software Reliability.