If you like big graphs and you cannot lie, you’re probably familiar with Grafana. :)
The particular data points that are of interest, completely depends on what you want to use it for - while some data helps you in figuring out when an outage happened and the impact it has on other parts of your system, and some others help you proactively find issues before it becomes a big outage.
For example, collecting metrics about the CPU, RAM, Disk usage of the server, network load would give an insight into your Cluster Health. Keeping them in the right range could mean that we have enough resources for our workload if these numbers are under threshold. If it exceeds that range, we would have to reduce the usage by killing unnecessary processes or by deleting old log files or by increasing the capacity of the server to handle the load etc.
Another widely used metric correlation - monitoring your status code can give you a direct indication of whether your customer is facing a product usability problem.
For example, the number of 5xx http status codes in a particular period of time may give us an insight into the severity of the customer impact.
But as is the case with any analysis, clear conclusions and root causes can be drawn out easily if you have more relevant data. However, there’s also such a thing as too much data. That being said, wouldn’t it be useful if your data could transform itself into something that makes sense as soon as you look at it?
Grafana allows you to query and visualize data from anywhere. Grafana helps turn your monitoring metrics into beautiful graphs and helps you visualize your collected data better.
While Grafana allows for a few channels of alerting, the platform is not meant to replace your IT Alerting Tool. Its primary focus is to serve as a data visualization tool. Like we’ve mentioned in a few of our other posts, it’s very important for you to make sure your data is actionable. You can do this by integrating Grafana, like you would with any monitoring tool, to your incident management platform and configure it to notify you on the customer impacting alerts.
In this particular post, we’ll be going through how you can use your Grafana data to set off alert triggers in Squadcast.
Side Note: At Squadcast, we use Grafana and absolutely love it! ❤️
We understand that you will want all of your data in one place, however, it’s important for your incident management tool to know if an engineer needs to be pinged in the middle of his meal for this alert.
A few things you can do to ensure that this integration works well for you:
Turbocharge your observability data in Grafana by making it actionable! If you have other best practices to share or just need help with the integration set-up, feel free to drop a line to our Support Team.