“We can't fix something which we can't observe” - whether it's a steam engine or a complex microservice based cloud deployment, great observability makes troubleshooting things easier. Having a clear view of your system makes early recognition and preemptive solving of problems possible. Getting the right data at the right time with associated context is a game changer for those who want better system stability.
In this blog post, we have collated a list of best observability tools in devops in the areas of log aggregation, APM, time series databases, distributed tracing and metrics collection tools. While this is not an indepth look at the strengths and weaknesses of these tools, it's a good starting point to get started on your journey to better observability.
The list contains a mix of on-premise,hybrid and SaaS platforms. Also some of the tools featured here are open-source products or built on the foundation of other open source software.
First up, we look at some log aggregation tools:
Fluentd is an open source data collection tool. It is used to analyse data from event and application logs. It is a centralizing layer for consolidating different log inputs and outputs.
Features:
ELK is a stack that includes three common open source projects : Elasticsearch, Logstash and Kibana. ELK allows you to collect logs from your applications, review and analyse these logs to create visualisations for better monitoring and troubleshooting.
Features:
Graylog is another centralised log aggregation tool that allows real-time search of large amounts of data. It uses the Elasticsearch and MongoDB frameworks. It also functions as a repository for capturing and storing machine data. Graylog has paid plans for enterprises.
Features:
Loggly is a log data processing SaaS solution. It has log tracking tools to help you monitor and analyse the logs generated from your infrastructure. Since it is a SaaS product you can start using it without installing any additional hardware or software. Loggly has freemium and paid plans.
Features:
Next up, here’s some APM (Application Performance Monitoring) tools.
Opsview is a highly scalable monitoring platform that is used by enterprises. Opsview Cloud, gives its users an unified view of their organization's IT infrastructure as well as uncovering opportunities for automation. Opsview is suitable for small to medium businesses as well. Opsview is a paid tool with a free demo available.
Features:
Zenoss offers monitoring services for IT infrastructure. It is agentless and uses a collector tool to collect system information and sends it to a central server for analysis. Zenoss captures data in real-time and places it in context. Zenoss is a paid tool.
Features:
List of top distributed tracing tools for monitoring microservice based applications.
Wavefront(Tanzu Observability) offers insight into your cloud platforms with detailed metrics, traces, logs, and relevant analytics. It has a host of integrations to major cloud hosting and incident management platforms.
Features:
Lightstep is a product that provides visibility into complex deployments. This includes analysis of redundancies and automatic root causes analysis from collected data. It also has the ability to automatically detect changes in your infrastructure. Lightstep has paid as well as freemium versions.
Features:
OpenTelemetry is an open source, vendor-neutral set of tools, APIs, SDKs with broad support for most languages and frameworks. It lets you collect telemetry data from your applications and send it to other tools for analysis.
Features:
Next up are some time series databases.
Datastax is a time series database that is built using Apache Cassandra (No SQL). Cassandra is widely used when time series data needs to be stored. It is preferred since it allows for easy scalability.
Features:
Warp 10 is a time series database that has its own analytics language and engine (Warpscript). It can be used to collect, store and analyse data. It is used in the aggregation and analysis of sensor data for IoT applications and others that require time sensitive data. Due to its GTS (Geo-timestamped) data, it is preferred for use in IoT.
Features:
Lastly here are some preferred tools used for metrics collection.
Logstash is a lightweight, open source, server-side data processing framework for storing, converting and transmitting data from a number of sources to their target destination. It ingests, converts and transmits data dynamically independent of their format or complexity. Logstash also has tight integration with Elasticsearch.
Features:
Kafka is an open-source distributed event dissemination platform with support for high-performance data pipelines, streaming analytics, data integration, and more. It is widely used for mission critical applications for its zero message loss capabilities. Kafka is widely used by organisations in the insurance, banking, manufacturing and telecom industries.
Features:
Sentry is a well known application monitoring or client-side performance monitoring tool that allows cross-functional visibility into the application’s health and performance. Assists software development lifecycle by notifying issues to developers with stack traces and trail of events.
Features:
Google Stackdriver, now known as Google cloud’s operations suite, is effective in monitoring, observing, improving and troubleshooting the applications and system performances on a Google cloud environment. It even has a freemium version for you to try out its functions and capabilities.
Features:
Amazon Cloudwatch is one of the prominent observability tools that provides monitoring and management services with actionable data insights for on-prem, AWS hybrid, infrastructure, application and services. It can be leveraged as a single platform that accumulates various information and data logs on all of the performance metrics.
Features:
Elastic Observability is specifically designed to provide granular insights and context about the behaviour of applications that are running in your infrastructure. It facilitates a single stack of data that contains logs, uptime data, metrics, user experience data, application traces and synthetics. Users can search, monitor and apply analytics on a real-time basis across the environment.
Features:
SolarWinds AppOptics is a simple and powerful solution for APM and infrastructure monitoring applications. It enhances application performance monitoring and is cost-effective for cloud-native and hybrid IT infrastructure environments. It has got a wide range of products such as IT service management, network, systems, database and IT security management solutions.
Features:
Dynatrace is an automatic and intelligent observability tool that helps in the faster transformation to cloud infrastructure. It is designed to resolve complexities across the cloud architecture with intelligent and automatic observability in a single platform.
Features:
You can never have enough visibility into your infrastructure. With the advent of microservices architecture the resulting observability tools must rise to the challenge of discovering and analysing dependencies.
Although this is not an exhaustive list of both the available tools and the listed features, as stated earlier, it is important to identify the kind of metrics you need to observe and understand how you can make this data more actionable before choosing an observability tool. You can also visit the respective websites to know more about each tool and how it can help you.
Regardless of the kind of platform you are running, we are sure that the tools listed here will be useful to you. On similar lines, for a more detailed look at the top monitoring tools used by DevOps/SREs, head over to this blog.