Got a DevOps horror story? Tell us about your worst on-call nightmares this Halloween and get featured! Click Here
Blog
Monitoring
Infrastructure monitoring using kube-prometheus operator

Infrastructure monitoring using kube-prometheus operator

June 1, 2021
Infrastructure monitoring using kube-prometheus operator
In This Article:
Our Products
On-Call Management
Incident Response
Continuous Learning
Workflow Automation

In this tutorial by Squadcast, you will learn how to install and configure infrastructure monitoring for your Kubernetes cluster using the kube-prometheus operator, displaying metrics with Grafana, and configuring alerting with Alertmanager.

Infrastructure Monitoring

One of the key principles of running clusters in production is Monitoring.

You must be aware of the resource allocation and limits of each component that is a part of the cluster.

It is of crucial importance to have insight and observability in your cluster be it a Kubernetes one, bare metal, virtual machines, or any other.

A good monitoring solution paired with a set of metrics and alerting will provide a safe environment for your workloads.

It’s safe to say that monitoring a Kubernetes cluster comes in two parts - infrastructure and workload monitoring.

The first part covers the actual infrastructure that supports your workloads. These will be the nodes or instances that host your applications. And by collecting and observing the metrics you will be very well aware of the node's health, usage, and capacity.

The other part that needs to be covered are the workloads and the microservices that you deploy on the cluster.

These can be defined in your applications or in the Kubernetes ecosystem - the pods and  containers.

Monitoring just the instances is not enough, since Kubernetes abstracts and adds additional layers for container management, this also needs to be taken into account.

The Pods are entities of their own, each with different resource requirements, limits, and usage.

Prometheus Operator

Before moving on to installing the monitoring stack, let’s have a brief intro on what the Kubernetes Operators and Custom Resources are.

In one of our previous blogs, we explained in detail what and how Kubernetes Operators and Custom Resources are used.

Kubernetes operators take the Kubernetes controller pattern that manages native Kubernetes resources (Pods, Deployments, Namespaces, Secrets, etc) and lets you apply it to your own custom resources.

Custom resources are Kubernetes objects that you define via CRDs (Custom Resource Definitions). Once a CRD is defined, you can create custom resources based on the definition and they are stored by Kubernetes. And you can interact with them through the Kubernetes API or kubectl, just like existing resources.

As you’ve read, both of these resources are extensions to the Kubernetes API and are not available by default like those of Deployment or StatefulSet kinds.

The Prometheus Operator will manage and configure a Prometheus cluster for you. Bear in mind that this contains only the core components.

And instead, in this tutorial, you will deploy the more enhanced - kube-prometheus operator.

The kube-prometheus operator will deploy all the core components plus exporters, extra configurations, dashboards, and everything else required to get your cluster monitoring up to speed.

These configurations can then be easily modified to suit your needs.

You can follow the official link if you wish to compare the differences between the operator deployment options.

Note: Manually deploying the Prometheus stack is still an option. However, you will need to deploy every component separately. This creates operational toil and will require a lot of manual steps until all services are configured and connected properly.

Installing the kube-prometheus operator

Although Kubernetes Operators may sound complex and scary, fear not, their installation is a breeze!

The people contributing to the Prometheus Operator project made its install straightforward.

Just a couple of commands need to be executed and you will have your monitoring set up in the cluster.

Let’s start.

To install the kube-prometheus operator, first clone the repository containing all the necessary files with this command:

$ git clone https://github.com/prometheus-operator/kube-prometheus.git

Now from inside the kube-prometheus folder, apply the manifests located in the manifests/setup:

$ kubectl apply -f manifests/setup

What this does is, first it’s creating the monitoring namespace that will contain all deployments for the monitoring stack.

Second, it will create all the necessary RBAC roles, role bindings, and service accounts that are required for the monitoring services to have proper privileges for access and metrics gathering.

And finally, it will create the aforementioned custom resources and custom resource definitions for the Prometheus Operator, and deploy them.

With the previous command you deployed the files for the Operator itself, but not for its services.

Wait a bit until the prometheus-operator pod is up and running.

You can check on it using:

$ kubectl -n monitoring get pods
NAME READY STATUS RESTARTS AGE
prometheus-operator-7775c66ccf-74fn5 2/2 Running 0 58s

Now you will need to deploy the services next:

$ kubectl apply -f manifests/

Running the above command will deploy:

  • Prometheus with High Availability
  • Alertmanager with High Availability
  • Grafana
  • Node exporters
  • Blackbox exporter
  • Prometheus adapter
  • Kube-state-metrics
  • And all other supporting services and configurations for the monitoring stack

After a couple of minutes, all the pods should get into a running state. You can verify again using the kubectl get pods command.

You can now port-forward and open the Prometheus, Grafana, and Alertmanager services locally:

Prometheus

$ kubectl -n monitoring port-forward svc/prometheus-k8s 9090

Grafana

$ kubectl -n monitoring port-forward svc/grafana 3000

Alertmanager

$ kubectl -n monitoring port-forward svc/alertmanager-main 9093

It’s best to check and open all of them, just to verify that everything works as expected.

Note: The default user and password for Grafana are admin/admin, after which you will be prompted to create a new password.

If you are greeted by the services user interface, that means the install was successful and you can now explore, and configure the services to fit your needs.

Grafana dashboard

Before moving on, let’s understand how the services are interconnected and what role they serve in the monitoring stack.

Service Interaction

Consider the following diagram:

Service Interaction- kube prometheus operator

The high-level overview of the stack goes like this:

Prometheus will periodically scrape (or pull) metrics via the configured exporters and metrics servers on the nodes using HTTP.

The exporters don’t send out the data to Prometheus, instead, Prometheus pulls that data from endpoints set by exporters.

The scraped metrics gets time-stamped and stored as time-series data, which get written to a persistent volume for storage, and later for analysis.

A Config Map contains the rules and configurations for Prometheus.

This configuration contains the rules, alerts, scraping jobs, and targets that Prometheus needs to do proper monitoring.

Prometheus uses its own language called PromQL in which alerts can be set, and data queried.

Example of a PromQL query:

api_http_requests_total{method="POST", handler="/messages"}

To visualize and display the data stored by Prometheus, Grafana comes into the picture.

Grafana connects to Prometheus, sets it as the data source, fetches this data, and displays it through a custom dashboard.

Alertmanager is used to notify us of any alerts. Rules set in Prometheus get evaluated, and once a violation occurs Prometheus pushes this alert notification to Alertmanager.

Once Alertmanager receives this notification, based on its own rules for routing and grouping, it will then send it over to the configured Receivers.

Similar to Prometheus, the Alertmanager rules also get stored separately either in Config Maps or Secrets.

These receivers can be either Squadcast Platform, Slack, Email, or any other configurable incident receiver.

Let’s examine this Prometheus alerting rule:

- name
:
kubernetes-system-kubelet

  rules
:
  - alert
:
KubeNodeNotReady

    annotations
:
      description
:
'{{ $labels.node }} has been unready for more than 15 minutes.'

      runbook_url
:
https://github.com/prometheus-operator/kube-prometheus/wiki/kubenodenotready

      summary
:
Node is not ready.

    expr
:
|

kube_node_status_condition{job="kube-state-metrics",condition="Ready",status="true"} == 0

    for
:
15m

    labels
:
      severity
:
warning
  • name - The name of the rule group; you can have multiple alerts under the same group
  • rules - Configured rules under the rule group
  • alert - Name of the actual Alert and how it will be displayed once it is firing
  • annotations - Under annotations, you can set more details and summaries about the alert firing
  • expr - The expression for the alert in PromQL language that gets evaluated
  • for - How long will Prometheus wait from first encountering the alert until a notification is sent. If the alert is running for 15minutes or longer, Prometheus will send a notification to Alertmanager
  • labels - additional labeling that you can attach to the alert

Using the Go templating language, you can further detail your alert and give a clearer description.

Adding a custom alerting rule

Now that you’ve learned more about how the whole Prometheus setup operates, you can modify its configuration and add your own custom alerting rules.

As explained above, Prometheus gets its configuration data from a Config Map.

If you describe the prometheus-k8s Stateful Set, you will see the prometheus-k8s-rule-files-0 Config Map that contains the config rules.

$ kubectl -n monitoring describe statefulsets prometheus-k8s
[other output truncated]
    prometheus-k8s-rulefiles-0
:
      Type
:
ConfigMap (a volume populated by a ConfigMap)

      Name
:
prometheus-k8s-rulefiles-0

      Optional
:
false

You can now edit the config map and add your custom alert.

Keep in mind that the config map will have a lot of lines!

It’s best to save it and edit it offline:

$ kubectl -n monitoring get configmap prometheus-k8s-rulefiles-0 -o yaml > prometheus-k8s-rulefiles.yaml

The configuration inside will be split under separate YAML files. And each file will contain a group of alerts based on common alert targets.

In this case, the example will cover adding an alerting rule under a new group.

But if you like, you can add it under one of the existing alert groups.

Add the following sample alert under the data block:

custom-alert-rules.yaml
:
|

  groups
:
    - name
:
custom.rules

  rules
:
    - alert
:
AlertTestJobFailed

    expr
:
kube_job_status_failed{job_name="alert-test"} == 0

    for
:
0m

  labels
:
false

    severity
:
warning

  annotations
:
    summary
:
Alert Test Job failed (instance {{ $labels.instance }})

    description
:
"Job {{$labels.namespace}}/{{$labels.exported_job}} failed to complete\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"

The alert will fire if our Kubernetes job fails to execute. Which you will test in a moment.

Now comes the tricky part.

Since the owner of the Config Map is the Prometheus Operator, you will need to remove the file ownership data.

Otherwise, any modifications to the Config Map will be reverted back without any changes!

Under the config map metadata field remove anything but the name and namespace, leave those as they are.

apiVersion
:
v1

kind
:
ConfigMap

metadata
:
  name
:
prometheus-k8s-rulefiles-0

  namespace
:
monitoring

[other data truncated]

You can also find the link to the edited config map here.

Finally, replace the existing Config Map with the new one:

$ kubectl replace -f prometheus-k8s-rulefiles.yaml
configmap/prometheus-k8s-rulefiles-0 replaced

Prometheus will automatically reload the Config Map, and after waiting a bit check on the UI to verify that the alerting rule is created:

$ kubectl -n monitoring port-forward svc/prometheus-k8s 9090
prometheus  dashboard

You can now run the test job to check if the alert will fire correctly.

$ kubectl create job alert-test --image=busybox -- sh -c "sleep 10; exit 139"

The job will create a pod that will sleep for ten seconds and exit with status 139, thus failing the job and triggering the alert.

prometheus kubernetes jobs

Setting up Slack Webhook

Since the monitoring stack is now working, it’s a good idea to configure a receiver and see how alert notifications can be made easier for human view.

Creating a Slack workspace and channel for testing purposes is easy.

If you do not already have a Slack account you can sign up and create one here.

Once logged in, on the left side, in the Channels tab click on the plus sign and create a separate channel for alerts.

prometheus Slack Webhook

Now, to configure a webhook click on Browse Slack, then Apps, and in the search bar search for webhooks.

Make sure to select the Incoming Webhooks, this is extremely important.

Incoming - it means that data will be sent to Slack, and not received from it.

prometheus Slack Webhook

Once you click on Add, you will be redirected to the Slack webhook webpage.

Just confirm with Add to Slack, and choose the previously created alert channel. In our case #prometheus_alerts.

Once that is done on the next page, you will be given the webhook URL.

That looks something like:

https://hooks.slack.com/services/XXXXXXXXXX/XXXXX/XXXXX

Keep this URL a secret, and treat it like a password!

Anyone that gets his / her hands on this URL can send anything to your channel.

At the bottom, there will be an example with curl that you can use to send a request and verify if the webhook integration works.

Going back to your channel, If you see the famous ghost emoji, this means that the webhook configuration is successful.

prometheus alerts

Note: If you received an SSL certificate problem when running the test request, add the -k option to curl.

$ curl -k -X POST {rest of the command}

Configuring Alertmanager

With the Slack channel configured with a webhook, what's left is to integrate it in the Alertmanager configuration.

In other, non-operator deployments, usually the configuration will be stored inside a Config Map.

However, deployed through the Prometheus Operator the configuration will be defined inside a Kubernetes secret.

Which is much better since its contents are encoded and not left in plain-text format. On another note, some additional steps are required when there is a configuration change.

You have two options for editing the Alertmanager configuration:

  1. You can modify the existing secret that Alertmanager uses to store its configuration.
  2. You can replace that secret with a new one that contains your updated configuration.

The first option is a bit tedious process since you will need to decode and encode back the contents every time you want to change some of the configurations.

With the second option, you can define and store your configuration in plain YAML format. Once there is a new configuration needed, you can generate the Alertmanager secret, directly from that file.

First, you will need to grab the default configuration that’s deployed with Alertmanager.

You will use that as a base for further editing.

You can output the Alertmanager secret with:

$ kubectl -n monitoring get secret alertmanager-main -o yaml

The secret will look messy at first glance.

What you are interested in is in the data field, under the alertmanager.yaml field.

That’s the configuration that Alertmanager uses.

For sake of clarity, the other fields will be truncated.

data
:
  alertmanager.yaml
:
Imdsb2JhbCI6CiAgInJlc29sdmVfdGltZW91dCI6ICI1bSIKImluaGliaXRfcnVsZXMiOgotICJlcXVhbCI6CiAgLSAibmFtZXNwYWNlIgogIC0gImFsZXJ0bmFtZSIKICAic291cmNlX21hdGNoIjoKICAgICJzZXZlcml0eSI6ICJjcml0aWNhbCIKICAidGFyZ2V0X21hdGNoX3JlIjoKICAgICJzZXZlcml0eSI6ICJ3YXJuaW5nfGluZm8iCi0gImVxdWFsIjoKICAtICJuYW1lc3BhY2UiCiAgLSAiYWxlcnRuYW1lIgogICJzb3VyY2VfbWF0Y2giOgogICAgInNldmVyaXR5IjogIndhcm5pbmciCiAgInRhcmdldF9tYXRjaF9yZSI6CiAgICAic2V2ZXJpdHkiOiAiaW5mbyIKInJlY2VpdmVycyI6Ci0gIm5hbWUiOiAiRGVmYXVsdCIKLSAibmFtZSI6ICJXYXRjaGRvZyIKLSAibmFtZSI6ICJDcml0aWNhbCIKInJvdXRlIjoKICAiZ3JvdXBfYnkiOgogIC0gIm5hbWVzcGFjZSIKICAiZ3JvdXBfaW50ZXJ2YWwiOiAiNW0iCiAgImdyb3VwX3dhaXQiOiAiMzBzIgogICJyZWNlaXZlciI6ICJEZWZhdWx0IgogICJyZXBlYXRfaW50ZXJ2YWwiOiAiMTJoIgogICJyb3V0ZXMiOgogIC0gIm1hdGNoIjoKICAgICAgImFsZXJ0bmFtZSI6ICJXYXRjaGRvZyIKICAgICJyZWNlaXZlciI6ICJXYXRjaGRvZyIKICAtICJtYXRjaCI6CiAgICAgICJzZXZlcml0eSI6ICJjcml0aWNhbCIKICAgICJyZWNlaXZlciI6ICJDcml0aWNhbCI=

In this state, you can’t do anything, since the data stored is encoded in base64.

You can either decode it using online free decoders available on the web.

Or you can do it through the terminal:

$ echo “Imdsb2JhbCI6CiAgInJlc29sdmVfdGltZW91dCI6ICI1bSIKImluaGliaXRfcnVsZXMiOgotICJlcXVhbCI6CiAgLSAibmFtZXNwYWNlIgogIC0gImFsZXJ0bmFtZSIKICAic291cmNlX21hdGNoIjoKICAgICJzZXZlcml0eSI6ICJjcml0aWNhbCIKICAidGFyZ2V0X21hdGNoX3JlIjoKICAgICJzZXZlcml0eSI6ICJ3YXJuaW5nfGluZm8iCi0gImVxdWFsIjoKICAtICJuYW1lc3BhY2UiCiAgLSAiYWxlcnRuYW1lIgogICJzb3VyY2VfbWF0Y2giOgogICAgInNldmVyaXR5IjogIndhcm5pbmciCiAgInRhcmdldF9tYXRjaF9yZSI6CiAgICAic2V2ZXJpdHkiOiAiaW5mbyIKInJlY2VpdmVycyI6Ci0gIm5hbWUiOiAiRGVmYXVsdCIKLSAibmFtZSI6ICJXYXRjaGRvZyIKLSAibmFtZSI6ICJDcml0aWNhbCIKInJvdXRlIjoKICAiZ3JvdXBfYnkiOgogIC0gIm5hbWVzcGFjZSIKICAiZ3JvdXBfaW50ZXJ2YWwiOiAiNW0iCiAgImdyb3VwX3dhaXQiOiAiMzBzIgogICJyZWNlaXZlciI6ICJEZWZhdWx0IgogICJyZXBlYXRfaW50ZXJ2YWwiOiAiMTJoIgogICJyb3V0ZXMiOgogIC0gIm1hdGNoIjoKICAgICAgImFsZXJ0bmFtZSI6ICJXYXRjaGRvZyIKICAgICJyZWNlaXZlciI6ICJXYXRjaGRvZyIKICAtICJtYXRjaCI6CiAgICAgICJzZXZlcml0eSI6ICJjcml0aWNhbCIKICAgICJyZWNlaXZlciI6ICJDcml0aWNhbCI=” | base64 -d

The above command will echo the contents and pipe them through base64 with the -d option for decoding.

The final, cleaned up configuration will look like this:

"global"
:
  "resolve_timeout"
:
"5m"

"inhibit_rules"
:
  - "equal"
:
    - "namespace"

    - "alertname"

    "source_match"
:
      "severity"
:
"critical"

    "target_match_re"
:
      "severity"
:
"warning|info"

  - "equal"
:
    - "namespace"

    - "alertname"

    "source_match"
:
      "severity"
:
"warning"

    "target_match_re"
:
      "severity"
:
"info"

"receivers"
:
  - "name"
:
"Default"

  - "name"
:
"Watchdog"

  - "name"
:
"Critical"

"route"
:
  "group_by"
:
  - "namespace"

  "group_interval"
:
"5m"

  "group_wait"
:
"30s"

  "receiver"
:
"Default"

  "repeat_interval"
:
"12h"

  "routes"
:
  - "match"
:
    "alertname"
:
"Watchdog"

    "receiver"
:
"Watchdog"

  - "match"
:
    "severity"
:
"critical"

    "receiver"
:
"Critical"

Now to add the Slack webhook, a couple of edits will be needed to the configuration.

On the following link from the official docs, you can see what options are available.

Copy the configuration output from above, to a new file named alertmanager.yaml.

In that file, you will need to add three separate configs for Alertmanager in order to send alerts on Slack.

    1. Get the Slack webhook URL created previously, and you will add it under the global config as slack_api_url value.

"global"
:
  "resolve_timeout"
:
"5m"

  "slack_api_url"
:
"https://hooks.slack.com/services/xxxxxxxxxxxx/xxxxxxxxxx/x"

[other data truncated]

    2. Under the route section, replace the default receiver Default with slack.

"receiver"
:
"slack"

[other data truncated]

    3. Create a separate receiver for Slack alerts. Under the channel, add the alert channel name you created earlier.

"receivers"
:
  - "name"
:
"slack"

    "slack_configs"
:
    - "channel"
:
"#prometheus_alerts"

      "send_resolved"
:
"true"

      "text"
:
" \nsummary: {{ .CommonAnnotations.summary }}\ndescription: {{ .CommonAnnotations.description }}"

[other data truncated]

Victory is close!

For you to be able to add this new configuration as a new secret, you must first delete the existing one:

$ kubectl -n monitoring delete secret alertmanager-main
secret "alertmanager-main" deleted

Now create a new secret with the exact same name, and using the alertmanager.yaml file:

$ kubectl -n monitoring create secret generic alertmanager-main --from-file=alertmanager.yaml
secret/alertmanager-main created

Finally, be patient and wait a bit for Alertmanager to load the new configuration.

If all is configured properly, you should now see some alerts on your Slack alert channel.

Alertmanager slack integration

The Alertmanager options are endless, there are a lot of tweaks and fine-tuning that can be done.

You can set the intervals, group methods, different receivers, alert severity, you can even configure the alerting messages however you like.

Above is just a sample to showcase the basic configuration options.

Recap

Let’s recap on what you’ve learned from this tutorial:

  • You learned more about Infrastructure Monitoring
  • Got familiar with Kubernetes Operators
  • Installed kube-prometheus operator as a monitoring solution on your cluster
  • You learned how the Prometheus components communicate with each other
  • Added your own custom alerting rule
  • Configured Alertmanager to send alerts to a Slack channel using webhooks

From the team at Squadcast, we encourage you to keep on learning!

Written By:
June 1, 2021
Kristijan Mitevski
Kristijan Mitevski
June 1, 2021
Monitoring
Kubernetes
Best Practices
Share this blog:
In This Article:
Get reliability insights delivered straight to your inbox.
Get ready for the good stuff! No spam, no data sale and no promotion. Just the awesome content you signed up for.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Get reliability insights delivered straight to your inbox.
Get ready for the good stuff! No spam, no data sale and no promotion. Just the awesome content you signed up for.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Get the latest scoop on Reliability insights. Delivered straight to your inbox.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2 Users love Squadcast on G2
Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2 Users love Squadcast on G2
Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2
Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2
Users love Squadcast on G2
Copyright © Squadcast Inc. 2017-2024

Infrastructure monitoring using kube-prometheus operator

Jun 1, 2021
Last Updated:
May 2, 2024
Share this post:
Infrastructure monitoring using kube-prometheus operator

With the Kube-Prometheus stack, learn how you can monitor Kubernetes implementations. Leverage Prometheus Alertmanager cluster to route alerts to Slack. Read more here.

Table of Contents:

    In this tutorial by Squadcast, you will learn how to install and configure infrastructure monitoring for your Kubernetes cluster using the kube-prometheus operator, displaying metrics with Grafana, and configuring alerting with Alertmanager.

    Infrastructure Monitoring

    One of the key principles of running clusters in production is Monitoring.

    You must be aware of the resource allocation and limits of each component that is a part of the cluster.

    It is of crucial importance to have insight and observability in your cluster be it a Kubernetes one, bare metal, virtual machines, or any other.

    A good monitoring solution paired with a set of metrics and alerting will provide a safe environment for your workloads.

    It’s safe to say that monitoring a Kubernetes cluster comes in two parts - infrastructure and workload monitoring.

    The first part covers the actual infrastructure that supports your workloads. These will be the nodes or instances that host your applications. And by collecting and observing the metrics you will be very well aware of the node's health, usage, and capacity.

    The other part that needs to be covered are the workloads and the microservices that you deploy on the cluster.

    These can be defined in your applications or in the Kubernetes ecosystem - the pods and  containers.

    Monitoring just the instances is not enough, since Kubernetes abstracts and adds additional layers for container management, this also needs to be taken into account.

    The Pods are entities of their own, each with different resource requirements, limits, and usage.

    Prometheus Operator

    Before moving on to installing the monitoring stack, let’s have a brief intro on what the Kubernetes Operators and Custom Resources are.

    In one of our previous blogs, we explained in detail what and how Kubernetes Operators and Custom Resources are used.

    Kubernetes operators take the Kubernetes controller pattern that manages native Kubernetes resources (Pods, Deployments, Namespaces, Secrets, etc) and lets you apply it to your own custom resources.

    Custom resources are Kubernetes objects that you define via CRDs (Custom Resource Definitions). Once a CRD is defined, you can create custom resources based on the definition and they are stored by Kubernetes. And you can interact with them through the Kubernetes API or kubectl, just like existing resources.

    As you’ve read, both of these resources are extensions to the Kubernetes API and are not available by default like those of Deployment or StatefulSet kinds.

    The Prometheus Operator will manage and configure a Prometheus cluster for you. Bear in mind that this contains only the core components.

    And instead, in this tutorial, you will deploy the more enhanced - kube-prometheus operator.

    The kube-prometheus operator will deploy all the core components plus exporters, extra configurations, dashboards, and everything else required to get your cluster monitoring up to speed.

    These configurations can then be easily modified to suit your needs.

    You can follow the official link if you wish to compare the differences between the operator deployment options.

    Note: Manually deploying the Prometheus stack is still an option. However, you will need to deploy every component separately. This creates operational toil and will require a lot of manual steps until all services are configured and connected properly.

    Installing the kube-prometheus operator

    Although Kubernetes Operators may sound complex and scary, fear not, their installation is a breeze!

    The people contributing to the Prometheus Operator project made its install straightforward.

    Just a couple of commands need to be executed and you will have your monitoring set up in the cluster.

    Let’s start.

    To install the kube-prometheus operator, first clone the repository containing all the necessary files with this command:

    $ git clone https://github.com/prometheus-operator/kube-prometheus.git

    Now from inside the kube-prometheus folder, apply the manifests located in the manifests/setup:

    $ kubectl apply -f manifests/setup

    What this does is, first it’s creating the monitoring namespace that will contain all deployments for the monitoring stack.

    Second, it will create all the necessary RBAC roles, role bindings, and service accounts that are required for the monitoring services to have proper privileges for access and metrics gathering.

    And finally, it will create the aforementioned custom resources and custom resource definitions for the Prometheus Operator, and deploy them.

    With the previous command you deployed the files for the Operator itself, but not for its services.

    Wait a bit until the prometheus-operator pod is up and running.

    You can check on it using:

    $ kubectl -n monitoring get pods
    NAME READY STATUS RESTARTS AGE
    prometheus-operator-7775c66ccf-74fn5 2/2 Running 0 58s

    Now you will need to deploy the services next:

    $ kubectl apply -f manifests/

    Running the above command will deploy:

    • Prometheus with High Availability
    • Alertmanager with High Availability
    • Grafana
    • Node exporters
    • Blackbox exporter
    • Prometheus adapter
    • Kube-state-metrics
    • And all other supporting services and configurations for the monitoring stack

    After a couple of minutes, all the pods should get into a running state. You can verify again using the kubectl get pods command.

    You can now port-forward and open the Prometheus, Grafana, and Alertmanager services locally:

    Prometheus

    $ kubectl -n monitoring port-forward svc/prometheus-k8s 9090

    Grafana

    $ kubectl -n monitoring port-forward svc/grafana 3000

    Alertmanager

    $ kubectl -n monitoring port-forward svc/alertmanager-main 9093

    It’s best to check and open all of them, just to verify that everything works as expected.

    Note: The default user and password for Grafana are admin/admin, after which you will be prompted to create a new password.

    If you are greeted by the services user interface, that means the install was successful and you can now explore, and configure the services to fit your needs.

    Grafana dashboard

    Before moving on, let’s understand how the services are interconnected and what role they serve in the monitoring stack.

    Service Interaction

    Consider the following diagram:

    Service Interaction- kube prometheus operator

    The high-level overview of the stack goes like this:

    Prometheus will periodically scrape (or pull) metrics via the configured exporters and metrics servers on the nodes using HTTP.

    The exporters don’t send out the data to Prometheus, instead, Prometheus pulls that data from endpoints set by exporters.

    The scraped metrics gets time-stamped and stored as time-series data, which get written to a persistent volume for storage, and later for analysis.

    A Config Map contains the rules and configurations for Prometheus.

    This configuration contains the rules, alerts, scraping jobs, and targets that Prometheus needs to do proper monitoring.

    Prometheus uses its own language called PromQL in which alerts can be set, and data queried.

    Example of a PromQL query:

    api_http_requests_total{method="POST", handler="/messages"}

    To visualize and display the data stored by Prometheus, Grafana comes into the picture.

    Grafana connects to Prometheus, sets it as the data source, fetches this data, and displays it through a custom dashboard.

    Alertmanager is used to notify us of any alerts. Rules set in Prometheus get evaluated, and once a violation occurs Prometheus pushes this alert notification to Alertmanager.

    Once Alertmanager receives this notification, based on its own rules for routing and grouping, it will then send it over to the configured Receivers.

    Similar to Prometheus, the Alertmanager rules also get stored separately either in Config Maps or Secrets.

    These receivers can be either Squadcast Platform, Slack, Email, or any other configurable incident receiver.

    Let’s examine this Prometheus alerting rule:

    - name
    :
    kubernetes-system-kubelet

      rules
    :
      - alert
    :
    KubeNodeNotReady

        annotations
    :
          description
    :
    '{{ $labels.node }} has been unready for more than 15 minutes.'

          runbook_url
    :
    https://github.com/prometheus-operator/kube-prometheus/wiki/kubenodenotready

          summary
    :
    Node is not ready.

        expr
    :
    |

    kube_node_status_condition{job="kube-state-metrics",condition="Ready",status="true"} == 0

        for
    :
    15m

        labels
    :
          severity
    :
    warning
    • name - The name of the rule group; you can have multiple alerts under the same group
    • rules - Configured rules under the rule group
    • alert - Name of the actual Alert and how it will be displayed once it is firing
    • annotations - Under annotations, you can set more details and summaries about the alert firing
    • expr - The expression for the alert in PromQL language that gets evaluated
    • for - How long will Prometheus wait from first encountering the alert until a notification is sent. If the alert is running for 15minutes or longer, Prometheus will send a notification to Alertmanager
    • labels - additional labeling that you can attach to the alert

    Using the Go templating language, you can further detail your alert and give a clearer description.

    Adding a custom alerting rule

    Now that you’ve learned more about how the whole Prometheus setup operates, you can modify its configuration and add your own custom alerting rules.

    As explained above, Prometheus gets its configuration data from a Config Map.

    If you describe the prometheus-k8s Stateful Set, you will see the prometheus-k8s-rule-files-0 Config Map that contains the config rules.

    $ kubectl -n monitoring describe statefulsets prometheus-k8s
    [other output truncated]
        prometheus-k8s-rulefiles-0
    :
          Type
    :
    ConfigMap (a volume populated by a ConfigMap)

          Name
    :
    prometheus-k8s-rulefiles-0

          Optional
    :
    false

    You can now edit the config map and add your custom alert.

    Keep in mind that the config map will have a lot of lines!

    It’s best to save it and edit it offline:

    $ kubectl -n monitoring get configmap prometheus-k8s-rulefiles-0 -o yaml > prometheus-k8s-rulefiles.yaml

    The configuration inside will be split under separate YAML files. And each file will contain a group of alerts based on common alert targets.

    In this case, the example will cover adding an alerting rule under a new group.

    But if you like, you can add it under one of the existing alert groups.

    Add the following sample alert under the data block:

    custom-alert-rules.yaml
    :
    |

      groups
    :
        - name
    :
    custom.rules

      rules
    :
        - alert
    :
    AlertTestJobFailed

        expr
    :
    kube_job_status_failed{job_name="alert-test"} == 0

        for
    :
    0m

      labels
    :
    false

        severity
    :
    warning

      annotations
    :
        summary
    :
    Alert Test Job failed (instance {{ $labels.instance }})

        description
    :
    "Job {{$labels.namespace}}/{{$labels.exported_job}} failed to complete\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"

    The alert will fire if our Kubernetes job fails to execute. Which you will test in a moment.

    Now comes the tricky part.

    Since the owner of the Config Map is the Prometheus Operator, you will need to remove the file ownership data.

    Otherwise, any modifications to the Config Map will be reverted back without any changes!

    Under the config map metadata field remove anything but the name and namespace, leave those as they are.

    apiVersion
    :
    v1

    kind
    :
    ConfigMap

    metadata
    :
      name
    :
    prometheus-k8s-rulefiles-0

      namespace
    :
    monitoring

    [other data truncated]

    You can also find the link to the edited config map here.

    Finally, replace the existing Config Map with the new one:

    $ kubectl replace -f prometheus-k8s-rulefiles.yaml
    configmap/prometheus-k8s-rulefiles-0 replaced

    Prometheus will automatically reload the Config Map, and after waiting a bit check on the UI to verify that the alerting rule is created:

    $ kubectl -n monitoring port-forward svc/prometheus-k8s 9090
    prometheus  dashboard

    You can now run the test job to check if the alert will fire correctly.

    $ kubectl create job alert-test --image=busybox -- sh -c "sleep 10; exit 139"

    The job will create a pod that will sleep for ten seconds and exit with status 139, thus failing the job and triggering the alert.

    prometheus kubernetes jobs

    Setting up Slack Webhook

    Since the monitoring stack is now working, it’s a good idea to configure a receiver and see how alert notifications can be made easier for human view.

    Creating a Slack workspace and channel for testing purposes is easy.

    If you do not already have a Slack account you can sign up and create one here.

    Once logged in, on the left side, in the Channels tab click on the plus sign and create a separate channel for alerts.

    prometheus Slack Webhook

    Now, to configure a webhook click on Browse Slack, then Apps, and in the search bar search for webhooks.

    Make sure to select the Incoming Webhooks, this is extremely important.

    Incoming - it means that data will be sent to Slack, and not received from it.

    prometheus Slack Webhook

    Once you click on Add, you will be redirected to the Slack webhook webpage.

    Just confirm with Add to Slack, and choose the previously created alert channel. In our case #prometheus_alerts.

    Once that is done on the next page, you will be given the webhook URL.

    That looks something like:

    https://hooks.slack.com/services/XXXXXXXXXX/XXXXX/XXXXX

    Keep this URL a secret, and treat it like a password!

    Anyone that gets his / her hands on this URL can send anything to your channel.

    At the bottom, there will be an example with curl that you can use to send a request and verify if the webhook integration works.

    Going back to your channel, If you see the famous ghost emoji, this means that the webhook configuration is successful.

    prometheus alerts

    Note: If you received an SSL certificate problem when running the test request, add the -k option to curl.

    $ curl -k -X POST {rest of the command}

    Configuring Alertmanager

    With the Slack channel configured with a webhook, what's left is to integrate it in the Alertmanager configuration.

    In other, non-operator deployments, usually the configuration will be stored inside a Config Map.

    However, deployed through the Prometheus Operator the configuration will be defined inside a Kubernetes secret.

    Which is much better since its contents are encoded and not left in plain-text format. On another note, some additional steps are required when there is a configuration change.

    You have two options for editing the Alertmanager configuration:

    1. You can modify the existing secret that Alertmanager uses to store its configuration.
    2. You can replace that secret with a new one that contains your updated configuration.

    The first option is a bit tedious process since you will need to decode and encode back the contents every time you want to change some of the configurations.

    With the second option, you can define and store your configuration in plain YAML format. Once there is a new configuration needed, you can generate the Alertmanager secret, directly from that file.

    First, you will need to grab the default configuration that’s deployed with Alertmanager.

    You will use that as a base for further editing.

    You can output the Alertmanager secret with:

    $ kubectl -n monitoring get secret alertmanager-main -o yaml

    The secret will look messy at first glance.

    What you are interested in is in the data field, under the alertmanager.yaml field.

    That’s the configuration that Alertmanager uses.

    For sake of clarity, the other fields will be truncated.

    data
    :
      alertmanager.yaml
    :
    Imdsb2JhbCI6CiAgInJlc29sdmVfdGltZW91dCI6ICI1bSIKImluaGliaXRfcnVsZXMiOgotICJlcXVhbCI6CiAgLSAibmFtZXNwYWNlIgogIC0gImFsZXJ0bmFtZSIKICAic291cmNlX21hdGNoIjoKICAgICJzZXZlcml0eSI6ICJjcml0aWNhbCIKICAidGFyZ2V0X21hdGNoX3JlIjoKICAgICJzZXZlcml0eSI6ICJ3YXJuaW5nfGluZm8iCi0gImVxdWFsIjoKICAtICJuYW1lc3BhY2UiCiAgLSAiYWxlcnRuYW1lIgogICJzb3VyY2VfbWF0Y2giOgogICAgInNldmVyaXR5IjogIndhcm5pbmciCiAgInRhcmdldF9tYXRjaF9yZSI6CiAgICAic2V2ZXJpdHkiOiAiaW5mbyIKInJlY2VpdmVycyI6Ci0gIm5hbWUiOiAiRGVmYXVsdCIKLSAibmFtZSI6ICJXYXRjaGRvZyIKLSAibmFtZSI6ICJDcml0aWNhbCIKInJvdXRlIjoKICAiZ3JvdXBfYnkiOgogIC0gIm5hbWVzcGFjZSIKICAiZ3JvdXBfaW50ZXJ2YWwiOiAiNW0iCiAgImdyb3VwX3dhaXQiOiAiMzBzIgogICJyZWNlaXZlciI6ICJEZWZhdWx0IgogICJyZXBlYXRfaW50ZXJ2YWwiOiAiMTJoIgogICJyb3V0ZXMiOgogIC0gIm1hdGNoIjoKICAgICAgImFsZXJ0bmFtZSI6ICJXYXRjaGRvZyIKICAgICJyZWNlaXZlciI6ICJXYXRjaGRvZyIKICAtICJtYXRjaCI6CiAgICAgICJzZXZlcml0eSI6ICJjcml0aWNhbCIKICAgICJyZWNlaXZlciI6ICJDcml0aWNhbCI=

    In this state, you can’t do anything, since the data stored is encoded in base64.

    You can either decode it using online free decoders available on the web.

    Or you can do it through the terminal:

    $ echo “Imdsb2JhbCI6CiAgInJlc29sdmVfdGltZW91dCI6ICI1bSIKImluaGliaXRfcnVsZXMiOgotICJlcXVhbCI6CiAgLSAibmFtZXNwYWNlIgogIC0gImFsZXJ0bmFtZSIKICAic291cmNlX21hdGNoIjoKICAgICJzZXZlcml0eSI6ICJjcml0aWNhbCIKICAidGFyZ2V0X21hdGNoX3JlIjoKICAgICJzZXZlcml0eSI6ICJ3YXJuaW5nfGluZm8iCi0gImVxdWFsIjoKICAtICJuYW1lc3BhY2UiCiAgLSAiYWxlcnRuYW1lIgogICJzb3VyY2VfbWF0Y2giOgogICAgInNldmVyaXR5IjogIndhcm5pbmciCiAgInRhcmdldF9tYXRjaF9yZSI6CiAgICAic2V2ZXJpdHkiOiAiaW5mbyIKInJlY2VpdmVycyI6Ci0gIm5hbWUiOiAiRGVmYXVsdCIKLSAibmFtZSI6ICJXYXRjaGRvZyIKLSAibmFtZSI6ICJDcml0aWNhbCIKInJvdXRlIjoKICAiZ3JvdXBfYnkiOgogIC0gIm5hbWVzcGFjZSIKICAiZ3JvdXBfaW50ZXJ2YWwiOiAiNW0iCiAgImdyb3VwX3dhaXQiOiAiMzBzIgogICJyZWNlaXZlciI6ICJEZWZhdWx0IgogICJyZXBlYXRfaW50ZXJ2YWwiOiAiMTJoIgogICJyb3V0ZXMiOgogIC0gIm1hdGNoIjoKICAgICAgImFsZXJ0bmFtZSI6ICJXYXRjaGRvZyIKICAgICJyZWNlaXZlciI6ICJXYXRjaGRvZyIKICAtICJtYXRjaCI6CiAgICAgICJzZXZlcml0eSI6ICJjcml0aWNhbCIKICAgICJyZWNlaXZlciI6ICJDcml0aWNhbCI=” | base64 -d

    The above command will echo the contents and pipe them through base64 with the -d option for decoding.

    The final, cleaned up configuration will look like this:

    "global"
    :
      "resolve_timeout"
    :
    "5m"

    "inhibit_rules"
    :
      - "equal"
    :
        - "namespace"

        - "alertname"

        "source_match"
    :
          "severity"
    :
    "critical"

        "target_match_re"
    :
          "severity"
    :
    "warning|info"

      - "equal"
    :
        - "namespace"

        - "alertname"

        "source_match"
    :
          "severity"
    :
    "warning"

        "target_match_re"
    :
          "severity"
    :
    "info"

    "receivers"
    :
      - "name"
    :
    "Default"

      - "name"
    :
    "Watchdog"

      - "name"
    :
    "Critical"

    "route"
    :
      "group_by"
    :
      - "namespace"

      "group_interval"
    :
    "5m"

      "group_wait"
    :
    "30s"

      "receiver"
    :
    "Default"

      "repeat_interval"
    :
    "12h"

      "routes"
    :
      - "match"
    :
        "alertname"
    :
    "Watchdog"

        "receiver"
    :
    "Watchdog"

      - "match"
    :
        "severity"
    :
    "critical"

        "receiver"
    :
    "Critical"

    Now to add the Slack webhook, a couple of edits will be needed to the configuration.

    On the following link from the official docs, you can see what options are available.

    Copy the configuration output from above, to a new file named alertmanager.yaml.

    In that file, you will need to add three separate configs for Alertmanager in order to send alerts on Slack.

        1. Get the Slack webhook URL created previously, and you will add it under the global config as slack_api_url value.

    "global"
    :
      "resolve_timeout"
    :
    "5m"

      "slack_api_url"
    :
    "https://hooks.slack.com/services/xxxxxxxxxxxx/xxxxxxxxxx/x"

    [other data truncated]

        2. Under the route section, replace the default receiver Default with slack.

    "receiver"
    :
    "slack"

    [other data truncated]

        3. Create a separate receiver for Slack alerts. Under the channel, add the alert channel name you created earlier.

    "receivers"
    :
      - "name"
    :
    "slack"

        "slack_configs"
    :
        - "channel"
    :
    "#prometheus_alerts"

          "send_resolved"
    :
    "true"

          "text"
    :
    " \nsummary: {{ .CommonAnnotations.summary }}\ndescription: {{ .CommonAnnotations.description }}"

    [other data truncated]

    Victory is close!

    For you to be able to add this new configuration as a new secret, you must first delete the existing one:

    $ kubectl -n monitoring delete secret alertmanager-main
    secret "alertmanager-main" deleted

    Now create a new secret with the exact same name, and using the alertmanager.yaml file:

    $ kubectl -n monitoring create secret generic alertmanager-main --from-file=alertmanager.yaml
    secret/alertmanager-main created

    Finally, be patient and wait a bit for Alertmanager to load the new configuration.

    If all is configured properly, you should now see some alerts on your Slack alert channel.

    Alertmanager slack integration

    The Alertmanager options are endless, there are a lot of tweaks and fine-tuning that can be done.

    You can set the intervals, group methods, different receivers, alert severity, you can even configure the alerting messages however you like.

    Above is just a sample to showcase the basic configuration options.

    Recap

    Let’s recap on what you’ve learned from this tutorial:

    • You learned more about Infrastructure Monitoring
    • Got familiar with Kubernetes Operators
    • Installed kube-prometheus operator as a monitoring solution on your cluster
    • You learned how the Prometheus components communicate with each other
    • Added your own custom alerting rule
    • Configured Alertmanager to send alerts to a Slack channel using webhooks

    From the team at Squadcast, we encourage you to keep on learning!

    What you should do now
    • Schedule a demo with Squadcast to learn about the platform, answer your questions, and evaluate if Squadcast is the right fit for you.
    • Curious about how Squadcast can assist you in implementing SRE best practices? Discover the platform's capabilities through our Interactive Demo.
    • Enjoyed the article? Explore further insights on the best SRE practices.
    • Schedule a demo with Squadcast to learn about the platform, answer your questions, and evaluate if Squadcast is the right fit for you.
    • Curious about how Squadcast can assist you in implementing SRE best practices? Discover the platform's capabilities through our Interactive Demo.
    • Enjoyed the article? Explore further insights on the best SRE practices.
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management.
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    What you should do now?
    Here are 3 ways you can continue your journey to learn more about Unified Incident Management
    Discover the platform's capabilities through our Interactive Demo.
    See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management.
    Share the article
    Share this blog post on Facebook, Twitter, Reddit or LinkedIn.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Compare our plans and find the perfect fit for your business.
    See Redis' Journey to Efficient Incident Management through alert noise reduction With Squadcast.
    Discover the platform's capabilities through our Interactive Demo.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Compare Squadcast & PagerDuty / Opsgenie
    Compare and see if Squadcast is the right fit for your needs.
    Compare our plans and find the perfect fit for your business.
    Learn how Scoro created a solid foundation for better on-call practices with Squadcast.
    Discover the platform's capabilities through our Interactive Demo.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Learn how Scoro created a solid foundation for better on-call practices with Squadcast.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Discover the platform's capabilities through our Interactive Demo.
    Enjoyed the article? Explore further insights on the best SRE practices.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Enjoyed the article? Explore further insights on the best SRE practices.
    Written By:
    June 1, 2021
    June 1, 2021
    Share this post:
    Subscribe to our LinkedIn Newsletter to receive more educational content
    Subscribe now
    ant-design-linkedIN

    Subscribe to our latest updates

    Enter your Email Id
    Thank you! Your submission has been received!
    Oops! Something went wrong while submitting the form.
    FAQs
    More from
    Kristijan Mitevski
    Traditional vs Modern Incident Response
    Traditional vs Modern Incident Response
    February 24, 2022
    Infrastructure as Code: All you need to know
    Infrastructure as Code: All you need to know
    November 26, 2021
    Implementing Istio in a Kubernetes cluster
    Implementing Istio in a Kubernetes cluster
    October 13, 2021
    Learn how organizations are using Squadcast
    to maintain and improve upon their Reliability metrics
    Learn how organizations are using Squadcast to maintain and improve upon their Reliability metrics
    mapgears
    "Mapgears simplified their complex On-call Alerting process with Squadcast.
    Squadcast has helped us aggregate alerts coming in from hundreds...
    bibam
    "Bibam found their best PagerDuty alternative in Squadcast.
    By moving to Squadcast from Pagerduty, we have seen a serious reduction in alert fatigue, allowing us to focus...
    tanner
    "Squadcast helped Tanner gain system insights and boost team productivity.
    Squadcast has integrated seamlessly into our DevOps and on-call team's workflows. Thanks to their reliability...
    Alexandre Lessard
    System Analyst
    Martin do Santos
    Platform and Architecture Tech Lead
    Sandro Franchi
    CTO
    Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2022 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Mid-Market Asia Pacific Incident Management on G2 Users love Squadcast on G2
    Squadcast awarded as "Best Software" in the IT Management category by G2 🎉 Read full report here.
    What our
    customers
    have to say
    mapgears
    "Mapgears simplified their complex On-call Alerting process with Squadcast.
    Squadcast has helped us aggregate alerts coming in from hundreds of services into one single platform. We no longer have hundreds of...
    Alexandre Lessard
    System Analyst
    bibam
    "Bibam found their best PagerDuty alternative in Squadcast.
    By moving to Squadcast from Pagerduty, we have seen a serious reduction in alert fatigue, allowing us to focus...
    Martin do Santos
    Platform and Architecture Tech Lead
    tanner
    "Squadcast helped Tanner gain system insights and boost team productivity.
    Squadcast has integrated seamlessly into our DevOps and on-call team's workflows. Thanks to their reliability metrics we have...
    Sandro Franchi
    CTO
    Revamp your Incident Response.
    Peak Reliability
    Easier, Faster, More Automated with SRE.