Got a DevOps horror story? Tell us about your worst on-call nightmares this Halloween and get featured! Click Here
Blog
Cloud Computing
How to Install Prometheus on Kubernetes: A Step-by-Step Tutorial

How to Install Prometheus on Kubernetes: A Step-by-Step Tutorial

April 20, 2023
How to Install Prometheus on Kubernetes: A Step-by-Step Tutorial
In This Article:
Our Products
On-Call Management
Incident Response
Continuous Learning
Workflow Automation

As one of the most popular open-source Kubernetes monitoring solutions, Prometheus leverages a multidimensional data model of time-stamped metric data and labels. The platform uses a pull-based architecture to collect metrics from various targets. It stores the metrics in a time-series database and provides the powerful PromQL query language for efficient analysis and data visualization.

Despite its powerful capabilities, there are several key considerations that determine the observability efficiency of a Kubernetes cluster through Prometheus. These considerations include

  1. Choosing the appropriate installation method.
  2. Configuring Prometheus components appropriately.
  3. Adopting the best practices for scaling, maintaining, and securing a Prometheus deployment.

This article walks you through each step to install and configure Prometheus in detail. 

Key Prometheus configuration parameters

Configuration parameter Purpose
Scrape Configurations Essential for monitoring and collecting relevant metrics by defining how often the target should be scraped and which data should be collected
Relabeling Used for the classification and filtering of targets and metrics by rewriting their label sets
Resource Management For optimal consumption of resources to ensure Prometheus runs smoothly and does not cause resource contention issues within the operating cluster
Cluster Federation Federating multiple Prometheus clusters helps provide a centralized view of metrics data by linking clusters together and sharing metric data
Prometheus Metrics Exposition Allows the export and collection of metrics in a format that conforms to Prometheus’ time-series database
Alerting Rules Essential for setting up alert conditions based on Prometheus language expressions, triggering notifications when alerts are triggered and ensuring the timely detection of any potential issues
StorageClass Provisioning Enables Prometheus to leverage the Kubernetes platform for dynamic provisioning of persistent volumes and efficient allocation of data storage

Prometheus Installation Options 

Prometheus supports multiple installation options that can be chosen depending upon the complexity of the deployment, the need for customization, and the availability of resources to manage and maintain the installation. Installation options include:

Operator-based installation

You can leverage a Kubernetes operator to simplify the installation and management of Prometheus by abstracting recurrent configuration tasks through a high-level interface. Although operators require additional technical expertise to set up and manage, they also offer significant benefits, including automatic backups, scaling, and self-healing, making them ideal for large-scale production environments. 

Manifest-based installation

You can create YAML files that define the desired state of the Prometheus deployment, including the configuration of core components such as the Prometheus server, alert manager, and exporters. This installation option requires a greater degree of manual effort to maintain and update the deployment, as any changes to the configuration or components require YAML file changes and redeployment. However, the approach is beneficial for simpler deployments or for SREs who prefer granular control over the deployment configuration.

Helm chart-based installation

You can leverage the Helm package manager for easier configuration of Prometheus and underlying Kubernetes resources. Compared to manifest-based installation, Helm charts provide more customization options, allowing for fine-grained control over the Prometheus deployment. The Helm templating engine also simplifies the upgrade and management process for complex implementations that require intricate configuration of dependency management, versioning, and rollback support.

Helm chart vs. manifest-based installation

When choosing between Helm charts and manifests for installing Prometheus on Kubernetes, there are several important factors to consider:

  Manifests Helm Charts
Configuration Through manual YAML specification Allows configuration through templates
Customization Limited Offers advanced customization options
Dependency management None Built-in dependency management and versioning
Installation process Effort-intensive for complex setups Simplifies installation for complex deployments
Maintenance Requires extensive manual effort Easier to upgrade and manage multiple installations simultaneously

In the following sections, we go through the steps to install and configure Prometheus on a Kubernetes cluster using Helm. 

Installing Prometheus on Kubernetes using Helm 

Before installing Prometheus, it is recommended to create a plan for

  1. Metrics you want to monitor.
  2. Metric data storage location.
  3. Visualization tools you want to use. 

As Prometheus consumes significant CPU and memory resources, it is also recommended to inspect if your Kubernetes cluster has optimum availability of resources to support the installation and subsequent configuration steps.

Prerequisites

  • An operating Kubernetes cluster 
  • Access to the kubectl CLI
  • Helm3 installed (instructions to install Helm for your operating system can be found here: https://helm.sh/docs/intro/install/)

Step 1: Creating the namespace

As the first step, create a namespace that defines a logical boundary to isolate resources of the Prometheus setup from other services of your Kubernetes cluster. 

For this demo, we create the namespace  darwin and use it for the Prometheus deployment.

kubectl create namespace darwin

Check for output:

namespace/darwin created

Step 2: Installing Prometheus using Helm

Run the following command to install Prometheus using Helm:

helm install prometheus prometheus-community/kube-prometheus-stack --namespace darwin

Check for output:

NAME: prometheus
LAST DEPLOYED: Mon Feb 14 10:47:11 2023
NAMESPACE: darwin
STATUS: deployed
REVISION: 1
TEST SUITE: None

To verify the installation, run the command:

kubectl get pods -n darwin

The output returns the list of different pods that are running in the darwin namespace:

NAME                                   READY   STATUS    RESTARTS   AGE
prometheus-prometheus-0                3/3     Running   0          2m
prometheus-alertmanager-0              2/2     Running   0          2m
prometheus-node-exporter-4q4lc         1/1     Running   0          2m
prometheus-pushgateway-c8f8f88bb-dk7c2  1/1     Running   0          2m
prometheus-server-exporter-8wz6h       1/1     Running   0          2m

In our case, the pods are:

  • prometheus-prometheus-0: The primary pod on which Prometheus server is deployed.
  • prometheus-alertmanager-0: The Alertmanager pod, which is used to manage alerts and send notifications.
  • prometheus-node-exporter-4q4lc: The node exporter pod that collects metrics from the underlying host system and make collected metrics available to the Prometheus UI.
  • prometheus-pushgateway-c8f8f88bb-dk7c2: The Pushgateway pod to collect metrics from batch jobs or other non-service sources.
  • prometheus-server-exporter-8wz6h: The server exporter pod to collect metrics from other services running in the same Kubernetes cluster.

Step 3: Port-Forwarding to access the Prometheus UI (Optional)

This step is optional and should be used only when you want to interact with the Prometheus server running in a Kubernetes cluster from a local machine. 

Once you have the details of the pods running different services, create a port-forward from your local device to the primary pod (on which the Prometheus server is deployed) for accessing the Prometheus UI. 

To achieve this, use the command below.

kubectl port-forward -n darwin prometheus-prometheus-0 9091:9090

The above command forwards traffic from the 9091 port of your local machine to the prometheus-prometheus-0 pod. After running this command, you can access the Prometheus UI by navigating to http://localhost:9091 in your web browser. 

Quick note: The port-forward command will continue running in the foreground until you interrupt it manually (e.g., by pressing Ctrl-C). If you stop port-forwarding, you can no longer access the Prometheus UI until you re-initiate a new port-forward.

Alternatively, suppose you want to keep the port-forward running in the background. In that case, you can start it as a background process by adding & at the end of the command:

kubectl port-forward -n darwin prometheus-prometheus-0 9091:9090 &

Although port-forwarding is a helpful option to forward and test traffic access from the UI locally, it is not advisable to expose traffic to the wider network. Instead, use a Prometheus service (covered in step 5 below) to expose pods as a network service for external clients.

Step 4: Creating the ConfigMap

ConfigMaps in Kubernetes store the configuration data of your cluster workloads outside of the container image. They allow you to manage configuration files independently from the application code. For a Prometheus instance, a ConfigMap processes the configuration file prometheus.yml to store the specifications of various targets, scraping metrics, alert rules, and other settings of the Prometheus server.

To create a ConfigMap named prometheus-config in the darwin namespace that contains the configuration file prometheus.yml, use the command:

kubectl create configmap prometheus-config --namespace darwin --from-file=prometheus.yml

Which returns the output:

configmap/prometheus-config created

Modify the configuration file with specifications similar to:

kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: darwin
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      scrape_timeout: 10s
      evaluation_interval: 15s
    scrape_configs:
      - job_name: 'darwin-service'
        scrape_interval: 5s
        static_configs:
          - targets: ['darwin-service:8080']

In this case, the scrape interval is set to 15 seconds, with one target (darwin-service:8080) defined with a scrape interval of 5 seconds. 

Quick note: Be sure to set the targets field to only the IP or hostname of the service you want to scrape.

Apply the configuration with the command:

kubectl apply -f prometheus-config.yaml

Step 5: Creating the Prometheus service

Once you create the Prometheus deployment, the next step is to expose the Prometheus server within the Kubernetes cluster and allow other pods and services (within or external to the cluster) to communicate with it. In our example, we create a Kubernetes service object to include a stable IP address and port number that other pods and services can use to access the Prometheus server. 

Create a service named darwin-prometheus-service with specifications similar to:

kind: Service
metadata:
  name: darwin-prometheus-service
  namespace: darwin
spec:
  type: NodePort
  selector:
    app: darwin-prometheus
  ports:
    - name: web
      port: 9090
      targetPort: 9090
      nodePort: 30000
  externalIPs:
    - 10.0.0.100

Quick note: The specification above creates a new service named darwin-prometheus-service in the darwin namespace with a NodePort type, and configures it to forward traffic to the darwin-prometheus deployment using the app: darwin-prometheus selector. The service is exposed on port 9090 with a targetPort of 9090, and is accessible from outside the cluster using the IP address 10.0.0.100 and port 30000.

Instead of using NodePort, you can also use other service types, like ClusterIP (default), LoadBalancer, or ExternalName based on your use case. Details of various Kubernetes service types can be found here

Apply the service file using the command:

kubectl apply -f darwin-prometheus-service.yaml

Verify that the service has been created using the following command:

kubectl get services -n darwin

The output returns details of darwin-prometheus-service with a NodePort and the port number that you assign to it.

NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
darwin-prometheus-service    NodePort    10.101.23.123   10.0.0.100        9090:31507/TCP   5m
Integrated full stack reliability management platform
Try for free
Drive better business outcomes with incident analytics, reliability insights, SLO tracking, and error budgets
Manage incidents on the go with native iOS and Android mobile apps
Seamlessly integrated alert routing, on-call, and incident response
Try for free

Configuring Prometheus 

When you install Prometheus using Helm chart, both the prometheus.yml and values.yaml files are generated. During the installation of the Helm chart, Helm reads the values.yaml file and generates the Kubernetes manifest files for deploying Prometheus, including the prometheus.yml configuration file. 

In the following steps, we use the prometheus.yml file to configure key touchpoints of the Prometheus deployment. 

Scrape Configurations

Prometheus periodically scrapes metrics from target endpoints, including the kube-state-metrics service. To modify Prometheus scrape configurations, you can modify the prometheus.yml configuration file to specify scrape targets and related parameters.

For example:

global:
  scrape_interval: 15s
scrape_configs:
  - job_name: 'darwin-service-1'
    scrape_interval: 5s
    static_configs:
      - targets: ['darwin-service-1:80']
    relabel_configs:
      - source_labels: [job]
        target_label: job
        replacement: 'darwin-new-service'

  - job_name: 'darwin-service-2'
    scrape_interval: 10s
    static_configs:
      - targets: ['darwin-service-2:80']

In this example, two separate static_configs blocks are defined, each with a different job_name. The first block scrapes a single target, darwin-service-1:80, every 5 seconds, while the second block scrapes a single target, darwin-service-2:80, every 10 seconds.

Relabeling

Relabeling allows you to transform or modify the scraped data labels before storing them in the time-series database. This is useful for modifying labels to match your naming conventions or adding additional metadata to the scraped data.

For instance, if you want to modify the job label of darwin-service-1 and save the scraped metrics to a new value, say darwin-new-service, you relabel the prometheus.yml configuration file as follows.

global:
  scrape_interval: 15s
scrape_configs:
  - job_name: 'darwin-service-1'
    scrape_interval: 5s
    static_configs:
      - targets: ['darwin-service-1:80']
    relabel_configs:
      - source_labels: [job]
        target_label: job
        replacement: 'darwin-new-service'

  - job_name: 'darwin-service-2'
    scrape_interval: 10s
    static_configs:
      - targets: ['darwin-service-2:80']

Additional details of internal relabeling and supported actions of the relabel_config block can be found on the Grafana blog.

Resource Management

In a default Prometheus configuration, you deploy containers without resource limits, consequently leading to suboptimal performance of the operating cluster. Instead, you can configure resource consumption at the job, instance, or global level by explicitly defining the limit in the prometheus.yml configuration file.   

global:
  scrape_interval: 15s
scrape_configs:
  - job_name: 'darwin-service-1'
    scrape_interval: 5s
    static_configs:
      - targets: ['darwin-service-1:80']
    relabel_configs:
      - source_labels: [job]
        target_label: job
        replacement: 'darwin-new-service'
    resources:
      requests:
        memory: 2Gi
        cpu: 1
      limits:
        memory: 4Gi
        cpu: 2

  - job_name: 'darwin-service-2'
    scrape_interval: 10s
    static_configs:
      - targets: ['darwin-service-2:80']
    resources:
      requests:
        memory: 1Gi
        cpu: 0.5
      limits:
        memory: 2Gi
        cpu: 1

In this case, containers are allocated as below:

Job name Minimum Allocation Consumption limit
darwin-service-1 2 GB of memory and 1 CPU 4 GB memory and 2 CPUs
darwin-service-2 1 GB of memory and 0.5 CPU 2 GB of memory and 1 CPU.

Cluster Federation

To increase the scalability and availability of your Prometheus deployment, you can configure a federated cluster setup, where multiple Prometheus instances share information and query each other to consolidate metrics data. This can be useful in large-scale deployments where you have multiple clusters with Prometheus servers and intend to analyze metrics data for all of them.

Configuring cluster federation involves the following steps:

  1. Set up each Prometheus server to expose its metrics data through a remote-write or remote-read endpoint.
  2. Configure a Prometheus server to scrape the metrics data from each endpoint.
  3. Configure the central Prometheus server to store the aggregated data in its database and expose it further by configuring a Prometheus service.

Exposing Prometheus Metrics

Prometheus provides a variety of ways to expose metrics, including through its web UI, an HTTP endpoint, or a push gateway. For the collection and monitoring of specific application or service metrics by the Prometheus server, you can expose those metrics in a format that Prometheus can parse. 

The steps to expose Prometheus metrics include

  1. Define metrics in the application code using Prometheus client libraries
  2. Instrument application by developing a script to export metrics.
  3. Expose metrics endpoint to return metrics in a format that can be parsed by Prometheus.
  4. Configure the prometheus.yml file to scrape metrics by specifying the endpoint URL. 

Alerting

Alerting rules are used to define conditions under which alerts are triggered. As an essential part of monitoring and reliability engineering, you can set up notifications via various channels such as email, Slack, or Squadcast to help detect and resolve issues before they become critical.

global:
  scrape_interval: 15s
  evaluation_interval: 1m

rule_files:
  - /etc/prometheus/rules/*.rules

scrape_configs:
  - job_name: 'darwin-service-1'
    scrape_interval: 5s
    static_configs:
      - targets: ['darwin-service-1:80']
    relabel_configs:
      - source_labels: [job]
        target_label: job
        replacement: 'darwin-new-service'
    resources:
      requests:
        memory: 2Gi
        cpu: 1
      limits:
        memory: 4Gi
        cpu: 2

    alerting:
      alertmanagers:
      - static_configs:
        - targets:
          - alertmanager:9093
  - job_name: 'darwin-service-2'
    scrape_interval: 10s
    static_configs:
      - targets: ['darwin-service-2:80']
    resources:
      requests:
        memory: 1Gi
        cpu: 0.5
      limits:
        memory: 2Gi
        cpu: 1

    alerting:
      alertmanagers:
      - static_configs:
        - targets:
          - alertmanager:9093

In this case, the rule_files field points to a directory containing alert rules, which define the conditions under which alerts are triggered. Triggered alerts get sent to the specified Alertmanager targets, which you can further configure to send notifications to various channels, such as email or the Squadcast platform.

StorageClass Provisioning

StorageClass provisioning is an important aspect of Prometheus configuration that ensures metrics data is stored by following a set of properties, such as:

  • The type of storage (e.g., block or file)
  • Access mode
  • Capacity
  • Volume plugin to use

It is important to note that unlike other configuration settings covered in the steps above, StorageClass provisioning is a Kubernetes configuration, and is done by configuring the values.yaml file.

Provisioning a StorageClass for Prometheus involves the following steps:

  1. Define a StorageClass resource using a YAML file and then mount it on your Kubernetes cluster to create the StorageClass resource.
  2. Configure Prometheus to use the StorageClass for its data storage. You can do this by setting the storageClassName property in the Prometheus values.yaml file to ensure that PVCs created by Prometheus use the specified StorageClass for data storage.
  3. Deploy Prometheus with the specified storageClassName value that sets the provisioned StorageClass to be used to store the metrics data.

Best practices for installing Prometheus on Kubernetes

Some important points to watch out for are given below.

Appropriately size persistent storage

As Prometheus stores time-series data over time, it can expand to consume a significant amount of storage. To ensure appropriate sizing of persistent storage for your Prometheus installation, ensure the following:

  • Determine the appropriate size of the persistent volume to use based on the expected data ingestion rate, retention period, and data cardinality.
  • Consider using compression and/or downsampling to reduce the amount of storage Prometheus uses.
  • For cloud instances, leverage dynamic provisioning for persistent volumes.

Configure resource requests and limits

Efficient resource management prevents Prometheus from consuming too many or too few resources of a Kubernetes cluster. As a recommended practice, you should enforce resource requests and limits according to the expected workload of Prometheus. This helps to ensure optimal performance of the Prometheus instance and prevents cost escalation by limiting resource usage.

Use network policies to restrict general access

To prevent Prometheus from being a potential target of cyberattacks; it is crucial to restrict public access to Prometheus. With network policies, you can define how traffic is allowed to flow between pods in a Kubernetes cluster. When applying network policies at the component level, ensure the following:

  • Identify and manifest components, including the Prometheus server, alertmanager, and exporters, which need to communicate with each other or with other services of the cluster.
  • Create policies that only allow traffic between the manifested components and services while denying all other traffic.
  • Apply network policies only to the appropriate namespaces of the cluster.

Use dedicated service accounts for Prometheus components to access cluster resources

For robust security, it is best to use dedicated service accounts for Prometheus components to access Kubernetes resources. Make sure to grant these accounts the necessary permissions to access the required resources by binding a cluster role based on the component’s scope. With this, you can limit the access of Kubernetes resources to only what is actually needed for Prometheus.

Use ConfigMaps to store configuration information

ConfigMaps stores Prometheus configuration information as key-value pairs and string data. Using ConfigMaps to centrally store configuration information makes it easier to update Prometheus configurations as needed without having to modify individual Kubernetes deployment files. Some recommended steps to achieve this include:

  • Create a separate ConfigMap for each configuration file to support easier management and modification.
  • Use the same naming convention for ConfigMaps and Prometheus resources.
  • Use a ConfigMap volume to mount the configuration files (without needing to rebuild the Prometheus container).

Conclusion 

Application performance in a Kubernetes cluster typically depends on the performance of containers, pods, and services. Undeniably, monitoring core components of a Kubernetes cluster is an essential aspect of reliability engineering that helps gain proactive insights into cluster health, workloads, and underlying infrastructure. 

For a distributed ecosystem of containerized applications and related dependencies, monitoring Kubernetes using Prometheus can be a complex undertaking. It is crucial to adopt configuration best practices that ensure core Kubernetes components expose metrics securely. Prometheus can then scrape them in real time for rapid analysis and visualization.

Integrated full stack reliability management platform
Platform
Blameless
Lightstep
Squadcast
Incident Retrospectives
Seamless Third-Party Integrations
Built-In Status Page
On Call Rotations
Incident
Notes
Advanced Error Budget Tracking
Try For free
Platform
Incident Retrospectives
Seamless Third-Party Integrations
Incident
Notes
Built-In Status Page
On Call Rotations
Advanced Error Budget Tracking
Blameless
FireHydrant
Squadcast
Try For free
Written By:
Squadcast Community
Vishal Padghan
Squadcast Community
Vishal Padghan
April 20, 2023
Cloud Computing
Share this blog:
In This Article:
Get reliability insights delivered straight to your inbox.
Get ready for the good stuff! No spam, no data sale and no promotion. Just the awesome content you signed up for.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Get reliability insights delivered straight to your inbox.
Get ready for the good stuff! No spam, no data sale and no promotion. Just the awesome content you signed up for.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Get the latest scoop on Reliability insights. Delivered straight to your inbox.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2 Users love Squadcast on G2
Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2 Users love Squadcast on G2
Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2
Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2
Users love Squadcast on G2
Copyright © Squadcast Inc. 2017-2024

How to Install Prometheus on Kubernetes: A Step-by-Step Tutorial

Apr 20, 2023
Last Updated:
September 17, 2024
Share this post:
How to Install Prometheus on Kubernetes: A Step-by-Step Tutorial

Learn how to install Prometheus on Kubernetes for observability with considerations, installation options, and touchpoints.

Table of Contents:

    As one of the most popular open-source Kubernetes monitoring solutions, Prometheus leverages a multidimensional data model of time-stamped metric data and labels. The platform uses a pull-based architecture to collect metrics from various targets. It stores the metrics in a time-series database and provides the powerful PromQL query language for efficient analysis and data visualization.

    Despite its powerful capabilities, there are several key considerations that determine the observability efficiency of a Kubernetes cluster through Prometheus. These considerations include

    1. Choosing the appropriate installation method.
    2. Configuring Prometheus components appropriately.
    3. Adopting the best practices for scaling, maintaining, and securing a Prometheus deployment.

    This article walks you through each step to install and configure Prometheus in detail. 

    Key Prometheus configuration parameters

    Configuration parameter Purpose
    Scrape Configurations Essential for monitoring and collecting relevant metrics by defining how often the target should be scraped and which data should be collected
    Relabeling Used for the classification and filtering of targets and metrics by rewriting their label sets
    Resource Management For optimal consumption of resources to ensure Prometheus runs smoothly and does not cause resource contention issues within the operating cluster
    Cluster Federation Federating multiple Prometheus clusters helps provide a centralized view of metrics data by linking clusters together and sharing metric data
    Prometheus Metrics Exposition Allows the export and collection of metrics in a format that conforms to Prometheus’ time-series database
    Alerting Rules Essential for setting up alert conditions based on Prometheus language expressions, triggering notifications when alerts are triggered and ensuring the timely detection of any potential issues
    StorageClass Provisioning Enables Prometheus to leverage the Kubernetes platform for dynamic provisioning of persistent volumes and efficient allocation of data storage

    Prometheus Installation Options 

    Prometheus supports multiple installation options that can be chosen depending upon the complexity of the deployment, the need for customization, and the availability of resources to manage and maintain the installation. Installation options include:

    Operator-based installation

    You can leverage a Kubernetes operator to simplify the installation and management of Prometheus by abstracting recurrent configuration tasks through a high-level interface. Although operators require additional technical expertise to set up and manage, they also offer significant benefits, including automatic backups, scaling, and self-healing, making them ideal for large-scale production environments. 

    Manifest-based installation

    You can create YAML files that define the desired state of the Prometheus deployment, including the configuration of core components such as the Prometheus server, alert manager, and exporters. This installation option requires a greater degree of manual effort to maintain and update the deployment, as any changes to the configuration or components require YAML file changes and redeployment. However, the approach is beneficial for simpler deployments or for SREs who prefer granular control over the deployment configuration.

    Helm chart-based installation

    You can leverage the Helm package manager for easier configuration of Prometheus and underlying Kubernetes resources. Compared to manifest-based installation, Helm charts provide more customization options, allowing for fine-grained control over the Prometheus deployment. The Helm templating engine also simplifies the upgrade and management process for complex implementations that require intricate configuration of dependency management, versioning, and rollback support.

    Helm chart vs. manifest-based installation

    When choosing between Helm charts and manifests for installing Prometheus on Kubernetes, there are several important factors to consider:

      Manifests Helm Charts
    Configuration Through manual YAML specification Allows configuration through templates
    Customization Limited Offers advanced customization options
    Dependency management None Built-in dependency management and versioning
    Installation process Effort-intensive for complex setups Simplifies installation for complex deployments
    Maintenance Requires extensive manual effort Easier to upgrade and manage multiple installations simultaneously

    In the following sections, we go through the steps to install and configure Prometheus on a Kubernetes cluster using Helm. 

    Installing Prometheus on Kubernetes using Helm 

    Before installing Prometheus, it is recommended to create a plan for

    1. Metrics you want to monitor.
    2. Metric data storage location.
    3. Visualization tools you want to use. 

    As Prometheus consumes significant CPU and memory resources, it is also recommended to inspect if your Kubernetes cluster has optimum availability of resources to support the installation and subsequent configuration steps.

    Prerequisites

    • An operating Kubernetes cluster 
    • Access to the kubectl CLI
    • Helm3 installed (instructions to install Helm for your operating system can be found here: https://helm.sh/docs/intro/install/)

    Step 1: Creating the namespace

    As the first step, create a namespace that defines a logical boundary to isolate resources of the Prometheus setup from other services of your Kubernetes cluster. 

    For this demo, we create the namespace  darwin and use it for the Prometheus deployment.

    kubectl create namespace darwin
    

    Check for output:

    namespace/darwin created
    

    Step 2: Installing Prometheus using Helm

    Run the following command to install Prometheus using Helm:

    helm install prometheus prometheus-community/kube-prometheus-stack --namespace darwin
    

    Check for output:

    NAME: prometheus
    LAST DEPLOYED: Mon Feb 14 10:47:11 2023
    NAMESPACE: darwin
    STATUS: deployed
    REVISION: 1
    TEST SUITE: None
    

    To verify the installation, run the command:

    kubectl get pods -n darwin
    

    The output returns the list of different pods that are running in the darwin namespace:

    NAME                                   READY   STATUS    RESTARTS   AGE
    prometheus-prometheus-0                3/3     Running   0          2m
    prometheus-alertmanager-0              2/2     Running   0          2m
    prometheus-node-exporter-4q4lc         1/1     Running   0          2m
    prometheus-pushgateway-c8f8f88bb-dk7c2  1/1     Running   0          2m
    prometheus-server-exporter-8wz6h       1/1     Running   0          2m
    

    In our case, the pods are:

    • prometheus-prometheus-0: The primary pod on which Prometheus server is deployed.
    • prometheus-alertmanager-0: The Alertmanager pod, which is used to manage alerts and send notifications.
    • prometheus-node-exporter-4q4lc: The node exporter pod that collects metrics from the underlying host system and make collected metrics available to the Prometheus UI.
    • prometheus-pushgateway-c8f8f88bb-dk7c2: The Pushgateway pod to collect metrics from batch jobs or other non-service sources.
    • prometheus-server-exporter-8wz6h: The server exporter pod to collect metrics from other services running in the same Kubernetes cluster.

    Step 3: Port-Forwarding to access the Prometheus UI (Optional)

    This step is optional and should be used only when you want to interact with the Prometheus server running in a Kubernetes cluster from a local machine. 

    Once you have the details of the pods running different services, create a port-forward from your local device to the primary pod (on which the Prometheus server is deployed) for accessing the Prometheus UI. 

    To achieve this, use the command below.

    kubectl port-forward -n darwin prometheus-prometheus-0 9091:9090
    

    The above command forwards traffic from the 9091 port of your local machine to the prometheus-prometheus-0 pod. After running this command, you can access the Prometheus UI by navigating to http://localhost:9091 in your web browser. 

    Quick note: The port-forward command will continue running in the foreground until you interrupt it manually (e.g., by pressing Ctrl-C). If you stop port-forwarding, you can no longer access the Prometheus UI until you re-initiate a new port-forward.

    Alternatively, suppose you want to keep the port-forward running in the background. In that case, you can start it as a background process by adding & at the end of the command:

    kubectl port-forward -n darwin prometheus-prometheus-0 9091:9090 &
    

    Although port-forwarding is a helpful option to forward and test traffic access from the UI locally, it is not advisable to expose traffic to the wider network. Instead, use a Prometheus service (covered in step 5 below) to expose pods as a network service for external clients.

    Step 4: Creating the ConfigMap

    ConfigMaps in Kubernetes store the configuration data of your cluster workloads outside of the container image. They allow you to manage configuration files independently from the application code. For a Prometheus instance, a ConfigMap processes the configuration file prometheus.yml to store the specifications of various targets, scraping metrics, alert rules, and other settings of the Prometheus server.

    To create a ConfigMap named prometheus-config in the darwin namespace that contains the configuration file prometheus.yml, use the command:

    kubectl create configmap prometheus-config --namespace darwin --from-file=prometheus.yml
    

    Which returns the output:

    configmap/prometheus-config created
    

    Modify the configuration file with specifications similar to:

    kind: ConfigMap
    metadata:
      name: prometheus-config
      namespace: darwin
    data:
      prometheus.yml: |
        global:
          scrape_interval: 15s
          scrape_timeout: 10s
          evaluation_interval: 15s
        scrape_configs:
          - job_name: 'darwin-service'
            scrape_interval: 5s
            static_configs:
              - targets: ['darwin-service:8080']
    

    In this case, the scrape interval is set to 15 seconds, with one target (darwin-service:8080) defined with a scrape interval of 5 seconds. 

    Quick note: Be sure to set the targets field to only the IP or hostname of the service you want to scrape.

    Apply the configuration with the command:

    kubectl apply -f prometheus-config.yaml
    

    Step 5: Creating the Prometheus service

    Once you create the Prometheus deployment, the next step is to expose the Prometheus server within the Kubernetes cluster and allow other pods and services (within or external to the cluster) to communicate with it. In our example, we create a Kubernetes service object to include a stable IP address and port number that other pods and services can use to access the Prometheus server. 

    Create a service named darwin-prometheus-service with specifications similar to:

    kind: Service
    metadata:
      name: darwin-prometheus-service
      namespace: darwin
    spec:
      type: NodePort
      selector:
        app: darwin-prometheus
      ports:
        - name: web
          port: 9090
          targetPort: 9090
          nodePort: 30000
      externalIPs:
        - 10.0.0.100
    

    Quick note: The specification above creates a new service named darwin-prometheus-service in the darwin namespace with a NodePort type, and configures it to forward traffic to the darwin-prometheus deployment using the app: darwin-prometheus selector. The service is exposed on port 9090 with a targetPort of 9090, and is accessible from outside the cluster using the IP address 10.0.0.100 and port 30000.

    Instead of using NodePort, you can also use other service types, like ClusterIP (default), LoadBalancer, or ExternalName based on your use case. Details of various Kubernetes service types can be found here

    Apply the service file using the command:

    kubectl apply -f darwin-prometheus-service.yaml
    

    Verify that the service has been created using the following command:

    kubectl get services -n darwin
    

    The output returns details of darwin-prometheus-service with a NodePort and the port number that you assign to it.

    NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
    darwin-prometheus-service    NodePort    10.101.23.123   10.0.0.100        9090:31507/TCP   5m
    
    Integrated full stack reliability management platform
    Try for free
    Drive better business outcomes with incident analytics, reliability insights, SLO tracking, and error budgets
    Manage incidents on the go with native iOS and Android mobile apps
    Seamlessly integrated alert routing, on-call, and incident response
    Try for free

    Configuring Prometheus 

    When you install Prometheus using Helm chart, both the prometheus.yml and values.yaml files are generated. During the installation of the Helm chart, Helm reads the values.yaml file and generates the Kubernetes manifest files for deploying Prometheus, including the prometheus.yml configuration file. 

    In the following steps, we use the prometheus.yml file to configure key touchpoints of the Prometheus deployment. 

    Scrape Configurations

    Prometheus periodically scrapes metrics from target endpoints, including the kube-state-metrics service. To modify Prometheus scrape configurations, you can modify the prometheus.yml configuration file to specify scrape targets and related parameters.

    For example:

    global:
      scrape_interval: 15s
    scrape_configs:
      - job_name: 'darwin-service-1'
        scrape_interval: 5s
        static_configs:
          - targets: ['darwin-service-1:80']
        relabel_configs:
          - source_labels: [job]
            target_label: job
            replacement: 'darwin-new-service'
    
      - job_name: 'darwin-service-2'
        scrape_interval: 10s
        static_configs:
          - targets: ['darwin-service-2:80']
    

    In this example, two separate static_configs blocks are defined, each with a different job_name. The first block scrapes a single target, darwin-service-1:80, every 5 seconds, while the second block scrapes a single target, darwin-service-2:80, every 10 seconds.

    Relabeling

    Relabeling allows you to transform or modify the scraped data labels before storing them in the time-series database. This is useful for modifying labels to match your naming conventions or adding additional metadata to the scraped data.

    For instance, if you want to modify the job label of darwin-service-1 and save the scraped metrics to a new value, say darwin-new-service, you relabel the prometheus.yml configuration file as follows.

    global:
      scrape_interval: 15s
    scrape_configs:
      - job_name: 'darwin-service-1'
        scrape_interval: 5s
        static_configs:
          - targets: ['darwin-service-1:80']
        relabel_configs:
          - source_labels: [job]
            target_label: job
            replacement: 'darwin-new-service'
    
      - job_name: 'darwin-service-2'
        scrape_interval: 10s
        static_configs:
          - targets: ['darwin-service-2:80']
    

    Additional details of internal relabeling and supported actions of the relabel_config block can be found on the Grafana blog.

    Resource Management

    In a default Prometheus configuration, you deploy containers without resource limits, consequently leading to suboptimal performance of the operating cluster. Instead, you can configure resource consumption at the job, instance, or global level by explicitly defining the limit in the prometheus.yml configuration file.   

    global:
      scrape_interval: 15s
    scrape_configs:
      - job_name: 'darwin-service-1'
        scrape_interval: 5s
        static_configs:
          - targets: ['darwin-service-1:80']
        relabel_configs:
          - source_labels: [job]
            target_label: job
            replacement: 'darwin-new-service'
        resources:
          requests:
            memory: 2Gi
            cpu: 1
          limits:
            memory: 4Gi
            cpu: 2
    
      - job_name: 'darwin-service-2'
        scrape_interval: 10s
        static_configs:
          - targets: ['darwin-service-2:80']
        resources:
          requests:
            memory: 1Gi
            cpu: 0.5
          limits:
            memory: 2Gi
            cpu: 1
    

    In this case, containers are allocated as below:

    Job name Minimum Allocation Consumption limit
    darwin-service-1 2 GB of memory and 1 CPU 4 GB memory and 2 CPUs
    darwin-service-2 1 GB of memory and 0.5 CPU 2 GB of memory and 1 CPU.

    Cluster Federation

    To increase the scalability and availability of your Prometheus deployment, you can configure a federated cluster setup, where multiple Prometheus instances share information and query each other to consolidate metrics data. This can be useful in large-scale deployments where you have multiple clusters with Prometheus servers and intend to analyze metrics data for all of them.

    Configuring cluster federation involves the following steps:

    1. Set up each Prometheus server to expose its metrics data through a remote-write or remote-read endpoint.
    2. Configure a Prometheus server to scrape the metrics data from each endpoint.
    3. Configure the central Prometheus server to store the aggregated data in its database and expose it further by configuring a Prometheus service.

    Exposing Prometheus Metrics

    Prometheus provides a variety of ways to expose metrics, including through its web UI, an HTTP endpoint, or a push gateway. For the collection and monitoring of specific application or service metrics by the Prometheus server, you can expose those metrics in a format that Prometheus can parse. 

    The steps to expose Prometheus metrics include

    1. Define metrics in the application code using Prometheus client libraries
    2. Instrument application by developing a script to export metrics.
    3. Expose metrics endpoint to return metrics in a format that can be parsed by Prometheus.
    4. Configure the prometheus.yml file to scrape metrics by specifying the endpoint URL. 

    Alerting

    Alerting rules are used to define conditions under which alerts are triggered. As an essential part of monitoring and reliability engineering, you can set up notifications via various channels such as email, Slack, or Squadcast to help detect and resolve issues before they become critical.

    global:
      scrape_interval: 15s
      evaluation_interval: 1m
    
    rule_files:
      - /etc/prometheus/rules/*.rules
    
    scrape_configs:
      - job_name: 'darwin-service-1'
        scrape_interval: 5s
        static_configs:
          - targets: ['darwin-service-1:80']
        relabel_configs:
          - source_labels: [job]
            target_label: job
            replacement: 'darwin-new-service'
        resources:
          requests:
            memory: 2Gi
            cpu: 1
          limits:
            memory: 4Gi
            cpu: 2
    
        alerting:
          alertmanagers:
          - static_configs:
            - targets:
              - alertmanager:9093
      - job_name: 'darwin-service-2'
        scrape_interval: 10s
        static_configs:
          - targets: ['darwin-service-2:80']
        resources:
          requests:
            memory: 1Gi
            cpu: 0.5
          limits:
            memory: 2Gi
            cpu: 1
    
        alerting:
          alertmanagers:
          - static_configs:
            - targets:
              - alertmanager:9093
    

    In this case, the rule_files field points to a directory containing alert rules, which define the conditions under which alerts are triggered. Triggered alerts get sent to the specified Alertmanager targets, which you can further configure to send notifications to various channels, such as email or the Squadcast platform.

    StorageClass Provisioning

    StorageClass provisioning is an important aspect of Prometheus configuration that ensures metrics data is stored by following a set of properties, such as:

    • The type of storage (e.g., block or file)
    • Access mode
    • Capacity
    • Volume plugin to use

    It is important to note that unlike other configuration settings covered in the steps above, StorageClass provisioning is a Kubernetes configuration, and is done by configuring the values.yaml file.

    Provisioning a StorageClass for Prometheus involves the following steps:

    1. Define a StorageClass resource using a YAML file and then mount it on your Kubernetes cluster to create the StorageClass resource.
    2. Configure Prometheus to use the StorageClass for its data storage. You can do this by setting the storageClassName property in the Prometheus values.yaml file to ensure that PVCs created by Prometheus use the specified StorageClass for data storage.
    3. Deploy Prometheus with the specified storageClassName value that sets the provisioned StorageClass to be used to store the metrics data.

    Best practices for installing Prometheus on Kubernetes

    Some important points to watch out for are given below.

    Appropriately size persistent storage

    As Prometheus stores time-series data over time, it can expand to consume a significant amount of storage. To ensure appropriate sizing of persistent storage for your Prometheus installation, ensure the following:

    • Determine the appropriate size of the persistent volume to use based on the expected data ingestion rate, retention period, and data cardinality.
    • Consider using compression and/or downsampling to reduce the amount of storage Prometheus uses.
    • For cloud instances, leverage dynamic provisioning for persistent volumes.

    Configure resource requests and limits

    Efficient resource management prevents Prometheus from consuming too many or too few resources of a Kubernetes cluster. As a recommended practice, you should enforce resource requests and limits according to the expected workload of Prometheus. This helps to ensure optimal performance of the Prometheus instance and prevents cost escalation by limiting resource usage.

    Use network policies to restrict general access

    To prevent Prometheus from being a potential target of cyberattacks; it is crucial to restrict public access to Prometheus. With network policies, you can define how traffic is allowed to flow between pods in a Kubernetes cluster. When applying network policies at the component level, ensure the following:

    • Identify and manifest components, including the Prometheus server, alertmanager, and exporters, which need to communicate with each other or with other services of the cluster.
    • Create policies that only allow traffic between the manifested components and services while denying all other traffic.
    • Apply network policies only to the appropriate namespaces of the cluster.

    Use dedicated service accounts for Prometheus components to access cluster resources

    For robust security, it is best to use dedicated service accounts for Prometheus components to access Kubernetes resources. Make sure to grant these accounts the necessary permissions to access the required resources by binding a cluster role based on the component’s scope. With this, you can limit the access of Kubernetes resources to only what is actually needed for Prometheus.

    Use ConfigMaps to store configuration information

    ConfigMaps stores Prometheus configuration information as key-value pairs and string data. Using ConfigMaps to centrally store configuration information makes it easier to update Prometheus configurations as needed without having to modify individual Kubernetes deployment files. Some recommended steps to achieve this include:

    • Create a separate ConfigMap for each configuration file to support easier management and modification.
    • Use the same naming convention for ConfigMaps and Prometheus resources.
    • Use a ConfigMap volume to mount the configuration files (without needing to rebuild the Prometheus container).

    Conclusion 

    Application performance in a Kubernetes cluster typically depends on the performance of containers, pods, and services. Undeniably, monitoring core components of a Kubernetes cluster is an essential aspect of reliability engineering that helps gain proactive insights into cluster health, workloads, and underlying infrastructure. 

    For a distributed ecosystem of containerized applications and related dependencies, monitoring Kubernetes using Prometheus can be a complex undertaking. It is crucial to adopt configuration best practices that ensure core Kubernetes components expose metrics securely. Prometheus can then scrape them in real time for rapid analysis and visualization.

    Integrated full stack reliability management platform
    Platform
    Blameless
    Lightstep
    Squadcast
    Incident Retrospectives
    Seamless Third-Party Integrations
    Built-In Status Page
    On Call Rotations
    Incident
    Notes
    Advanced Error Budget Tracking
    Try For free
    Platform
    Incident Retrospectives
    Seamless Third-Party Integrations
    Incident
    Notes
    Built-In Status Page
    On Call Rotations
    Advanced Error Budget Tracking
    Blameless
    FireHydrant
    Squadcast
    Try For free
    What you should do now
    • Schedule a demo with Squadcast to learn about the platform, answer your questions, and evaluate if Squadcast is the right fit for you.
    • Curious about how Squadcast can assist you in implementing SRE best practices? Discover the platform's capabilities through our Interactive Demo.
    • Enjoyed the article? Explore further insights on the best SRE practices.
    • Schedule a demo with Squadcast to learn about the platform, answer your questions, and evaluate if Squadcast is the right fit for you.
    • Curious about how Squadcast can assist you in implementing SRE best practices? Discover the platform's capabilities through our Interactive Demo.
    • Enjoyed the article? Explore further insights on the best SRE practices.
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management.
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    What you should do now?
    Here are 3 ways you can continue your journey to learn more about Unified Incident Management
    Discover the platform's capabilities through our Interactive Demo.
    See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management.
    Share the article
    Share this blog post on Facebook, Twitter, Reddit or LinkedIn.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Compare our plans and find the perfect fit for your business.
    See Redis' Journey to Efficient Incident Management through alert noise reduction With Squadcast.
    Discover the platform's capabilities through our Interactive Demo.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Compare Squadcast & PagerDuty / Opsgenie
    Compare and see if Squadcast is the right fit for your needs.
    Compare our plans and find the perfect fit for your business.
    Learn how Scoro created a solid foundation for better on-call practices with Squadcast.
    Discover the platform's capabilities through our Interactive Demo.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Learn how Scoro created a solid foundation for better on-call practices with Squadcast.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Discover the platform's capabilities through our Interactive Demo.
    Enjoyed the article? Explore further insights on the best SRE practices.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Enjoyed the article? Explore further insights on the best SRE practices.
    Share this post:
    Subscribe to our LinkedIn Newsletter to receive more educational content
    Subscribe now
    ant-design-linkedIN

    Subscribe to our latest updates

    Enter your Email Id
    Thank you! Your submission has been received!
    Oops! Something went wrong while submitting the form.
    FAQs
    More from
    Squadcast Community
    Continuous Improvement with Squadcast: Optimizing Incident Response for Long-Term Growth
    Continuous Improvement with Squadcast: Optimizing Incident Response for Long-Term Growth
    October 29, 2024
    Incident Management in the Cloud Era: Challenges and Opportunities
    Incident Management in the Cloud Era: Challenges and Opportunities
    October 25, 2024
    The Fundamentals of Enterprise Incident Management
    The Fundamentals of Enterprise Incident Management
    October 23, 2024
    Learn how organizations are using Squadcast
    to maintain and improve upon their Reliability metrics
    Learn how organizations are using Squadcast to maintain and improve upon their Reliability metrics
    mapgears
    "Mapgears simplified their complex On-call Alerting process with Squadcast.
    Squadcast has helped us aggregate alerts coming in from hundreds...
    bibam
    "Bibam found their best PagerDuty alternative in Squadcast.
    By moving to Squadcast from Pagerduty, we have seen a serious reduction in alert fatigue, allowing us to focus...
    tanner
    "Squadcast helped Tanner gain system insights and boost team productivity.
    Squadcast has integrated seamlessly into our DevOps and on-call team's workflows. Thanks to their reliability...
    Alexandre Lessard
    System Analyst
    Martin do Santos
    Platform and Architecture Tech Lead
    Sandro Franchi
    CTO
    Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2022 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Mid-Market Asia Pacific Incident Management on G2 Users love Squadcast on G2
    Squadcast awarded as "Best Software" in the IT Management category by G2 🎉 Read full report here.
    What our
    customers
    have to say
    mapgears
    "Mapgears simplified their complex On-call Alerting process with Squadcast.
    Squadcast has helped us aggregate alerts coming in from hundreds of services into one single platform. We no longer have hundreds of...
    Alexandre Lessard
    System Analyst
    bibam
    "Bibam found their best PagerDuty alternative in Squadcast.
    By moving to Squadcast from Pagerduty, we have seen a serious reduction in alert fatigue, allowing us to focus...
    Martin do Santos
    Platform and Architecture Tech Lead
    tanner
    "Squadcast helped Tanner gain system insights and boost team productivity.
    Squadcast has integrated seamlessly into our DevOps and on-call team's workflows. Thanks to their reliability metrics we have...
    Sandro Franchi
    CTO
    Revamp your Incident Response.
    Peak Reliability
    Easier, Faster, More Automated with SRE.