📢 Webinar Alert! Reliability Automation - AI, ML, & Workflows in Incident Management. Register Here
Blog
SRE
Top Open Source projects for SREs and DevOps

Top Open Source projects for SREs and DevOps

November 13, 2020
Top Open Source projects for SREs and DevOps
In This Article:
Our Products
On-Call Management
Incident Response
Continuous Learning
Workflow Automation

The path to becoming a successful SRE lies in continuous learning. There are a plethora of great open source projects out there for SREs/DevOps,each with new and exciting implementations and often tackling unique challenges. These open-source projects do the heavy lifting so you can do your job more easily.

In this blog we look at some of the top and sought out open source projects in the areas of monitoring, deployment and maintenance. Among the projects we have covered are those that simulate network traffic and allow you to model unpredictable(chaotic) events to develop dependable systems.

And, while you are at it, we thought we could help a little more by providing some essential DevOps and SRE reading suggestions as well for all you tech folks out there.

We hope this keeps you good company.

Cloudprober

Cloudprober is an active tracking and monitoring application to spot malfunctions before your customers do. It uses an "active" monitoring model to check that your components are operating as intended. It runs probes proactively, for instance, to ensure if your frontends can access your backends. Similarly, a probe can be run to verify that your on-premise systems can actually reach your in-Cloud VMs. This method of tracking makes it easy, independent of the implementation, to track the configurations of your applications and lets you easily pin down what is broken in your system.

Features:

  • Native Integration with open source monitoring stack of Prometheus and Grafana. Cloudprober can export probe results as well.
  • For Cloud targets, automatic target discovery. Out-of-the-box support is provided to GCE and Kubernetes; other cloud services can be easily configured.
  • Significant commitment on ease of deployment. Cloudprober is completely written and compiled into a static binary in Go. It can be deployed quickly by way of docker containers. In addition to most of the updates, there is normally no need to re-deploy or reconfigure cloudprober due to the automatic aim discovery.
  • The Cloudprober docker image size is low, containing only a statically compiled binary, and it requires a very small amount of CPU and RAM to run even a large number of probes.
Image Source

Cloud Operations Sandbox (Alpha)

Cloud Operations Sandbox is an open-source platform that lets specialists learn about Google's Service Reliability Engineering practices and adapt them to their cloud systems using Ops Management (formerly Stackdriver). It is based on the Hipster Shop, a cloud-based platform for native microservices. Note: This requires a Google cloud services account.

Features:

  • Demo Service - an application designed on a modern, cloud-native, microservice architecture.
  • One-click deployment - a script handles the work of deploying the service to Google Cloud Platform.
  • Load Generator - a part that produces simulated traffic on a demo service.
Image Source

Version Checker for Kubernetes

Kubernetes utility that allows you to observe existing versions of images that are running in the cluster. This tool also allows you to see the current image versions in  table format on a Grafana dashboard.

Features:

  • Multiple self hosted registries can be set-up at once
  • This utility allows you to see the version information as Prometheus metrics.
  • Support for registries like ACR, DockerHub, ECR.
Image Source

Istio

Istio is an open framework for incorporating microservices, monitoring traffic movement through microservices, implementing policies and aggregating telemetry data in a standardised way. The control plane of Istio offers an abstraction layer over the underlying platform for cluster management, such as Kubernetes.

Features:

  • Automatic load balancing for HTTP, gRPC, WebSocket, and TCP traffic.
  • Fine-grained control of traffic behavior with rich routing rules, retries, failovers, and fault injection.
  • A pluggable policy layer and configuration API supporting access controls, rate limits and quotas.
  • Automatic metrics, logs, and traces for all traffic within a cluster, including cluster ingress and egress.
  • Secure service-to-service communication in a cluster with strong identity-based authentication and authorization.
Image Source

Checkov

Checkov is an Infrastructure-as-Code static code review tool. It scans Terraform, Cloud Details, Cubanet, Serverless or ARM Models cloud infrastructure, and detects security and compliance misconfigurations.

Features:

  • More than 400 built-in rules cover AWS, Azure and Google Cloud 's best protection and security practises.
  • Assesses Terraform Provider settings to monitor Terraform-managed IaaS, PaaS or SaaS development , maintenance, and updates.
  • Detects AWS credential in EC2 Userdata, Lambda context variables and Terraform providers.
Image Source

Litmus

Cloud-Native Chaos Engineering

Litmus is a cloud-based chaos modelling toolkit. Litmus provides tools to orchestrate chaos on Kubernetes to help SREs discover vulnerabilities in their deployments. SREs use Litmus to conduct chaos tests first in the staging area and finally in development to discover glitches and vulnerabilities. Fixing the deficiencies leads to improved system resilience.

Features:

  • Developers can run chaos tests during application development as an extension to unit testing or integration testing.
  • For CI pipeline builders: To run chaos as a pipeline stage to find bugs when the application is subjected to fail paths in a pipeline.
Image Source

Locust

Locust is a simple to use, scriptable and flexible performance testing application. You define the behaviour of your users in standard Python code, instead of using a clunky UI or domain specific language. This enables Locust to be extensible and developer friendly.

Features:

  • Locust is distributed & scalable - easily supporting hundreds or thousands of users.
  • Web-based UI that shows progress in real-time.
  • Can test any system with a little tinkering.
Image Source

Prometheus

Prometheus, a Cloud Native Computing Foundation project, is a systems and service monitoring system. It extracts metrics from configured destinations at specific times, tests rules and shows outcomes. If specified criteria are violated, it will trigger notifications.

Features:

  • A multi-dimensional data model (time series defined by metric name and set of key/value dimensions)
  • Targets are discovered via service discovery or static configuration
  • No dependency on distributed storage; single server nodes are autonomous
  • PromQL, a powerful and flexible query language to leverage this dimensionality
Image Source

Kube-Monkey

Kube-monkey is a Kubernetes cluster implementation of Netflix's Chaos Monkey. The random deletion of kubernetes pods facilitates the creation of failure-resistant resources and validates them at the same time.

Features:

  • Kube-monkey is operating with an opt-in model and only targeting the termination of Kubernetes (k8s) users which have specifically accepted that kube-monkey will terminate their pods.
  • Highly customisable scheduling features based on your requirements
Image Source

PowerfulSeal

PowerfulSeal injects failure into Kubernetes clusters, helping you to recognise issues as quickly as possible. It enables scenarios that portray complete chaos experiments to be created.

Features:

  • Compatible with Kubernetes, OpenStack, AWS, Azure, GCP and local machines
  • Connects with Prometheus and Datadog for metrics collection
  • Multiple modes allowed for custom use cases
Image Source

The great benefit of open source technologies is their extensible nature. You can add features to the tool if required to better fit your custom architecture. These open source projects have extensive support documentation and a community of users. As microservice architecture is slated to dominate the cloud computing space, reliable tools to monitor and troubleshoot these instances are sure to become part of every developer's arsenal.

You can also find more such awesome DevOps and SRE open source projects here. Meanwhile, we’d love to hear from you on other projects/tools that should make this list! Leave us a comment or reach out over a DM via Twitter and let us know your thoughts.

Written By:
November 13, 2020
Nir Sharma
Nir Sharma
November 13, 2020
SRE
DevOps
Share this blog:
In This Article:
Get reliability insights delivered straight to your inbox.
Get ready for the good stuff! No spam, no data sale and no promotion. Just the awesome content you signed up for.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Get reliability insights delivered straight to your inbox.
Get ready for the good stuff! No spam, no data sale and no promotion. Just the awesome content you signed up for.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Get the latest scoop on Reliability insights. Delivered straight to your inbox.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2 Users love Squadcast on G2
Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2 Users love Squadcast on G2
Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2
Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2
Users love Squadcast on G2
Copyright © Squadcast Inc. 2017-2024

Top Open Source projects for SREs and DevOps

Nov 13, 2020
Last Updated:
November 20, 2024
Share this post:
Top Open Source projects for SREs and DevOps

Building scalable and highly reliable software systems is the ultimate goal of every SRE out there. Follow the path of continuous learning with the help of our latest blog which outlines some of the most sought out open source projects in the monitoring, deployment & maintenance space.

Table of Contents:

    The path to becoming a successful SRE lies in continuous learning. There are a plethora of great open source projects out there for SREs/DevOps,each with new and exciting implementations and often tackling unique challenges. These open-source projects do the heavy lifting so you can do your job more easily.

    In this blog we look at some of the top and sought out open source projects in the areas of monitoring, deployment and maintenance. Among the projects we have covered are those that simulate network traffic and allow you to model unpredictable(chaotic) events to develop dependable systems.

    And, while you are at it, we thought we could help a little more by providing some essential DevOps and SRE reading suggestions as well for all you tech folks out there.

    We hope this keeps you good company.

    Cloudprober

    Cloudprober is an active tracking and monitoring application to spot malfunctions before your customers do. It uses an "active" monitoring model to check that your components are operating as intended. It runs probes proactively, for instance, to ensure if your frontends can access your backends. Similarly, a probe can be run to verify that your on-premise systems can actually reach your in-Cloud VMs. This method of tracking makes it easy, independent of the implementation, to track the configurations of your applications and lets you easily pin down what is broken in your system.

    Features:

    • Native Integration with open source monitoring stack of Prometheus and Grafana. Cloudprober can export probe results as well.
    • For Cloud targets, automatic target discovery. Out-of-the-box support is provided to GCE and Kubernetes; other cloud services can be easily configured.
    • Significant commitment on ease of deployment. Cloudprober is completely written and compiled into a static binary in Go. It can be deployed quickly by way of docker containers. In addition to most of the updates, there is normally no need to re-deploy or reconfigure cloudprober due to the automatic aim discovery.
    • The Cloudprober docker image size is low, containing only a statically compiled binary, and it requires a very small amount of CPU and RAM to run even a large number of probes.
    Image Source

    Cloud Operations Sandbox (Alpha)

    Cloud Operations Sandbox is an open-source platform that lets specialists learn about Google's Service Reliability Engineering practices and adapt them to their cloud systems using Ops Management (formerly Stackdriver). It is based on the Hipster Shop, a cloud-based platform for native microservices. Note: This requires a Google cloud services account.

    Features:

    • Demo Service - an application designed on a modern, cloud-native, microservice architecture.
    • One-click deployment - a script handles the work of deploying the service to Google Cloud Platform.
    • Load Generator - a part that produces simulated traffic on a demo service.
    Image Source

    Version Checker for Kubernetes

    Kubernetes utility that allows you to observe existing versions of images that are running in the cluster. This tool also allows you to see the current image versions in  table format on a Grafana dashboard.

    Features:

    • Multiple self hosted registries can be set-up at once
    • This utility allows you to see the version information as Prometheus metrics.
    • Support for registries like ACR, DockerHub, ECR.
    Image Source

    Istio

    Istio is an open framework for incorporating microservices, monitoring traffic movement through microservices, implementing policies and aggregating telemetry data in a standardised way. The control plane of Istio offers an abstraction layer over the underlying platform for cluster management, such as Kubernetes.

    Features:

    • Automatic load balancing for HTTP, gRPC, WebSocket, and TCP traffic.
    • Fine-grained control of traffic behavior with rich routing rules, retries, failovers, and fault injection.
    • A pluggable policy layer and configuration API supporting access controls, rate limits and quotas.
    • Automatic metrics, logs, and traces for all traffic within a cluster, including cluster ingress and egress.
    • Secure service-to-service communication in a cluster with strong identity-based authentication and authorization.
    Image Source

    Checkov

    Checkov is an Infrastructure-as-Code static code review tool. It scans Terraform, Cloud Details, Cubanet, Serverless or ARM Models cloud infrastructure, and detects security and compliance misconfigurations.

    Features:

    • More than 400 built-in rules cover AWS, Azure and Google Cloud 's best protection and security practises.
    • Assesses Terraform Provider settings to monitor Terraform-managed IaaS, PaaS or SaaS development , maintenance, and updates.
    • Detects AWS credential in EC2 Userdata, Lambda context variables and Terraform providers.
    Image Source

    Litmus

    Cloud-Native Chaos Engineering

    Litmus is a cloud-based chaos modelling toolkit. Litmus provides tools to orchestrate chaos on Kubernetes to help SREs discover vulnerabilities in their deployments. SREs use Litmus to conduct chaos tests first in the staging area and finally in development to discover glitches and vulnerabilities. Fixing the deficiencies leads to improved system resilience.

    Features:

    • Developers can run chaos tests during application development as an extension to unit testing or integration testing.
    • For CI pipeline builders: To run chaos as a pipeline stage to find bugs when the application is subjected to fail paths in a pipeline.
    Image Source

    Locust

    Locust is a simple to use, scriptable and flexible performance testing application. You define the behaviour of your users in standard Python code, instead of using a clunky UI or domain specific language. This enables Locust to be extensible and developer friendly.

    Features:

    • Locust is distributed & scalable - easily supporting hundreds or thousands of users.
    • Web-based UI that shows progress in real-time.
    • Can test any system with a little tinkering.
    Image Source

    Prometheus

    Prometheus, a Cloud Native Computing Foundation project, is a systems and service monitoring system. It extracts metrics from configured destinations at specific times, tests rules and shows outcomes. If specified criteria are violated, it will trigger notifications.

    Features:

    • A multi-dimensional data model (time series defined by metric name and set of key/value dimensions)
    • Targets are discovered via service discovery or static configuration
    • No dependency on distributed storage; single server nodes are autonomous
    • PromQL, a powerful and flexible query language to leverage this dimensionality
    Image Source

    Kube-Monkey

    Kube-monkey is a Kubernetes cluster implementation of Netflix's Chaos Monkey. The random deletion of kubernetes pods facilitates the creation of failure-resistant resources and validates them at the same time.

    Features:

    • Kube-monkey is operating with an opt-in model and only targeting the termination of Kubernetes (k8s) users which have specifically accepted that kube-monkey will terminate their pods.
    • Highly customisable scheduling features based on your requirements
    Image Source

    PowerfulSeal

    PowerfulSeal injects failure into Kubernetes clusters, helping you to recognise issues as quickly as possible. It enables scenarios that portray complete chaos experiments to be created.

    Features:

    • Compatible with Kubernetes, OpenStack, AWS, Azure, GCP and local machines
    • Connects with Prometheus and Datadog for metrics collection
    • Multiple modes allowed for custom use cases
    Image Source

    The great benefit of open source technologies is their extensible nature. You can add features to the tool if required to better fit your custom architecture. These open source projects have extensive support documentation and a community of users. As microservice architecture is slated to dominate the cloud computing space, reliable tools to monitor and troubleshoot these instances are sure to become part of every developer's arsenal.

    You can also find more such awesome DevOps and SRE open source projects here. Meanwhile, we’d love to hear from you on other projects/tools that should make this list! Leave us a comment or reach out over a DM via Twitter and let us know your thoughts.

    What you should do now
    • Schedule a demo with Squadcast to learn about the platform, answer your questions, and evaluate if Squadcast is the right fit for you.
    • Curious about how Squadcast can assist you in implementing SRE best practices? Discover the platform's capabilities through our Interactive Demo.
    • Enjoyed the article? Explore further insights on the best SRE practices.
    • Schedule a demo with Squadcast to learn about the platform, answer your questions, and evaluate if Squadcast is the right fit for you.
    • Curious about how Squadcast can assist you in implementing SRE best practices? Discover the platform's capabilities through our Interactive Demo.
    • Enjoyed the article? Explore further insights on the best SRE practices.
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management.
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    What you should do now?
    Here are 3 ways you can continue your journey to learn more about Unified Incident Management
    Discover the platform's capabilities through our Interactive Demo.
    See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management.
    Share the article
    Share this blog post on Facebook, Twitter, Reddit or LinkedIn.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Compare our plans and find the perfect fit for your business.
    See Redis' Journey to Efficient Incident Management through alert noise reduction With Squadcast.
    Discover the platform's capabilities through our Interactive Demo.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Compare Squadcast & PagerDuty / Opsgenie
    Compare and see if Squadcast is the right fit for your needs.
    Compare our plans and find the perfect fit for your business.
    Learn how Scoro created a solid foundation for better on-call practices with Squadcast.
    Discover the platform's capabilities through our Interactive Demo.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Learn how Scoro created a solid foundation for better on-call practices with Squadcast.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Discover the platform's capabilities through our Interactive Demo.
    Enjoyed the article? Explore further insights on the best SRE practices.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Enjoyed the article? Explore further insights on the best SRE practices.
    Written By:
    November 13, 2020
    November 13, 2020
    Share this post:
    Subscribe to our LinkedIn Newsletter to receive more educational content
    Subscribe now
    ant-design-linkedIN

    Subscribe to our latest updates

    Enter your Email Id
    Thank you! Your submission has been received!
    Oops! Something went wrong while submitting the form.
    FAQs
    More from
    Nir Sharma
    What are Canary Deployments and Why are they Important?
    What are Canary Deployments and Why are they Important?
    August 25, 2022
    Classifying Severity Levels for Your Organization
    Classifying Severity Levels for Your Organization
    July 5, 2022
    Freshdesk + Squadcast: Enabling Streamlined Incident Response for Enterprises
    Freshdesk + Squadcast: Enabling Streamlined Incident Response for Enterprises
    April 5, 2022
    Learn how organizations are using Squadcast
    to maintain and improve upon their Reliability metrics
    Learn how organizations are using Squadcast to maintain and improve upon their Reliability metrics
    mapgears
    "Mapgears simplified their complex On-call Alerting process with Squadcast.
    Squadcast has helped us aggregate alerts coming in from hundreds...
    bibam
    "Bibam found their best PagerDuty alternative in Squadcast.
    By moving to Squadcast from Pagerduty, we have seen a serious reduction in alert fatigue, allowing us to focus...
    tanner
    "Squadcast helped Tanner gain system insights and boost team productivity.
    Squadcast has integrated seamlessly into our DevOps and on-call team's workflows. Thanks to their reliability...
    Alexandre Lessard
    System Analyst
    Martin do Santos
    Platform and Architecture Tech Lead
    Sandro Franchi
    CTO
    Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2022 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Mid-Market Asia Pacific Incident Management on G2 Users love Squadcast on G2
    Squadcast awarded as "Best Software" in the IT Management category by G2 🎉 Read full report here.
    What our
    customers
    have to say
    mapgears
    "Mapgears simplified their complex On-call Alerting process with Squadcast.
    Squadcast has helped us aggregate alerts coming in from hundreds of services into one single platform. We no longer have hundreds of...
    Alexandre Lessard
    System Analyst
    bibam
    "Bibam found their best PagerDuty alternative in Squadcast.
    By moving to Squadcast from Pagerduty, we have seen a serious reduction in alert fatigue, allowing us to focus...
    Martin do Santos
    Platform and Architecture Tech Lead
    tanner
    "Squadcast helped Tanner gain system insights and boost team productivity.
    Squadcast has integrated seamlessly into our DevOps and on-call team's workflows. Thanks to their reliability metrics we have...
    Sandro Franchi
    CTO
    Revamp your Incident Response.
    Peak Reliability
    Easier, Faster, More Automated with SRE.