Squadcast has helped us effectively classify alerts and respond to them based on the priority and severity of the incidents. Besides being able to clearly differentiate between alerts coming in from different services and for different clients, we also have more visibility into matters that require an urgent response.
Redux is a globally recognized provider of Managed Amazon Web Services (AWS) and Managed Teams. As part of managed AWS services, they provide architecture design, migration assistance, monitoring, and management services so their customers can use AWS cloud services to their fullest potential.
Having integrations in place to tackle their over-dependence on email-based alerting, Redux was looking for an incident management platform that could integrate well with their existing monitoring stack & ChatOps tools. Initially, their incident response process relied on manual alert routing and thus looking for a tool that could streamline their alerting process with the primary intention to reduce alert fatigue.
Extreme levels of alert noise & alert fatigue: Redux’s engineers would receive almost 300-400 Email alerts in a matter of a few hours when an incident occurs (or a threshold is breached for a service)
Streamlined alerting process & reduced alert noise: By integrating their monitoring stack (CloudWatch, DataDog, etc.) with Squadcast & with configuring Deduplication and Routing rules, they now have more control over which alerts get reported and the users it gets routed to.
Inability to prioritize alerts based on priority & severity: Since they were primarily dependent on emails for alert notifications, they did not have much clarity and insight into an alert’s priority & severity.
More clarity with Tagging rules: By auto-adding ‘Tags’ to incoming incidents, their engineers could leverage the information on Squadcast’s GUI, to determine the priority & severity of incident alerts.
Lack of ownership amidst increasing alerts: Alerts were acknowledged & forwarded via Emails, which made it difficult to manage incidents and genrate reports.
Improved accountability & better Postmortems: Squadcast provided their engineers a centralized medium to acknowledge, & resolve alert notifications. They could also do better Postmortems and present meaningful reports during QBR discussions.
Inability to classify alert notifications based on clients & client-specific needs: Prior to Squadcast, their alerting process did not support alert classification. Alerts for all clients would be delivered to their email inbox and/or to the same Slack channel.
Advanced alert classification based on client-specific needs: By routing alerts via Squadcast, they could separate alerts based on different clients. They could also classify alerts based on different services, route them to client-specific Slack channels, & mitigate them with increased transparency & ownership.
Handling the infrastructure of multiple clients & monitoring specific services of certain clients became easier and more efficient.
Classifying incidents based on client-specific needs and service-specific severity made notifications context-rich and reduced the number of alerts needing attention.
The centralized dashboard which allows for acknowledging/ resolving incidents is helping Redux’s engineers to be more accountable for incidents.
Besides being able to conduct better Postmortems and RCAs, even generating monthly/ quarterly reports for clients has become simple and straightforward.
Handling the infrastructure of multiple clients & monitoring specific services of certain clients became easier and more efficient.
Classifying incidents based on client-specific needs and service-specific severity made notifications context-rich and reduced the number of alerts needing attention.
The centralized dashboard which allows for acknowledging/ resolving incidents is helping Redux’s engineers to be more accountable for incidents.
Besides being able to conduct better Postmortems and RCAs, even generating monthly/ quarterly reports for clients has become simple and straightforward.