As a fast paced growing startup, Squadcast helped us in automated alerting with proper deduplication in place to reduce bandwidth on figuring out issue owners enabling us route them correctly without fatigue. Coming from a monitoring infrastructure background, the journey from an alert to RCA seemed a huge gap in process management and Squadcast is tackling that quite well.
Udaan, one of the fastest-growing startups to achieve unicorn status in 2018, is a network-centric B2B platform designed specifically for small & medium businesses. Serving large number of retailers across India, Udaan has become the platform of choice for B2B transactions across verticals like electronics, home appliances, lifestyle products, and many more.
To keep things running smoothly, Udaan depends on a multitude of microservices. As their platform grows, user expectations grow in sync which is reflected in the ever-growing number of requests they handle per second, that can run into thousands or even more, at peak load. Even a few minutes of downtime can have a significant impact in terms of their revenue.
Udaan was on a lookout for an SRE platform that could handle multiple integrations and track changes in their complex architecture in real-time, which included a web of interconnected microservices and mobile applications. As Udaan’s infrastructure scaled in complexity, there was a need for an SRE platform that was up to the challenge and which could make the lives of on-call engineers easier.
Ineffective On-call Management: Since alerting, monitoring and issue support were completely owned by different teams, whenever an outage occurred, there was a lag between incident discovery and acknowledgement leading to an increase in MTTR.
Better and Easy On-call Scheduling: Automated scheduling enabled quick & accurately routed on-call notifications.
Non Existence of Escalation Policies: Their current incident management process did not provide room to set up escalation policies when an incident goes unacknowledged leading to increase in MTTA.
Escalation Policies: With Squadcast’s highly configurable escalation policies, it is possible for Udaan to alert specific support engineers depending on the severity of the problem thus reducing MTTA drastically.
Alert Fatigue: As Udaan’s infrastructure grew in size, support engineers faced on-call fatigue as multiple alerts for the same incident were sent out.
Deduplication of Alerts: By automatically combining alerts for the same incident Squadcast dramatically reduced the number of unnecessary alerts.
Manual Alert Forwarding: Udaan had a manual alerting system where support engineers would attempt to call the concerned department. This quickly became impractical with their rapidly growing infrastructure.
Alert Routing: With smarter routing and diverse notification modes , there is reduction in time to acknowledge since engineers get alerted only on relevant events.
Analytics: No metrics in place to measure reliability and team productivity.
Discoverability of Metrics: Squadcast dashboard provides a quick snapshot of MTTA/MTTR allowing to instantly deduce the volume and the speed at which team is acknowledging and resolving incidents.
The deduplication feature within the platform ensures that on-call engineers are not bombarded with similar and/or acknowledged alerts. All alerts from a specific outage are now automatically grouped and routed to the right responder (event tagging).
A feature-rich dashboard allows for a granular view of incidents and responses. Thanks to this, Udaan was able to keep better track of their SLAs. Number of service outages came down by 215% after adoption!
The team is now able to reach the right folks faster with our intuitive and customizable on-call scheduling, conditional routing, and flexible escalation policies.
Squadcast’s stellar customer support service helped Udaan to stay on top of critical incidents in real time.
The deduplication feature within the platform ensures that on-call engineers are not bombarded with similar and/or acknowledged alerts. All alerts from a specific outage are now automatically grouped and routed to the right responder (event tagging).
A feature-rich dashboard allows for a granular view of incidents and responses. Thanks to this, Udaan was able to keep better track of their SLAs. Number of service outages came down by 215% after adoption!
The team is now able to reach the right folks faster with our intuitive and customizable on-call scheduling, conditional routing, and flexible escalation policies.
Squadcast’s stellar customer support service helped Udaan to stay on top of critical incidents in real time.
Udaan is now confident that they can go beyond conventional IT alerting and incident management processes by using Squadcast’s SRE-focused features to enhance their incident response capabilities.