Squadcast helps us to effectively monitor our services performance and alerts us to take immediate actions in case of abnormal behaviors.
Kovai is a leading Enterprise and SaaS company, boasting a robust portfolio that includes four products: BizTalk360, Serverless360, Document360, and Churn360. Their commitment to reliability is crucial, given the stringent Service Level Agreements (SLAs) they promise to their global clientele of Enterprise customers across their product portfolio.
The Challenge:
With a monthly footfall of about a million visitors on their website(s) and an average of 50 million API hits, Document360 had to guarantee reliability and scalability to meet their SLAs. Any downtime not only impacts revenue but also inundates the Engineering and Customer Support team with support tickets.
Document360 Implemented robust internal systems and Reliable Incident Management with Squadcast, to bolster system resilience. With a team composed of skilled software engineers and technical architects, they sought to optimize service reliability through performance tuning, excellent deployment practices, and comprehensive Root Cause Analysis (RCA), among other measures.
Prior to Squadcast, Document360 grappled with rudimentary alerting, a lack of formalized On-Call management, inefficient Incident Response, insufficient incident insights, and poor visibility of service health. With Squadcast, they reformed their approach as follows:
Rudimentary Alerting: In the past, alerts were sent to engineers via email through another third party vendor, who provides monitoring. This method of receiving alerts was inadequate given the pressing requirement for routing critical and time-sensitive alerts.
Timely Critical Alert Aggregation: After implementing Squadcast, critical alerts are now directly routed to Document360's engineering team through various channels: emails, push notifications, calls, and Microsoft Teams.
In addition to this, Squadcast's Incident Dashboard provides a detailed, visual representation of aggregated alerts, allowing for a comprehensive overview and better management.
These enhancements have led to a significant improvement in Mean Time to Acknowledge (MTTA), which has been reduced by 55%.
Unstructured On-Call Management: Document360 previously faced a challenge with their Incident Management process, where alerts were not specifically directed to the appropriate personnel. This lack of structured routing resulted in the entire team working on all incoming alerts, leading to inefficiencies and lack of clear responsibility for incident resolution.
Streamlined On-Call Management: By implementing Schedules and Escalation Policies, alerts are now systematically directed to the appropriate team members based on their specific roles and at the precise time, ensuring more accountability and eliminating work redundancy. This targeted approach has significantly improved the productivity of Document360’s team by 20%.
Lack of Efficient Incident Response: Prior to Squadcast, alerts were not routed to their collaboration tool, Microsoft Teams. This limitation significantly slowed down the response time to critical alerts, reducing the overall efficiency of their Incident Management process.
Flexible Incident Response: By leveraging Outgoing Webhooks and Squadcast’s robust integration with Microsoft Teams, Alerts are now seamlessly routed from the Monitoring platform directly to Microsoft Teams, which is where the engineering team primarily operates, resulting in faster and efficient Incident Response.
Insufficient Incident Insight: Despite having a strong culture of performing Root Cause Analysis (RCAs) for every incident, Document360 engineering team found that they lacked comprehensive insights into each incident. This limited understanding hindered their ability to recognize and respond effectively to recurring patterns and issues.
Incident Analytics for Enhanced RCAs: By leveraging Squadcast’s Analytics capabilities, Document360 engineering team is now able to identify patterns in incidents with more depth and precision. For instance, they detected a consistent trend of more incidents reported every Tuesday. Upon conducting an RCA, they found out that one of their customers was running penetration testing on this day, causing a significant spike in their API hits and resulting in service slowdowns. After implementing targeted performance fixes and identifying this pattern, they were able to prevent these incidents from recurring.
Lack of Visualization of Services: Before Squadcast, there was no formal platform to keep track of ongoing issues and their team did not have visibility of service health.
Enhanced Service Visibility: By integrating with Squadcast’s Service Catalog, Document360 has significantly improved their service visibility. The Service Catalog provides a real-time and accurate snapshot of service health. This improvement has enhanced their understanding of service performance.
Implementing a formalized On-Call Management process has had a significant impact on Incident Response times. Now Document360's engineers receive critical alerts promptly via multiple channels, improving their MTTA by 55%.
Document 360 streamlined their Incident Response with Squadcast's Scheduling and Escalation Policies to ensure faster resolution, better accountability and improved productivity by 20%.
More Reliable services and faster resolution times have shown a positive impact on customer satisfaction. Now, incidents are typically addressed and resolved by the Engineering team before customers notice, significantly enhancing CSAT score by 30%.
By implementing Reliable Incident Management with Squadcast and reinforcing it with robust internal processes, Document360 was able to enhance their Service Level Agreements (SLAs) from 99% to an impressive 99.9%
Implementing a formalized On-Call Management process has had a significant impact on Incident Response times. Now Document360's engineers receive critical alerts promptly via multiple channels, improving their MTTA by 55%.
Document 360 streamlined their Incident Response with Squadcast's Scheduling and Escalation Policies to ensure faster resolution, better accountability and improved productivity by 20%.
More Reliable services and faster resolution times have shown a positive impact on customer satisfaction. Now, incidents are typically addressed and resolved by the Engineering team before customers notice, significantly enhancing CSAT score by 30%.
By implementing Reliable Incident Management with Squadcast and reinforcing it with robust internal processes, Document360 was able to enhance their Service Level Agreements (SLAs) from 99% to an impressive 99.9%