Got a DevOps horror story? Tell us about your worst on-call nightmares this Halloween and get featured! Click Here
Blog
Best Practices
Incident Response in the time of Remote Work

Incident Response in the time of Remote Work

March 26, 2020
Incident Response in the time of Remote Work
In This Article:
Our Products
On-Call Management
Incident Response
Continuous Learning
Workflow Automation

In light of the recent events happening around the globe, with increasing cases of COVID-19, many countries have publicly announced lockdowns. This has been fairly easy to manage for some companies but many are struggling to define appropriate Work From Home (WFH) policies. 

This has hit operations-heavy companies more than most. Operations involve a massive amount of coordination, communication and responsiveness; all of which is tricky to accomplish when you suddenly have to deal with your team remotely.

How does this affect Incident Response and On-call Teams?

One clear result of being quarantined and maintaining social distancing is how overwhelmingly dependent we have become on the digital world. So, operations teams and on-call folks are under added pressure to keep IT infrastructure and applications in top shape. As a result, it becomes really important that we stay connected through multiple modes of communication. After all, incident response is all about getting to the right person at the right time and communicating effectively, not just within the on-call team but also to external stakeholders.

Incident alerting and management tools are accessible to people irrespective of where they work from. But it’s only useful if your incident management practices are sound and compliment the tool.

Handling on-call on a normal day at work can be stressful, but if you are working remotely it becomes all the more crucial to ensure good communication protocol. It’s never too late to tweak your incident response processes to make it easier for Incident Management teams to be on the same page and ensure that your systems and services are always reliable. Here are some ways you can set yourself up for success.

Preparing for Remote Incident Management

As a company that focuses on best practices that help streamline incident response, we follow a few practices to always be remote-ready.

  • Incident Communication
  • The cornerstone of any good incident response process is communication.

    Document more: One thing to keep in mind to reduce the risk of misinformation or communication gaps, is to write more and write better. It’s always better to have a record of information and associated activity to go back to, if necessary. When in doubt, throw in a few more details.

    Use a central Slack Channel: For those of you that love chatops or depend majorly on Slack for incident management, use a dedicated channel to bring in all your incidents. You might have to create separate channels for communicating with regard to specific issues and outages. But a central channel can act as an index, and prevent the chaos of looking for a specific incident and its status.

    Virtual War Room: Goes without saying that collaboration is key to reducing your MTTR. You can mimic a traditional war room huddle by using a video conferencing tool or chat platform with the incident response team. With Squadcast, you can use our virtual war-room where you can chat, bring in other members from your team, SMEs, Stakeholders and business facing teams to ensure that all your goals are aligned.

    Publish Meeting link: You can create a virtual meeting room for just the fire-fighting and keep that open throughout your on-call rotation. You can add the meeting ID along with the incident details or pin the details in the Slack channel or any other communication tool you use.

    We typically use Zoom to keep an open meeting room that anyone with the meeting ID can join in. You should be able to do it with other tools as well.

    Talking is faster than typing, so it can be tempting to just call up with every doubt; however, use it only if the situation calls for it. No one likes to be constantly interrupted.

    Be transparent: Communication can take a major hit with work from home teams. This happens simply because you may think you’re communicating all the available information you know but may miss out on some prerequisites needed to comprehend the information better.

    To avoid this, it’s great to just add in all of the relevant teams while dealing with an incident. Also, remember to update your status page with any new necessary information immediately post incident resolution.

    This opens them up to all the activity taking place, the severity of the issue and acts as a single platform to discuss and share. With everyone always informed, you don’t have to struggle with context switching for just drafting the right message to send to external teams or customers.

  • Incident Response
  • Once you get the communication processes right, incident response gets simpler. You can focus on firefighting without having to worry about anything else.

    Assign Roles: You are quickest when you know what you need to do. This kind of clarity can be achieved by simply assigning roles to your incident response team. This also helps distribute work that would otherwise fall on just the one person figuring out a fix.

    For teams of just 1 or 2, just a checklist of items to do when an incident hits can go a long way. This clears the mind of any doubts about pending work to do.

    Timeline of Incident Activity: Usually, a scribe is expected to maintain a record of all incident related activities. It’s always a good idea to not trust your memory in a high-stress situation. This allows you to have all the information necessary to analyze better, write better postmortems and create an effective playbook as a pre-emptive measure. With Squadcast, we use our automated timelines to understand the resolution activities while conducting postmortems.

    Set-up an automated on-call rotation: If you haven’t already set this up, you can expect a downhill graph of motivation for all the engineers that do this today. It is highly likely that when you don’t have a rotation set up, the stress of incidents fall on just 1 or few.

    It’s a load off your mind if you just knew beforehand when you’d have to go on-call. Rotations also help you assign appropriate load to everyone on the team.

    Remember, being on-call is everyone’s responsibility.

  • Incident Resolution
  • Always Create Runbooks (with fallback options): It’s useful to create a knowledge base of all incident resolution information that one can refer to when similar incidents hit your service. This way, you don’t have to spend time figuring out the incident all over again.

    Runbooks are especially useful to folks who are new to on-call or newer in your organization. It’s always good to have more information when you’re new.

    Blameless Postmortems: Another great source of information is postmortems and post incident reviews. Not a lot of organizations follow through to finish a postmortem simply because it’s a long, tedious and sometimes stressful process. But the best way to ensure that an incident doesn’t occur again is to analyze why it happened in the first place and then making this information available for the entire team. In Squadcast, you can create postmortems of incidents from within the app and can be viewed by anyone on your team.

    The unexpected and sudden shift to remote working introduces new risks. And while each organization needs to take its own unique circumstances into account, the aforementioned practices offer a step in the right direction in keeping operations both productive and proactive.

Written By:
March 26, 2020
Prakya Vasudevan
Prakya Vasudevan
March 26, 2020
Best Practices
Incident Response
Incident Management
Share this blog:
In This Article:
Get reliability insights delivered straight to your inbox.
Get ready for the good stuff! No spam, no data sale and no promotion. Just the awesome content you signed up for.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Get reliability insights delivered straight to your inbox.
Get ready for the good stuff! No spam, no data sale and no promotion. Just the awesome content you signed up for.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Get the latest scoop on Reliability insights. Delivered straight to your inbox.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2 Users love Squadcast on G2
Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2 Users love Squadcast on G2
Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2
Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2
Users love Squadcast on G2
Copyright © Squadcast Inc. 2017-2024

Incident Response in the time of Remote Work

Mar 26, 2020
Last Updated:
October 4, 2024
Share this post:
Incident Response in the time of Remote Work

The unexpected and sudden shift to remote working introduces a new set of problems within the incident response space. And while each organization needs to take its own unique circumstances into account, this post outlines the best practices and steps that can be taken in the right direction in keeping operations both productive and proactive.

Table of Contents:

    In light of the recent events happening around the globe, with increasing cases of COVID-19, many countries have publicly announced lockdowns. This has been fairly easy to manage for some companies but many are struggling to define appropriate Work From Home (WFH) policies. 

    This has hit operations-heavy companies more than most. Operations involve a massive amount of coordination, communication and responsiveness; all of which is tricky to accomplish when you suddenly have to deal with your team remotely.

    How does this affect Incident Response and On-call Teams?

    One clear result of being quarantined and maintaining social distancing is how overwhelmingly dependent we have become on the digital world. So, operations teams and on-call folks are under added pressure to keep IT infrastructure and applications in top shape. As a result, it becomes really important that we stay connected through multiple modes of communication. After all, incident response is all about getting to the right person at the right time and communicating effectively, not just within the on-call team but also to external stakeholders.

    Incident alerting and management tools are accessible to people irrespective of where they work from. But it’s only useful if your incident management practices are sound and compliment the tool.

    Handling on-call on a normal day at work can be stressful, but if you are working remotely it becomes all the more crucial to ensure good communication protocol. It’s never too late to tweak your incident response processes to make it easier for Incident Management teams to be on the same page and ensure that your systems and services are always reliable. Here are some ways you can set yourself up for success.

    Preparing for Remote Incident Management

    As a company that focuses on best practices that help streamline incident response, we follow a few practices to always be remote-ready.

    • Incident Communication
    • The cornerstone of any good incident response process is communication.

      Document more: One thing to keep in mind to reduce the risk of misinformation or communication gaps, is to write more and write better. It’s always better to have a record of information and associated activity to go back to, if necessary. When in doubt, throw in a few more details.

      Use a central Slack Channel: For those of you that love chatops or depend majorly on Slack for incident management, use a dedicated channel to bring in all your incidents. You might have to create separate channels for communicating with regard to specific issues and outages. But a central channel can act as an index, and prevent the chaos of looking for a specific incident and its status.

      Virtual War Room: Goes without saying that collaboration is key to reducing your MTTR. You can mimic a traditional war room huddle by using a video conferencing tool or chat platform with the incident response team. With Squadcast, you can use our virtual war-room where you can chat, bring in other members from your team, SMEs, Stakeholders and business facing teams to ensure that all your goals are aligned.

      Publish Meeting link: You can create a virtual meeting room for just the fire-fighting and keep that open throughout your on-call rotation. You can add the meeting ID along with the incident details or pin the details in the Slack channel or any other communication tool you use.

      We typically use Zoom to keep an open meeting room that anyone with the meeting ID can join in. You should be able to do it with other tools as well.

      Talking is faster than typing, so it can be tempting to just call up with every doubt; however, use it only if the situation calls for it. No one likes to be constantly interrupted.

      Be transparent: Communication can take a major hit with work from home teams. This happens simply because you may think you’re communicating all the available information you know but may miss out on some prerequisites needed to comprehend the information better.

      To avoid this, it’s great to just add in all of the relevant teams while dealing with an incident. Also, remember to update your status page with any new necessary information immediately post incident resolution.

      This opens them up to all the activity taking place, the severity of the issue and acts as a single platform to discuss and share. With everyone always informed, you don’t have to struggle with context switching for just drafting the right message to send to external teams or customers.

    • Incident Response
    • Once you get the communication processes right, incident response gets simpler. You can focus on firefighting without having to worry about anything else.

      Assign Roles: You are quickest when you know what you need to do. This kind of clarity can be achieved by simply assigning roles to your incident response team. This also helps distribute work that would otherwise fall on just the one person figuring out a fix.

      For teams of just 1 or 2, just a checklist of items to do when an incident hits can go a long way. This clears the mind of any doubts about pending work to do.

      Timeline of Incident Activity: Usually, a scribe is expected to maintain a record of all incident related activities. It’s always a good idea to not trust your memory in a high-stress situation. This allows you to have all the information necessary to analyze better, write better postmortems and create an effective playbook as a pre-emptive measure. With Squadcast, we use our automated timelines to understand the resolution activities while conducting postmortems.

      Set-up an automated on-call rotation: If you haven’t already set this up, you can expect a downhill graph of motivation for all the engineers that do this today. It is highly likely that when you don’t have a rotation set up, the stress of incidents fall on just 1 or few.

      It’s a load off your mind if you just knew beforehand when you’d have to go on-call. Rotations also help you assign appropriate load to everyone on the team.

      Remember, being on-call is everyone’s responsibility.

    • Incident Resolution
    • Always Create Runbooks (with fallback options): It’s useful to create a knowledge base of all incident resolution information that one can refer to when similar incidents hit your service. This way, you don’t have to spend time figuring out the incident all over again.

      Runbooks are especially useful to folks who are new to on-call or newer in your organization. It’s always good to have more information when you’re new.

      Blameless Postmortems: Another great source of information is postmortems and post incident reviews. Not a lot of organizations follow through to finish a postmortem simply because it’s a long, tedious and sometimes stressful process. But the best way to ensure that an incident doesn’t occur again is to analyze why it happened in the first place and then making this information available for the entire team. In Squadcast, you can create postmortems of incidents from within the app and can be viewed by anyone on your team.

      The unexpected and sudden shift to remote working introduces new risks. And while each organization needs to take its own unique circumstances into account, the aforementioned practices offer a step in the right direction in keeping operations both productive and proactive.

    What you should do now
    • Schedule a demo with Squadcast to learn about the platform, answer your questions, and evaluate if Squadcast is the right fit for you.
    • Curious about how Squadcast can assist you in implementing SRE best practices? Discover the platform's capabilities through our Interactive Demo.
    • Enjoyed the article? Explore further insights on the best SRE practices.
    • Schedule a demo with Squadcast to learn about the platform, answer your questions, and evaluate if Squadcast is the right fit for you.
    • Curious about how Squadcast can assist you in implementing SRE best practices? Discover the platform's capabilities through our Interactive Demo.
    • Enjoyed the article? Explore further insights on the best SRE practices.
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management.
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    What you should do now?
    Here are 3 ways you can continue your journey to learn more about Unified Incident Management
    Discover the platform's capabilities through our Interactive Demo.
    See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management.
    Share the article
    Share this blog post on Facebook, Twitter, Reddit or LinkedIn.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Compare our plans and find the perfect fit for your business.
    See Redis' Journey to Efficient Incident Management through alert noise reduction With Squadcast.
    Discover the platform's capabilities through our Interactive Demo.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Compare Squadcast & PagerDuty / Opsgenie
    Compare and see if Squadcast is the right fit for your needs.
    Compare our plans and find the perfect fit for your business.
    Learn how Scoro created a solid foundation for better on-call practices with Squadcast.
    Discover the platform's capabilities through our Interactive Demo.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Learn how Scoro created a solid foundation for better on-call practices with Squadcast.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Discover the platform's capabilities through our Interactive Demo.
    Enjoyed the article? Explore further insights on the best SRE practices.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Enjoyed the article? Explore further insights on the best SRE practices.
    Written By:
    March 26, 2020
    March 26, 2020
    Share this post:
    Subscribe to our LinkedIn Newsletter to receive more educational content
    Subscribe now
    ant-design-linkedIN

    Subscribe to our latest updates

    Enter your Email Id
    Thank you! Your submission has been received!
    Oops! Something went wrong while submitting the form.
    FAQs
    More from
    Prakya Vasudevan
    On-call On-boarding Checklist
    On-call On-boarding Checklist
    May 20, 2020
    Best Practices in Incident Management
    Best Practices in Incident Management
    May 7, 2020
    Configure an Intuitive Service Dashboard & Reduce Response Time
    Configure an Intuitive Service Dashboard & Reduce Response Time
    April 30, 2020
    Learn how organizations are using Squadcast
    to maintain and improve upon their Reliability metrics
    Learn how organizations are using Squadcast to maintain and improve upon their Reliability metrics
    mapgears
    "Mapgears simplified their complex On-call Alerting process with Squadcast.
    Squadcast has helped us aggregate alerts coming in from hundreds...
    bibam
    "Bibam found their best PagerDuty alternative in Squadcast.
    By moving to Squadcast from Pagerduty, we have seen a serious reduction in alert fatigue, allowing us to focus...
    tanner
    "Squadcast helped Tanner gain system insights and boost team productivity.
    Squadcast has integrated seamlessly into our DevOps and on-call team's workflows. Thanks to their reliability...
    Alexandre Lessard
    System Analyst
    Martin do Santos
    Platform and Architecture Tech Lead
    Sandro Franchi
    CTO
    Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2022 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Mid-Market Asia Pacific Incident Management on G2 Users love Squadcast on G2
    Squadcast awarded as "Best Software" in the IT Management category by G2 🎉 Read full report here.
    What our
    customers
    have to say
    mapgears
    "Mapgears simplified their complex On-call Alerting process with Squadcast.
    Squadcast has helped us aggregate alerts coming in from hundreds of services into one single platform. We no longer have hundreds of...
    Alexandre Lessard
    System Analyst
    bibam
    "Bibam found their best PagerDuty alternative in Squadcast.
    By moving to Squadcast from Pagerduty, we have seen a serious reduction in alert fatigue, allowing us to focus...
    Martin do Santos
    Platform and Architecture Tech Lead
    tanner
    "Squadcast helped Tanner gain system insights and boost team productivity.
    Squadcast has integrated seamlessly into our DevOps and on-call team's workflows. Thanks to their reliability metrics we have...
    Sandro Franchi
    CTO
    Revamp your Incident Response.
    Peak Reliability
    Easier, Faster, More Automated with SRE.