📢 Webinar Alert! Reliability Automation - AI, ML, & Workflows in Incident Management. Register Here
Blog
Best Practices
Anti-patterns in Incident Response that you should unlearn

Anti-patterns in Incident Response that you should unlearn

August 2, 2022
Anti-patterns in Incident Response that you should unlearn
In This Article:
Our Products
On-Call Management
Incident Response
Continuous Learning
Workflow Automation

It is important to invest time and effort in understanding why a system performs the way it does and how we can improve it. Companies continue with practices that yield successful results, but ignoring anti-patterns can be far worse than choosing rigid processes. In this blog, we will explore anti-patterns in incident response and why you should unlearn those.

Common Anti-patterns in Incident Response

Just get everyone on the call

Alerting everyone each time an incident is detected is not the best of practices. Sometimes notifying everyone is easier or it adds value. For example,

  • Organizations have smaller teams, and it is easier to notify the entire team.
  • The issue is critical and getting everyone on board is a better option.

This practice may not be ideal when teams scale. You will end up notifying people who have nothing to do with the incident. This may result in alert fatigue where people get accustomed to not paying attention and often ignore incidents where their attention is needed.

Having on-call rotations and targeted alerting can help with efficient routing and prevent burnouts.

Using up bandwidth to give status updates

Responders deal with critical incidents where stakeholders expect constant status updates. Updates are great as it keeps everyone in the loop and may potentially offer more solutions. Sometimes, teams deal with minor incidents, which they can resolve quickly and then pass on updates to concerned members. However, while dealing with critical incidents, teams may be forced to focus more on sending updates rather than just resolving the incident. This may compromise the resolution process.

To address this issue a dedicated person can be assigned for handling communication and to provide timely updates to the stakeholders.

Progress follows Chaos

There is a perception that while dealing with critical incidents, people will move around with lots of discussions chaos and panic. This is not always true. When multiple people are responding to an incident, it is absolutely critical that they collaborate and keep everyone in sync with the actions being taken. Chaos and panic can worsen the situation and should be avoided by defining clear roles and responsibilities. Teams should have an incident commander who takes decisions and authorizes changes that can impact the outcome. Teams also use chat rooms to give updates and maintain records effectively. By setting up these processes, teams can ensure effective communication and prevent chaos and panic.

Incident Severity and policy discussion during an Incident Call

Debating over the severity of the incident at the last minute is a waste of people’s time. This time should be used in resolving incidents. It is important to define unambiguous severity levels for incidents, as responses, plans, and policies are chosen based on the severity. Ideally, rules should be technically driven, clear and automated so that every incident comes with a pre-defined severity level.

Training and drills should be conducted to educate teams on how to handle these situations better.

Not escalating incidents to the right responders

Teams fail to inform the right responders when they don't have mechanisms to associate/relate incidents to the right responders. In order to find the right person, teams go back and forth, slowing down the process. Another reason why the right people aren't notified is when there are multiple teams involved and team structures are complex. It is important to have an identifiable and reachable person for every team. There should be a clear, well-oiled mechanism to route alerts to the right responders to ensure smooth routing and escalations.

Postmortem Failures

Postmortems are important for incident response because they help you learn from the events that happened in the past and help you plan your future actions.

There are various reasons that result in postmortem failures,

  • Some teams are frequently stressed with deadlines and unplanned incidents. Therefore, once the incident is resolved, no postmortems are done.
  • Sometimes postmortems end up in blame games. A good postmortem happens only when people are open to discussing problems honestly. If you are afraid of getting blamed during a postmortem, it kills the purpose of having postmortem to find solutions to problems.
  • In some cases, postmortems are done just because the process demands it and not to find answers.

Without postmortems, you fail to recognize what’s working and where you can improve. Most importantly, they help you avoid making the same mistakes in the future. Hence, postmortems should be an integral part of the incident response process and must be done sincerely.

(Checkout Squadcast's Postmortem Templates)

Inflexible Policies and Processes

Organizations find comfort in practices that return successful results and like to continue with those practices. However, at times you cannot anticipate certain events and established solutions do not work. Having flexible policies and processes can help you adapt to changing requirements and find the right solutions when needed. You don't have to be reckless and should try to introduce sensible changes. Also, don't be afraid to make changes. Some changes will slow down proceedings in the short-term but promise faster and better results in the long run.

Putting on multiple hats

Incidents are confusing at the best of times. People taking up different roles uninformed just adds to the confusion. In high-pressure situations, people are expected to act quickly. Also, there is limited information coming in and a lack of clarity on who needs to do what. This only makes the situation worse. Hence, it is important to define the right roles and responsibilities for people. Also, as an individual, one should keep others involved and informed about a change when needed.

Conclusion

Incident response is a field where we constantly look for processes and stability, but ignoring anti-patterns can be far worse than choosing optimal solutions or rigid processes.

Incident response teams need to identify issues early on, so they can help save time, prevent frustration, and reduce refactoring in the long run. Hence, it is very important to unlearn anti-patterns and learn new processes that can help accelerate incident response.

Written By:
August 2, 2022
Vishal Padghan
Vishal Padghan
August 2, 2022
Best Practices
Incident Response
SRE
On-Call
Share this blog:
In This Article:
Get reliability insights delivered straight to your inbox.
Get ready for the good stuff! No spam, no data sale and no promotion. Just the awesome content you signed up for.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Get reliability insights delivered straight to your inbox.
Get ready for the good stuff! No spam, no data sale and no promotion. Just the awesome content you signed up for.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Get the latest scoop on Reliability insights. Delivered straight to your inbox.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2 Users love Squadcast on G2
Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2 Users love Squadcast on G2
Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2
Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2
Users love Squadcast on G2
Copyright © Squadcast Inc. 2017-2024

Anti-patterns in Incident Response that you should unlearn

Aug 2, 2022
Last Updated:
May 2, 2024
Share this post:
Anti-patterns in Incident Response that you should unlearn
Table of Contents:

    It is important to invest time and effort in understanding why a system performs the way it does and how we can improve it. Companies continue with practices that yield successful results, but ignoring anti-patterns can be far worse than choosing rigid processes. In this blog, we will explore anti-patterns in incident response and why you should unlearn those.

    Common Anti-patterns in Incident Response

    Just get everyone on the call

    Alerting everyone each time an incident is detected is not the best of practices. Sometimes notifying everyone is easier or it adds value. For example,

    • Organizations have smaller teams, and it is easier to notify the entire team.
    • The issue is critical and getting everyone on board is a better option.

    This practice may not be ideal when teams scale. You will end up notifying people who have nothing to do with the incident. This may result in alert fatigue where people get accustomed to not paying attention and often ignore incidents where their attention is needed.

    Having on-call rotations and targeted alerting can help with efficient routing and prevent burnouts.

    Using up bandwidth to give status updates

    Responders deal with critical incidents where stakeholders expect constant status updates. Updates are great as it keeps everyone in the loop and may potentially offer more solutions. Sometimes, teams deal with minor incidents, which they can resolve quickly and then pass on updates to concerned members. However, while dealing with critical incidents, teams may be forced to focus more on sending updates rather than just resolving the incident. This may compromise the resolution process.

    To address this issue a dedicated person can be assigned for handling communication and to provide timely updates to the stakeholders.

    Progress follows Chaos

    There is a perception that while dealing with critical incidents, people will move around with lots of discussions chaos and panic. This is not always true. When multiple people are responding to an incident, it is absolutely critical that they collaborate and keep everyone in sync with the actions being taken. Chaos and panic can worsen the situation and should be avoided by defining clear roles and responsibilities. Teams should have an incident commander who takes decisions and authorizes changes that can impact the outcome. Teams also use chat rooms to give updates and maintain records effectively. By setting up these processes, teams can ensure effective communication and prevent chaos and panic.

    Incident Severity and policy discussion during an Incident Call

    Debating over the severity of the incident at the last minute is a waste of people’s time. This time should be used in resolving incidents. It is important to define unambiguous severity levels for incidents, as responses, plans, and policies are chosen based on the severity. Ideally, rules should be technically driven, clear and automated so that every incident comes with a pre-defined severity level.

    Training and drills should be conducted to educate teams on how to handle these situations better.

    Not escalating incidents to the right responders

    Teams fail to inform the right responders when they don't have mechanisms to associate/relate incidents to the right responders. In order to find the right person, teams go back and forth, slowing down the process. Another reason why the right people aren't notified is when there are multiple teams involved and team structures are complex. It is important to have an identifiable and reachable person for every team. There should be a clear, well-oiled mechanism to route alerts to the right responders to ensure smooth routing and escalations.

    Postmortem Failures

    Postmortems are important for incident response because they help you learn from the events that happened in the past and help you plan your future actions.

    There are various reasons that result in postmortem failures,

    • Some teams are frequently stressed with deadlines and unplanned incidents. Therefore, once the incident is resolved, no postmortems are done.
    • Sometimes postmortems end up in blame games. A good postmortem happens only when people are open to discussing problems honestly. If you are afraid of getting blamed during a postmortem, it kills the purpose of having postmortem to find solutions to problems.
    • In some cases, postmortems are done just because the process demands it and not to find answers.

    Without postmortems, you fail to recognize what’s working and where you can improve. Most importantly, they help you avoid making the same mistakes in the future. Hence, postmortems should be an integral part of the incident response process and must be done sincerely.

    (Checkout Squadcast's Postmortem Templates)

    Inflexible Policies and Processes

    Organizations find comfort in practices that return successful results and like to continue with those practices. However, at times you cannot anticipate certain events and established solutions do not work. Having flexible policies and processes can help you adapt to changing requirements and find the right solutions when needed. You don't have to be reckless and should try to introduce sensible changes. Also, don't be afraid to make changes. Some changes will slow down proceedings in the short-term but promise faster and better results in the long run.

    Putting on multiple hats

    Incidents are confusing at the best of times. People taking up different roles uninformed just adds to the confusion. In high-pressure situations, people are expected to act quickly. Also, there is limited information coming in and a lack of clarity on who needs to do what. This only makes the situation worse. Hence, it is important to define the right roles and responsibilities for people. Also, as an individual, one should keep others involved and informed about a change when needed.

    Conclusion

    Incident response is a field where we constantly look for processes and stability, but ignoring anti-patterns can be far worse than choosing optimal solutions or rigid processes.

    Incident response teams need to identify issues early on, so they can help save time, prevent frustration, and reduce refactoring in the long run. Hence, it is very important to unlearn anti-patterns and learn new processes that can help accelerate incident response.

    What you should do now
    • Schedule a demo with Squadcast to learn about the platform, answer your questions, and evaluate if Squadcast is the right fit for you.
    • Curious about how Squadcast can assist you in implementing SRE best practices? Discover the platform's capabilities through our Interactive Demo.
    • Enjoyed the article? Explore further insights on the best SRE practices.
    • Schedule a demo with Squadcast to learn about the platform, answer your questions, and evaluate if Squadcast is the right fit for you.
    • Curious about how Squadcast can assist you in implementing SRE best practices? Discover the platform's capabilities through our Interactive Demo.
    • Enjoyed the article? Explore further insights on the best SRE practices.
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management.
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    What you should do now?
    Here are 3 ways you can continue your journey to learn more about Unified Incident Management
    Discover the platform's capabilities through our Interactive Demo.
    See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management.
    Share the article
    Share this blog post on Facebook, Twitter, Reddit or LinkedIn.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Compare our plans and find the perfect fit for your business.
    See Redis' Journey to Efficient Incident Management through alert noise reduction With Squadcast.
    Discover the platform's capabilities through our Interactive Demo.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Compare Squadcast & PagerDuty / Opsgenie
    Compare and see if Squadcast is the right fit for your needs.
    Compare our plans and find the perfect fit for your business.
    Learn how Scoro created a solid foundation for better on-call practices with Squadcast.
    Discover the platform's capabilities through our Interactive Demo.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Learn how Scoro created a solid foundation for better on-call practices with Squadcast.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Discover the platform's capabilities through our Interactive Demo.
    Enjoyed the article? Explore further insights on the best SRE practices.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Enjoyed the article? Explore further insights on the best SRE practices.
    Written By:
    August 2, 2022
    August 2, 2022
    Share this post:
    Subscribe to our LinkedIn Newsletter to receive more educational content
    Subscribe now
    ant-design-linkedIN

    Subscribe to our latest updates

    Enter your Email Id
    Thank you! Your submission has been received!
    Oops! Something went wrong while submitting the form.
    FAQs
    More from
    Vishal Padghan
    From DevOps to GenOps: The Future of Cloud-Native and Hybrid IT Operations
    From DevOps to GenOps: The Future of Cloud-Native and Hybrid IT Operations
    November 20, 2024
    The Perfect Guide to IT Alerting Tools: Ensuring Proactive Monitoring and Swift Incident Response
    The Perfect Guide to IT Alerting Tools: Ensuring Proactive Monitoring and Swift Incident Response
    November 15, 2024
    Incident Response Automation: How It Works & Why It Speeds Up Resolutions
    Incident Response Automation: How It Works & Why It Speeds Up Resolutions
    November 8, 2024
    Learn how organizations are using Squadcast
    to maintain and improve upon their Reliability metrics
    Learn how organizations are using Squadcast to maintain and improve upon their Reliability metrics
    mapgears
    "Mapgears simplified their complex On-call Alerting process with Squadcast.
    Squadcast has helped us aggregate alerts coming in from hundreds...
    bibam
    "Bibam found their best PagerDuty alternative in Squadcast.
    By moving to Squadcast from Pagerduty, we have seen a serious reduction in alert fatigue, allowing us to focus...
    tanner
    "Squadcast helped Tanner gain system insights and boost team productivity.
    Squadcast has integrated seamlessly into our DevOps and on-call team's workflows. Thanks to their reliability...
    Alexandre Lessard
    System Analyst
    Martin do Santos
    Platform and Architecture Tech Lead
    Sandro Franchi
    CTO
    Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2022 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Mid-Market Asia Pacific Incident Management on G2 Users love Squadcast on G2
    Squadcast awarded as "Best Software" in the IT Management category by G2 🎉 Read full report here.
    What our
    customers
    have to say
    mapgears
    "Mapgears simplified their complex On-call Alerting process with Squadcast.
    Squadcast has helped us aggregate alerts coming in from hundreds of services into one single platform. We no longer have hundreds of...
    Alexandre Lessard
    System Analyst
    bibam
    "Bibam found their best PagerDuty alternative in Squadcast.
    By moving to Squadcast from Pagerduty, we have seen a serious reduction in alert fatigue, allowing us to focus...
    Martin do Santos
    Platform and Architecture Tech Lead
    tanner
    "Squadcast helped Tanner gain system insights and boost team productivity.
    Squadcast has integrated seamlessly into our DevOps and on-call team's workflows. Thanks to their reliability metrics we have...
    Sandro Franchi
    CTO
    Revamp your Incident Response.
    Peak Reliability
    Easier, Faster, More Automated with SRE.