📢 Webinar Alert! Reliability Automation - AI, ML, & Workflows in Incident Management. Register Here
Blog
SRE
Keeping your teams and customers in the loop during downtime

Keeping your teams and customers in the loop during downtime

August 12, 2020
Keeping your teams and customers in the loop during downtime
In This Article:
Our Products
On-Call Management
Incident Response
Continuous Learning
Workflow Automation

Communication is key. This is true in all aspects of life, and it especially applies to managing critical incidents in your company. Managing communication effectively with your customers can ensure good-will is maintained and they continue to use your product; alternately, failing to keep your customers informed can result in a loss of business and angry customers. Building and maintaining good communication channels within your company and with your customers is key to ensuring your product continues to be patronised. Done properly, when (inevitable) outages occur, the impact (both technologically and emotionally) is limited.

In a technical sense, proper communication channels help technical teams who may be unaware of how each other operate can work together efficiently to action and resolve problems quickly. In an emotional sense, business teams, management, and customers will be happy to know that their data and technology is in the hands of people who know what they’re doing. Levels of comfort among these stakeholders are enhanced by being included in the process, having their concerns proactively acknowledged, and, being treated as equals by technical teams.

Why audience-appropriate communication is important

Many “techies” enjoy communicating using complex, domain-specific language when managing their services, and this usually never poses a problem in the day-to-day of their jobs. However, when a service-impacting incident is underway, it’s not just you and your team members who are fixing it: everyone is. In the same way spectators at a football game cheer on their home team, your managers and customers are there to provide you with the support you need to resolve issues. But when you speak to them in complex language that is difficult to understand, you unintentionally gate-keep which prevents those same people from assisting you because they just don’t understand what you’re saying. In the same way, product managers and marketing have a tendency to “sanitise” public communication to customers; by the time information makes its way into their inboxes it’s functionally useless.

Consider the following: “our internet service provider has published incorrect routing information, which means that everything on our internal network does not know how to reach the internet. We can fix it by temporarily overriding the incorrect routing information, but our ISP will need to correct their configuration”. This explanation uses very clear, common phrases to explain the issue: internet service provider, internal network, internet, route. It contains no overtly technical information, and it also provides methods for resolution. Ironically, I have very rarely seen an example of clear communication from most technical staff. Usually, a manager comes along with a question like “why is the network down?”, and the following answers are given:

  • “The router is broken, I’m fixing it.”
  • “One of our vendors misconfigured their BGP and has taken down our network.”
  • “We’re receiving a strange route from our extranet.”

The first response lacks any clarifying information and doesn’t provide any additional context to the question. The last two cases are so technical that without domain-specific knowledge, anyone who is not on your team including those people in other technical teams will not understand it. This example was adapted from the service outage review conducted by Cloudflare for their outage on the 17th of July 2020. I’m a customer with Cloudflare and the way they clearly communicated their understanding of the issue, steps they took to resolve, and post-mortem of the issue gave me the confidence to continue being their customer. If they had responded with “the service is out, we’re looking into it”, I would have moved to a better provider. This is usually what happens when these simplistic, pointless updates are given because people lose trust in a service provider to actually do their job.

Hot tip: customers already know the service is out, they don’t need reassurance that the outage is occurring, they need assurance the service is being fixed.

These pointless updates are usually caused by technical teams who provide little-to-no-context to product managers and external communications. In turn, these teams do the same for customers. Conversely, it is possible to be “too communicative”, whereby you notify customers of outages to individual infrastructure that is redundant or will not impact the customer in a material way.

There are a few keys to communicating with your customers that will ensure that they remain customers:

  • Honesty: if you are honest with customers, they will trust you.
  • Clarity: if you are clear with your customers, they will understand you.
  • Teamwork: if your company works together during an outage, your customers will be happier. You can only work together with appropriate communication.
  • Timeliness: Customers need information so that they can resolve downstream issues with their services, the quicker you get information to them, the faster you allow them to limit the impact to their customers.

These four behaviours ultimately result in a better experience overall, it develops and enhances the relationship you have with your customers. It changes the dynamic from “us vs. them” to “we are in this together”.

Using different methods to communicate effectively

Now that we understand what we need to communicate, we need to know how we can use that  to convey that information to those who need it. During an outage, there are four methods of communication, each building on the last, to provide each audience with the appropriate information they need to do their jobs.

Direct Communication

Whether this is in-person, via chat, or conference, this communication happens directly between the people “on the ground” fixing the issues. This should be highly technical to give technical staff the information they need to fix technical problems.

War Room

A War Room is a place where technical and non-technical staff come together to provide updates and discuss a critical outage. Generally, updates should only be provided when the status of an incident changes (e.g. the cause is discovered or service restoration is beginning). An Incident Commander (IC) should also provide non-technical updates in the incident notes to ensure that appropriate communications can be drafted for all stakeholders.

Status Page

A public Status Page should be made available to all customers and potential customers of your service. This is vital as it ensures that your service is fully transparent, and it also provides a central place for your customers to find information in the case of an outage. The information on here should be non-technical in nature and provide customers with the information they need to make critical decisions regarding their own services.

Postmortem

Within 48 hours of an incident being resolved, you should provide a full postmortem of the incident on your blog and/or status page. This provides your customers with a full understanding of why an incident has occurred, how it was resolved, and actions that can be taken to limit similar outages in the future. This is a blend of non-technical and technical, with a business summary at the start followed by a technical analysis. Customers should also be invited to ask questions about the incident on social media channels to ensure that any concerns they have, are addressed.

Using Squadcast to keep your customers in the loop

Squadcast has many features that can enable you to keep your customers engaged and informed during an outage. Implementing these into your incident management process is very simple, and integrate into your existing Squadcast usage.

Incident Notes (previously War Room) is an excellent tool for keeping everyone up-to-date. Use this effectively to drive inter-team communication by having an Incident Commander (IC) who can translate between technical and non-technical staff. Ensure that all communications are addressed to teams or individuals so that nothing is missed. Finally, be sure to send your account managers, support staff, and management to War Rooms for updates; your IC should be providing non-technical updates as the status of your incident changes.

StatusPage is Squadcast’s tool for providing public updates to your customers. StatusPage allows you to provide updates from within Squadcast’s Incident Page, reducing the need for your team to jump between tools to provide customer updates. Users can simply select the option to Update the StatusPage, provide a status and message for the incident and publish it to customers. Having such an easily accessible solution for support staff means that communication processes can be augmented without adding burden or extra work. It’s all conveniently located in one central place.

The Incidents Page should be your one-stop-shop for all information pertaining to an incident. Your post-mortem should derive all of its information from this page, and staff should be encouraged to ensure that technical and non-technical updates are adequately managed within an incident. By doing this, technical staff can be easily removed from the external communications process (which they probably find boring) and communications staff know they can rely on the information they can obtain via the incident.

Start your journey today

Making your organisation more transparent is not always an easy process, but using some of the tips and tools we’ve provided in this article will give you an idea on how to begin. The core message is that you need to make communication a cultural pillar for your organisation. Don’t just write a procedure that says that “staff should communicate with each other”, encourage communication in every part of your organisation. When outages occur, get everyone on a phone call. Have your communications teams sit with technical staff to understand how the business runs. Encourage customers to follow up with your team for information following an outage. There are many things you can do to get started, but the most important thing is that you do something!

Written By:
August 12, 2020
Adam Hammond
Adam Hammond
August 12, 2020
SRE
Best Practices
Incident Management
Share this blog:
In This Article:
Get reliability insights delivered straight to your inbox.
Get ready for the good stuff! No spam, no data sale and no promotion. Just the awesome content you signed up for.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Get reliability insights delivered straight to your inbox.
Get ready for the good stuff! No spam, no data sale and no promotion. Just the awesome content you signed up for.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Get the latest scoop on Reliability insights. Delivered straight to your inbox.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2 Users love Squadcast on G2
Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2 Users love Squadcast on G2
Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2
Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2
Users love Squadcast on G2
Copyright © Squadcast Inc. 2017-2024

Keeping your teams and customers in the loop during downtime

Aug 12, 2020
Last Updated:
November 20, 2024
Share this post:
Keeping your teams and customers in the loop during downtime

Making your organization more transparent is not always an easy process. In our latest blog post, Adam Hammond, shares some tips and tools that can help you get started when it comes to keeping your teams and customers in the loop during downtime.The core message is that you need to make communication a cultural pillar of your organization.

Table of Contents:

    Communication is key. This is true in all aspects of life, and it especially applies to managing critical incidents in your company. Managing communication effectively with your customers can ensure good-will is maintained and they continue to use your product; alternately, failing to keep your customers informed can result in a loss of business and angry customers. Building and maintaining good communication channels within your company and with your customers is key to ensuring your product continues to be patronised. Done properly, when (inevitable) outages occur, the impact (both technologically and emotionally) is limited.

    In a technical sense, proper communication channels help technical teams who may be unaware of how each other operate can work together efficiently to action and resolve problems quickly. In an emotional sense, business teams, management, and customers will be happy to know that their data and technology is in the hands of people who know what they’re doing. Levels of comfort among these stakeholders are enhanced by being included in the process, having their concerns proactively acknowledged, and, being treated as equals by technical teams.

    Why audience-appropriate communication is important

    Many “techies” enjoy communicating using complex, domain-specific language when managing their services, and this usually never poses a problem in the day-to-day of their jobs. However, when a service-impacting incident is underway, it’s not just you and your team members who are fixing it: everyone is. In the same way spectators at a football game cheer on their home team, your managers and customers are there to provide you with the support you need to resolve issues. But when you speak to them in complex language that is difficult to understand, you unintentionally gate-keep which prevents those same people from assisting you because they just don’t understand what you’re saying. In the same way, product managers and marketing have a tendency to “sanitise” public communication to customers; by the time information makes its way into their inboxes it’s functionally useless.

    Consider the following: “our internet service provider has published incorrect routing information, which means that everything on our internal network does not know how to reach the internet. We can fix it by temporarily overriding the incorrect routing information, but our ISP will need to correct their configuration”. This explanation uses very clear, common phrases to explain the issue: internet service provider, internal network, internet, route. It contains no overtly technical information, and it also provides methods for resolution. Ironically, I have very rarely seen an example of clear communication from most technical staff. Usually, a manager comes along with a question like “why is the network down?”, and the following answers are given:

    • “The router is broken, I’m fixing it.”
    • “One of our vendors misconfigured their BGP and has taken down our network.”
    • “We’re receiving a strange route from our extranet.”

    The first response lacks any clarifying information and doesn’t provide any additional context to the question. The last two cases are so technical that without domain-specific knowledge, anyone who is not on your team including those people in other technical teams will not understand it. This example was adapted from the service outage review conducted by Cloudflare for their outage on the 17th of July 2020. I’m a customer with Cloudflare and the way they clearly communicated their understanding of the issue, steps they took to resolve, and post-mortem of the issue gave me the confidence to continue being their customer. If they had responded with “the service is out, we’re looking into it”, I would have moved to a better provider. This is usually what happens when these simplistic, pointless updates are given because people lose trust in a service provider to actually do their job.

    Hot tip: customers already know the service is out, they don’t need reassurance that the outage is occurring, they need assurance the service is being fixed.

    These pointless updates are usually caused by technical teams who provide little-to-no-context to product managers and external communications. In turn, these teams do the same for customers. Conversely, it is possible to be “too communicative”, whereby you notify customers of outages to individual infrastructure that is redundant or will not impact the customer in a material way.

    There are a few keys to communicating with your customers that will ensure that they remain customers:

    • Honesty: if you are honest with customers, they will trust you.
    • Clarity: if you are clear with your customers, they will understand you.
    • Teamwork: if your company works together during an outage, your customers will be happier. You can only work together with appropriate communication.
    • Timeliness: Customers need information so that they can resolve downstream issues with their services, the quicker you get information to them, the faster you allow them to limit the impact to their customers.

    These four behaviours ultimately result in a better experience overall, it develops and enhances the relationship you have with your customers. It changes the dynamic from “us vs. them” to “we are in this together”.

    Using different methods to communicate effectively

    Now that we understand what we need to communicate, we need to know how we can use that  to convey that information to those who need it. During an outage, there are four methods of communication, each building on the last, to provide each audience with the appropriate information they need to do their jobs.

    Direct Communication

    Whether this is in-person, via chat, or conference, this communication happens directly between the people “on the ground” fixing the issues. This should be highly technical to give technical staff the information they need to fix technical problems.

    War Room

    A War Room is a place where technical and non-technical staff come together to provide updates and discuss a critical outage. Generally, updates should only be provided when the status of an incident changes (e.g. the cause is discovered or service restoration is beginning). An Incident Commander (IC) should also provide non-technical updates in the incident notes to ensure that appropriate communications can be drafted for all stakeholders.

    Status Page

    A public Status Page should be made available to all customers and potential customers of your service. This is vital as it ensures that your service is fully transparent, and it also provides a central place for your customers to find information in the case of an outage. The information on here should be non-technical in nature and provide customers with the information they need to make critical decisions regarding their own services.

    Postmortem

    Within 48 hours of an incident being resolved, you should provide a full postmortem of the incident on your blog and/or status page. This provides your customers with a full understanding of why an incident has occurred, how it was resolved, and actions that can be taken to limit similar outages in the future. This is a blend of non-technical and technical, with a business summary at the start followed by a technical analysis. Customers should also be invited to ask questions about the incident on social media channels to ensure that any concerns they have, are addressed.

    Using Squadcast to keep your customers in the loop

    Squadcast has many features that can enable you to keep your customers engaged and informed during an outage. Implementing these into your incident management process is very simple, and integrate into your existing Squadcast usage.

    Incident Notes (previously War Room) is an excellent tool for keeping everyone up-to-date. Use this effectively to drive inter-team communication by having an Incident Commander (IC) who can translate between technical and non-technical staff. Ensure that all communications are addressed to teams or individuals so that nothing is missed. Finally, be sure to send your account managers, support staff, and management to War Rooms for updates; your IC should be providing non-technical updates as the status of your incident changes.

    StatusPage is Squadcast’s tool for providing public updates to your customers. StatusPage allows you to provide updates from within Squadcast’s Incident Page, reducing the need for your team to jump between tools to provide customer updates. Users can simply select the option to Update the StatusPage, provide a status and message for the incident and publish it to customers. Having such an easily accessible solution for support staff means that communication processes can be augmented without adding burden or extra work. It’s all conveniently located in one central place.

    The Incidents Page should be your one-stop-shop for all information pertaining to an incident. Your post-mortem should derive all of its information from this page, and staff should be encouraged to ensure that technical and non-technical updates are adequately managed within an incident. By doing this, technical staff can be easily removed from the external communications process (which they probably find boring) and communications staff know they can rely on the information they can obtain via the incident.

    Start your journey today

    Making your organisation more transparent is not always an easy process, but using some of the tips and tools we’ve provided in this article will give you an idea on how to begin. The core message is that you need to make communication a cultural pillar for your organisation. Don’t just write a procedure that says that “staff should communicate with each other”, encourage communication in every part of your organisation. When outages occur, get everyone on a phone call. Have your communications teams sit with technical staff to understand how the business runs. Encourage customers to follow up with your team for information following an outage. There are many things you can do to get started, but the most important thing is that you do something!

    What you should do now
    • Schedule a demo with Squadcast to learn about the platform, answer your questions, and evaluate if Squadcast is the right fit for you.
    • Curious about how Squadcast can assist you in implementing SRE best practices? Discover the platform's capabilities through our Interactive Demo.
    • Enjoyed the article? Explore further insights on the best SRE practices.
    • Schedule a demo with Squadcast to learn about the platform, answer your questions, and evaluate if Squadcast is the right fit for you.
    • Curious about how Squadcast can assist you in implementing SRE best practices? Discover the platform's capabilities through our Interactive Demo.
    • Enjoyed the article? Explore further insights on the best SRE practices.
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management.
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    What you should do now?
    Here are 3 ways you can continue your journey to learn more about Unified Incident Management
    Discover the platform's capabilities through our Interactive Demo.
    See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management.
    Share the article
    Share this blog post on Facebook, Twitter, Reddit or LinkedIn.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Compare our plans and find the perfect fit for your business.
    See Redis' Journey to Efficient Incident Management through alert noise reduction With Squadcast.
    Discover the platform's capabilities through our Interactive Demo.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Compare Squadcast & PagerDuty / Opsgenie
    Compare and see if Squadcast is the right fit for your needs.
    Compare our plans and find the perfect fit for your business.
    Learn how Scoro created a solid foundation for better on-call practices with Squadcast.
    Discover the platform's capabilities through our Interactive Demo.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Learn how Scoro created a solid foundation for better on-call practices with Squadcast.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Discover the platform's capabilities through our Interactive Demo.
    Enjoyed the article? Explore further insights on the best SRE practices.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Enjoyed the article? Explore further insights on the best SRE practices.
    Written By:
    August 12, 2020
    August 12, 2020
    Share this post:
    Subscribe to our LinkedIn Newsletter to receive more educational content
    Subscribe now
    ant-design-linkedIN

    Subscribe to our latest updates

    Enter your Email Id
    Thank you! Your submission has been received!
    Oops! Something went wrong while submitting the form.
    FAQs
    More from
    Adam Hammond
    Error Budgets and their Dependencies
    Error Budgets and their Dependencies
    February 3, 2021
    How small changes to your SLOs can be SMART for your business - A narrative case study
    How small changes to your SLOs can be SMART for your business - A narrative case study
    November 17, 2020
    Choosing SLOs that users need, not the ones you want to provide
    Choosing SLOs that users need, not the ones you want to provide
    October 1, 2020
    Learn how organizations are using Squadcast
    to maintain and improve upon their Reliability metrics
    Learn how organizations are using Squadcast to maintain and improve upon their Reliability metrics
    mapgears
    "Mapgears simplified their complex On-call Alerting process with Squadcast.
    Squadcast has helped us aggregate alerts coming in from hundreds...
    bibam
    "Bibam found their best PagerDuty alternative in Squadcast.
    By moving to Squadcast from Pagerduty, we have seen a serious reduction in alert fatigue, allowing us to focus...
    tanner
    "Squadcast helped Tanner gain system insights and boost team productivity.
    Squadcast has integrated seamlessly into our DevOps and on-call team's workflows. Thanks to their reliability...
    Alexandre Lessard
    System Analyst
    Martin do Santos
    Platform and Architecture Tech Lead
    Sandro Franchi
    CTO
    Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2022 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Mid-Market Asia Pacific Incident Management on G2 Users love Squadcast on G2
    Squadcast awarded as "Best Software" in the IT Management category by G2 🎉 Read full report here.
    What our
    customers
    have to say
    mapgears
    "Mapgears simplified their complex On-call Alerting process with Squadcast.
    Squadcast has helped us aggregate alerts coming in from hundreds of services into one single platform. We no longer have hundreds of...
    Alexandre Lessard
    System Analyst
    bibam
    "Bibam found their best PagerDuty alternative in Squadcast.
    By moving to Squadcast from Pagerduty, we have seen a serious reduction in alert fatigue, allowing us to focus...
    Martin do Santos
    Platform and Architecture Tech Lead
    tanner
    "Squadcast helped Tanner gain system insights and boost team productivity.
    Squadcast has integrated seamlessly into our DevOps and on-call team's workflows. Thanks to their reliability metrics we have...
    Sandro Franchi
    CTO
    Revamp your Incident Response.
    Peak Reliability
    Easier, Faster, More Automated with SRE.