📢 Webinar Alert! Reliability Automation - AI, ML, & Workflows in Incident Management. Register Here
Blog
SRE
Mark Henderson from Stack Overflow shares his experience on being an SRE

Mark Henderson from Stack Overflow shares his experience on being an SRE

July 11, 2019
Mark Henderson from Stack Overflow shares his experience on being an SRE
In This Article:
Our Products
On-Call Management
Incident Response
Continuous Learning
Workflow Automation

How did you become an SRE?

I started off in Australia with a typical IT career: doing retail, call center and help desk work at a variety of companies, including Cisco. When I graduated from university, I told my current employer that I was planning on leaving and getting a “real” job. They referred me to the person that designed their help desk software and I worked for him for around 9 years. I started by doing application development and eventually building out a small datacenter. I was the sole employee when I started, but just one of a larger team when I left.

I moved to New York City to join Stack Overflow in 2015, which was my first job that had “SRE” in the title. I went from being the sole systems administrator in a very small company to being a part of one of the most efficient SRE teams in the world. I got to learn from some of the best in the industry: George Beech, Tom Limoncelli, Kyle Brandt, Nick Craver, and many others. I’ve worked on virtually every part of Stack Overflow - the public infrastructure that serves over 40 million developers with less than one rack of hardware. The logging infrastructure that ingests and analyses over half a terabyte of logs every day. The CI/CD pipelines that keep Stack Overflow updated and in check.

Currently, I work on the Azure infrastructure and tooling for Stack Overflow Enterprise which is a totally private version of Stack Overflow that we can run for you for your proprietary code questions that you can’t ask on the internet.

What's the most challenging part of your job?

SREs love to work on SLAs, SLO, monitoring, and metrics. Measure everything is one of the tenets of SRE work - but it’s just one. It’s hard to get out of the mindset of just measuring everything and starting to look at the other things SREs should be doing, such as working on reducing organisational silos. It’s very easy to fall into the trap of just working on moving easily monitored metrics (such as a latency budget) instead of the intangible metrics (increasing cross-team collaboration).

What process, tools and techniques you can't live without?

There is no single tool that I can’t live without. Ask 5 SREs what their toolsets are and you’ll get 5 different answers. The fact is that the tools we use are secondary to the goals we’re trying to achieve. However, one thing that is not negotiable to me is having a quiet space to work. Right now I work from home - which for me is wonderful. But even when I worked in an office, having a private office with a door that closes is worth everything. Having a private space means not having to fight against the cacophony of an open-plan office or the dull drabness of a cubicle. Either working from home or an office with a door that closes are non-negotiables for me. Particularly in western culture, we need to get out of the perspective of thinking “Private office == more status or higher rank” and more into “Private office == ability to focus” and giving people access to the work environment that makes them the most productive.

Any productivity hacks that you would give to new SREs?

Use your calendar to its full potential. This isn’t really an SRE-specific hack, just generally good life advice. It makes scheduling things so much easier with your coworkers. Don’t be afraid to schedule a meeting with yourself on your calendar to give yourself some actual work time if you start to get overwhelmed with meetings.

Put your personal items on the calendar too - even which recycling bin goes out onto the street on which day (mark them as private so your coworkers can’t see the details if you wish). If you have coworkers in other time zones, add additional timezones to your calendar so you can see at a glance what time it is there.

To give some actual SRE advice: validate that your SLOs are meaningful. If you find out that your SLO was pulled out of thin air, then perhaps your error budgets or latency budgets are needlessly strict. Find out what they should actually be and you might find out that your job becomes a lot easier.

What are some of the things people get wrong about this role?

SRE should not be a silo on its own. SRE is not just a drop-in replacement for traditional systems administration. It is not a replacement for DevOps. SRE should be compassing the pillars of DevOps by sharing responsibilities with developers, working in small batches, not placing blame.

If you have an existing systems administration team, you can’t just rebrand them as “SRE”, hand them a copy of the Google SRE book and let them loose. Migrating from traditional systems administration to the SRE mindset requires organisational change. Doing SRE successfully means that your SREs need to start breaking down walls with the development teams that they are supporting. Working with the devs, and just as importantly have the devs work with the SREs. It’s not an overnight change, and although there are many wrong ways to do SRE there is no one right way.

Secondly, you do not need to be an amazing developer to do SRE. A large part of SRE is automation - but that does not mean you have to write everything from scratch yourself. If you understand how to read a JSON file, you can write a Terraform configuration. If you understand how to run write a Powershell cmdlet then you can implement Octopus Deploy. You have to be willing to do some coding, but you do not need to be a world-class developer.

Is there any book, video, talk, or tech that has inspired you lately, and why?

It sounds like nepotism but I really enjoy the talk given by Tom Limoncelli (who happens to be my manager) on “DevOps Where You Wouldn't Have Expected”. He uses some critical thinking regarding the DevOps principles that we use so often in SRE and applies those principles to places outside of systems administration and SRE - such as new employee onboarding.

Of course, there’s always the venerable The Phoenix Project. It really is a must-read for anyone working in SRE.

On a more recent front “Retrospectives for Humans (a Crash Course)” by Courtney Eckhardt was probably my favourite talk at SRECon 2019 Asia/Pacific (the most recent conference I attended). She gives an excellent perspective on retrospective/post-mortem analysis by comparing them to a real-world disaster investigation and the language that was used to describe the cause of the accident. But because of the nature of the English language, the real cause of the disaster was never actually found. By being aware of the faults in our tools (in this case study, the tool was the English language) we can reach much more useful conclusions.

Follow the journey of more such inspiring SREs from around the globe through our SRE Speak Series.

Written By:
Prakya Vasudevan
Bipin VK
Prakya Vasudevan
Bipin VK
July 11, 2019
SRE
SRE Speak
Share this blog:
In This Article:
Get reliability insights delivered straight to your inbox.
Get ready for the good stuff! No spam, no data sale and no promotion. Just the awesome content you signed up for.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Get reliability insights delivered straight to your inbox.
Get ready for the good stuff! No spam, no data sale and no promotion. Just the awesome content you signed up for.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Get the latest scoop on Reliability insights. Delivered straight to your inbox.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2 Users love Squadcast on G2
Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2 Users love Squadcast on G2
Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2
Best IT Management Products 2024 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2
Users love Squadcast on G2
Copyright © Squadcast Inc. 2017-2024

Mark Henderson from Stack Overflow shares his experience on being an SRE

Jul 11, 2019
Last Updated:
November 20, 2024
Share this post:
Mark Henderson from Stack Overflow shares his experience on being an SRE

Mark Henderson has been a Site Reliability Engineer at Stack Overflow since 2015. Before this he worked as the sole systems administrator at a small software company in Sydney, Australia. These days, he lives in South Australia and works from home with his wife and two children.

Table of Contents:

    How did you become an SRE?

    I started off in Australia with a typical IT career: doing retail, call center and help desk work at a variety of companies, including Cisco. When I graduated from university, I told my current employer that I was planning on leaving and getting a “real” job. They referred me to the person that designed their help desk software and I worked for him for around 9 years. I started by doing application development and eventually building out a small datacenter. I was the sole employee when I started, but just one of a larger team when I left.

    I moved to New York City to join Stack Overflow in 2015, which was my first job that had “SRE” in the title. I went from being the sole systems administrator in a very small company to being a part of one of the most efficient SRE teams in the world. I got to learn from some of the best in the industry: George Beech, Tom Limoncelli, Kyle Brandt, Nick Craver, and many others. I’ve worked on virtually every part of Stack Overflow - the public infrastructure that serves over 40 million developers with less than one rack of hardware. The logging infrastructure that ingests and analyses over half a terabyte of logs every day. The CI/CD pipelines that keep Stack Overflow updated and in check.

    Currently, I work on the Azure infrastructure and tooling for Stack Overflow Enterprise which is a totally private version of Stack Overflow that we can run for you for your proprietary code questions that you can’t ask on the internet.

    What's the most challenging part of your job?

    SREs love to work on SLAs, SLO, monitoring, and metrics. Measure everything is one of the tenets of SRE work - but it’s just one. It’s hard to get out of the mindset of just measuring everything and starting to look at the other things SREs should be doing, such as working on reducing organisational silos. It’s very easy to fall into the trap of just working on moving easily monitored metrics (such as a latency budget) instead of the intangible metrics (increasing cross-team collaboration).

    What process, tools and techniques you can't live without?

    There is no single tool that I can’t live without. Ask 5 SREs what their toolsets are and you’ll get 5 different answers. The fact is that the tools we use are secondary to the goals we’re trying to achieve. However, one thing that is not negotiable to me is having a quiet space to work. Right now I work from home - which for me is wonderful. But even when I worked in an office, having a private office with a door that closes is worth everything. Having a private space means not having to fight against the cacophony of an open-plan office or the dull drabness of a cubicle. Either working from home or an office with a door that closes are non-negotiables for me. Particularly in western culture, we need to get out of the perspective of thinking “Private office == more status or higher rank” and more into “Private office == ability to focus” and giving people access to the work environment that makes them the most productive.

    Any productivity hacks that you would give to new SREs?

    Use your calendar to its full potential. This isn’t really an SRE-specific hack, just generally good life advice. It makes scheduling things so much easier with your coworkers. Don’t be afraid to schedule a meeting with yourself on your calendar to give yourself some actual work time if you start to get overwhelmed with meetings.

    Put your personal items on the calendar too - even which recycling bin goes out onto the street on which day (mark them as private so your coworkers can’t see the details if you wish). If you have coworkers in other time zones, add additional timezones to your calendar so you can see at a glance what time it is there.

    To give some actual SRE advice: validate that your SLOs are meaningful. If you find out that your SLO was pulled out of thin air, then perhaps your error budgets or latency budgets are needlessly strict. Find out what they should actually be and you might find out that your job becomes a lot easier.

    What are some of the things people get wrong about this role?

    SRE should not be a silo on its own. SRE is not just a drop-in replacement for traditional systems administration. It is not a replacement for DevOps. SRE should be compassing the pillars of DevOps by sharing responsibilities with developers, working in small batches, not placing blame.

    If you have an existing systems administration team, you can’t just rebrand them as “SRE”, hand them a copy of the Google SRE book and let them loose. Migrating from traditional systems administration to the SRE mindset requires organisational change. Doing SRE successfully means that your SREs need to start breaking down walls with the development teams that they are supporting. Working with the devs, and just as importantly have the devs work with the SREs. It’s not an overnight change, and although there are many wrong ways to do SRE there is no one right way.

    Secondly, you do not need to be an amazing developer to do SRE. A large part of SRE is automation - but that does not mean you have to write everything from scratch yourself. If you understand how to read a JSON file, you can write a Terraform configuration. If you understand how to run write a Powershell cmdlet then you can implement Octopus Deploy. You have to be willing to do some coding, but you do not need to be a world-class developer.

    Is there any book, video, talk, or tech that has inspired you lately, and why?

    It sounds like nepotism but I really enjoy the talk given by Tom Limoncelli (who happens to be my manager) on “DevOps Where You Wouldn't Have Expected”. He uses some critical thinking regarding the DevOps principles that we use so often in SRE and applies those principles to places outside of systems administration and SRE - such as new employee onboarding.

    Of course, there’s always the venerable The Phoenix Project. It really is a must-read for anyone working in SRE.

    On a more recent front “Retrospectives for Humans (a Crash Course)” by Courtney Eckhardt was probably my favourite talk at SRECon 2019 Asia/Pacific (the most recent conference I attended). She gives an excellent perspective on retrospective/post-mortem analysis by comparing them to a real-world disaster investigation and the language that was used to describe the cause of the accident. But because of the nature of the English language, the real cause of the disaster was never actually found. By being aware of the faults in our tools (in this case study, the tool was the English language) we can reach much more useful conclusions.

    Follow the journey of more such inspiring SREs from around the globe through our SRE Speak Series.

    What you should do now
    • Schedule a demo with Squadcast to learn about the platform, answer your questions, and evaluate if Squadcast is the right fit for you.
    • Curious about how Squadcast can assist you in implementing SRE best practices? Discover the platform's capabilities through our Interactive Demo.
    • Enjoyed the article? Explore further insights on the best SRE practices.
    • Schedule a demo with Squadcast to learn about the platform, answer your questions, and evaluate if Squadcast is the right fit for you.
    • Curious about how Squadcast can assist you in implementing SRE best practices? Discover the platform's capabilities through our Interactive Demo.
    • Enjoyed the article? Explore further insights on the best SRE practices.
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management.
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    What you should do now?
    Here are 3 ways you can continue your journey to learn more about Unified Incident Management
    Discover the platform's capabilities through our Interactive Demo.
    See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management.
    Share the article
    Share this blog post on Facebook, Twitter, Reddit or LinkedIn.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Compare our plans and find the perfect fit for your business.
    See Redis' Journey to Efficient Incident Management through alert noise reduction With Squadcast.
    Discover the platform's capabilities through our Interactive Demo.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Compare Squadcast & PagerDuty / Opsgenie
    Compare and see if Squadcast is the right fit for your needs.
    Compare our plans and find the perfect fit for your business.
    Learn how Scoro created a solid foundation for better on-call practices with Squadcast.
    Discover the platform's capabilities through our Interactive Demo.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Learn how Scoro created a solid foundation for better on-call practices with Squadcast.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Discover the platform's capabilities through our Interactive Demo.
    Enjoyed the article? Explore further insights on the best SRE practices.
    We’ll show you how Squadcast works and help you figure out if Squadcast is the right fit for you.
    Experience the benefits of Squadcast's Incident Management and On-Call solutions firsthand.
    Enjoyed the article? Explore further insights on the best SRE practices.
    Written By:
    Share this post:
    Subscribe to our LinkedIn Newsletter to receive more educational content
    Subscribe now
    ant-design-linkedIN

    Subscribe to our latest updates

    Enter your Email Id
    Thank you! Your submission has been received!
    Oops! Something went wrong while submitting the form.
    FAQs
    More from
    Prakya Vasudevan
    On-call On-boarding Checklist
    On-call On-boarding Checklist
    May 20, 2020
    Best Practices in Incident Management
    Best Practices in Incident Management
    May 7, 2020
    Configure an Intuitive Service Dashboard & Reduce Response Time
    Configure an Intuitive Service Dashboard & Reduce Response Time
    April 30, 2020
    Learn how organizations are using Squadcast
    to maintain and improve upon their Reliability metrics
    Learn how organizations are using Squadcast to maintain and improve upon their Reliability metrics
    mapgears
    "Mapgears simplified their complex On-call Alerting process with Squadcast.
    Squadcast has helped us aggregate alerts coming in from hundreds...
    bibam
    "Bibam found their best PagerDuty alternative in Squadcast.
    By moving to Squadcast from Pagerduty, we have seen a serious reduction in alert fatigue, allowing us to focus...
    tanner
    "Squadcast helped Tanner gain system insights and boost team productivity.
    Squadcast has integrated seamlessly into our DevOps and on-call team's workflows. Thanks to their reliability...
    Alexandre Lessard
    System Analyst
    Martin do Santos
    Platform and Architecture Tech Lead
    Sandro Franchi
    CTO
    Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2022 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Mid-Market Asia Pacific Incident Management on G2 Users love Squadcast on G2
    Squadcast awarded as "Best Software" in the IT Management category by G2 🎉 Read full report here.
    What our
    customers
    have to say
    mapgears
    "Mapgears simplified their complex On-call Alerting process with Squadcast.
    Squadcast has helped us aggregate alerts coming in from hundreds of services into one single platform. We no longer have hundreds of...
    Alexandre Lessard
    System Analyst
    bibam
    "Bibam found their best PagerDuty alternative in Squadcast.
    By moving to Squadcast from Pagerduty, we have seen a serious reduction in alert fatigue, allowing us to focus...
    Martin do Santos
    Platform and Architecture Tech Lead
    tanner
    "Squadcast helped Tanner gain system insights and boost team productivity.
    Squadcast has integrated seamlessly into our DevOps and on-call team's workflows. Thanks to their reliability metrics we have...
    Sandro Franchi
    CTO
    Revamp your Incident Response.
    Peak Reliability
    Easier, Faster, More Automated with SRE.