Blog
SRE Speak
Hrushikesh shares his journey into SRE and his thoughts on the future of this space

Hrushikesh shares his journey into SRE and his thoughts on the future of this space

March 5, 2020
Hrushikesh shares his journey into SRE and his thoughts on the future of this space
In This Article:
Our Products
On-Call Management
Incident Response
Continuous Learning
Workflow Automation

1. How did you become an SRE?

When I joined Vuclip in 2015, I was involved with a project that worked on designing a completely new platform for the product. During this time, I got more involved with automation, infrastructure planning and monitoring. This is when I felt that I was not just working with two teams but with two completely different mindsets. 

I was later approached by the Head of Operations to join his team. He was on a mission to implement SRE as a culture within the organization. This was when I was first introduced to the term SRE. 

We then worked together to take small and incremental steps towards building an SRE culture by implementing monitoring tools, finding and automating some tasks and templatizing services among others. 

2. What's the most challenging part of your job? 

The most challenging part of SRE is to get people to understand that there is an issue with the way things are today in the ops world. This cultural shift can happen only when they understand the power of automation and how we can make processes more reliable with the same.

3. What process, tools and techniques you can't live without?

If you are planning to move into the SRE space, some things to keep in mind would be

  • To segregate between mission-critical and value-added functionalities. 
  • To validate all the services which are mission-critical and raise the clear risk items to the  product/ project manager and their respective engineering teams.
  • To define realistic uptime goals and map out potential risks by collaborating with the product / project managers and engineering teams.

4. What according to you is the future of SRE?

According to me, SRE is solving a lot of issues around traditional operations thinking that companies face such as: 

Addressing the silos between development and operations: Business and product teams think that reliability is the responsibility of the developers and operations. Developers think that reliability is the responsibility of the operations team. Operations team thinks that the developers should also be responsible for the reliability of the systems they build. There’s a lot of hassle around just making changes to the system when its reliability is at stake. 

SRE as a culture eliminates this problem by enabling the product and engineering team with tools and techniques and putting reliability as part of the product requirements. When reliability is part of product requirements, engineers take more responsibility to code the ship.

Usually, in traditional operations, downtime, reliability and quality of service are completely based on assumptions. SRE helps quantify uptime and provides a process to link it with feature releases. Solving these issues will lead to reducing downtime and helping organizations look at scaling their businesses instead of just maintaining it. 

Clear ownership, reliability and the ability to be more process driven are few reasons I see SRE catching up even with smaller business and startups. 

5. Any productivity hacks that you would give to new SREs?

Segregate the functionalities of your product into minimum viable products (MVP). Also make sure you have fallbacks for any functionalities you can’t do without. 

Make sure you have tight SLOs (99.99% uptime) for services that your MVPs are dependent on, to ensure that your SLAs are not breached. 

Designing fallbacks should be an integral part of designing and developing your service. 

For your value added services, the flow should be separated from the MVP flows. Like, site load metrics and logging services should be  completely isolated from the main application to ensure that any potential downtime with logging service should not impact site load.

6. What are some of the things people get wrong about this role?

There is a common notion that if you know how to automate things, you are automatically an SRE. Automation as a skill is not limited to just SREs.

Automation as an approach to solving technical problems has been popular with many engineering and DevOps teams. Site Reliability best practices defines a methodology for most of these actions and uses it as a way to avoid linear increase in operations as systems scale with business growth. And this can be learnt. Anyone can be an SRE with a mindset to create scalable systems reliably. 

The other myth in this space is that only the SRE team is responsible for the uptime of a product or service. However, site reliability engineering provides a way to collaborate with product and development teams and ensure that reliability is kept in mind from the start of the design and development phase.

Follow the journey of more such inspiring SREs from around the globe through our SRE Speak Series.

Written By:
March 5, 2020
Prakya Vasudevan
Prakya Vasudevan
March 5, 2020
SRE Speak
SRE
Share this blog:
In This Article:
Get reliability insights delivered straight to your inbox.
Get ready for the good stuff! No spam, no data sale and no promotion. Just the awesome content you signed up for.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Get reliability insights delivered straight to your inbox.
Get ready for the good stuff! No spam, no data sale and no promotion. Just the awesome content you signed up for.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Get the latest scoop on Reliability insights. Delivered straight to your inbox.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Learn how organizations are using Squadcast
to maintain and improve upon their Reliability metrics
Learn how organizations are using Squadcast to maintain and improve upon their Reliability metrics
mapgears
"Mapgears simplified their complex On-call Alerting process with Squadcast.
Squadcast has helped us aggregate alerts coming in from hundreds...
bibam
"Bibam found their best PagerDuty alternative in Squadcast.
By moving to Squadcast from Pagerduty, we have seen a serious reduction in alert fatigue, allowing us to focus...
tanner
"Squadcast helped Tanner gain system insights and boost team productivity.
Squadcast has integrated seamlessly into our DevOps and on-call team's workflows. Thanks to their reliability...
Alexandre Lessard
System Analyst
Martin do Santos
Platform and Architecture Tech Lead
Sandro Franchi
CTO
Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2022 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Mid-Market Asia Pacific Incident Management on G2 Users love Squadcast on G2
Squadcast awarded as "Best Software" in the IT Management category by G2 🎉 Read full report here.
What our
customers
have to say
mapgears
"Mapgears simplified their complex On-call Alerting process with Squadcast.
Squadcast has helped us aggregate alerts coming in from hundreds of services into one single platform. We no longer have hundreds of...
Alexandre Lessard
System Analyst
bibam
"Bibam found their best PagerDuty alternative in Squadcast.
By moving to Squadcast from Pagerduty, we have seen a serious reduction in alert fatigue, allowing us to focus...
Martin do Santos
Platform and Architecture Tech Lead
tanner
"Squadcast helped Tanner gain system insights and boost team productivity.
Squadcast has integrated seamlessly into our DevOps and on-call team's workflows. Thanks to their reliability metrics we have...
Sandro Franchi
CTO
Revamp your Incident Response.
Peak Reliability
Easier, Faster, More Automated with SRE.