Blog
SRE Speak
Nishant Singh shares his thoughts on being an SRE

Nishant Singh shares his thoughts on being an SRE

August 5, 2020
Nishant Singh shares his thoughts on being an SRE
In This Article:
Our Products
On-Call Management
Incident Response
Continuous Learning
Workflow Automation

How did you become an SRE?

Since my early school days, I developed a keen interest in computers. The first language that I learned while in school was C++ which fascinated me to explore the fields of computers. Initially, I started to fiddle with my Windows PC using C++/C and then got a taste of shell scripting, which took me to the world of basic computer hacking and networking. I started with writing exploits and logic bombs, which mostly ended up crashing my system. I developed a taste for backend and distributed systems during my initial days at college. Cut to a few years later, I got my first job - an internship with a security company that dealt with AWS and a multi-tenant system for its customers. My boss’s boss who interviewed me had revolutionary ideas for the company as a whole and I was at the center of it all. I basically saw the whole shift towards the DevOps mindset in this organization and understood how essential it was for a company to keep technology aligned with the business. As I put in the time to learn more about the craft, I learned about SRE practices from Google’s way of running their production system. All of this played a huge part in my zeal towards finding the right people and place to work with, which led me to LinkedIn.

What's the most challenging part of your job?

The technology world is vast and SRE makes up for a large subset. There is an abundance of great problems to solve and the solutions are just as interesting. One can’t assign a set duration to pick up all of this. Today, there is no foolproof course that helps you graduate and feel like a rockstar SRE. It helps a lot to learn on the job and keep grinding towards becoming cloud agnostic and learning more about application development, maintenance and scaling infrastructure over any cloud. Another interesting challenge is constantly making sure that the production environment remains stable. In most cases, even though the SRE is responsible for the service’s reliability, it is the application developers who  own the  actual application logic. The downside of this is that you may miss out on  minor details that change  with every release of the application since you don’t directly contribute to the actual code. Ultimately,  it comes down to the SRE to learn the application and business logic which will then help you pitch ideas  in the design phase of  the application development.

What process, tools, and techniques you can't live without?

Automation is one of the core processes that play a central role in my life. Right now, I am mostly dependent on Python for most things. Apart from that, I spend a good amount of time with Terraform, Ansible, Azure, K8S on the daily grind.

I believe that having  a good monitoring stack backed by an effective logging system is  a huge add-on. Oftentimes, teams do not invest too much in logging systems because of the overhead of maintaining the system from an operational perspective. However, if logging systems are configured correctly, it can help reduce the MTTR quite significantly.

What is your most memorable on-call story as an SRE?

The craziest on-call I was a part of had a misbehaving NIC card that was triggered due to a configuration issue on the top rack switch. This disrupted the service for an entire region.

We then narrowed down the problem to an issue in the network configuration (and no, it’s not the DNS) that caused the application communication issue. The interesting part about this story was that it was never observed during the usual manual debugging. This experience taught me that it can help to think from a machine’s perspective while debugging, however hard it may be.

What according to you is the future of SRE?

I think the role of SRE in the future will get more specialized & streamlined with newer applications and technologies.Few technologies to keep your eyes on are machine learning, neural network, image processing, etc. The future will require the SRE function to adopt more skills than just software engineering and operational knowledge.

The world is moving to a more skill based economy and this would mean that an SRE will be expected to compete with the skills of a domain expert and help them architect their stack more efficiently along with what is already expected today.

Any productivity hacks that you would give to new SREs?

Prioritize your stuff at the beginning of each day. I generally have a list of tasks that I write on a piece of paper to keep as a reminder of things that I need to finish that day. This helps me focus on the important tasks at hand.

The other thing that you must learn is working with multiple displays and organizing your terminal.  Choose a tool like tmux or anything else you prefer especially if you spend most of your time ssh’ing over boxes. Also, stick with an IDE of your choice which you find really fun and effective to use.

What are some of the things people get wrong about this role?

SRE definitely involves writing code and at times a lot of it depends on the team you are a part of. A common misconception is that it’s more focused on operations than actually writing code. A reflection of a good SRE culture involves the right balance of writing code and doing operational work.

At first, for someone coming from a pure operations background, code looks daunting, but I think one should focus on just the logic of solving things rather than looking into how to actually do something in a specific language. Languages come and go, it's the logic underneath that should help you get to solving it in any language you write code in.

What are some of the best practices you’ve picked up along the way?

Some of the best practices I have learned are:

  • Always think first & code that later.
  • Think about edge cases while designing solutions
  • Think about the end-user and understand how your application is consumed
  • Always write simple code and advocate this to your teammates as well. Not every person on the team will be at the same skill level but the one thing you  leave behind, when you move to a different role or company, is the code you write. This can either haunt the other person or let him sleep peacefully. Always make sure you write code to get to the latter.
  • Keep learning as much as you can even if you aren’t certain about picking it up completely. It’s absolutely okay to revisit some abandoned side projects after a year if that helps.
  • Also above all, WRITE everything down! Document as much as you can because it takes a hell lot of effort to maintain services & not everyone on the team will have the same level of confidence with a particular service.

Is there any book, video, talk, or tech that has inspired you lately, and why?

Apart from the regular tech books, one book which I recently read was called the “The Phoenix Project”. The book talks about the journey of a company and the challenges involved in the overall process of building and maintaining the numerous departments from Security, IT, engineering, etc.

The book definitely gives you the reason for a DevOps cultural shift. This is explained by taking you through the fictional working environment with a focus on breaking down silos. For anyone who is new to the role, I definitely recommend reading this book. Another great book is “Clean code” by Robert C Martin aka Uncle Bob, which is highly recommended to learn the basics of writing production-grade clean code.

What according to you makes a good SRE?

I think a good SRE is someone who is systematic in his approach while trying to solve a problem but a great SRE is someone who stays focused while trying to solve problems. It takes time to reach the latter and usually takes years of experience.

The initial years are spent mostly fighting the fires & getting panic attacks. Although, all of these hard experiences will push you to actually develop the skill to do it efficiently.

Along with this, it is important for an SRE to be a team player and push to inspire everyone in the team to do the ‘right’ things even in  tough situations.

Follow the journey of more such inspiring SREs from around the globe through our SRE Speak Series.

Written By:
August 5, 2020
Squadcast Community
Squadcast Community
August 5, 2020
SRE Speak
SRE
Share this blog:
In This Article:
Get reliability insights delivered straight to your inbox.
Get ready for the good stuff! No spam, no data sale and no promotion. Just the awesome content you signed up for.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Get reliability insights delivered straight to your inbox.
Get ready for the good stuff! No spam, no data sale and no promotion. Just the awesome content you signed up for.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Get the latest scoop on Reliability insights. Delivered straight to your inbox.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you wish to unsubscribe, we won't hold it against you. Privacy policy.
Learn how organizations are using Squadcast
to maintain and improve upon their Reliability metrics
Learn how organizations are using Squadcast to maintain and improve upon their Reliability metrics
mapgears
"Mapgears simplified their complex On-call Alerting process with Squadcast.
Squadcast has helped us aggregate alerts coming in from hundreds...
bibam
"Bibam found their best PagerDuty alternative in Squadcast.
By moving to Squadcast from Pagerduty, we have seen a serious reduction in alert fatigue, allowing us to focus...
tanner
"Squadcast helped Tanner gain system insights and boost team productivity.
Squadcast has integrated seamlessly into our DevOps and on-call team's workflows. Thanks to their reliability...
Alexandre Lessard
System Analyst
Martin do Santos
Platform and Architecture Tech Lead
Sandro Franchi
CTO
Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2022 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Mid-Market Asia Pacific Incident Management on G2 Users love Squadcast on G2
Squadcast awarded as "Best Software" in the IT Management category by G2 🎉 Read full report here.
What our
customers
have to say
mapgears
"Mapgears simplified their complex On-call Alerting process with Squadcast.
Squadcast has helped us aggregate alerts coming in from hundreds of services into one single platform. We no longer have hundreds of...
Alexandre Lessard
System Analyst
bibam
"Bibam found their best PagerDuty alternative in Squadcast.
By moving to Squadcast from Pagerduty, we have seen a serious reduction in alert fatigue, allowing us to focus...
Martin do Santos
Platform and Architecture Tech Lead
tanner
"Squadcast helped Tanner gain system insights and boost team productivity.
Squadcast has integrated seamlessly into our DevOps and on-call team's workflows. Thanks to their reliability metrics we have...
Sandro Franchi
CTO
Revamp your Incident Response.
Peak Reliability
Easier, Faster, More Automated with SRE.