1. How did you become an SRE?
I loved computers from when I was young. In 10th grade, I started building PCs and troubleshooting tasks by installing a lot of software and games on both Windows and Linux which led me to like Systems Administration. I discovered my love for troubleshooting systems and maintaining them in college when I managed all of their infrastructures.
As I started my professional journey I was tasked with a lot of automation projects relevant to my role in DevOps. All this experience propelled me into the path of being an SRE.
2. What's the most challenging part of your job?
For me, it's expecting the unexpected. Quoting John Wilkes of Google “Computer components are very reliable but once you have a lot of them they fail all the time”. To maintain that illusion of stability for external users even when you have 1000s of failure domains internally is probably the hardest thing for an SRE to achieve. So when there is a failure, how fast you detect it, building an automated recovery and finding a fix that is resilient is what keeps me busy.
3. What process, tools and techniques you can't live without?
Currently, I rely on ATOM as my IDE and Ansible for pushing configs around. I am always looking for good articles and open source projects on the web and try to run them in my own dev environments and if possible I try contributing to them as well. I have been using Kubernetes a lot lately both at work and in my personal projects, hence it's what I am currently most focused on.
4. What according to you is the future of SRE?
SREs are here to stay, any organization that cares about its users and systems wants to build a culture around SRE. Finding and hiring a good SRE is hard because apart from being good technically, one has to respect the process and has to have a sense of ownership and passion towards the systems they manage. In my opinion, SRE’s job is a never-ending one because as your business scales, your users scale, your systems scale and so does your failure domains and to keep those running reliably you need someone with incredible passion and grit who is determined to learn new skills and is curious about everything that happens in the tech world.
5. Any productivity hacks that you would give to new SREs?
6. What are some of the things people get wrong about this role?
SRE has become a buzzword in the tech world. Everyone wants to be an SRE assuming it's just like system administration and companies are enabling this by renaming SysAdmin jobs to SRE. But the reality is an SRE is a software engineer who knows his way around the Infrastructure that is running the software. SRE understands how that software will behave when the infrastructure running it will fail and how to bring it up if there is a failure.
With this knowledge, SREs can infuse reliability in software at the design level. So a software designed with reliability and failure domains in mind will be more reliable when taking hits from all sorts of expected and unexpected failures both at the service and infrastructure level.
Follow the journey of more such inspiring SREs from around the globe through our SRE Speak Series.