I ended up leaving higher education for financial reasons. I needed to get a job to support myself at 19 and applied to a startup where I took a series of tests and was offered an engineering role. This launched a career in various engineering roles over the years which I am incredibly grateful for.
Fast forward to about two years ago, while I was in a test automation role, I worked closely with a Senior DevOps Engineer and learned everything that I could from them. They ended up leaving for Apple and I put an unhealthy amount of time in understanding day-to-day DevOps activities. Shortly after their departure, I got pulled onto a new project and worked myself into the SRE role that was badly needed at the organization.
Understanding that you, as an individual, cannot possibly know everything. A lot of people outside of the engineering industry have an assumption that you just sit at a keyboard and magically type out memorized commands into the terminal but I don’t think that’s ever the case. The most challenging part of any of my engineering roles was at first, learning how to learn and after grasping that, quickly applying that knowledge in a business environment that has real deadlines.
At this moment, I’m fond of Oh-My-Zsh, kubectx, K9s, VIM, cURL, Go, and Python. I hear kustomize is great and want to dive into that (shout-out to Marcel Dempers for doing a video on this)
Ignoring the Principle of least privilege early in a project and providing too much access to a Kubernetes cluster, my awful justification was that it was an MVP project. An engineer manually deployed via kubectl -f apply which led to me spending time tracking down an unknown change that gifted us with some real unpleasant errors which turned into a late evening.
From a k8s perspective, I feel like a lot of that will get abstracted away and we will end up with some heroku-esque platform, so a lot of the day-to-day tasks will turn more into uptime focused work. I read what Kelsey Hightower writes on this topic and take note.
Regarding the SRE role, we will likely see this continue onward for a few years in the current state or for some teams, morph into a role that it was intended to be. Some organizations adopt some recruiter/FAANG-driven titles and the workload shifts based on the context of the company. The SDET role was infamous for this.
*waves magic wand*
“You’re all SRE’s now”
*continues doing the same work*
There was an argument years ago in the test community that test automation will consume the role and everyone will be an SDET from the readings of “How Google Tests Software”. However, I didn’t see that fully happening, so many companies and organizations work at varying levels of maturity that a leading organization making a change can take years to appropriately travel across other shops. I spoke with a company a few months earlier and they were still in the process of building an SRE team and ultimately trying to define what the average workload would look like based on their context. Not everyone has Google workloads and we end up with hybrid roles, kind of like what I’m doing now due to team size constraints.
Work on something to understand it and from there, see if it can be automated or addressed in another approach. When I worked on testing, I would walk-through a scenario just like a user and automate that walk-through via selenium and pytest for anything business critical, I apply this to SRE with automating what I think is realistic.
That it’s strictly an ops role. In DevOps, if I deploy something or build out any associated infrastructure I need to take an SRE approach and own whatever that is. My “customers” can be internal and we should still have an SLO, Observability, etc. on whatever that deployment might be.
Implement something within a realistic timeframe and iterate on that. I get that not everyone has that luxury but this is something that works for me. Push your organization to address technical debt in some way, every sprint.
I always enjoy seeing new work from Julia Evans (b0rk) over at https://wizardzines.com/. I grew up playing in punk bands and would always read DIY zines picked up at merch tables, so it was nostalgic seeing this approach to technology.
Having the ability to stay cool. It’s easier said than done but when systems go down you have to try and keep a level head while also looking at the clock, it can be incredibly scary. If you don’t already, try having a Game Day test or runbooks at the ready.
Trying not to panic also applies to the information overload that we get in the industry as engineers, with new tooling, languages, etc. all while trying to balance life on top of that. At the end of the day, we are all human and can’t forget to take care of ourselves too.
Follow the journey of more such inspiring SREs from around the globe through our SRE Speak Series.