Introduction
Arild Jensen is an SRE manager based in Los Angeles, California. Currently, he is focused on incident management--both response and root cause analysis--as well as site resiliency, isolation, and recovery for Upwork, the world’s largest online talent solution that enables businesses to find and work with highly skilled independent professionals.
How did you become an SRE?
It’s been a journey for me. I started off with a PC my dad brought home when I was a teenager. After college, I knew I didn’t want to write code full time so I began doing system administration, first for a small startup, then to an established company, along the way expanding my skill set. Then the whole DevOps movement gained traction and I used that to address some very real pain points where I worked. After I joined Upwork, we needed to establish a real site reliability team which I’ve been deeply a part of these last few years. At the end of the day, it’s all about improving yourself, your team, and your organization. That is the fun part and that is very much what SRE is all about.
What’s the most challenging part of your job?
When you’re in a technical environment (and probably most other places, I imagine) it’s always easier to focus on the day-to-day details, as it’s typically the technical minutiae that eat up most of the minutes of your day. To make a real difference you have to rise above that, see the big picture, and start pushing everyone in the right direction. Rather than let work come to you, you have to seek out the projects and initiatives that truly make a difference. Doing that when you are constantly interrupted is very much a challenge that takes some effort to overcome. But at the same time, this is definitely doable with some very significant rewards.
Any productivity hacks that you would give to new SREs?
Whatever you do, make sure you truly own your work. It’s easy when you’re junior to let others tell you what and how to work. The key to being productive, especially when you’re an SRE and need to influence others in their work, is to work with others, not for them. Do that, and work suddenly is negotiable and you can start digging into what people really want from you, what they actually need to be done, and how you can use that to change how they work to improve site reliability.
What are some of the things people get wrong about this role?
This is not just a technical role. You may spend most of your day on technical matters but the core of good SRE is the continuous improvement that continuously improves your site’s reliability. The big improvements always come from changing how people work. When others understand that, they turbocharge you and truly good things happen.
What are some of the best practices you’ve picked up along the way?
Don’t trust people, including yourself. Always base your decisions on hard data, avoid “manual” changes like the plague (use tools that require everything to be in code), and have peer review of your pull requests before they make it to production. And perhaps even more important, implement a “no-blame” culture. Once people see your commitment to that, it is amazing how open and honest they are about all the gaps and shortcomings you didn’t know about, which in turn allows you to address them.
Is there any book, video, talk, or tech that has inspired you lately, and why?
I’m a huge fan of Gene Kim’s “Project Phoenix” and am eagerly reading the followup “The Unicorn Project.” They give a great and fun read into what everyone in SRE is struggling with every day. The descriptions of the battles are very much what you encounter in the real world and it highlights how to address them. Understanding that is absolute key to site reliability engineering.
Follow the journey of more such inspiring SREs from around the globe through our SRE Speak Series.