Building an effective DevOps team is one of the best and most challenging decisions a growing software development group can make. That’s because a quality DevOps engineer requires a unique set of hard and soft skills. For example, a DevOps engineer needs to understand CI/CD pipelines and how to deal with the complexities of communication across organizational silos.
Your DevOps engineers will need diverse experiences to be effective. They will need application code knowledge and fluency in a programming or scripting language like Python. They must have experience with containerization technologies like Docker and Kubernetes and continuous integration tools like Github Actions or Jenkins. When things go bump in the night, that team will be the first online and the last to sleep, leading the troubleshooting conversation to resolution.
The first few DevOps hires set the cultural tone for those who come after, so knowing who to look for and how to recruit and keep them engaged is essential. To help organizations get building a DevOps team right, this article will explore the topic in detail, including key DevOps hiring and retention best practices.
Key DevOps team-building concepts
The table below summarizes key DevOps team-building concepts this article will explore in detail.
Building a DevOps team improves speed and efficiency
Building a DevOps team is a business decision, so it’s important to understand what business benefit an effective DevOps team delivers. At a high level, we can answer that with the words: speed and efficiency.
With the explosion of interest in DevOps principles, products, and practitioners, it is easy to forget that it simply means Development and Operations. It is equal parts protecting your developers from doing slow work in an unfamiliar domain and running that software more efficiently. Having dedicated people to do this job and making process improvements per their recommendations explains the difference between good and great technical teams.
The skills that involve speeding up deployments, helping your developers deliver updates incrementally and rapidly, keeping the service stable, and reducing failures don’t necessarily overlap with the primary skills of application developers. For example, not every developer wants or needs to know about Docker for application deployment. It turns out to be a good division of labor to have a specialized practitioner for these skills that crosses the boundary between the application code and the environment the code is built and deployed in. You need capable developers and qualified Ops and SRE personnel for the most effective teamwork.
DevOps encompasses a set of practices and tools that enable seamless collaboration between development and operations teams. The core functions in the DevOps process include planning (using tools like Jira), coding (with IDEs such as Visual Studio), building (utilizing platforms like GitHub), testing (with tools like Postman and Jmeter), releasing (through automation servers like Jenkins), deploying (using container orchestration platforms such as Kubernetes), operating (with infrastructure-as-code tools like Terraform and configuration management tools like Ansible), and monitoring (using dashboarding tools like Grafana and incident management tools like Squadcast). When reviewing resumes for DevOps roles, look for candidates who demonstrate proficiency in combining these tools and technologies, as they indicate a well-rounded skill set necessary for successfully implementing DevOps practices.
Skills needed for success as a DevOps engineer
The engineers you hire to increase your organization’s speed and efficiency will need experience with at least five skills that sometimes draw on disparate experiences.
Programming fluency
Look for evidence that your candidate is competent in one or more programming languages. A candidate with only one programming language listed could still be a good employee, but probably not in DevOps. For the first year or more with the organization, you will be throwing new challenges at them, and you will need them to be nimble and learn fast. Having a few languages they can code in is a positive indication of that.
They will need to read and write in a common programming language and demonstrate the ability to learn from others. Look for fluency in a language already used by your developers or a similar language. If you use Python for your application, look for somebody who knows Java, C++, Ruby, or Javascript. If you use Scala, identify Haskell or Clojure somewhere on the resume. Inquire about their experience with their professed favorite and ask how they learned a second programming language.
Expect terminology like Kubernetes, containerization, or virtualization to make an appearance. Docker and OCI are big names, but they incrementally improve on what came before, and the skills you learn while building a Packer image for deployment on bare metal are largely transferable. This is particularly true for candidates with on-premise experience—they have likely not been afforded the same opportunities to use cloud offerings or Kubernetes, but they have the underlying skills and experience needed to excel.
Virtualization
Every project built in a programming language on the list above comes with a set of dependencies that must be included in the deployment package for that project. For example, a Java web project typically includes not just the source code that developers write, but also an extensive suite of direct and transitive dependencies for working with SQL databases, JSON serialization, password storage, third-party services, HTTP services, and many others. A simple Spring Boot application will include 50-100 JAR files for these needs. Those dependencies need a straightforward way to be packaged and installed reliably. For Java, this is typically Maven or Gradle.
For the dependencies directly tied to your source code and the other system components, like OS packages, databases, or caches, the DevOps engineers will need knowledge of a virtualization system like Docker. The operating system being virtualized is typically Linux or Windows. Inquire about the Linux or Unix distribution they have the deepest experience with and ask what the challenges were.
CI/CD
Linux tools glue together the software build process, and using them from a continuous delivery tool like Jenkins or GitHub Actions makes their invocation reliable. Have candidates tell you about an experience improving the performance or reliability of a CI/CD software pipeline. They should know details about version control, like Git or Subversion, a CI/CD tool like Jenkins, GitLab CI/CD, Azure DevOps, or CircleCI. They should have confidence in the steps involved, like compiling code, running tests, and packaging artifacts. They should understand how to run automated tests for your application, including unit, integration, and acceptance tests, even if they can’t write those tests. They should know how to store the artifacts of the build process, like WARs, ZIPs, or Docker images. They should be able to configure the necessary deployment steps, such as provisioning infrastructure, deploying artifacts, and updating configurations.
DevOps engineers should ensure that environments are provisioned and configured correctly and use infrastructure-as-code tools like Terraform or CloudFormation to manage the infrastructure. Configuration management tools like Chef and Terraform serve different purposes in the infrastructure-as-code (IaC) landscape. Chef is primarily used for configuration management of existing resources, focusing on installing, configuring, and managing software on already provisioned machines. On the other hand, Terraform and Pulumi are infrastructure provisioning tools that declare and manage the infrastructure itself, such as virtual machines, networks, and storage.
In the context of a DevOps Engineer role, IaC skills are crucial for automating infrastructure provisioning, configuration management, and application deployment. A DevOps Engineer would leverage tools like Terraform or Pulumi to define and manage the infrastructure required for the application, ensuring consistency, scalability, and the ability to spin up or tear down environments quickly. They would also use configuration management tools like Chef to automate software component installation, configuration, and management on the provisioned infrastructure.
Incident management & post-mortem analysis
When a deployment fails, your DevOps team will rally to respond, repair, and recover. Managing a fair on-call process will involve an unattended and low-noise communications channel that automated systems can use to reliably notify a human. Candidates must commit to on-call hours in their first conversations with you and demonstrate monitoring and incident response tools knowledge
After recovering the application, the same team will work to make your software more reliable. In collaborative post-mortem meetings, they will identify technical or process changes to lower the likelihood and severity of that class of incident in the future. Ask your interviewees what’s included in a post-mortem analysis and about an improvement they spearheaded from an analysis.
Observability and testing
A successful DevOps engineer should also have a strong foundation in setting up and maintaining observability tools, including metrics, traces, and logs. These tools provide crucial insights into the systems' health, performance, and reliability, enabling proactive issue detection and faster resolution. Additionally, the engineer should be well-versed in running automated tests, such as load and performance testing, to ensure the system can handle expected traffic and to identify potential bottlenecks or performance issues before they impact users in production. This combination of observability and testing skills allows DevOps teams to continuously monitor, optimize, and improve their systems, ultimately leading to a more stable and efficient environment.
Screening a DevOps candidate
The sample resume below showcases a DevOps Engineer with n+ years of experience in architecting, automating, and optimizing mission-critical deployments over a large infrastructure. The candidate possesses proficiency in Configuration Management tools and developing CI/CD pipelines, along with skills in various programming languages, development tools, and an understanding of key DevOps concepts like infrastructure as code (IaC).
As you work through the screening and interview process, you should be looking to assess the candidate’s prior experience, technical aptitude, and cultural fit. Typically, you will set up a 1-hour screen with the candidate first. These screenings are designed to weed out poor fits before investing more of your time and your team’s time in interviewing that person.
Your goal in this first hour is to assess whether the experiences you identified from the resume appear valid and to probe for the intangibles you only get from speaking to them. Asking the right questions is an essential best practice in the DevOps hiring process. Here are some good questions to ask across four key domains.
Questions to assess DevOps technical aptitude
- Describe your experience with continuous integration and continuous deployment (CI/CD) pipelines. How have you implemented and optimized them in your previous roles?
- Explain the difference between containerization and virtualization. When would you choose one over the other?
- How do you ensure the security of your infrastructure and applications in a DevOps environment?
- Walk us through your process for troubleshooting a production issue. How do you identify the root cause and minimize downtime?
Questions to assess prior DevOps experience
- Share an example of a complex project you worked on that required collaboration between development and operations teams. What was your role, and how did you ensure successful delivery?
- Describe a time when you had to scale a system to handle increased traffic or data volume. What strategies and tools did you employ?
- Have you worked with cloud platforms like AWS, Azure, or Google Cloud? Explain your experience deploying and managing applications in a cloud environment.
- Tell us about a time when you automated a manual process. What was the process, and how did automation improve efficiency or reduce errors?
Questions to validate cultural fit
- In your opinion, what are the key elements of a successful DevOps culture? How have you contributed to fostering such a culture in your previous teams?
- Describe a time when you had a disagreement with a colleague. How did you handle the situation, and what was the outcome?
- How do you stay up-to-date with the latest trends and technologies in the DevOps field? Share some resources or communities you follow.
- In a rapidly evolving project with shifting priorities, how do you adapt and ensure that you and your team remain productive and motivated?
Coding questions for DevOps engineer
Depending on your needs, you may also want to see some code. Because you’re not hiring them to be application developers, you’re not looking for brilliant mastery of an algorithm, solving a deep puzzle, or knowing O(n) notation. You’re looking for concise, readable code that accomplishes a straightforward goal. Some examples might be coding a simple web application with a single endpoint, writing a Kubernetes deployment, or building a Dockerfile.
For example:
Please write a Dockerfile for a simple Node.js web application. The application should:
- Use the latest stable version of Node.js
- Copy the application files into the container
- Install the necessary dependencies
- Expose the application on port 3000
- Start the application using the command npm start
The purpose of this exercise is to assess the candidate's understanding of containerization principles and their ability to create a clear, concise Dockerfile.
Alternatively, a Kubernetes question along similar lines:
Create a Kubernetes deployment YAML file for a simple web application with the following requirements:
- Deploy 3 replicas of the application
- Use the myapp:v1 container image
- Expose the application on port 80
- Configure the deployment to perform a rolling update with a maximum surge of 1 and a maximum unavailable of 1
- Include resource requests and limits for CPU and memory
This exercise evaluates the candidate's knowledge of Kubernetes deployments, their ability to create a well-structured YAML file, and their understanding of key deployment configuration options.
Additional interviews
After the initial screen, if you move forward with the candidate, there will usually be a couple more rounds of interviews, both technical and cultural.
On the technical side, refrain from asking the same programming questions you ask your application developer interviewees. It will only marginally apply to the work the DevOps team does. DevOps is much more than “Did the program give me the right answer?” A whiteboard problem where the candidate talks about the different components of a system and outlines the relationships between them will be more informative for you and more fun for them. You’re also looking for their ability to triage system failures effectively - ask what happens when a particular component fails.
On the cultural side, introduce them to the people they would be working with. Every DevOps engineer knows that there will be a slew of responsibilities ready for them when they join, but cue them into some of your organization's challenges and opportunities. See what they think. If they’re excited or interested in those challenges, that’s a good sign. An excited candidate will ask you good questions about those challenges to demonstrate interest. They are interviewing you and getting a feeling for the hurdles ahead as much as you are interviewing them. In addition, opportunities for your existing team to meet the potential hire and start developing a relationship with them are invaluable. Your team has to like the candidate.
Four best practices for improving DevOps retention and avoiding burnout
If you’re lucky, you’ll sign a great candidate. Now, you want to keep them around for as long as possible, and there are tools and processes at your disposal to make their life the best it can be at work.
- Create a fair On-Call Schedulesome text
- Use software to manage on-call schedules
- Automatically link the scheduling tool with your notification system to route emergencies to the right person
- Ensure that every engineer has sufficient time off and knows when that time will be
- Conduct regular team retrospectivessome text
- Tie internal team retrospectives to the sprint cadence
- Encourage engineers to speak about successes, recognize their peers, and discuss improvements
- Spend considerable time discussing tweaks and changes when things go wrong
- Track action items over time and regularly check in on past items
- Be cautious of retrospectives that end without learning and proposed adjustments
- Hold blameless Post-Mortem meetingssome text
- Conduct post-mortem meetings after an incident has been mitigated and recovered from
- Involve engineering teams and business partners impacted by the incident
- Have your DevOps team lead the meeting to document details of the issue and create follow-up items
- Prioritize incremental improvements over large overhauls
- Balance additional tasks against the developer time needed to achieve them
- Follow through on retrospective action itemssome text
- Prioritize and implement the follow-up items from team retrospectives and post-mortem meetings
- Demonstrate that you value your engineers' opinions by acting on their suggestions
- Affirm your engineers as trusted caretakers by improving their work lives based on their feedback
All of these processes can be automated by the Squadcast platform. Building timelines of the incident becomes painless as opposed to the usual fight with a Google Doc table. Building out the roles in a post-mortem process is clear, and mitigating risk and increasing team happiness is a lot easier when you can focus your energy on the meeting and not the tool.
By following these best practices, you can create a positive work environment that fosters employee retention, prevents burnout, and encourages continuous improvement within your DevOps team.
{{banner-3="/design/banners"}}
Conclusion
The buzzwords change, but the breadth of experience needed to connect development and operations effectively stays mostly the same. After reading this, you should know what to look for in a good candidate, how to interview them effectively, and how to keep them for many years.
During the hiring process, look for candidates with a diverse skill set and assess their technical aptitude, prior experience, and cultural fit through carefully crafted questions and coding exercises. Once you've made the hire, focus on retaining your DevOps engineers by creating a fair on-call schedule, conducting regular team retrospectives, holding blameless post-mortem meetings, and following through on retrospective action items. This will drive success in your organization.
In all certainty, the hiring process, especially in a specialized role like this, may take more time than you anticipate. Fill your first position with somebody you are excited to onboard and who has demonstrated enough capability in programming fluency, CI/CD, virtualization, incident management, and observability and testing. Don’t settle for “good enough” for a full-time and in-house team member. The penalty for a false positive is even more significant in DevOps than it would be for an application developer since DevOps personnel regularly need to handle sensitive data, credentials, and production incidents.
The reward at the end is a DevOps team that multiplies the productivity of everyone in the engineering team and the organization as a whole. Done right, very few other kinds of hires within software have as significant an impact over time.