Inside the DevOps mindset: 6 traits of a great Site Reliability Engineer

5 mins

Site Reliability Engineering (SRE) is a relatively new field, first introduced by Google, th...

Site Reliability Engineering (SRE) is a relatively new field, first introduced by Google, that has become increasingly critical in today's rapidly evolving technological landscape. In January 2022, LinkedIn ranked SRE at number 21 in its list of jobs with the highest global demand throughout the past five years. Pretty impressive for such a niche role!

SREs, and by extension DevOps specialists, are responsible for ensuring that systems are reliable, scalable, and efficient, with a focus on automation and the elimination of manual grind. Studies have shown that having these kinds of capabilities reduces companies’ downtime by 50%, and also, that businesses are 2.5 times more likely to meet their SLAs than those without an SRE function.

So, what does it take to be a great Site Reliability Engineer like Craig Sebenik, David Blank-Edelman, and Kurt Andersen? If you’re keen to become (or hire) a top SRE professional, here are some of the top key traits that’ll help:

(1) Strong technical skills

It might seem obvious, but first and foremost, great SREs must have strong technical skills. This includes deep knowledge of things like microservices, serverless, cloud (Azure, GCP, AWS, etc.), containerization, cloud native, IaC, GitOps, IoT, ML, AI, DevSecOps, and/or programming languages like Golang and Python. You should also be familiar with software engineering practices like version control, continuous integration, and delivery.

In addition to technical skills, you must have a curious and methodical approach to problem-solving and be comfortable working in a fast-paced, dynamic environment.

 

(2) Continuous learning and improvement

Tech never stands still, so neither should you. Do you have a thirst for continuous learning and development? As technology continues to evolve, a growth mindset is an important trait for anyone working in DevOps. You’ll need to stay up-to-date with emerging trends and technologies, attend industry events and conferences, and be willing to experiment. When working on the cutting-edge of tech, don’t be afraid to fail fast.

Ambitious SREs should also seek out opportunities for personal and professional development, including mentorship and training programs. Kim Moir shared in a Developer to Manager interview, “The year before I became a manager, I started thinking less about distributed systems and more about distributed teams. I would look at the team I worked on and think ‘I wonder how engaged they are in their work?’, ‘How can we celebrate our successes as a distributed team?’ and ‘How could we share our skills more effectively among team members?’” If you can also show an interest in managing others, there are endless opportunities in this fast-moving sector.

 

(3) Strategy, analysis & business acumen

As the SRE motto says: “Hope is not a strategy”. The very best Site Reliability Engineers and DevOps specialists not only collaborate with the rest of the business, but will be able to step away from their daily activities to look at the bigger picture. To do well in DevOps, you need a good understanding of your business and its objectives. So, as an SRE it’s important to carry out analysis of the entire business, to mitigate risk, and to build up an understanding of how your work can support the business’ goals.

Do you understand the ‘why’ behind what you’re doing and the impact your work is having on the business? The very best professionals will take a keen interest in the effect their projects have on business success and will play an important role in providing the business with a competitive advantage.

 

(4) Proactivity and problem-solving

The strongest SRE candidates will have a proactive and problem-solving mindset. They’ll be able to anticipate potential issues before they arise, and design systems that are resilient and scalable. When issues do occur, as an SRE you’ll be able to quickly identify and resolve them and implement long-term solutions to prevent similar issues from occurring in the future. And as Betsy Beyer says in Site Reliability Engineering: How Google Runs Production Systems, the solution might not always be obvious: “When standard operating procedures break down, they’ll need to be able to improvise fully”.

 

(5) Attention to detail

The very best SREs will have an incredible attention to detail to ensure that systems are running smoothly and efficiently. They need to be able to identify and mitigate potential points of failure, troubleshoot issues quickly, and monitor system health in real-time. This requires a meticulous approach to the execution and a willingness to dig deep to get to the root cause of an issue.

 

(6) Collaboration and communication

Last, but most certainly not least, great SREs must also possess strong collaboration and communication skills. This involves not just working effectively with others in SRE and DevOps, but also with developers, product managers, data scientists, and other stakeholders across an organisation. If you’re an SRE, you must be able to clearly articulate technical issues and recommendations to non-technical stakeholders, as well as provide regular updates on system health and performance. Mastering this soft skill will set you apart from your peers instantly.

And don’t forget collaboration outside of your organisation too. Being involved with the wider tech community is really helpful for individuals wanting to excel in SRE and DevOps. Attending relevant tech events or even working to become a thought leader in your niche can be really valuable for your career.


To recap, the best SREs possess a unique combination of technical skills and soft skills. You’ll play a critical role in keeping systems reliable and efficient, and in many cases, are responsible for shaping entire departments and organisations. The role’s evolving, though. Two decades on from Google introducing the role, SRE and DevOps continue to grow and develop as companies’ stacks become increasingly complex and require more expertise to help with risk mitigation and improved user experience.

Are you a tech specialist looking to move into Site Reliability Engineering? Or an experienced professional looking to take the next step in your career? Maybe you’re an employer looking to build a great DevOps team with the key traits listed above. Whatever your needs, Apollo can help. We have relationships with lots of great tech experts and the exciting, forward-thinking companies that hire them. Drop us a line to see how we could work together.

Ready to get started?