What is SRE?
Site reliability engineering (SRE) is a set of principles that incorporates aspects of software engineering into IT operations. It takes tasks that would typically have been done manually by operations teams and gives them to engineers to solve using software and automation. This helps to create a bridge between development and operations teams.
The concept of SRE was created by Google back in 2003. Since then, it has been adopted by thousands of organizations all over the world. Today, it helps organizations to find a balance between releasing new features and making sure that existing features are reliable for users.
Organizations can help to define and improve reliability by using the SRE pyramid – also known as the Dickerson pyramid. The SRE pyramid outlines seven key principles of service reliability: monitoring, incident response, postmortems and root cause analysis, testing, capacity planning, development, and product. The health of a service can be categorized by looking at how many levels of the pyramid have been fulfilled.
What is DevOps?
DevOps is a set of practices that combine software development and IT operations that is frequently used in conjunction with Agile software development. It focuses on shortening the development lifecycle of a system and providing continuous delivery.
The DevOps movement began around 2007. Unlike previous software development models which focused on different teams working apart from each other, DevOps often encourages development and operations teams to merge together into a single team.
DevOps can result in faster releases, increased collaboration between teams, more rapid deployment of products and faster bug repairs and new feature releases, increased quality and reliability, and improved security.
How is SRE related to DevOps?
Despite their differences, SRE and DevOps are closely related. Both SREs and DevOps engineers are considered specialized advisors that utilize automation and collaboration to create resilient and reliable software, and both approaches are needed to launch and maintain large products.
DevOps allows companies to deliver high-quality services at a high speed. However, while such rapid delivery has its benefits, it also inevitably causes issues. For instance, it can often lead to teams missing out on vulnerabilities, or failing to test bugs thoroughly enough, which can lead to a less reliable service. This is where SREs can step in to improve reliability. For this reason, some organizations even consider SRE as a specific implementation of DevOps.
The 5 main differences between DevOps and SRE
There has been a lot of debate about the differences between DevOps and SRE. Here’s a summary of some of the main differences between the roles:
1. DevOps focuses on what needs to be done, SRE focuses on how it can be done
DevOps engineers focus their efforts on the development pipeline to figure out what needs to be done.
They focus on developing new products quickly, and on maintaining existing products. SREs, on the other hand, focus on how things can be done. This means that they focus on characteristics such as resilience, robustness, scaling, and uptime.
2. They solve different problems
DevOps teams are typically sent to help solve one-off problems such as failed deployments. SREs won’t usually be sent to fix a problem unless it causes an outage, or it is a recurring issue that affects system availability and reliability.
3. They focus on different metrics
DevOps engineers typically care more about deployment frequency and deployment failure rate. SREs, on the other hand, focus on error budgets, service level indicators (SLIs), service level agreements (SLAs), and service level objectives (SLOs).
4. They care about different things
Perhaps the biggest difference between DevOps and SREs is what they care about. DevOps focuses on creating a cultural and philosophical shift.
SRE is much more practical, and focuses instead on practices and metrics that improve collaboration, and allow for better service delivery.
5. They have a different team structure
DevOps teams are made up of a variety of different roles, including developers, engineers, and QA experts. SRE teams are made up of SREs who have a background in development and operations.
6. SRE vs DevOps jobs: what differentiates the two roles?
People within the industry have a wide range of opinions about what differentiates the role of a DevOps engineer from the role of an SRE:
DevOps vs SRE:— Tiffany (@TiffanyJachja) May 19, 2021
DevOps is the people, process, and technology that drive business value through software delivery.
Site Reliability Engineering are software engineers designing an operations function.
Both are about driving better software related outcomes.
One of the most obvious differences between the job of an SRE and the job of a DevOps engineer is that DevOps engineers are more operations-focused, whereas SREs are more development-focused.
DevOps is often used as a way to break down silos within an organization, which can sometimes be challenging. DevOps engineers are always working on new technology and working to improve the software development lifecycle. They are often required to pick up new technology quickly, and often learn on the job.
#DevOps vs. #SRE vs. #GitOps— Philipp Strube (@pst418) July 6, 2020
DevOps: Principles enabling IT organizations to achieve high quality and velocity.
SRE: Practical two team implementation of DevOps principles to balance features and reliability.
GitOps: Opinionated, Git based automation workflow for DevOps teams.
The job of an SRE, on the other hand, is much more focused and will typically vary less between different organizations. However, the nature of the job is stressful – especially when things go wrong. Hours can be long, and your schedule can be unreliable.
Which is better, DevOps or SRE?
Neither of these roles is objectively ‘better’ than the other. But if you’re trying to decide whether to become an SRE or a DevOps engineer, there are a few things to be aware of that could impact your decision.
Right now, there are many advantages to being a DevOps engineer. The role of a DevOps engineer is more well-known than the role of an SRE, which means that more companies are searching for DevOps engineers. This also means that there is more recognition of the role amongst people who aren’t in the tech industry. There are also fewer programming skills required by DevOps engineers, and they’re less likely to be on call than SREs.
While there are currently fewer job openings for SREs, there are likely to be more long-term job opportunities over the next few years. SREs typically have a higher salary than DevOps engineers, and it is considered a more prestigious position. According to Payscale, the average salary for a DevOps engineer is $98,411. Meanwhile, the average salary for an SRE is $118,573 – a 20.5% increase.
When do organizations need DevOps and SRE?
Organizations typically need DevOps when they want to shift the focus away from short-term, cost-focused models, and towards the faster development of new products and seamless maintenance of existing products.
They typically need SREs when they have to fulfill strict reliability needs, and meet strict performance standards. For instance, a company like Netflix would meet these criteria, given that a lapse of even a few seconds can cause significant issues for users who are streaming video content.
Organizations also often hire SREs if they have a complex technology stack where there are lots of things that could potentially go wrong.
What does it mean to be reliable?
Measuring reliability within an organization can be challenging. Service level indicators (SLIs) are designed to make the process of measuring reliability easier. Some of the SLIs that can help organizations to view their performance metrics and measure reliability include request latency, error rate, traffic, availability, durability, and saturation.
How often organizations measure their SLIs depends largely on the service level agreements (SLAs) that they have with their users, as well as the service level objectives (SLOs) of their organization. They’re also discussed at length in Google’s SRE handbook.
What tools do SREs use?
- DataDog: a monitoring platform for cloud applications. It can be used to monitor servers, tools, and databases.
- Reliably: a platform that allows reliability teams to create objectives, aggregate a reliability score, and set alerts so you can operate with greater predictability and less anxiety.
- Gremlin: a monitoring platform that allows teams to identify critical failures, reduce the resolution time for incidents, and frequently test disaster recovery mechanisms.