Reliably helps Site Reliability Engineers and Development Teams shoulder the burden of building predictable, reliable systems.
SLOs and error budgets are a great way to start discussing what everyone should be doing next.
Is it the right moment to experiment on new features, or to scale up your infrastrucure? With the right data, it's easier to make the right move.
Happy users are vital to your business. By defining what is meaningful to them, you're able prioritize where you invest.
Reliably is always where you are, never in your way. Our CLI integrates nicely with your favorite CI/CD pipeline, to provide you with useful information when you need it.
Because SLOs are declared as YAML manifests, you can edit them, version them, and share them the same way you work with all of your codebase.
You take pride in shipping beautiful and robust code. But how do you run it? The
reliably scan command makes sure your Kubernetes manifests and clusters follow the best practices and provide you with solutions when they don't.
Report of SLOs status in your continuous delivery pipeline mean you can use them as guardrails as to when deploying that new feature, or when it's better to work on making your application more robust.
The Reliably CLI helps developers and everyone who cares about reliability introduce best practices into their current workflow, in a non-disruptive way.
reliably slo report Current Objective / Time Window Type Trend Service #1: http-api ✅ 99% availability over 1 hour 100.00% 99% / 1h0m0s Availability ✓ ✓ ✓ ✓ ❌ 99% of requests under 300ms 73.91% 99% / 1d Latency ✕ ✕ ✕ ✕ Service #2: products-api ✅ 99% availability over 1 day 100.00% 99% / 1d Availability ✓ ✓ ✓ ✓
reliably scan kubernetes --format table Results: ■ manifests/deployment.yaml Kubernetes:Deployment K8S-DPL-0007 Setting a high cpu request may render pod scheduling difficult or starve other pods ■ manifests/deployment.yaml Kubernetes:Deployment K8S-DPL-0009 Not setting a cpu requests means the pod will be allowed to consume the entire available CPU (unless the cluster has set a global limit) ■ manifests/deployment.yaml Kubernetes:Deployment K8S-DPL-0013 A rollout strategy can reduce the risk of downtime ■ manifests/deployment.yaml Kubernetes:Deployment K8S-DPL-0014 Without the 'minReadySeconds' property set, pods are considered available from the first time the readiness probe is valid. Settings this value indicates how long it the pod should be ready for before being considered available. ■ manifests/deployment.yaml Kubernetes:Deployment K8S-DPL-0001 You should specify a number of replicas ■ manifests/pod.yaml Kubernetes:Pod K8S-POD-0001 You should not use the default 'latest' image tag. It causes ambiguity and leads to the cluster not pulling the new image ■ manifests/pod.yaml Kubernetes:Pod K8S-POD-0003 Only images from an approved registry can be run ■ manifests/deployment.yaml Kubernetes:Deployment K8S-DPL-0012 Image pull policy should usually not be set to 'Always' ■ test-manifest.yaml:92:1 Kubernetes:PodSecurityPolicy K8S-PSP-0001 Enabling privileged can lead to unwanted escalation from the container's process ■ test-manifest.yaml:92:1 Kubernetes:PodSecurityPolicy K8S-PSP-0007 To reduce risk of accessing files outside of an allowed paths, it's best to make them read only Summary: 10 suggestions found ■ 3 info - ■ 5 warning - ■ 2 error
Service Level Objectives identify what you should care about on your system. They are what good looks like for the users of your system. If an SLO is underperforming, it will be impacting your users in some way.
The Reliably CLI allows you to define SLOs for Availability and Latency.
An Availability SLO allows you to specify a target availability percentage for a Service.
A Latency SLO allows you to specify a threshold latency for a service and a target percentage. The percentage gives the target percentage of responses within that threshold latency.
For more details of an SLO report, see the Reliably documentation on How the Reliably CLI works
Report time: Wed, 19 May 2021 10:46:07 UTC
|Name||Current Objective||Time Window||Type||Trend|
|95% of requests successful over last 1 hour||95%||1 hour||availability||✓ ✓|
|98% of requests faster than 350ms over last 1 day||98%||1 day||latency||✕ ✕|
When you define and SLO for you system, you include a target percentage for that SLO. An example target could be 95%. That leaves 5%, that is your error budget for your SLO.
When you define an SLO with the Reliably CLI you specify a target percentage for the Availability or Latency for that SLO.
The expectation is over a time window, the responses for the Serice will be within that target percentage.
An example could be 99.5% available over a period of 7 days. The target availability is less than 100% so this leaves a margin for error. This can be considered the Error Budget.
The Error Budget metrics are:
|availability||95% of requests successful over last 1 hour||5.00%||18s||0s||18s|
|latency||98% of requests faster than 350ms over last 1 day||2.00%||2m52s||2h11m35s (+2h8m43s)||0s|
Generated by the Reliably CLI Version 0.16.0 (2021-05-19)
Declare SLOs as code and get actionable reports right where you need them: in the CLI, in your continuous delivery pipeline, or as PDF to share with your teammates or manager.
Are you sure all your Kubernetes manifests or clusters follow best practices? The Reliably CLI scans them for potential vulnerabilities and come up with fixes you can implement right away.
Make reliability a part of your development process by automating SLO reports or Kubernetes scan with your favorite CI/CD. You can even use those reports as gates to block a pipeline in certain conditions.
Reliably loves and supports open source. The CLI is developed as an open source project with an Apache-2.0 License, so you can understand exactly what happens under the hood, or even contribute.
Easily and efficiently fetch raw data from your cloud services provider or monitoring solution.
Seamlessly connect to your continuous delivery pipeline.
Make SLOs accessible to everyone, by displaying reports in GitHub PRs, or in continuous delivery pipelines. Never lose time telling people to look at dashboards anymore!
Validate your reliability efforts with powerful and flexible chaos engineering experiments. Made by the same people as Reliably, the popular open source Chaos Toolkit has proven to be a solution of choice for thousands of reliability experts. And with the upcoming Reliably Chaos Toolkit extension, it's even easier to experiments only when your SLOs are healthy.
You might sometimes have the feeling the development teams follow a different agenda than yours. You care about ROI and user happiness, while they work on features, issues, and availability. SLOs are a way to quantify what is important to you, in a language that is shareable across the whole organization.
You know what your users care about, but you're not sure how it relates to measurable metrics? With Reliably SLOs you can declare what is important to you, and find out later how to measure it. Or make it you reliability team's mission.