Objectives

Objectives are a building block of your reliability effort. They help you monitor different aspects of your services and systems, from latency and error rate to the number of open PRs and issues.

Objectives are defined as how you think your system should be behaving in “normal” situations. They help you define what “good” looks like.

To learn more about how to use objectives, you can read our blog post Understanding the power of objectives or the Service Levels Objectives chapter in the Google SRE book.

Using indicators to measure objectives

Indicators are a measurement of an aspect of your service. Some common indicators are:

  • latency (the time it takes for a request to be completed)
  • availability (the percentage of the time during which a service is usable)
  • error rate (the fraction of all requests that returned an error)

Indicators are mostly collected on the server side, through tools like Prometheus, or as metrics given by the API of your cloud services provider.

An objective only makes sense in a certain time window. You can set objectives for how a certain service performs in one-hour, one-day or one-week time windows.

Example

You have an indicator attached to a service that returns its latency. For each request, you get a number in milliseconds. You probably want your service to be fast and decide to set a threshold at 250 milliseconds. That number can be set arbitrarily, or based on what you already know about how an existing service has performed in the past.

In this case, good is defined as Indicator ≤ 250 ms

An objective for this service can then be phrased as “I want 95% of all requests to this service to be performed in less than 250 milliseconds”. It will be computed as the percentage of requests that were executed in less than 250 milliseconds against all requests, during the specified time window.

Defining Objectives in Reliably

Deciding what to attach an objective to, and how to measure it, can be hard or tedious. Reliably helps you get up-and-running quickly by scanning your system and creating objectives and indicators for you.

Reliably currently supports the following providers:

  • AWS (API Gateway, Lambda, Elastic Load Balancer)
  • Google Cloud (Cloud functions and Cloud Run)
  • Kubernetes (Pods)
  • Datadog (Import the SLOs defined in Datadog)
  • GitHub (Issues and Pull Requests)
  • Dynatrace (Import the SLOs defined in Dynatrace)
  • Custom (Construct Custom Objective)