Use Reliably Integrations

This guide goes into the details of creating and using Reliably Integrations.

A Reliably Plan executes activities implementing the scenario described by the Reliably Experiments of the plan. It is often pertinent to let perform tasks while the execution takes place, to cover scenarios such as:

  • sending events to your observability platform
  • controlling the execution so that it terminates in case of system’s specific condition
  • make the run visible via platforms like Slack

and many other use cases that allow you to integrate Reliably into your operations.

The Basics

Create an Integration

Unlike Reliably Environments, Reliably Integrations work with any Reliably Deployment type.

To create an integration, go to the New integration page and start filling the form as needed.

Provide a name that you can easily recall what it covers later on. Then fill the requested parameters. Integrations are then available to be used in Reliably Plans.

Delete an Integration

You can delete integrations at will but only when all Reliably Plan that use them have also been deleted first. This prevents mistakes where a Plan tries to run and cannot find its integration.

Configure Integrations

This section will review how to configure all integrations provided by Reliably.

Slack

The Reliably Integration for Slack creates a Slack thread in a channel of your choice. That thread contains all events of the execution of the experiment in near real-time. Messages contain the result of each activity so you can review where the action is at and its status as it runs.

FieldValue
nameThe integration name. Make it specific so you can easily differentiate it when creating many of the same type.
channelThe Slack channel where to send messages to. The Slack app must be invited into that channel first.
tokenThe Slack token of the Slack application enabled in the Slack workspace for the integration.

To enable the Reliably Integration for Slack, you need to first create a Slack application and enable it in your Slack Workspace. Please, read more about how this is done on the Slack documentation.

The application should have the following scopes:

  • channels:read
  • chat:write
  • file:write

We suggest you name it Reliably to make it clear to your users.

Honeycomb

The Reliably Integration for Honeycomb uses the Open Telemetry protocol to send traces to Honeycomb while the experiment is running.

FieldValue
nameThe integration name. Make it specific so you can easily differentiate it when creating many of the same type.
traces endpointThe Honeycomb Open Telemetry traces endpoint. Usually https://api.honeycomb.io/v1/traces.
api keyThe Honeycomb API key using the format x-honeycomb-team=your-api-key.

Dynatrace

The Reliably Integration for Dynatrace uses the Open Telemetry protocol to send traces to Dynatrace while the experiment is running. Please review Dynatrace documentation for more information.

FieldValue
nameThe integration name. Make it specific so you can easily differentiate it when creating many of the same type.
traces endpointThe Dynatrace Open Telemetry traces endpoint. Usually https://{your-environment-id}.live.dynatrace.com/api/v2/oltp/v1/traces.
api tokenThe Dynatrace API token with the scope to send traces: openTelemetryTrace.ingest.

Google Cloud Platform

The Reliably Integration for GCP uses the Open Telemetry protocol to send traces to GCP while the experiment is running.

FieldValue
nameThe integration name. Make it specific so you can easily differentiate it when creating many of the same type.
service accountThe GCP service account which has the roles/cloudtrace.agent role to send OpenTelemetry traces.

Grafana

The Reliably Integration for Grafana uses the Open Telemetry protocol to send traces to Grafana Tempo while the experiment is running. Please review Grafana OTLP documentation for more information.

FieldValue
nameThe integration name. Make it specific so you can easily differentiate it when creating many of the same type.
traces endpointThe Grafana Open Telemetry traces endpoint. Usually https://otlp-gateway-<zone>.grafana.net/otlp/v1/traces.
usernameThe Grafana instance ID from the details page.
passwordThe Grafana access policy token.

OpenAI ChatGPT

The OpenAI ChatGPT Integration automatically pulls information about your experiment once it’s finished. The goal is to offer context of what other aspect of the system you should be paying attention to.

FieldValue
nameThe integration name. Make it specific so you can easily differentiate it when creating many of the same type.
modelThe ChatGPT model to use.
keyYour OpenAI key.
orgYour OpenAI organization identifier.

We recommnd you do not enable this integration on plans with a recurring schedule. This would generate likely the same output and would consume quite a lot of your tokens. Instead use it one plans using the schedule now.

Reliably Pre-checks

The Reliably Pre-checks Integration offers a mechanism to verify some pre-conditions before the experiment is allows to run. If these pre-conditions are not met, the execution is interrupted and terminated.

This integration is useful if you need to ensure operational properties of your system so the experiment does not run in conditions that could lead to potential issues of the system.

If you are familiar with the steady-state hypothesis concept of the experiment itself, you may feel these pre-checks are redundant. In fact, they use the same approach but have a different purpose. The hypothesis of the experiment is part of the experiment’s definition itself and uses to inform us how a specific change, or turbulence, can impact this hypothesis. The pre-checks are operational gates that decide whether it’s safe to execute the experiment and does not make any assumption about the experiment itself.

FieldValue
nameThe integration name. Make it specific so you can easily differentiate it when creating many of the same type.
pre-check endpointThe URL called during pre-checks to make the decision of interrupting the execution before it starts. The endpoint must return a JSON payload of the form {"ok": true/false, "reason": "Reason when ok property is false"}. When the ok field is false, the execution is interrupted. Any response with a status code different from 200 will be considered as an interruption as well.
tokenIf the endpoint is protected by a bearer token, provide it as well
credentials fileIf the endpoint is protected and requires a service account (GCP) or a credentials file (AWS), provide its content

Reliably Safeguards

The Reliably Safeguards Integration offers a mechanism to verify the operational safety of executing the experiment as it occurs. If the conditions are not met, the execution is interrupted and terminated.

This integration is useful if you need to ensure operational properties of your system so the experiment does not run in conditions that could lead to potential issues of the system while it executes.

FieldValue
nameThe integration name. Make it specific so you can easily differentiate it when creating many of the same type.
safeguard endpointThe URL called during the execution of the experiment to make the decision of interrupting the execution. The endpoint must return a JSON payload of the form {"ok": true/false, "reason": "Reason when ok property is false"}. When the ok field is false, the execution is interrupted. Any response with a status code different from 200 will be considered as an interruption as well.
frequencyThe frequency at which the endpoint should be called.
tokenIf the endpoint is protected by a bearer token, provide it as well
credentials fileIf the endpoint is protected and requires a service account (GCP) or a credentials file (AWS), provide its content

Reliably Auto-pause

The Reliably Auto-pause Integration offers a mechanism to pause the execution after each activity. Pauses can be resumed manually from the execution page.

Pauses are useful when you want to investigate the system as the experiment takes place.

FieldValue
nameThe integration name. Make it specific so you can easily differentiate it when creating many of the same type.
strategyThe strategy of pauses: before or after each activities in the method

Pauses lasts as long as your Billing Plan offers.