Your First Reliably Plan
This guide will take you through your first journey using Reliably. You do not need to know anything about Chaos Engineering to get started.
2 minutes to get started
- Select one of Reliably’s
starters. We suggest you
start with a simple one like
Verify latency
- Fill its form and click the
Create and run
button. Leave theContributions and tags
section as it is for now - Select the default
[Reliably Cloud] cloud
deployment - Set the schedule to the value
now
- Click the
Create plan
That’s it! Well done.
Shortly, the plan will start and the experiment’s execution will be visible from the executions list.
Explanation
Reliably’s has the goal to help with the adoption of Chaos Engineering and verification into a team, or organization. To achieve that, its core concepts are straightforward and relatable.
Reliably uses the Chaos Engineering concept of Experiment
which is a way to
declare a scenario of verification of your system. A Reliably Starter
is a
template for common experiments that you can easily put in place in your
environment without having to learn how they are designed.
A Reliably Plan
orchestrates the execution of an experiment. To perform this
function, a plan requires a Reliably Deployment
. A deployment merely instructs
where to run the experiment. A plan also needs a Reliably Schedule
which
tells Reliably if the plan runs now and/or periodically, through a CRON format.
Optionally, a plan may also take a Reliably Environment
which represents a
bag of environment variables and secrets to pass to the execution at runtime.
The plan can also take reliably Integrations
which provide additional
behaviors during the execution of the experiment.
As soon as the run starts it can be reviewed via the Reliably Execution
page.
Going further
Run your own Experiment
While Reliably provides ready-to-run starters, it’s likely your system requires bespoke experiments to fully address its unique properties.
Nothing simpler. Let’s say you want to run this experiment:
{
"title": "Losing an AZ in a classic ELB should not impact our users",
"description": "While EKS helps with our resilience, we need to understand how the failure of an entire AZ at the ELB level may harm our users.",
"configuration": {
"aws_region": "eu-central-1"
},
"method": [
{
"type": "action",
"name": "Simulate AZ Failure for a classic LB",
"pauses": {
"after": 10
},
"provider": {
"type": "python",
"module": "azchaosaws.elb.actions",
"func": "fail_az",
"arguments": {
"az": "eu-central-1c",
"dry_run": false,
"tags": [
{
"Key": "kubernetes.io/service-name",
"Value": "default/frontend-external"
}
]
}
}
}
],
"rollbacks": [
{
"type": "action",
"name": "Recover AZ Failure for CLB",
"provider": {
"type": "python",
"module": "azchaosaws.elb.actions",
"func": "recover_az"
}
}
]
}
Go to the new experiment page and
paste it (or load it from your computer). Leave aside the contributions for
now, we will explain them later. Click the Import
button.
This experiment is now available on the experiments page for all users of this organization.
Select it in the list and click on the Run experiment
button. Follow then the
same process as with the starter above to trigger the plan.
Parametrized your Experiment
Quite often, users will reuse a single experiment but change its parameters
at runtime. To achieve this, you can use Reliably Environments
. They represent
a bag of clear text environment variables and/or encrypted secrets.
In the example above:
{
"title": "Losing an AZ in a classic ELB should not impact our users",
"description": "While EKS helps with our resilience, we need to understand how the failure of an entire AZ at the ELB level may harm our users.",
"configuration": {
"aws_region": {
"type": "env",
"key": "AWS_REGION",
"default": "eu-central-1"
},
"fail_az_tag_key": {
"type": "env",
"key": "TAG_KEY_1",
"default": "kubernetes.io/service-name"
},
"fail_az_tag_value": {
"type": "env",
"key": "TAG_KEY_2",
"default": "default/frontend-external"
},
},
"method": [
{
"type": "action",
"name": "Simulate AZ Failure for a classic LB",
"pauses": {
"after": 10
},
"provider": {
"type": "python",
"module": "azchaosaws.elb.actions",
"func": "fail_az",
"arguments": {
"az": "eu-central-1c",
"dry_run": false,
"tags": [
{
"Key": "${fail_az_tag_key}",
"Value": "${fail_az_tag_value}"
}
]
}
}
}
],
"rollbacks": [
{
"type": "action",
"name": "Recover AZ Failure for CLB",
"provider": {
"type": "python",
"module": "azchaosaws.elb.actions",
"func": "recover_az"
}
}
]
}
We modified the experiment to let the values be passed as environment variables
at runtime. To create an environment that contains these values, go to the
environments page and click
New environment
.
Give it a meaningful name and add three environment variables:
AWS_REGION
with valueeu-central-1
TAG_KEY_1
with valuekubernetes.io/service-name
TAG_KEY_2
with valuedefault/frontend-external
Notice that in the experiment, all our entries have default values already. This means this experiment can still be used without the environment variables being set. However, when set, they take precedence over the hardcoded defaults.
Next, when creating a plan, you can simply select the right environment to run this experiment with different conditions.
Connect Reliably to your World
You have embarked on an extraordinary journey to learn about your system and how you, as a team, respond to changes brought to it.
With this in mind, you mustn’t consider Reliably as the end of a line, instead, its power lies in the fact it can surface knowledge into your usual operational stack.
To achieve this, you can bet on Reliably Integrations
. Reliably comes
with a set of integrations that push data to places where you already are such
as Slack or observability platforms.
Go to the integrations page and click
on the New integration
button. Selects one that suits your operations and
configure it.
When creating a plan, you can enable integrations for the plan and let them do their magic as the experiment runs.
Investigate during an Execution
Reliably is all about helping you, as a team, nurture a learning capability. To this extent, Reliably allows you to put an execution on pause so that you can go and investigate the system’s response to the experiment.
Go to a running experiment and
click the Pause execution
button. Reliably will do its best to put the
execution on pause right after the current activity. When that happens, you can
wait for the pause to timeout or click the Resume execution
button.
On a Free plan, the pause cannot be longer than 5mn before it times out.
A pause can only happen after the current activity. If that activity is very long, your pause will not take place immediately.
Learn more
- Read the Reliably reference to understand Reliably’s architecture