What is Verification?

Verification is the process of running controlled chaos engineering experiments on your systems and services in order to test their capabilities and how they react to different events. It is a method for confirming hypotheses, exploring complex infrastructures, and revealing otherwise undiscoverable behaviors.

Identifying weaknesses before they manifest is critical to good service reliability.

Why is verification important?

Modern infrastructures are so complex that even in medium-sized organizations, not a single person knows how each service relates to every other. Understanding the impact of a failure or even a performance issue on one service can have on your overall infrastructure is downright impossible.

There are then two ways of discovering this. Either wait for something to happen, and see for yourself (hoping you’re not the on-call person), or create and run chaos engineering experiments to discover these impacts in a controlled environment, at a moment when degraded performance can be tolerated (for example when your SLOs have been overperforming for some time).

How does verification work?

Reliably relies on the open source Chaos Toolkit to run chaos engineering experiments. Chaos Toolkit has been created and is maintained by members of the Reliably team, with contributions and extensions from wonderful teams working at companies such as Capital One, Oracle, or Lego.

Over the time, we’ve heard from first-time chaos engineering practioners that they we’re struggling defining their first experiments. “What to experiment on?” and “What should an experiment do?” were frequent blockers.

Reliably helps you take this first step by allowing you to automatically create simple verifications for existing objectives.

The form to create a new verification. Type in a name, select an existing objective, and you click the create button.

You can then download the verification or run it from the provided URL as a Chaos Toolkit experiment.

If it impacts your service, you will notice it in your objective results graph.

An objective results graph with value dropping when a verification is run.

The benefits of verification

Verifications allow you to surface behaviors of your system that would otherwise be undiscoverable. It thus allows the people responsible for running day-to-day operations to better understand how services work together.

Chaos engineering experiments, by giving the opportunity to experiment with fixing production on a regular basis, make each team member more confident in navigating future incidents. They make teams more efficient and helps reduce on-call anxiety.

Verification is also a powerful feedback loop that lets your team find out if the current set of objectives is the most benefical for the team’s performance. A team that has become very good at fixing fires might be able to maintain their objectives above target all the time, at the detriment of innovation and individual mental health. Making it in fact sensitive to any change in external conditions, like an increase in traffic or a marketing request for new features.

FAQ

Who should use verifications?
We're tempted to say "everyone"! When you have your infrastructure covered, with objectives defined and monitored, running verifications seems a natural step, given all the benefits your organization will get from it.
What are the benefits of engineering chaos with Reliably?
Testing your services and systems by engineering chaos helps identify vulnerabilities through controlled experiments allowing you to introduce unpredictable behavior to see how your systems respond. It makes team more efficient and individual contributors more confidents.
Can I get a free trial?
Yes, sign up for free (no credit card required) and get access to features such as service level objectives, scorecards, and alerts. You can cancel any time and you won't be charged anything.