Is 99.999% uptime realistic? We cover why you should care, and how you can achieve it.
What Does ‘Five Nines Of Availability’ Mean?
You’ve most likely seen it crop up before, but it just wasn’t referred to as the ‘five nines of availability’. It’s a term used for the target of 99.999% uptime or service availability.
This level of service provision is a common business goal or SLA. Sometimes, it can be referred to as the ‘five nines of reliability’, or just the five nines. While reliability and availability may be calculated slightly differently, the five nines theory is the same.
Do You Really Need Five Nines?
It’s true that high levels of service availability are pretty important. ‘Five nines’ is a term that is becoming even more increasingly tossed around boardrooms and developer teams.
But there are other factors that can outweigh the allure of 99.999% availability – like the quality of the service when it is available.
For customers, great UX, excellent customer service, and value for money can all make a bigger difference than a few minutes more maintenance time. And for businesses, the cost difference between providing 99.9% (three nines) and 99.999% (five nines) can be in the hundreds of millions.
But you can’t chuck availability and reliability targets in the trash. They’re still important to your customers, and can even prompt innovative ways to improve efficiency in your business.
Why Reliability Matters To Your Customers
If a system or service goes down, that leaves end users unable to do their tasks. No one wants to wait to access an application, or reorganize their day based on (un)availability.
Today, customers expect high levels of service whenever they need it. Whether it’s a personal banking app, a retail website, or a smart factory, 24/7 on-demand access is crucial to keep things running. Don’t forget that there are organizations out there for whom five nines is an imperative – like emergency services.
As we mentioned above, for most businesses it doesn’t have to be the be-all-end-all, but it might spell the difference between someone making a purchase or bouncing off a website. And at the end of the day, predictable, reliable availability will inevitably increase customer loyalty.
Is the Five Nines Achievable?
99.999% availability equates to just 5 minutes 26 seconds of downtime per year. But despite five nines targets, 59% of Fortune 500 companies experience a minimum of 1.6 hours of downtime each week. That’s a massive difference.
We’ve previously talked about the importance of reducing downtime, but the five nines are not always realistic.
It can be expensive and impractical, in some industries more than others. So, the first question to ask is whether or not five nines is even the right target for you to set. Given your business size, existing infrastructure, and your product or service offering, would four or even three nines be more realistic?
Once you’ve set the correct target, then there are absolutely ways to achieve it. Here are some approaches that can help you accelerate your journey to service reliability.
Create An Accurate Risk Profile
The best place to start is to gain an accurate risk and performance profile of your business. That way, you’ll be able to identify applications or instances that are root causes of failure and need to be solved, retired, or made redundant. You’ll also be able to separate ‘normal’ errors (like gateways timeouts) from larger, discrete incidents, helping you to prioritize where to start. With all of this information, you can start to build a transformation roadmap.
Invest In Automation
Introducing automation not only removes the likelihood of manual error – it also unlocks opportunities for other smart tools like continuous network analysis, AI, and machine learning. These can all help to anticipate and solve problems before they result in downtime.
With an up-to-date risk profile and increased automation, you might likely find opportunities to make efficiencies across your existing investments.
If two pieces of technology can do the same thing, why not retire the less reliable one? Streamlining processes onto less, more secure software can not only reduce the likelihood of downtime, but can also result in cost savings for the business.
Testing And Monitoring
It’s incredibly important to make sure that both your response plan and everyday operations are up to scratch. Do you have a tried and tested disaster recovery plan in place? Do you have a test environment to assess the performance of your existing technology in different scenarios? Do you have continuous monitoring in place to ensure no bugs or issues get released into your software?
Availability & SLOs
It’s hard to demonstrate improvement if you don’t have something to measure it against. This is why businesses often resort to five nines as a default metric. But it’s important to make sure you’re using the most valuable service level objective (SLO) that measures the right thing for you, rather than sticking with five nines because it’s what everyone else does.
And it’s good to remember that you don’t have to have the same SLO for everything. For less important services, three nines might be sufficient. Use your risk profile to identify priority areas and realistic SLOs.
Fine nines might be your aspirational goal, but set achievable ones, too.
Use your current landscape to set benchmarks and use the approaches above to create improvement models. Eventually, you’ll start being able to proactively manage and prevent risk, getting you closer to achieving five nines.
Ultimately, it is crucial to focus on delivering a good service and experience to customers. It shouldn’t just be about trying to hit an arbitrary metric – but improving reliability and availability are good places to start.