It can be a big can of worms, but tackling IT downtime can be the first step to major cost savings. Here’s everything you need to know about downtime but were too afraid to ask.
What is unplanned application downtime?
It’s a pretty self-explanatory term, but unplanned IT downtime is essentially any disruption to IT services that are not anticipated or scheduled by the business. Unplanned downtime can be caused by many different sources: hackers or viral attacks, hardware or software faults, or even natural disasters.
When downtime occurs, it leaves people unable to access services or do their work. Minor events during non-work hours might only inconvenience IT teams, but larger events can impact staff and customers, too. The bigger the issue, the harder it can be to solve, and the more people that are affected.
Is downtime the same as an outage?
The short answer? No.
The longer – and more useful – answer is that an outage is the suspension or unavailability of services or operations, while downtime is about the time that customers and employees lose as a result. In a best-case scenario, it’s possible to resolve an outage without actually causing downtime.
A disruption that results in partial suspension of services is sometimes called a brownout.
Planned downtime vs unplanned downtime
Put simply, planned downtime is scheduled. Unplanned downtime is not.
Planned downtime might be things like hardware repairs, essential maintenance, or major software updates.
Unplanned downtime is, unsurprisingly, more costly than planned downtime. But both cause disruption and unwanted expenditure.
How much does unplanned downtime cost?
In 2014, Gartner estimated that IT downtime can cost organizations up to $5,600 per minute – and that number has only risen since.
In 2018, the average annual cost of planned downtime was $5.6 million for just 830 minutes. And unplanned downtime can cost up to 35% more. That’s $7.6 million – bringing the total cost of all annual downtime to $13.2 million.
Meanwhile, 18% of enterprises surveyed in 2020 reported that their downtime costs were in excess of $5 million per hour.
That’s a lot of numbers – but it’s clear to see that the cost of downtime is only increasing as businesses’ IT architectures and operational ecosystems become ever more complex:
|Average annual cost of downtime per minute||$5,600||$6,747||$83,333|
A target of 99.95% uptime equates to 5 minutes of unplanned downtime a week. But for 59% of Fortune 500 companies, the real number is more like 1.6 hours. That equates to a massive $8 million per week lost due to unplanned downtime.
But it’s important to remember that the costs associated with unplanned downtime aren’t purely financial. Depending on the level of disruption, downtime can affect a business’ reputation, as well as customer loyalty.
It’s not just external consequences, but internal, too – employee churn can easily increase if their experiences are disrupted. And for highly regulated industries like Financial Services, there might even be legal ramifications, litigation, or other penalties.
Unplanned application downtime examples
Take a look below to see some examples of the impact that unplanned downtime can have in your industry and others.
Manufacturers are increasingly investing in smart factories and OT/IT alignment. But when your manufacturing equipment is connected across a network, it’s not just the software that’s affected by IT downtime. Without failsafes in place, machinery can stop being operational, too.
If this happens, it can cause disruption across the entire supply chain. And not only that, but downtime due to equipment failure or similar faults can cause health and safety issues, risking employee welfare and legal action.
Financial Services is one of the most high-risk sectors when it comes to the ramifications of downtime. Even the shortest outage can cost millions. And with added levels of security, it takes longer to get back up and running – particularly if the cause is cyberterrorism.
As a result, there’s a huge amount of pressure for Financial Services institutions to increase their reliability. But it comes with a lot of challenges to overcome, like ever-changing regulations and overcoming legacy technology.
On the most basic level, if a retailer’s website goes down, sales are directly affected. Just a one-hour outage cost Amazon $100 million.
And it’s not just loss of traffic that can cost millions. Too many customers overwhelming your servers can have similar consequences. Retailers need to ensure that they have capabilities in place to cope with any eventuality, whatever the cause of the downtime.
It goes without saying that lots of government agencies deal with critical services. Citizens expect 24-hour service, especially for emergencies.
Many governments are increasing e-services, which is great for accessibility, but they can introduce issues of security and risks of downtime. So a secure, reliable infrastructure is key.
The ramifications of IT downtime on the Education sector can be huge, especially in Higher Education.
In the wake of COVID 19, even more educational resources are now hosted online, including lectures, classes, and digital learning tools. And they’re usually accessible via a centralized learning platform like Blackboard. This means that if something goes wrong, thousands of students could be affected, causing administration havoc, re-sits, and distress for staff and students alike.
Media and entertainment
As with retail, even a short amount of downtime can pour money down the drain – particularly for social media.
For media and entertainment giants like Meta, if one server goes down then multiple platforms are affected. Their infamous outage across Facebook, Instagram and WhatsApp cost over $100 million – not to mention lost media spend.
Preventative measures to avoid unplanned downtime
While it’s obviously ideal to avoid any kind of downtime, sometimes things do go wrong. Outages are eventually inevitable, whether it’s human error or technology malfunction. So it’s crucial to be able to deal with them when they do occur.
First, it’s imperative that you have an up-to-date incident response plan in place, and to make sure your teams are informed and well-trained to respond quickly to issues.
Next, businesses need to do everything in their power to learn from incidents and stop them from happening again. Postmortems are essential to prevent recurrent issues.
Then, it’s about strengthening your existing IT architecture. There are a number of tools and methodologies you can deploy:
- Observability: by having a complete view of all your operations, you’ll be able to implement real-time monitoring of all systems and software, so you can easily spot the issue and resolve it quickly.
- Predictive maintenance: You should invest in systems that can anticipate and prevent problems, rather than pick up the pieces after the fact. Predictive maintenance also means you can avoid planned downtime for large software updates.
- Chaos engineering: can improve system resilience by testing IT systems, using chaos theory to simulate unexpected disruptions.
- Site reliability engineering (SRE): takes DevOps and software engineering methodologies and applies them to operational infrastructure. This makes sure that your ecosystem is as reliable as possible and can withstand disruption.
- Customer reliability engineering (CRE): is a way of working with your customers to help them to deploy SRE methodologies in their own business. It brings an added level of service reliability – and an extra layer of defense – to your IT landscape.
Together, these approaches can help you to tackle the root causes of downtime, and strengthen your response when it does happen. With downtime as costly as it’s been shown to be, it’s clear that reducing it as much as possible should be a priority.