How to prepare for a major IT failure

23/01/26 Wavenet
IT failure

When a major IT failure strikes, organisations don’t get a second chance to prepare. Decisions must be made quickly, and IT information needs to be trusted, accurate, and accessible. This is where business continuity (BC) software can play a crucial role in strengthening IT service continuity (ITSC). In this article we consider what you can do to prepare in advance of an IT failure occurring and then look at how BC software can help.

Defining IT service continuity

It’s helpful to clarify some key terms:

“IT critical incident”
The highest severity level of IT incident. This typically requires the recovery of multiple IT systems and is managed by the core IT management team rather than a single Incident Manager. A priority 1 or IT major incident might escalate to be an IT critical incident.

“IT Service Continuity” (“ITSC”)
An umbrella term covering the preparation for, prevention of, and response to IT critical incidents, this encompasses concepts which may be more familiar, such as “IT resilience” and “IT disaster recovery”.

“ITSC owner”
The person in the organisation who is responsible for understanding, documenting and recording ITSC requirements. This may be the Head of IT (or another person in the IT business function), or possibly the Operational Resilience Manager / Business Continuity Manager. If there is no person assigned this responsibility during business-as-usual at your organisation, identify who would have this responsibility during an IT critical incident.

Business-as-usual (BAU)
Day to day operations, outside of a major critical incident.

The growing complexity of IT service continuity

Regardless of whether they use dedicated business continuity software or not, organisations tend to have varying levels of understanding of their IT service continuity requirements. At the most basic level, this may be no more than a list of the five to ten key IT systems that require priority recovery during an IT critical incident. As organisations grow, so too does the number of systems, interdependencies, and the overall complexity of recovery requirements.

One of the biggest challenges for ITSC owners is the difficulty of obtaining, validating, and maintaining this information. Stakeholders across the business have competing priorities, and ITSC owners themselves may be focused on more immediate operational demands. Crucially, when an IT critical incident occurs, there is no opportunity to catch up on this foundational work. If ITSC requirements have not been thoroughly understood and documented during BAU, the resulting gaps can lead to inefficient incident management and significantly higher impacts on the business.

Why BAU preparation matters during an IT critical incident

During an IT critical incident, there is no opportunity to undertake these principal steps effectively:

  • Perform a business impact analysis (BIA).
  • Reassess the true criticality of IT systems.
  • Define or validate recovery time objectives (RTOs) and recovery point objectives (RPOs).
  • Identify or confirm business and system dependencies.

During an incident, the ITSC owner must rely entirely on what has already been recorded, regardless of how complete or accurate it is. This information will quickly be scrutinised by senior stakeholders, including executive leadership, making clarity and confidence essential.

Are you prepared for an IT incident?

Here are the key things you need to ask, to sense check your ability to manage a major IT failure:

  • Are BIAs conducted to determine business recovery requirements, including IT?
  • Are RTOs and RPOs based on recent and reliable BIAs?
  • Are IT dependencies clearly mapped and understood?
  • Is the priority order for IT recovery well defined and agreed?
  • Is this information up to date?
  • Are you able to quickly and easily access this information?

If you can answer all of these questions with a “yes”, that’s great. You’ve done the fundamental groundwork that will pay dividends when you’re faced with an IT failure. If not, it’s important to invest the time to do the above steps and get prepared.

Keeping information up to date

One of the most frustrating scenarios we see as a business continuity provider is when an organisation has a solid plan mapped out but has not reviewed or updated it for some time. Organisations don’t stand still. So much happens within an organisation relating to the business, personnel, processes and technology, it’s unlikely old plans will be fully effective and they can even hinder a recovery situation.

Keeping information accessible

Do your key IT people know where your IT Service Continuity information is stored? Is it readily accessible to all those who would need it, whenever they are likely to need it (which could be, “out of hours”)? Is the information captured in BC software, or is it distributed across documents, often spreadsheets, that may be difficult to maintain and validate? If the latter, ITSC owners and stakeholders should consider whether the data can truly be trusted.

The hidden risks of spreadsheet-based ITSC data

Consider the following questions if your ITSC data is kept in spreadsheets or other documents, rather than BC software:

  • Are spreadsheet formulas functioning correctly?
  • Has all data been entered accurately, considering any known weaknesses in the spreadsheet design (e.g., fields easily overwritten)?
  • Are links and references current and understood?
  • Are macros up to date and documented?
  • Was the spreadsheet created by the current ITSC owner or by a predecessor no longer available to provide clarification?
  • How confident are users that they understand the spreadsheets and their limitations?
  • During an IT critical incident, how easy is it to extract ITSC requirements and communicate them clearly to key stakeholders?
  • How does this confidence change if the ITSC owner is unavailable and another individual must locate, interpret, and use the spreadsheets?

Given that there is little to no time during a critical incident to resolve issues like these, ITSC owners risk presenting inaccurate or untrusted data to recovery teams and executives.

The value of business continuity software for ITSC

Business continuity software makes it easier to record and update potentially complex information regarding IT dependencies, RTOs and RPOs. Dedicated BC software provides a stable, structured environment for maintaining ITSC data. Rather than relying on potentially fragile documents, information is held in a platform designed specifically to support continuity and recovery planning. It’s also easier to make relevant information available to everyone who needs it.

This delivers clear benefits during an IT critical incident, but the advantages also extend well into BAU operations.

Key benefits include:

Reduced administrative burden

BC software automates many tasks that are labour-intensive and error-prone in spreadsheets, freeing ITSC owners to focus on planning, strategy, and exercising.

Improved decision-making

IT teams gain better visibility of dependencies and priorities, supporting more informed responses to day-to-day incidents as well as major outages.

Easier data consolidation and reporting

BC software can consolidate and report on large amounts of data much faster than copying and pasting spreadsheets together. This can give organisations a faster and more accessible insight into how IT systems support critical services, helping to guide smarter, more targeted investment.

Building more resilient IT services

Ultimately, IT service continuity depends on preparation. When requirements are clearly understood, regularly maintained, and stored in a trusted system, organisations are far better equipped to respond effectively under pressure.

By using dedicated business continuity software to manage ITSC requirements, organisations strengthen both their BAU capabilities and their ability to manage IT critical incidents, creating a more resilient, confident, and responsive operation overall.

Note:

References to “Business continuity software / BC software” above reflect the author’s knowledge and use of Wavenet’s Shadow-Planner, and are not a statement of the capability and functionality of any other BC software products produced by other organisations.

About the Author:

David Davies MBCI

David-DaviesDavid has a highly focused skillset secured through over 25 years’ experience in IT service continuity and over 20 years’ experience in business continuity management.

David joined Wavenet from Barclaycard where he was a key member of the business continuity relationship management team, responsible for embedding IT service continuity across the organisation. He had previously held several other business continuity and IT service continuity positions for organisations including IBM Global Services.

David has become a highly respected consultant in the industry, initially with Jermyn Consulting and then with Wavenet. Over this time, he has delivered BCM and ITSC professional services to 100+ organisations.

He has delivered advice and solutions to organisations in the financial services, healthcare, housing, manufacturing, retail, technology, and transport sectors.

In 2019 David was named 'Continuity and Resilience Consultant of the Year' at the Business Continuity Institute's European Awards and was shortlisted in the 'Advisor of the Year' category at the CIR annual Business Continuity Awards.

Strengthen your IT resilience - speak to our continuity experts today.

Latest blogs

See all posts
Placeholder thumbnail
4 critical strategies for ensuring business continuity in the manufacturing industry

The manufacturing industry is currently undergoing a significant transformation with the advent of Industry 4.0. In order to optimise this transformation, manufacturers must prioritise operational resilience. Safeguarding production output and mitigating risks arising from cybercrime and supply chain disruptions are paramount. In today’s environment, manufacturers frequently encounter disruptions within their supply chains. It’s essential to have a robust business continuity and disaster recovery plan for addressing critical events and ensuring uninterrupted product delivery to customers. Considering these challenges, let’s explore some of the key strategies that manufacturers should adopt to secure their long-term success, even in the face of business-impacting events. 1. Assess the risks your business may encounter To begin, identify the critical aspects of your business, their dependencies, and how long you can operate without them. Understand the recovery capabilities of these dependencies to spot potential risks to your business and its recovery. Conducting a thorough Business Impact Analysis (BIA) will help uncover this valuable information. In manufacturing, typical disruptions include hardware and software issues, power failures, cybercrime, human error, natural disasters, and fires. Performing a BIA can be labour-intensive and time-consuming, but it swiftly reveals operational risks that might otherwise remain hidden until an incident occurs. While conducting a BIA internally is an option if you have the necessary resources, many businesses choose to outsource this task to external experts. Wavenet is here if you need us. 2. Establish your business-critical resources Manufacturers rely on vital assets, including office buildings, warehouses, production lines, and transportation hubs. These assets face many threats and disruptions. Therefore, your business continuity and disaster recovery team should work with senior leadership to identify the most important resources. Creating a simple list of these business-critical assets, without the need for extensive documentation, will suffice. Use that list to prioritise which function must be restored first to protect those assets. Whether it’s equipment, IT systems, or production lines, focus on what matters most. Then develop targeted comprehensive business continuity and disaster recovery plans around those priorities. 3. Develop your business continuity and disaster recovery plans Now it’s time to construct your business continuity, crisis management, and disaster recovery plans. It is crucial to understand the distinctions between these plans and how they can complement one another. A crisis management plan enables your business to respond swiftly and in an organised manner to unforeseen or sudden incidents. It includes vital information regarding communication protocols with staff and key stakeholders, escalation and de-escalation procedures, as well as immediate actions to be taken. On the other hand, a business continuity plan outlines the steps necessary to recover and resume critical operations at a predefined level after any disruption that affects the business’s functioning, regardless of its duration. A disaster recovery plan primarily focuses on restoring the business’s critical technology infrastructure. It also encompasses procedures for managing the recovery of IT and communication services to support the business after a service disruption. If needed, a crisis management plan can be integrated into the broader business continuity plan. Top tip – prioritise smart planning over excessive planning! Throughout our experience, we have observed numerous organisations attempting to prepare for every conceivable situation. However, the truth is that it’s impossible to anticipate every single thing that “might” happen. Therefore, it is crucial not to burden yourself with that expectation. A successful plan isn’t one that dictates actions for specific scenarios, but one that empowers you to make well-informed decisions in any situation. A useful plan is one that is actually utilised because it provides assistance. When creating your plan(s), consider what essential information is necessary for guiding your decision-making process. Anything beyond that is likely unnecessary, as it only complicates the plan and renders it impractical. If you find it necessary to have a plan tailored to a specific scenario, ensure that it focuses solely on that particular situation. Most importantly, make sure that everyone understands the purpose of the plan. 4. Harness external resources No business is impervious to cyber threats, operational risks, and the unpredictable nature of life! When faced with adversity, having a well-tested business continuity strategy and reliable business continuity services can make a substantial difference for manufacturers. At Wavenet, we excel in both of these areas, and we are here to offer our assistance. Given the intricate nature of business continuity and operational resilience, it is understandable that many companies seek outsourced solutions. This approach ensures that you benefit from the expertise of professionals experienced in crafting comprehensive business continuity plans. It also grants you access to cutting-edge solutions based on industry best practices. Our team of BCM/OR consultants is equipped to oversee your entire business continuity management program, relieving you of the challenges associated with in-house management. Our ultimate goal is to help manufacturers optimise efficiency, streamline costs, and address present and future industry challenges, while safeguarding customers and infrastructure in today’s ever-expanding online marketplace. Check out: Resilience checklist for manufacturers.

Read more
Placeholder thumbnail
The advantage of business continuity for retailers

However, with evolving consumer behaviours, the rise of round-the-clock shopping, and the continuous advancements in online and mobile technologies, data analytics, and business intelligence tools, the need for effective business continuity management has become a paramount concern. It is now essential for retailers to protect their technology investments and maintain a competitive edge. Alongside the need to “keep the lights on.” Here are some of the key benefits that business continuity brings to retailers like yourself: Safeguards your valuable data In the retail industry, data plays a crucial role. It provides valuable insights that drive sales, enhance the customer experience, and optimise various aspects of operations such as inventory management and waste reduction. Any downtime or data loss can have a significant impact on the success and continuity of your business. As the prevalence of cyber threats continues to grow, investing in disaster recovery services and business continuity has become imperative for retailers. The costs associated with dealing with breaches or outages far outweigh the investment in proactive measures. By implementing strategies like real-time system mirroring and immutable backups, you can ensure the protection of your critical systems. In the event of a disruption, these measures enable quick recovery in a secure and dedicated location, minimising downtime and ensuring the integrity of your data. Immutable backups provide additional protection against not only traditional risks like equipment failures or environmental issues but also against evolving cyber threats such as ransomware. By storing backups offsite and in an air-gapped environment, they offer an extra layer of security that cannot be tampered with. Protects your brand reputation Consumer trust is the holy grail of a retailers success. Consumers buy into a brand rather than individual retail channels, and despite difficult economic times, trusted retail brands have increased their profitability and fostered their marketing position. But what would happen if that trust was breached? When customers lose trust in a retailer, they start to look elsewhere, which usually leads them to a rival business who has not just suffered a cyber attack. And it doesn’t stop there, these consumers then go on to tell others about their experience, complain via social media channels and leave negative reviews on platforms such as TrustPilot. It is evident that losing data and trust go hand in hand. It’s essential that you have an effective data protection strategy in place as well as a disaster recovery plan to mitigate risks. It is also just as important to have a robust crisis communications plan – having a plan in place that outlines proper communications with your customers leads to increased transparency and trust. Safeguards your supply chain Inevitably, disruptions will arise—it’s not a matter of if, but when. Establishing documented contingency plans to ensure the uninterrupted delivery of your products and services during these disruptions is crucial for the ongoing prosperity of your business, especially during critical retail periods such as Black Friday, Christmas, or the summer season. Any disruption occurring during these peak trading periods could have catastrophic consequences. Effectively managing supply chain risks is a fundamental aspect for retailers and can not only safeguard your operations when issues arise but also give you a competitive advantage over rivals who may not be as well-prepared. By proactively addressing potential disruptions and implementing strategies to mitigate their impact, you can demonstrate resilience and reliability to customers, setting yourself apart in the marketplace. Mitigate financial risks According to recent data from IBM, in 2022, the average time to identify and contain a data breach in the retail sector was 287 days. The longer it took to detect and resolve the breach, the higher the associated costs. In terms of financial impact, the average cost of a data breach for retailers was £2.68 million in 2022, a significant increase from £1.64 million in 2020. Furthermore, the study highlights the importance of having robust business continuity plans in place. Organisations that had effective business continuity planning and tested disaster recovery plans experienced an average total cost of a data breach that was £2.01 million lower than those without such preparations. This demonstrates the clear correlation between prioritising business continuity planning and reducing financial risks associated with data breaches. These statistics underscore the need for retailers to invest in comprehensive security measures and prioritise business continuity planning to safeguard their operations, protect customer data, and mitigate the potential financial consequences of a data breach. What should I do next? Wondering what your next steps should be? The concept of business continuity management (BCM) has gained even more significance in recent years, prompting retailers of all sizes to re-evaluate its importance. The consequences of not having a BCM plan in place can be severe and debilitating. It’s easy to get overwhelmed by competing demands from various departments, management, and IT, resulting in confusion, delays, and potential damage to reputation. If BCM is a top priority for you but you are unsure where to begin, we recommend checking out our advisory article “5 top tips for successful business continuity planning”. The next crucial decision revolves around whether to handle BCM internally or outsource it to a specialised third party. Both options have their pros and cons. To gain a deeper understanding of this topic, you can read more on that in our insightful article – “Business Continuity Management (BCM) – are you going out or staying in?” We hope these resources prove helpful to you. However, it’s worth noting that we are also the industry leader for business continuity and operational resilience in the UK. Whether you choose to outsource BCM or manage it in-house, we offer award-winning services and support to assist you along the way.

Read more
Placeholder thumbnail
5 essential tips for effective business continuity planning

1. Business continuity planning: think smart, not big! Throughout my experience, I’ve witnessed countless organisations attempting to plan for every imaginable situation. However, it’s important to acknowledge that it’s simply not feasible to anticipate every single eventuality. Instead of striving for an all-encompassing plan, focus on creating a plan that empowers you to make well-informed decisions in any given scenario. A good plan is one that is practical and useful, providing essential information that aids decision-making. Avoid including unnecessary details that only make the plan cumbersome and impractical. If you find the need for a specific scenario-based plan, ensure it is laser-focused on that particular situation. Most importantly, ensure that everyone comprehends its purpose. 2. Safeguard your business: understand what is important and why In my view, it’s crucial to have a clear understanding of what is truly critical in your business and the dependencies associated with those critical elements. Without this understanding, how can you effectively safeguard your business? You may find yourself dedicating efforts to the wrong areas or unintentionally overlooking dependencies that could have a significant impact. Conducting a Business Impact Analysis (BIA) may require considerable effort and time, but it serves as a valuable shortcut for identifying operational risks that may not surface until an actual incident occurs. It’s of utmost importance that anyone involved in business continuity or operational resilience comprehends the significance of these efforts and fully buys into their purpose. 3. Thriving operational resilience: board engagement is key Throughout my experience, I have encountered the repeated notion that securing management buy-in is indispensable for a thriving resilience program, and I wholeheartedly agree. Without the endorsement of management, how can we anticipate the rest of the staff to actively participate? It is crucial for employees to recognise that the resilience program is backed and mandated by the board. It is not an optional endeavour but a necessary one in order to safeguard the business. Over time, it becomes ingrained in the culture of the organisation, shaping the way we operate and protect our interests. 4. Incident response plans: you are only as good as your last test! I firmly believe in the adage “you are only as good as your last test.” It doesn’t matter how impressive your plans may be if you’ve never put them to the test. How can you be certain they will actually work? Furthermore, how can you ensure that your staff knows their roles and responsibilities during an incident? Do they understand the procedures they need to follow? Testing is an invaluable practice that fosters inclusivity and helps individuals grasp their responsibilities in an emergency. It not only boosts their confidence in meeting expectations, but also allows them to practice their response in a controlled environment, free from the fear of making mistakes. Remember, it’s far better to discover any flaws or missing critical data during an exercise than during a real incident when you’re relying on it. Testing is the key to building resilience and ensuring readiness when it matters most. 5. Include suppliers in your business continuity planning I want to emphasise the significant role that suppliers play in our success. They often provide crucial services and data that are vital to our operations. It’s important to treat them as an extension of our own business or as an additional department. Taking the time to understand them in detail is essential. Including them in our business continuity planning and testing processes is a crucial step towards building a resilient relationship. It ensures that both parties understand the significance of what they provide to us. Equally important is gaining insight into their resilience capabilities and how they would continue to deliver their services in the event of an incident. By involving them in our program and taking the opportunity to develop a deeper understanding, we can foster a stronger and more resilient partnership. You may also be interested in: Business Continuity Management (BCM): are you going out or staying in? Why supply chain resilience matters and how to build it About the author Colin Jeffs MBCI moved into the realm of business continuity from IT project management where, as part of implementing IT systems, he had to implement resiliency. Colin has worked in business continuity/operational resilience and crisis management for more than 28 years, holding senior roles in these disciplines for many years at major financial institutions in the city of London. Colin now heads up Wavenet’s award-winning operational resilience consulting and software division.

Read more