Roughly 40-60% of midsize businesses never reopen after a disaster! But, with the right disaster recovery and business continuity plan in place, the damage can be minimized. In this whitepaper we’ll examine the common threats businesses face and how to create an effective disaster recovery plan to thwart them.
Disaster recovery plans used to be a luxury only large businesses could afford. However, in an era of growing cybersecurity concerns and increasingly common natural disasters, an effective disaster recovery plan is critical to the survival of every mid-sized business. 90% of businesses that experience a disaster and don’t resume operations within 5 days go out of business within the first year. As a result, it is critical businesses develop and maintain a disaster recovery protocol that makes it possible – and easy – to get back up and running within minutes instead of days or weeks.
An effective recovery plan is always on standby, ready to go into effect. It can be updated incrementally, without time-intensive system revisions, ensuring it scales with business growth. It’s cost-efficient and well-suited to the organization’s needs and what it can support. Therefore, it can help minimize revenue loss and get operations up and running as soon as possible.
But how do you build an effective disaster recovery plan? Here's a high level overview of 10 steps that will help you get started.
#1: Complete a Comprehensive Business Impact Analysis (BIA)Business impact analysis (BIA) is a systematic process to determine and evaluate the potential effects of an interruption to critical business operations as a result of a disaster, accident or emergency. A BIA is an essential component of an organization's disaster recovery and business continuity plans. It includes an exploratory component to reveal any vulnerabilities and a planning component to develop strategies for minimizing risk. The result is a business impact analysis report, which describes the potential risks specific to the organization studied.
One of the basic assumptions behind BIA is that every component of the organization is reliant upon the continued functioning of every other component, but that some are more crucial than others and require a greater allocation of funds in the wake of a disaster. For example, a business may be able to continue more or less normally if the cafeteria were to close, but would likely come to a complete halt if essential IT capabilities were to crash.
No formal standards exist for a BIA, and the methodology can vary by organization. A BIA is generally a multi-phase process that includes the following steps:
- Gathering information
- Evaluating the collected information
- Preparing a report to document the findings
- Presenting the results to senior management
A detailed questionnaire or survey is commonly developed to identify critical business processes, resources, relationships and other information that will be essential in assessing the potential impact of a disruptive event. An education session may be conducted for key personnel with knowledge of the business. Information can be collected in a variety of ways, including in-person interviews and automated surveys. Follow-up interviews may be necessary.
#2: Set Recovery Goals
Before writing a recovery plan, you must assess the potential threats your facilities might face and their impact on operations.
Large, region-devastating disasters, like tornadoes, floods and hurricanes, make the headlines, but they aren’t the only threats to your operation. Hardware failure, hacking, ransomware, human error and power outages are constant, and arguably more damaging, threats which can compromise unprotected data. These disasters corrupt computer environments, destroy hardware, interrupt connections to service providers and destroy data.
After assessing potential threats and their probability, create your RTO (Recovery Time Objective), an estimated timeline for how long it will take to recover. This should be dictated by how quickly you need things to be back up and running. In each scenario, what assets and services are likely to be lost? Which are the most important to restore first? If an expedited recovery is what you need, it can often require more resources to speed up.
#3: Select A Data Backup Strategy
After setting goals for recovery, you must now select the right data backup strategy. Data backup is critical to any disaster recovery plan. Without the right preparations, a single storm or cyberattack can wipe out all of the data a business needs to function. When selecting a backup strategy, there are two keys to look for: redundancy and security.
Redundancy means backing up your data in more than one way and location. Properly managing the location and type of backup ensures that no single point of failure or disaster can debilitate your business.
Security means that your data is protected from disasters and other threats, remaining accessible to your organization and inaccessible to everyone else.
There are two different ways of conducting a data backup ... physical and cloud:
PhysicalPhysical drives are simple to use and easy to scale. But they’re just as vulnerable to disaster as the data they’re backing up. It’s also difficult to separate the backup drive and original source while maintaining backup frequency. This is because the two devices need to be physically next to each other to backup, but also stored on separate locations for security ... because when disaster strikes, a physical hard drive backup on the same site as the original isn’t much of a backup.
For midsize operations, allocating an additional facility – just for storing drives – often isn’t feasible. For larger organizations, it becomes difficult to scale, as large quantities of data will require mountains of hard drives.
Cloud
Cloud backups upload and store data in the cloud, offsite and disconnected from your network. This transfers the burden of redundancy and security to an external location. Cloud backups are an extremely cost-effective solution. They also make it easy to conduct automatic backups, which places less demand on operating personnel to manually back up mountains of data every week.
Cloud backups also are much easier to scale than physical storage. Instead of purchasing and maintaining new drive assets, the SaaS nature of cloud backups makes changing storage amounts as easy as upgrading or downgrading a subscription plan.
#4: Implement Selected Data Backup Strategy
After selecting your backup method, you must implement it. Here are the best options depending on which path you choose.
Physical
For physical storage, one way of creating redundancy is by utilizing dual data centers that run with data mirrored and synchronized. This eliminates downtime, enables a quick recovery and gives organizations total control over the data. But it’s also very expensive and resource intensive – so it’s usually only viable for large companies.
An alternative to this is internal recovery. This involves leveraging more than one facility to store data from all facilities at an offsite location. In the event of a disaster, the information can be sent to other facilities to restore the functions and services of the downed location. This has similar benefits and challenges as mirroring and synchronization, providing a high degree of control while requiring more resources to function.
Cloud
For cloud backups, there are (generally) two different approaches:
- Store backups on a specialized appliance onsite and then securely transmit data to the vendor data center. During transmission, data is deduplicated, condensing and removing duplicate files.
- Send the backups directly from your facilities to the vendor data center, during which data is deduplicated.
The main difference between these two options is the presence of an onsite device. It adds another layer of redundancy to the system. However, for some users, this may be unnecessary.
#5: Establish Data Backup Frequency
After implementing, the next step is to establish a backup frequency and RPO (Recovery Point Objective), the time that might pass during a disruption before the data loss exceeds a business’ tolerance. This could be hourly, daily, weekly or continuous. But be wary – the longer the period of time between backups, the more vulnerable you are to critical data loss. Because of this, data should be backed up as frequently as needed to ensure any data loss is manageable.
Whichever approach and frequency you choose, it is critical to know where your backups are being stored. If you decide to back up your data on the cloud through a vendor, look into where their facilities are and how they store your information. This will ensure that you’re not partnering with a company that operates with inadequate facilities for redundancy or with facilities in locations that are vulnerable to the same natural disasters you’re trying to protect yourself against.
#6: Identify Roles & Responsibilities
An effective recovery plan clearly specifies who is accountable for each task. Every accountability, no matter how small, needs to be assigned an executor and given a timeline.
Larger companies often have dedicated recovery staff to handle these responsibilities in house. For midsize companies, partnering with a managed service provider is the most efficient option from a cost and resource point of view.
#7 Define The Communication Strategy
A disaster recovery plan only works if there is a protocol for reliable communication. Thus, a plan that relies on using company landlines to contact key personnel and vendors will fail when phone lines fail. A diversified communication strategy, one that utilizes a combination of company phones, mobile devices, email, and online messaging, is the most reliable. Similarly, an unclear plan that leaves personnel unsure of who to contact will lead to bottlenecks and prolonged outages.
A comprehensive contact list is a great way of accomplishing this. This could include key personnel, vendors, suppliers, property owners, utilities service providers and technical support. Like the recovery plan, the contact list should be accessible to all stakeholders during a facilities failure. This helps ensure that the right people are in place to execute the recovery plan.
#8: Create A Comprehensive Equipment and Application Inventory
Another element of an effective recovery plan is to create a comprehensive inventory of hardware, software, and data that your business uses. Create a hierarchy of what information and assets are most critical. This ensures that all essential data is backed up, and anything not backed up is expendable.
Each entry for equipment inventory should include critical asset information such as make, model, function, ownership and cost. Setting a frequency for how often the inventory list will be audited helps keep a relevant and accurate guide of what assets are in play. For software and applications, have copies and additional licenses so that they can be easily reinstalled on new equipment and utilized by staff working remotely during the recovery process.
And in all cases, ensure that you have documented primary (and secondary) contacts for all critical vendors, along with the agreed upon method of contact in the event of an emergency.
#9: Test, Test, and Test Again!
Will your disaster recovery plan see you through ransomware attacks, hardware failures, and natural disasters — or will you be caught flat-footed? If you can’t answer that question with an unequivocal, “we’re ready!” you should be investing greater time and resources in disaster recovery testing.
Too many organizations implement backup and disaster recovery solutions and assume they’re prepared to face any eventuality. After all, vendors often promise data recovery in minutes. It cannot be overemphasized how important it is to validate your solutions and processes, as it is important to identify any points of failure and fix them before a real disaster should strike.
Assuming you’ve already fully-documented your infrastructure, application dependencies, data flows, costs of downtime, and SLAs, what’s next? We recommend starting small, and then building subsequent DR tests on the success (and failures!) of your prior DR tests.
And keep in mind that when it comes to DR testing, you’ll need to test BOTH your DR solution AND your people (never underestimate the importance of the human element!). While automated disaster recovery tests serve an important purpose, they only test the technical component of your DR plan. In the event of a real disaster, your people will also need to work quickly and confidently to rapidly restore uptime. Conducting both tabletop tests and simulated technical tests will help ensure your people are prepared to execute against your documented policies and procedures.
But, how frequently should you test your DR plan? Well, that will depend on your business — what works for a local advertising agency won’t work for a regional financial institution. That said, we would recommend you run a full test at least every year. And, if you’re required to comply with stringent regulations like PCI DSS, we recommend more regular testing. Remember ... the more frequently you put your people through their paces, the more prepared they’ll be to respond in the face of disaster. And, with regular turnover of your IT staff, regular tests will be absolutely critical when it comes to spinning up new team members.
Regardless of whether you’re running a short drill to test discrete applications or you’re running a full-scale test, you’ll need to fully document your DR testing plan before you begin. Consider the following:
- How long it’s been since you’ve DR tested your critical applications, and which should be included in your next test (assuming it’s not a full-run)
- What changes have occurred to your IT infrastructure that may necessitate updates to your plan, which must then be validated through DR testing
- And you’ll want to define ...
- who will be involved?
- what, specifically, will you be testing?
- what are the goals for your test?
- what are your expected (acceptable) results? In other words, what constitutes “success”?
When running your DR test, it will be crucial to task one person with observing and documenting the test; this should be their sole purpose on test day. During the test, this person will document any hiccups and record the time it takes to complete each step in your documented disaster recovery procedure. Minimally, they should keep note of the following:
- time required to failover, restore uptime, recover data, and failback
- unexpected technical failures
- the human response to unexpected surprises
- instances where people encountered a lack of clarity in the DR plan, which slowed their progress and created anxiety amongst the team
#10: Devise An Update Schedule
All the testing in the world, however, serves little purpose if you don’t leverage the insight you’ve gained to address vulnerabilities in your DR plan. Did the disaster recovery test reveal any holes? If so, it’s time to gather key stakeholders to determine your acceptable level of risk and how you can reduce the impact of data loss and downtime.
With regular disaster recovery tests and continuous improvements to your plan, you’ll be prepared to weather any storm.
CONCLUSION
Developing and implementing a disaster recovery plan is crucial to minimizing downtime and shielding your business from data loss. Hopefully, the steps outlined above will help you get a jumpstart on building a better disaster recovery and business continuity plan for your business.
AUTHOR: Wade Richmond is the founder and CEO of CISO ToGo, a cybersecurity firm specializing in the needs of small and medium sized business. Wade has 33 years of experience in IT, including Chief Information Security Officer roles for such large enterprises as BJ’s Wholesale Clubs, Ahold USA, Sensata Technologies, GTECH Corporation, Citizens Financial Group and CVS Pharmacies. In these positions, he has been responsible for providing leadership and direction to all cybersecurity and IT risk efforts associated with information technology applications, communications and computing services. To find out more information, please visit www.CISO-ToGo.com.