Best Practices for Ensuring Strong Business Continuity
Business Continuity And Disaster Recovery Best Practices From The Availability Trenches
As a Marine I learned the hard way how difficult it is to think straight under pressure. Now I find myself after two decades in the trenches of business continuity and disaster recovery (BC/DR) realizing some of the same lessons-learned apply to Information Technology! Since becoming a business technologist in the mid-1990’s, I’ve been ambushed by unexpected problems, had to “march” at double-time to meet our business objectives, and sweat it out waiting to see if a plan-of-attack was going to work like our marching orders said it would!
So, I’d like to save you some of the blood, sweat, and tears I’ve experienced along the way by sharing with you these six valuable lessons-learned from the trenches. These are lessons that may seem like common sense… but you’d be surprised how many people ignore them and fail.
Let’s say you have your primary data center, including all your applications and data, in Atlanta, Georgia. I don’t recommend building your secondary data center in a town just 15 miles away. A single event could easily disable both data centers.
Though a “safe” distance depends on geography and other factors, a general rule is this: you want to maintain a full copy of your mission critical data at least 150+ miles away, and have the IT resources needed to use the data for business continuance and recovery. That will usually be sufficient to ensure both data centers are not affected by a single disaster, and your company will be able to continue doing business even if a serious outage or loss occurs.
Ideally, the two data centers would be on separate power grids, but in spite of this general rule, there may be specific use-cases where it’s perfectly appropriate to locate your primary and secondary data centers nearby. Just do your homework or consult with an expert prior to making any final decisions.
An untested plan is a failed plan. If you’ve never tested your plan realistically, it almost certainly will fail because people won’t know which end is up when the chaos hits. There are many kinds of tests, and you will want to leverage them all. There are tests that focus on one particular process. There are tests that assume a limited type of disaster. And unfortunately these days, there are tests where we have to imagine our primary site has been turned into a big, smoking crater.
Testing accomplishes multiple goals: it verifies whether your recovery procedures are correct (or perhaps more importantly, incorrect!), and it makes people familiar with the procedures so they can function in a crisis situation. Tests are practice runs for your plan – and practice makes perfect.
Let’s say you have a very clear idea of your critical business processes, the associated applications and their SLAs, your infrastructure and data sources, and what steps are necessary to recover everything within your Recovery Point and Recovery Time Objectives. In fact, your plan is a beautiful thing: documented, tested, and proven!
Now, picture yourself six months down the line. You have deployed a new application system, moving applications off a physical server platform and into the cloud. Guess what? All your plans and tests related to that area of the business have become outdated and irrelevant overnight. If you have a disaster, all your hard work will have been for nothing because you won’t be able to recover anything (at least not without a lot of late nights and hair pulling). Unless, of course, you practice rigorous change management and keep your plan in alignment with your production environment.
You could have the best BC/DR plan in the world, but if people can’t get to the documents and run-books in a time of crisis, it’s all useless. So if, like many companies, your plans are documented in the form of PDF, Word, Excel, and/or Visio files, you need to make sure all of this is organized and accessible to your team even if the primary infrastructure is destroyed. And never underestimate the importance of version control, and keeping the most updated documents in the system to avoid confusion. Having an inaccessible or inconsistent plan is almost as bad as having no plan at all.
Whether it’s simply storing these documents in the cloud, or building your BC/DR plans in a living disaster recovery planning system; during a serious event, ensuring unhindered access to it is critical.
Too often, BC/DR is the purview of a single IT person or department at a company. If any critical person gets sick, leaves the company, or – heaven forbid – is rendered unavailable in a disaster, the company is left with tons of recovery documentation no one knows how to execute.
The solution is straightforward: Train and involve several people and departments on your BC/DR plan. And, if at all possible, train at least one of those people or teams outside your primary data center region. That way, if a widespread problem or incident renders the people near the primary data center unavailable, the team outside the region can step in to fill the role. A managed recovery program can also help meet this need, whereby the processes and procedures for application recovery are taught to another team of people in another region (or even outsourced to a third party).
What Is a Business Continuity Plan?
A business continuity plan (BCP) is an operational document, outlining how an enterprise will operate in case of an unplanned disaster. A business continuity strategy specifies disaster recovery approaches for recovering IT infrastructure, servers, applications, network connections, and any other resources required to run business operations. In addition, it provides a larger set of instructions for all teams on their responsibilities and actions towards regaining normal operations.
What Is the Primary Goal of Business Continuity Planning?
The goal of business continuity planning is to ensure the rapid recovery of your operations, as well as minimization of operational downtime and data losses. Having a systemized approach to business continuity management also helps to ensure the rapid resumption of services after an unplanned event – be it a natural disaster, global pandemic, or minor operational disruption (e.g., accidental data loss).
Given the current uncertain business climate, implementing a business continuity plan is crucial for ensuring greater operational resilience and protecting your company against internal and external volatility.
Why Is Business Continuity Important?
This year many businesses recognized the importance of business continuity planning when they were unexpectedly forced to shift to remote work and enable remote access to a large volume of business applications, services, and data centers. For many, the crisis presented a new opportunity to speed up the implementation of advanced technologies and adopt new digital products:
Now, however, a new challenge arises – with greater reliance on digital products, data storage, and supporting IT infrastructure, business leaders now need to ensure business continuity across a wider range of assets.
Given that the hourly cost of an infrastructure failure reaches $400,000 for 41% of organizations, and tops 5000mln on average for another 45%, further digitalization without proper continuity planning can accelerate, not mitigate, the operational risks.
Besides, the scope of business continuity plans also pertains to data backups and protection – another crucial aspect for ensuring business-as-usual operations, as well as avoiding regulatory penalties.
As many operations have been restarted cross-industry, taking proactive business continuity planning steps is crucial for ensuring that the new hybrid IT environments are as secure, strong, and resilient as possible.