Disaster Recovery
and Continuity:
What Every Business Owner Needs to Know
As seen in Small Business
Technology Magazine, March 2006
In
the aftermath of Hurricane Katrina, Tulane University hoped
to recover bins of mail containing tuition checks that were
left behind in the evacuation. Then they got the bad news:
the bins were underwater. Because their financial systems
were also damaged, the school had no way to send out new
bills.
Disasters
are more than natural phenomena, like hurricanes, earthquakes,
or tidal waves. Major disruptions caused by technological
failures can be just as devastating. One major securities
firm lost revenue at the rate of $25,000 per hour while
a seemingly simple failure of a disk array was repaired.
Though
technology can't prevent natural disasters- or cause them-
it can certainly help businesses prevent technical disasters.
Preparing for disasters or other unplanned events that disrupt
"business as usual" can be critical to a company's survival.
Continuity
and disaster recovery are important because all businesses
are vulnerable, and disasters can be more than mere inconveniences.
According to a study conducted by research firm Gartner,
40 percent of all companies that experience a catastrophic
event or prolonged computer outage never resume operations.
Of those that get up and running, one in three goes out
of business within two years. Thus, 60 percent of businesses
affected by major disasters disappear within two years.
No
matter where one's business is located, there are too many
disaster scenarios to plan for individually.
Though some are weather-related, others are caused
by problems within a particular building or by power outages,
which can occur anywhere, anytime.
Rather
than plan for each potential scenario, every business owner
should consider the recovery time objective. In other words,
in the event something were to happen, how quickly does
the problem need to be remedied? One second? One day? One
week? Clearly, the recovery time objective will be different
for each aspect of your business. Perhaps the accounting
systems need to come back online within two days, whereas
other production systems need to be rectified immediately.
Business
owners should also consider their recovery point objective.
Which is more important: loss of time or loss of data? At
what point do the systems and information need to be recovered?
If the last transaction were lost, for example, is it recoverable
in some other way? Would backup systems from the previous
night be good enough?
The
recovery time objective and the recovery point objective
need to be considered for each aspect of your business:
production, finance, human resources, et al. Defining Recovery
Time Objective and Recovery Point Objective for each area
of your business will drive planning, and everything else
will fall more easily into place.
The
key elements of business continuity and disaster recovery
are relatively simple: What needs to be recovered?
When does it need to be recovered?
Who conducts the recovery work? Who else is impacted?
What do other employees do? Where do they report to work?
Determining how the recovery is accomplished- which is both
a people and process discussion- drives the technology solution.
For
most businesses, the top priority is to recover online applications:
ERP applications, accounting applications, customer relationship
management, financial systems and production management
systems. Other essential services include e-mail access,
voice services, and the internet, along with the company's
intranet and website.
Offline
essentials also need to be considered, including forms,
licenses, and contracts. Depending on the business, restoring
key operations like production, supply chains or shipping
capabilities is also essential.
Once
business owners define what must be recovered, they then
need to consider when it needs to be recovered. In most
businesses, customer and employee interface functions and
communications are imperative. Without this, maintaining
relationships and recovering from the crisis becomes exponentially
more difficult.
Getting
e-mail, web sites, and voice communications up and running
is imperative. For most businesses, bringing essential business
functions that keep orders coming in or basic services running
is next on the list of priorities. Then comes traditional
back office or internal business functions, such as human
resources and benefits.
So
who will be responsible for recovery planning and testing?
Who will do the recovery work, and what will the channels
of communications be? The answers will, of course, vary
from business to business. However, the most effective response
involves representatives from every department in the planning
and testing processes in addition to any ongoing communications
that take place during a disaster.
The
same way that periodic fire drills help employees safely
exit a building in the event of a fire, technology disaster
drills ensure that every individual in the company knows
what his or her responsibilities are in case of such an
event. The plan needs to be widely disseminated to all employees
so that everyone has a clear understanding of how to help
save the company from ruin.
The
best way to prepare for disaster recovery, in broad strokes,
is to establish an automatic fail-over hot site, where critical
functions instantaneously flip over and become available
at an alternate location. Few small businesses, however,
are able to justify this expense. Thankfully, there are
other alternatives available to meet Recovery Time Objectives
and Recovery Point Objectives.
The
best, most cost-effective disaster relief solutions implement
strategies organized by tiers that consider cost along with
the Recovery Time Objective and the Recovery Point Objective.
Tiered
strategy:
|
Tier
|
Cost
|
RPO
|
RTO
|
Solution
|
No.
of Apps |
| 0
|
Significant
|
<5
min |
<1
hr |
Dedicated
hot site, real-time recovery fail-over |
7
|
| 1
|
40%
Premium |
1
hr |
2-3
hrs |
Mirrored
storage, dedicated servers |
6
|
| 2
|
20%
Premium |
8
hrs |
24
hrs |
Storage
vaulted, transaction log, warm site |
4
|
| 3
|
10%
Premium |
14
hrs |
48
hrs |
Storage
vaulted, warm site |
3
|
| 4
|
Standard
IT |
32
hrs |
72
hrs |
Tape
storage, cold site |
100
|
The bottom of the tier might include standard
information technology with a tape backup solution, for
example, which might have a Recovery Point Objective of
four days. That means that the last four business days of
data would have to be recovered some other way. With a Recovery
Time Objective of 72 hours, for example, the solution would
be to rely on tape storage in a cold site that might incorporate
100% of basic business applications.
More
important functions may be satisfied with vaulted storage
in a warm site or might include the same solution with a
transaction log in a warm site. More critical applications
that require a one hour Recovery Point Objective and a two
hour or three hour Recovery Time Objective might require
a 40% premium in cost. These would typically use mirror
storage and dedicated servers, which might apply to about
six applications.
The
few applications that really do require a less than five
minute Recovery Point Objective and a less than one hour
recovery time objective could represent a significant cost
premium for a dedicated hot site with a real time recovery
in fail over.
So
what is the best way to implement disaster recovery solutions?
First, it is essential to create a plan that explores the
technology options for each set of Recovery Point Objective
and Recovery Time Objective requirements as outlined in
the aforementioned tiered strategy.
Next
comes installation of the technology and procedures. After
that, a test run is required for each application and business
function. Staff must be cross-trained in how to perform
those recovery operations. Finally, tests need to be performed
at least annually to ensure that the backup procedures will
still work, that all the technologies are still compatible
and that the employees are still familiar with the procedures
they need to follow.
It
is also essential to maintain lines of transportation and
communication, since it is almost impossible to maintain
a chain of command breaks when communications are severed.
Alternative sites for disaster recovery with the appropriate
amenities and security must be established.
According
to an IBM survey, the most common complaint regarding business
continuity and disaster recovery plans is that business
owners rarely present a big-picture view of what needs to
be recovered and in what priority. The next pitfall is a
lack of executive commitment.
Insufficient
resources and a desire to play the odds are partly to blame,
of course, but disasters can strike at any time, in any
guise. When they do occur, the results are usually devastating.
The best way to define Recovery Time Objectives and Recovery
Point Objectives for each business application and function
within an organization is through a tiered strategy that
allows businesses to restore and recover the critical applications
first, while other applications can be recovered at a lower
cost and at a later time.
The
training, documentation procedures, transportation, and
communications available to your team will mean the difference
between success and failure should you need to mount a disaster
recovery effort. It can happen to you. It pays to be prepared.
Pragmatix,
Inc
Bill Abram is President and founder of Pragmatix, Inc.,
Elmsford, NY, a leading information technology company founded
in 1992, that builds custom database and web-enabled applications.
He can be reached at 914-345-9444 or via e-mail billa@pragmatix.com.
|