First and foremost, plans must be action oriented. The action tasks must contain only information that is needed at the time of event. It must be concise and direct the sequence and execution of recovery tasks, but not be so voluminous that it becomes overwhelming. Phrases like “The plan has the following objectives” and “This plan assumes that no two sites will be impacted at the same time…” are good clues that you might not have an action oriented plan.
Plans must address all seven stages of recovery.
Stage 1 is React to the Event during which the incident is first reported, life safety issues are addressed, disaster impact is assessed, recovery teams are notified, personnel is mobilized and deployed and the recovery plan is officially activated.
Stage 2 is Respond to the Situation during which the organization mounts its planned response to the damage and impact caused by the event.
Stage 3 is Recover the IT and Work Applications during which basic hardware and software, file systems and data, and business and communications infrastructure is restored at the alternate sites.
Stage 4 addresses Restoration of Applications and data and databases to a synchronized point in time.
Stage 5 then deals with Resumption of Business processing by coordinating with departmental business resumption plans to re-enter lost data and to input bridging transactions as the foundation to resumed business processing at the alternate site.
Stage 6 addresses the complexities involved with the Return to the Impacted Site which effectively reverses all of the activities of the first five stages to reclaim the home site.
Stage 7 Renormalizes Operations to return to pre-event operating levels.
Plans must be usable at the time of a disaster. A 1,000-page recovery plan in the chaotic environment of a disaster is not advisable. Many plans literally require hours to find the part that directs activities ATOD (At Time of Disaster).
Plans must be written to the proper level of detail. They must assume that qualified staff is available to perform the recovery but that company experts are not available. They must also explain how to perform each task as opposed to simply stating that the task must be performed.
Plans must control the sequence of when each task must be performed. Plans that are simply a compilation of separate “To Do Lists” can not effectively guide the timeline ATOD.
Plans must contain no “filler” that does not explicitly support the recovery effort. Examples of “filler” which should be avoided are: descriptions of various recovery situations, descriptions of the budget process, description of the purpose of the plan and why the project was approved, etc.
Plans must be maintainable. Most plans are not maintainable due to their architecture. The plan must not use embedded variables (i.e. names, phone numbers, configuration settings, etc) within its text. Any variables that exist should be segregated to the supporting procedures and inventories (appendices), which can be easily maintained.
Methodologies are vitally important to disaster recovery planning, but they should not be part of the plan itself. For example, while the Business Impact Analysis that is used to determine critical applications is important to the program as a whole, there is no value to describing that process in an action oriented recovery plan because there is no time of disaster relevance. Similarly, Maintenance and Testing procedures do not belong embedded in a recovery action plan. Though both processes are important, this documentation should not distract from those procedures that are needed ATOD.
Plans must not only define the recovery teams that are required, but must also explicitly coordinate when those teams are required at each of the different sites which must be supported ATOD (there may be 5 to 6 separate sites that must be staffed ATOD). Plans must also define and coordinate which individual team member can fill roles on more than one team without negatively affecting the recovery process or timeline.
Plans must facilitate a detailed impact assessment. If the event is not a smoke and rubble scenario, it may be unclear whether the event qualifies as a disaster.
Plans must include activation of a Control Center from which the entire recovery effort can be managed assuming the home site is uninhabitable.
Tasks necessary to sustain business are essential. Long-term forms and supplies, status updates for vendors and customers, reinstatement of business processes, personnel support functions, rerouting mail, insurance claims, initial press communications, and emergency acquisitions are just a few of the many action tasks needed in the plan
Plans should be scenario-specific to support different impact scenarios (i.e. component level failure, data center failure, business unit failure, etc.) and event-agnostic (i.e. fire, hurricane, o\power outage, etc.) in order to avoid unnecessary complexity.
Plans must direct multiple response levels ranging from notification to monitoring to alert to disaster declaration in order to maximize the possibility of disaster prevention and preemptive response.
For commercial alternate sites, plans must address assessing the availability of the subscribed hardware (given a multiple disaster situation) and direct the appropriate response if the anticipated hardware is not available.
All plans must integrate seamlessly with all other pans including business unit plans and other sub-plans such as crisis communication plan, product liability plan, management successions plan, etc.
Plans must provide for a comprehensive time-of-event communications process that utilizes all necessary tools such as call trees, automated notification systems, chat channels, instant messaging, temporary email, inbound voice recordings, etc. in order to ensure the fastest, most reliable internal communications.
Plans must address the variable responses necessary at-time-of-event based on the fail-over status of highly available and continuously available systems.