A Case for All-Risk Incident Management

The disaster recovery/business continuity industry has changed dramatically in the last few years.
Technologies that just recently were bleeding-edge are now fully stress tested and ready to respond to increasing demands for 24x7x365 continuous availability. As a result, many organizations are re-evaluating their disaster recovery/business continuity (DR/BC) capabilities. Accompanying this technology reevaluation is often a reevaluation of the overall program scope. As global business practices, national and international financial challenges, constantly increasing competition, industry consolidation and the constant chase for improved profitability raise the bar for today’s continuity programs by generating new requirements for much higher levels of availability, recoverability and operability—they also stretch the bar for the nature and scope of covered events.

Most companies now realize that achieving an acceptable return-on-investment for their continuity dollars means finding and implementing different approaches than those used in traditional DR/BC programs…even those as recent as the past few years. And most agree that any new approach must support one critical objective—integrating systems availability, business continuity, disaster recovery and incident management processes into a seamless program that protects the organization from all business incidents on a daily basis, not just during a traditional “disaster” event.

We refer to this kind of optimal, highly leveraged program as All-Risk Incident Management™.
And we believe it is the natural next step in the maturity of the disaster recovery/business continuity industry and in the maturity of an organization’s ability to truly insulate themselves and conduct business in the face of nearly any adverse event. Consider the industry’s focus at its inception some 30 years ago. Disaster Recovery fundamentally meant IT recovery and at the time, IT recovery usually meant mainframe recovery. It didn’t take long for people to realize that recovering a computer without solving the associated people, workplace and connectivity issues accomplished very little. So someone invented business continuity. While DR/BC was clearly more than the sum of its parts, the new discipline still left more potential issues unaddressed than it solved.

Take for example one of today’s most prevalent concerns—pandemic operations. A case could be made to address pandemic operations under the umbrella of the existing business continuity program. But consider further. Business continuity plans are focused on creating a workplace where people can be reassigned when their regular workplace becomes unavailable. A pandemic plan on the other hand seeks to keep people isolated from each other to prevent contagion, even when the workplace is fully functional. How long does it take before the two plans start encroaching on one another until neither represents an effective solution? Now add other common all-risk initiatives to the mix. Crisis communications, management succession, supply chain continuity, manufacturing continuity, life safety, virus response, labor relations, product liability, crisis management and a dozen others. Clearly, the breadth of this range of incidents and the need for each of these “plans” to integrate seamlessly together for cohesive enterprise-wide repose would strain the mechanics of traditional disaster recovery or business continuity model.

However, the good news is that with a few manageable, but critical, adjustments of methodology, an All-Risk Incident Management framework can be established that will concurrently support plans not only for the above risks, but for all other risks that the organization might face.
And it will support them on a modular basis, when they are needed and where they are needed. Additionally, existing DR and BC plans can be retrofitted into the new framework with minimal effort. The result is a highly responsive, enterprise-wide ability for senior management to respond to any incident of any type with a coordinated, cohesive response across business units, across locations and across the globe—while at the same time knowing that the subject matter experts with their “feet on the street” are focused on the specific topic of their plan. It’s the best of both worlds.

An All-Risk Incident Management capability of this nature requires a framework that addresses four elements that are new to traditional DR/BC planning. First is a structured process to conduct a situational assessment when Threshold Triggers™ are exceeded. Threshold Triggers can be any metric that once exceeded indicate the occurrence (or likely occurrence) of a business-adverse event. Threshold triggers may be as obvious as traditional DR/BC events like fire, flood or hurricane. Or, they may be as subtle as an unexplained drop in the price of the company’s stock or as unlikely as the loss of a senior manager in a car crash. The one thing all of these events have in common is that they portend a negative environment that the organization needs to respond to. The list of Threshold Triggers can be as narrow or as broad as you like, and as long or as short as needed. They, however, must be easily monitored and readily measured. Once a Threshold Trigger is exceeded, the incident management team must conduct a Situation Assessment to determine if the incident has, or might, impact the organization and to determine which areas of the organization might be impacted. Based on this assessment, the appropriate all-risk plans are activated for the appropriate locations.

The second necessary element is a manageable cross-plan command and control infrastructure, Even when DR/BC programs are supplemented with plans for other risks, there is seldom (if ever) a truly workable process to ensure unified command and control across business units—across locations—across plans—and across departments. This infrastructure must be able to quickly and accurately determine who was anticipated to be in control (per pre-incident assumptions) and who is actually available to be in control (per post incident reality). The process must also be able to modify the planned command and control hierarchy quickly and efficiently based on available staff and skills.

Next, the all-risk model requires that each component plan contains predefined, bi-directional communication milestones for upward (to the incident management team) and sideways (to the other component plans) communications. With this approach, the team executing each component plan knows exactly when to communicate to incident management. Conversely, incident management knows exactly when to expect communications. When the information flow is disrupted from either direction, the other side knows (based on planned milestone timing) that a planned communiqué was missed and then can proactively resolve the issue.

The final element is the incident management team itself.
They become the nexus for non-operational oversight. This means that they “trust” the teams from the component plans to execute those plans effectively. Their role is “non-operational” relative to business processes and focuses on brokering limited internal and external resources across locations and across plans. They ensure that the organization’s core beliefs and principles are maintained in the face of the adverse incident. The best way to understand the role of the incident management team is with the acronym R.E.A.C.T.I.O.N.S™: React–Respond–Assess–Contain–Trigger–Investigate–Orchestrate–Normalize–Steady State

White Papers

Download