Objective
A company’s data is its most important asset. And as business processes become more and more dependent on systems, data integrity becomes a critical factor in the ability to conduct business. A day’s worth of lost transactions for a bank or brokerage house, where a system with yesterday’s data is often worse than no system at all, is obviously catastrophic. But automated production lines, just in time manufacturing, and automated distribution create similar zero loss tolerance demands on traditionally less sensitive industries. Contingency planners must completely understand how much data loss, if any, is acceptable and then find cost-effective solutions to limit losses to that level.
Issues
The days are gone when all data can be backed up on the graveyard shift at a quiesced point in time and easily synchronized for all applications. Today, data loss is a reality in all but the most sophisticated backup solutions and must be addressed by all companies as the foundation of their continuity planning.
Consider the complexities of heterogeneous platforms, multiple locations, distributed processing and dissimilar applications. Then, factor-in that data must often be treated as a synchronized logical entity regardless of residence on heterogeneous platforms in geographically diverse locations. Finally, recognize that 24/7/365 availability, by definition, eliminates a backup window in the traditional sense. You now have a feeling for the challenges facing contingency planners. As tolerance for data loss approaches zero, the entire meaning of backup changes from a batch process to a real time requirement and costs escalate significantly.
Solution
Conceptually, reducing data loss is a simple concept. Backup your data and get it off-site more frequently. However, in practice this becomes more and more challenging as the tolerance for loss approaches zero. New and creative ways to use any and all available windows for backup purposes are required.
The first step towards reducing data loss is to understand how much loss is acceptable. This is typically referred to in the industry as the RPO or Recovery Point Objective, which refers to the point in time to which systems (and their data) must be restored. For example, an RPO of 24 hours means that systems data will be restored to a point 24 hours prior to the interruption. In this scenario, 24 hours of processed data is lost irrevocably (unless it can be manually recreated).
Understanding your capacity to accept data loss is one of the most difficult areas in continuity planning because of the sheer number of variables and because of the tendency to accept traditional batch backup methodologies which guarantee data loss. To get past the systems view of backup requires an understanding of the underlying business processes. The best way to do this is with an Iterative Business Process Decomposition. An IBPD accurately correlates business processes to data availability requirements and provides a process-specific statement of backup requirements. This is critical because few companies can utilize advanced backup/replication strategies for all of their data. They must pick and choose where to apply advanced solutions based on specific business requirements.
Once your tolerance is defined, you must develop your data availability policies and procedures. WTG’s Data Availability Architectures address the five aspects of data availability: data selection, data synchronization, data integrity, data accessibility and data protection to provide a complete solution to reduce data loss.
Data selection entails an application level review of all data to determine critical datasets and databases. WTG recommends and utilizes automated tools to facilitate this effort. Depending on the tool used, we also iteratively eliminate data duplication resulting from cross platforms/applications and batch processing propagation to improve the efficient use of backup windows and resources.
The next aspect addressed is data synchronization. The significance of synchronization varies based on the scope of your backup requirements as well as the amount of cross-platform, cross-system or cross-application processing that occurs. For example, a shop that can backup all data at the end of the nightly cycle and can restore to the previous night accepting a day’s loss of data has no synchronization problems. Conversely, a company that conducts application level backups throughout the day must also address the state and timing of other related backup data. Then, unique restoration jobs must be designed to take all of these backups, which represent different physical points in time, and restore them to a single logical point before processing can occur. This is a significant undertaking that requires specialized recovery expertise combined with production application expertise to implement effectively…even with automated tools and schedulers.
The third aspect, data integrity, is the primary component of controlling data loss and corruption. If data is only backed-up and stored off-site weekly, a week’s worth of data can be lost. Conversely, if data is mirrored in real time at an alternate location, the potential for data loss is theoretically zero. Data corruption is possible regardless of backup frequency and must be prevented with a system of checks and balances or the entire process will be jeopardized. The challenge is that currently, no single solution for advanced backup works across all platforms and the cost of advance solutions usually limits them to only the most critical data. WTG is able to craft high data availability solutions on all major platforms including OS390, RS6000, AS400, NT, Unix, SUN and HP utilizing various combinations of hardware, software and operating system features.
The next aspect is data accessibility which requires consideration of physical data location, logical data location (data singularity or lack thereof) and the physical characteristics of the storage media itself.
The final aspect is data protection, which fundamentally means getting the data away from the primary site to a secured alternate location. There are many vendors and technologies that can accommodate advanced backup techniques such as electronic vaulting and remote mirroring that inherently backup and store data remotely at the same time.
Scope
The strategies and techniques described in this solution are equally applicable to all levels of implementation, including: single business process, individual applications, single servers, platforms, whole sites or the entire enterprise.
Proven Results
WTG has helped many companies in various industries to implement cost-effective high data availability architectures. We identified a critical flaw in the data availability architecture of one of the top U.S. banks that would have completely prevented their successful recovery…after they had already been testing their recovery capability for over five years. Together, using the techniques described in this solution, we implemented a completely new data availability policy and proved its validity in the bank’s first successful all-application end user test.