UP Paper 1414 US-T-HDOWN
Business Continuity Planning for disasters is Just Good Planning
Roberts,WilliamEMC
Actual disasters make up less than 1% of all Business Continuity threats and risks. Fully 87% of all outages are planned downtime and the rest is unplanned not associated with any disaster. Planned downtime can includes such events as backup operations that need the system offline, extracts from a data warehouse, application or operating system patch installation and reboot, and application and data restore. Unplanned downtime can come from such events as database corruption, log file overruns, component failure, and human error. As we in the IT community plan for disasters such as Katrina and 9/11 we can also plan for planned and unplanned outages. Planning and testing for these can reduce significantly the downtime associated with running a data center. These items include: • Virtualization of applications and operating systems to allow seamless transition from one server to another to allow maintenance on that server. • Redundant servers available uncase of unplanned downtime • Redundancy of practically everything in the data center. Insuring for instance that the two power leads supporting the storage subsystem come from two completely different power grids. • Replication of data off site to allow for immediate standup of operations incase of an outage, provide reporting capability, and backup. • Using copies or Clones for backup, data warehouse extractions, and restoring corrupted databases Planning does not only mean providing for the above, having system level agreements (SLA’s) in place, and having alternate data centers. People dependent processes do not suffice, automate as much as possible to avoid human error. Next test the systems for vulnerabilities. Then test them again on a regular basis; at least once a quarter. Testing will show where the problems are, point to how they should be addressed, and give everyone confidence that they understand what to do incase of a problem. The first time to test your backups or your recovery site is not when you have a problem. Putting in place best practices can lead toward the goal of zero downtime.