March 3, 2008
When it comes to disaster planning, the objective is to prepare for the unexpected and have a proven and tested plan on standby.
Ever lost an important document, report, or file? If it was stored on your computer, perhaps you forgot to save it or accidentally deleted it. Or maybe the power went out before you could save it. Chances are that this scenario has occurred to anyone who has ever worked on a computer.
At its most basic level, disaster recovery is all about planning for when things go wrong. In the healthcare arena, it means protecting and mobilizing a hospital in the event of an emergency.
Disaster recovery planning has been around for at least three decades, and it has undoubtedly become more popular since the mid-90s. Angie Singer Keating, vice president of compliance and security at Reclamere, an IT asset management company in Tyrone, Pa., says the real push for disaster recovery plans (DRPs) across all industries began after the 1995 Oklahoma City bombing. Now, besides terrorism, weather is also a concern. “It has been my experience that hospitals and health systems have a great sense of urgency in creating their DRP, and my guess is that those located in the South, in major metropolitan areas, or along the coastlines have an ever greater sense of urgency,” she explains.
Despite all that has occurred during the past 10 to 15 years, complacency is inevitable. According to 2003 Joint Commission literature about disaster preparedness, “It does not take long for complacency to settle in. Eighteen months after the September 11th attacks and the subsequent, insidious, selected, and deliberate dispersion of anthrax spores, there are clear signs that the focus of American attention has long since moved on. The sense of urgency to prepare has now become a wait-and-see sense.”
Along with avoiding the wait-and-see approach, healthcare organizations should continue to test their DRP. “If an organization thinks that drafting a DRP is something to be done and put on a shelf, it will not work. The plan cannot go untested or unmonitored. It must be tested, revisited often, and updated to match current technology for it to be effective in the event of a disaster,” says Thomas Kristofco, president of Business Continuity Concepts in Hollidaysburg, Pa. At its most simplistic, a DRP or business continuity plan (BCP) is an exercise in preparedness, he says.
Start With Preparation
Hospitals tend to focus on catastrophic disasters such as Hurricane Katrina and overlook smaller mishaps such as downed networks and power outages. In either case, on-the-ball hospitals can enact their DRP without the panic or confusion that often results when preparation is lacking.
To help healthcare facilities cope, The Joint Commission has a set of standards in place to provide a framework for comprehensive emergency management. Within this framework, it breaks planning into four phases: mitigation, preparedness, response, and recovery.
The first step on the road to a DRP and creating functional departmental DRP documents is to conduct a risk assessment and business impact analysis. This process helps the organization understand how it is exposed to loss and disruption, helps to quantify the effects of a possible disaster, and points out the appropriate priorities in the planning process. The final BCP may then be drafted, which details how the hospital plans to provide uninterrupted patient care and restore critical departmental functionality in the event of a disaster. “Preplanning for a disaster is much cheaper than scrambling to recover from an unforeseen event, and it shortens the time between the incident and the onset of recovery,” says Kristofco.
For the BCP to be effective, hospital officials should consider the many types of interruptions their facility could experience and have an appropriate recovery plan for each event.
For example, in the event of a primary power outage, does the hospital have a standby generator to provide power to the entire facility, including the data center? According to Louie Caschera, chief information officer at CareTech Solutions in Troy, Mich., a high-availability system processor with redundant processors should be considered to maintain critical application systems. Also, redundant networking components should be built into the network infrastructure to eliminate communication outages due to a network component failure, and multiple heating, ventilation, and air conditioning units should be considered in the event of an outage.
Each of these considerations is aimed at maintaining critical systems to ensure that patient care is never compromised. A vital role of the DRP is its ability to protect and recover patient data regardless of whether or not the hospital functions in a paperless environment. “To minimize data loss in a paper environment, many hospitals using paper medical records are migrating them to electronic storage via document imaging systems,” says Caschera. “While not full-fledged electronic medical record systems, document imaging systems offer paper-based hospitals many more options to view, store, and recover records quickly.
“An IT DRP is a subset of a BCP and works best when IT issues are considered in the context of all other hospital operational issues impacted in a disaster via a BCP. Developing a DRP without first developing a BCP can lead to a failed DRP,” he adds.
One driving factor behind a DRP is the cost to utilize it in a disaster. Recovery time objectives are driven by how long the hospital can afford to be without access to critical applications such as clinical and enterprise resource planning systems before the situation impacts patient care or begins to spur financial losses. “As a general rule, the shorter a hospital’s recovery time objective, the greater the DRP’s cost,” says Caschera.
Preparedness and Response
Once the hospital identifies its critical systems that require a DRP, measures can be enacted to prevent system failures and downtime. Two such technologies to minimize failure are server virtualization and storage replication.
Virtualized server technology allows applications to quickly and automatically capture and make use of more system resources on demand. It also automatically makes use of another virtualized server if the first server fails, explains Caschera.
Advanced technology such as server virtualization can “level the playing field” to minimize the number of discrete platforms that need to be maintained in the recovery solution. “For instance, a hospital may have 80 dedicated server platforms running normal production operations. In the recovery environment, you could likely host the standby servers on 10 higher powered servers using virtualization,” says Eric Foote, chief technical architect of CareTech Solutions. “During the event, a hospital may encounter some performance degradation with heavily loaded virtualized servers, but this can be remedied by adding more computer resources to the recovery cluster, and this can be accomplished in hours, not days.”
A common DRP implementation challenge is justifying the cost of mirroring critical infrastructure, says Foote. Most clinical environments have many repositories for housing information (ie, clinical information systems, picture archiving and communication systems [PACS], labs, etc). Each system typically has its own unique set of challenges and requirements that drive up the cost of implementing the recovery plan. “Attempting to achieve these results without virtualization technology would be needlessly cost prohibitive,” says Caschera.
Clinical data repositories are growing exponentially with the addition of electronic medical record, PACS, and clinical documentation initiatives. In fact, it is not uncommon to have critical data repositories that surpass 10 terabytes. Data protection and storage replication technologies allow hospitals to have redundant copies of data for protection and quick recovery, but traditional tape protection is not a viable recovery medium when recovery time objectives (RTOs) and recovery point objectives (RPOs) dip below 24 hours, according to Foote. “Real-time data replication is a requirement for any system of size to have an RTO of less than 24 hours. With these time frames, cost quickly becomes a factor when dealing with large, expensive, proprietary storage arrays,” he explains.
Data replication can be accomplished within the same data center or at a remote data center as an off-site protection solution. Access to lost data could come from Web access or CD archives. “We are starting to see many alternative solutions coming to light such as LeftHand Networks’ SAN/iQ iSCSI solution, which dramatically lowers entry costs and eases communication charges by leveraging standard TCP/IP interconnects,” Foote says. “Hewlett Packard’s Medical Archive Solution [MAS-grid based storage] allows multiple clinical imaging systems to share a common redundant storage array to reduce unique devices counts in production and DR [disaster recovery] solutions.”
RPOs and RTOs define a hospital’s level of protection and recovery. The former is the time that a hospital last copied its data to a backup source, while the latter denotes the maximum time a system can be down after a failure occurs. It also constitutes the entire time allowed to bring the system back up.
“When hospitals set their RPOs and RTOs, they define how quickly data can be recovered and how old data will be when it is brought back online and is able to be accessed,” says Caschera. “It will also dictate the type of technology hospitals must implement to achieve their recovery objectives.”
“RTOs, RPOs, and functionality prioritization within [BCPs] set the goals of any sound DRP,” says Foote.
Understanding the interdependencies of systems that deliver the functionality described in the BCP and assessing protected application architecture will determine the necessary infrastructure to meet the RTOs and RPOs. From this assessment, a gap analysis is performed to determine what changes need to occur in the production environment to close or minimize the gaps. Depending on the risks and costs associated with completely or partially closing the gaps, a plan is set forth to implement the changes, explains Foote.
Minimizing Failure: Data Protection and Access
Data protection and access are two different yet equally important considerations in any DRP. Data protection is often the easier of the two to implement and support. For example, a typical 500-bed hospital environment likely has 40 discrete critical systems. Data center infrastructure for these systems probably numbers in the 50-server range, with perhaps 4 to 5 terabytes of data. Viewed from the user perspective, the hospital would likely have roughly 3,000 users, 2,000 desktop PCs, and 500 printers. If a catastrophic event occurred at or around the hospital, the clinicians would need to access the clinical information from alternative hardware, perhaps at another hospital, at their homes, or at a disaster recovery site. The task of configuring 40 critical applications on nonstandard devices is nearly impossible in short order and is certainly not achievable within a 24-hour RTO, according to Foote.
To achieve this access, the delivery mechanism must be portable and flexible enough to facilitate rapid deployment over any communication medium, including the Internet, and on any device. A thin-client solution such as Citrix Presentation Server fulfills this requirement. Foote says delivering clinical applications in a thin-client model may be a bit more expensive up front, but from both an operational perspective and a recovery scenario, hospitals will find the solution is much more flexible and robust.
Whether updating or drafting a DRP or BCP, it’s essential to uncover the areas where the hospital is vulnerable, create solutions to close the gaps, understand how systems and departments rely on each other, and test the plan. “Any disaster recovery plan or business continuity plan is only considered useful when it is revisited often, updated accordingly, and your team is ready to enact it and be successful should a disaster occur,” Kristofco says. “Technology is constantly changing, and your DRP/BCP should reflect both what resources are available and what is financially feasible to achieve the hospital’s RTOs and RPOs.”
— Kim M. Norton is a New Jersey-based freelance writer specializing in healthcare-related topics for various trade and consumer publications. She can be contacted at email@example.com.
Keeping Lenox Safe
For the last 25 years, Alpha Systems, a document scanning and electronic document management software provider headquartered in southeastern Pennsylvania, and Lenox Hill Hospital in Manhattan’s Upper East Side have had a working relationship to secure the hospital’s patient records. When Lenox Hill first signed on with Alpha Systems, it was using microfiche for its records. “Today, preserving our medical records is obviously easier with our digitized paper and clinical records,” says Jerry Rudyk, the hospital’s supervisor of storage and retrieval. “In the event we need a record that we do not have in our system, we can use a backup CD from Alpha to re-create the record. The turnaround time is less than 24 hours, but if necessary, they can put the record on a secure Web site for us to access in an emergency.”
On a monthly basis, Alpha Systems scans approximately 30 million pages to create digitized medical records along with a backup to be stored at a remote location, or warm site. “We provide all of the services necessary to created a centralized electronic record that can easily be backed up and re-created in the event of a disaster. The combination of our scanning services and E2E [electronic to electronic] feeds can produce a complete e-record within hours of discharge,” explains Alpha Systems President and Chief Operating Officer Brett Griffith. Should the hospital want immediate access to its records, they can be made available through a secure Web site, also known as a hot site.
How quickly the hospital needs to have records available in the event of a disaster will dictate whether it goes with the hot or warm site. The age of the records, activity levels, and number of people who need access play a factor as well, explains Griffith. “After the recent onslaught of hurricanes in the South and flooding emergencies in the Northeast, we realized that our customers need a fail-proof backup storage and disaster recovery system so that medical records don’t get destroyed, whether due to fire or water damage. Even access to critical medical records is a concern during [information system] maintenance,” he adds.