Leveraging Risk Management During Disaster Recovery

Home | Subscribe | Resources | Reprints | Writers' Guidelines

August 2016

Leveraging Risk Management During Disaster Recovery
By Mike Bassett
For The Record
Vol. 28 No. 8 P. 14

Although they may have competing interests at times, the marriage of IT and risk management can ease a crisis situation.

In 2013, Boulder Community Hospital in Colorado experienced a near-catastrophic systems failure when its EHR crashed and remained down for 10 days. While the hospital was able to provide all care-related services, the system was unable to access patient information, resulting in—among other things—delays in scheduling patient tests and receiving lab results.

It was a "pretty intense time," says Linda Minghella, who at the time was the hospital's vice president and CIO. During the outage, the incident command system went into effect to help ensure that hospital operations functioned as smoothly as possible. It's also at this juncture when IT and risk management groups should be working closely together, experts say.

"Risk management is always there, but during the course of an emergency you're so busy trying to manage the incident you might not think about risk managers and how to leverage them," says Minghella, founder and CEO of Blue River Digital Health, a health care consulting firm. "But during the outage, we clearly saw the benefit of leveraging risk management.

"The hospital is very conscious of patient safety and it was really helpful to have risk management in incident command to help us assess potential risks and how we could reduce those risks during the event."

Risk Management and IT
Assessing risk associated with a potential disaster such as an EHR failure should begin with a business impact analysis and an examination of potential losses, says Reggie Pool, JD, CIPT, CI, director of information governance and compliance for Consilio, LLC. "Find out what systems—such as managing patient data—need to be protected the most because of their use within the organization, and then prioritize in terms of the importance to the organization," he says. "And that drives the strategy they will work with IT to implement.

"Because you are looking at potential disaster events and having the possibility of not having access to some of your electronic health data, you are really looking for specific single points of failure that you can mitigate and then putting forth an appropriate plan of action with your IT groups," he adds. "Unfortunately, this is something that doesn't always happen. But it is something that should happen before the fact."

As far as "after the fact" is concerned, Pool says this is where "the rubber meets the road" when it comes to the ability of IT and risk managers to work together. However, according to Pool, IT and risk management departments may have their own separate objectives. For example, during an EHR failure, an IT department will focus on the technical challenges associated with getting the system back up and running as quickly as possible.

"Whenever you have this kind of urgency, there's always the tendency to want to cut corners and move things along faster without really making sure you are considering other risk areas, so there is this balancing act between the idea that we need to get these systems up and running and concerns about patient health," Pool says, adding that patient information must be evaluated and potential privacy and security risks weighed.

To be in the best position to handle a disaster, Pool recommends health care organizations have a HIPAA-compliant response plan in place and ready to put into action. "[Do you know] what kind of specific event can occur? When do you know something is actually going to be a big enough event to require your discovery response plan to go into effect?" he asks.

The latter question was an issue during the outage at Boulder Community Hospital. Several times early in the event it appeared that the system was going back online, but when that didn't happen it delayed the hospital's decision to put its long-term plan into effect. "We found out how important it is to have a predetermined timeframe for putting our plan for a long-term outage into effect," Minghella says.

Having a response strategy in place is not only imperative but also mandatory. Still, devising a plan of action is only a first step.

Upon taking over as senior director of technology for the Tiger Institute for Health Innovation and University of Missouri Health Care, Michael Bragg conducted a thorough examination of the system's disaster recovery plan. "I felt my most sacred responsibility was to make sure that I not only knew how to operate the system but [also] how to recover the system," he says. "I wanted to know what we had in place.

"It wasn't a bad plan, but it wasn't something I was entirely happy or comfortable with because it wasn't something that everyone understood," he says. "So the lesson for me is to make sure that everyone knows there is a plan, that people have access to it even when your network is offline—we use encrypted thumb drives so that all our IT leadership has everything in hand—and that the plan is simple enough and clear enough so that everyone understands what they are supposed to do."

The Value of Testing
Once a clear plan has been established, it's a good idea to put it through its paces. "Make sure you know how to test it," Pool says. "Where you see people fail is when they haven't rehearsed how they are going to get through the plan itself. Are they just relying on the fact that they have a plan, because if that's what's happening they won't really know how to react when they are faced with an actual event?"

Testing should not be a one-time occurrence either. "Make sure the plan is kept up to date," Pool says. "Check to make sure data are moved appropriately, back-ups are occurring correctly, and that systems can be switched and restored to their appropriate recovery points."

Typically, incremental testing occurs during planned events such as EHR backups and restore procedures. "You're doing that on a regular basis anyway, but more structured disaster testing is really important," Minghella says.

Prior to its extended outage in 2012, Boulder Community Hospital had an IT disaster plan that Minghella refers to as "immature" in the sense that it hadn't been rigorously tested. In the aftermath of the major outage, the hospital began testing more robust scenarios with "pseudo" outages involving all the major stakeholders. "The last step, and always the most difficult to test, is to have a simulated outage to see how people respond," she says.

All in This Together
Pool says there is always the potential for conflict between departments such as IT and risk management that may have different objectives; IT is likely to want the system back to full health as soon as possible, while risk management must weigh compliance concerns. Nevertheless, Minghella points out that much like other hospital stakeholders, IT leaders and risk managers share a common goal: providing appropriate patient care. "We both have the best interests of the organization—and the welfare of the patient—in mind," she says. "We have to depend on each other in terms of the knowledge and skills that IT brings to the stand and the responsibilities under which risk management works."

Patient welfare "needs to be the driver for everything," Pool says, adding that a delicate balance must be struck between meeting department objectives and providing the best possible care.

"From a consultative aspect, we come in and help organizations understand their risk profiles, do a business impact analysis, and help put those plans together," he says. "Where we see issues and roadblocks is where you have a kind of 'analysis paralysis' where risk management worries about protecting information and can stop programs from going forward. A medium has to be struck between being able to give the right level of service and providing the appropriate level of protection for sensitive information."

The question becomes how risk management and IT communicate, Pool says. "Are you creating those functional and technical requirements in a way that IT can understand what needs to be accomplished—and how it can be accomplished? And what are their requirements around response times and other system requirements?" he asks.

Pool says IT and risk management must work together to set disaster recovery priorities and understand how systems operate, including the types of data being managed and how they are being used. Each party must also be aware of the urgency to restore systems and the consequences of failing to do so in a timely manner.

"Even more than that, they need to understand what happens in situations where you can't get the systems back up and running," Pool says. "What are the fallback positions? When do they have to go to those fallback positions and start using paper?"

Much, if not all, of this should be ironed out in a formal setting. "You need to have an effective committee set up where everyone knows their responsibilities," Pool says. "It should discuss any appropriate activities such as new systems that are going online or old systems that are being retired.

"And the work needs to be ongoing, so [the committee] has to figure out how it is going to function on a regular basis and set policies and processes, and make sure these things are being validated."

Emergency Preparedness
According to Bragg, emergency preparedness is another area in which risk management and IT have begun working together. Oddly enough, because they had completely different functions, just a few years ago there wasn't much interaction at all between these groups, he says. However, as the use of technology has grown and become more commonplace, IT and risk management are now much more closely aligned.

"And that's for a couple of reasons," Bragg says. "First, the risk management folks are relying on us [IT] to provide the technology that provides them with communications tools. It used to be that we had things like calling trees and other ways of contacting staff in emergencies. But now we have cloud-based tools on our iPhones that allow us to do notifications very quickly. Also, every manager in our hospital has access to our emergency operations plan in an app on their phone. The risk management folks have really benefited from the technology in that way."

A key component of tackling an emergency situation is getting the word out that something has gone awry and providing status updates. "One of the things that is very difficult in a large health care environment is letting everyone know what is going on in the middle of an emergency or disaster," Bragg says. "What we are doing is leveraging [risk management] to help us communicate with our end users and other people impacted by the emergency, so we can let them know what's happening and what we need from them."

At University of Missouri Health Care, the decision to dub an event an emergency follows a calculated path. According to Bragg, IT will designate a period of time it believes is reasonable to try to troubleshoot an issue and determine whether it's fixable or an actual disaster. "If at this point we discover this is not normal or immediately fixable—or with a little bit more information discover we have an even bigger problem—then we'll engage the emergency preparedness people specifically to make an initial notification and then stand up the dual command centers."

Clinical leaders and emergency operations head one command center from which they communicate with clinicians, while IT operates a second to work on the technical issue. "By colocating the command centers, we are able to communicate very quickly between two conference bridges and keep everyone posted on both sides," Bragg says.

This approach was implemented earlier this year when a network outage took some time to resolve. "It appeared to be an issue with DNS [Domain Name System] resolution and we couldn't quite figure it out," Bragg says. "What we figured out was that there was a computer that was trying to do point-to-point communication outside of its tunnel and basically began broadcasting to the entire network, and that overloaded some of our main switches."

Happenstance led to the creation of the dual command centers. "We had senior hospital leadership standing literally outside of the door of where the senior IT leadership was running the bridge to resolve the network issue," Bragg says. "And it worked really well. It was nice to be able to tell immediately from communications with clinicians when we were finding an issue and then resolving it instead of either having to walk the floors or make phone calls to the departments."

— Mike Bassett is a freelance writer based in Holliston, Massachusetts.