Keeping Tabs on Data - For The Record Magazine

Home | Subscribe | Resources | Reprints | Writers' Guidelines

October 2019

Keeping Tabs on Data
By Sarah Elkins
For The Record
Vol. 31 No. 9 P. 14

A survey uncovered that while organizations felt confident they knew the location of their data, their slack monitoring practices suggest otherwise.

In February 2019, Integris Software conducted a survey that sought to learn how mid- to large-sized US companies manage data privacy. The resulting study, the Integris Software 2019 Healthcare Data Privacy Maturity Study, released in April, is a clarion call for privacy officers to get to know their data a little better.

According to Kristina Bergman, founder and CEO of Integris Software, there was anecdotal feedback on the subject, namely that many in the industry didn’t know what they didn’t know. The survey wanted a sense across all industries, not just health care, of the state of things. Of the 258 respondents, 46 were from the health care space. The result was a comprehensive view into the relative maturity of data privacy practices at health care organizations when compared with other industries.

While the key takeaways were not particularly surprising to Bergman, many in health care would do well to take a hard look at their privacy practices and ask themselves, “Is what we say we’re doing what we’re actually doing?” If Integris’ study is any indication, the answer is “maybe not.”

In this article, For The Record asks a couple of privacy leaders to weigh in on the results of the study and comment on its more significant findings. Lisa Wallace is technical director at MicroSolved, an information security company serving diverse clients in financial, utility, industrial control, and legal spaces, as well as health care. And Chris Bowen, CISSP, CCSP, CIPP/US, CIPT, is chief privacy and security officer and founder of ClearDATA, a health care–specific cloud computing and information security services provider.

Wallace, whose clients span many different industries, says, “Health care is what I would call middle of the pack or behind the curve” when it comes to effective privacy protections. For her, the results of the study are no great surprise.

Bowen notes the unique challenges facing those in the space. “Data sprawl is rampant in any industry but especially in health care. You have one health care transaction and your data are in 100 different places,” he says.

Similarly, Bergman says, “Data are built to flow like water. They are built to be highly redundant, have lots of copies, and be everywhere.” And, not unlike floodwaters, data can accumulate, move faster, and collect more debris than people realize. The Integris study warns that this is happening in many health care organizations right under the noses of privacy officers who think everything is under control.

Updating Data Inventories
Chief among the findings was that 70% of health care respondents were “very” or “extremely confident” in knowing where sensitive data reside within their organizations. This seems to fly in the face of the fact that 50% of the respondents update the inventory of personal data on only an annual—or even less frequently—basis, which points to an epidemic of overconfidence among privacy officers and managers.

Both Wallace and Bowen were vehement in their position that updating the data inventory once per year or even less frequently is a recipe for serious privacy issues.

“If they’re not updating that data schema more often than once a year, then they’re definitely overconfident,” Wallace says.

Bowen agrees, adding, “I would greatly question the efficiency and effectiveness of that inventory.”

It raises the question: How frequently should data inventory be updated? The short answer is: It depends. The main variable is whether the data are at rest or in motion.

Bergman recommends monthly scans to “see whether the data are demonstrably changing.” For example, the marketing department might purchase a marketing list that introduces new data. On the other hand, data in motion should be under continuous evaluation because they might be flowing into the organization from anywhere.

Bergman offers the example of strings of text from a customer support chat form. In this platform, patients often enter sensitive data such as credit card information or protected health information (PHI).

In motion or not, Bergman says the bottom line is that “no human being can possibly know what’s in that data set.”

“In the best of all possible worlds, that schema would be a living entity. It shouldn’t be a box you check off once per year,” Wallace says.

“If you have the ability to update your inventory in real time, you’ve achieved what I call privacy official nirvana,” Bowen says. “In order to do that, you have to use a combination of approaches: a basic questionnaire combined with e-discovery tools, combined with integrated APIs [application programming interfaces] to help deliver an inventory that’s actionable.”

In short, an effective data evaluation cannot be accomplished manually.

More surprising than the infrequency of data inventory is the tools being used to accomplish the task. “Even worse [than infrequent inventory], the most common tool used to update that data map is spreadsheets,” Bergman says.

Bowen confirms the scenario is quite common. “Even in the midst of some great tooling that’s out there, the Excel spreadsheet still happens to be the go-to method,” he says, noting that the ubiquitous reliance on spreadsheets is directly related to the maturity of the organization’s privacy platform.

According to Bowen, some verticals within health care are evolving faster than others. “I would label the payers and the pharmaceutical companies probably at the top of the stack along with SAAS [software as a service organizations],” he says.

For those organizations looking for a starting point in their quest to beef up their data inventory strategy, Wallace recommends focusing on the frequency aspect. “The best thing they can do is set up a lifecycle more often than annually—maybe it begins quarterly if they’re beginning from the ground up—to collate those data sources. See what data are coming, see where they are going, and see if they actually need to go where they’re going,” she says.

Who’s in Charge?
As privacy regulations become more stringent with the passage of legislation such as the California Consumer Privacy Act (CCPA), set to go into effect in January 2020, the technology solutions necessary to maintain compliance have become more sophisticated. Therefore, in this shifting landscape, many are left wondering, who’s ensuring privacy compliance within health care organizations?

“What we find in most organizations is that there’s a cross-functional counsel that addresses things like GDPR [the European Union’s General Data Protection Regulation] and CCPA and privacy and security overall. That counsel typically includes the chief technology officer, the chief information security officer, chief privacy officer, the head of governance risk and compliance, and other business leads,” Bergman says. “That counsel in the companies that, I think, are doing this well have different work streams to tackle different aspects of compliance as it relates to all of these different regulations. We see those groups working together with slightly different views that create a holistic solution.”

Bergman points to the adage so often heard in health care compliance: You need everyone at the table to get it right.

Wallace recalls a conversation with a top-level leader for a large health system in Michigan who said, “I don’t believe this is a security problem. I believe all of this falls under compliance, and we’re only concerned if it impacts patient safety.”

“That’s a viewpoint I find to be a bit frightening,” Wallace says.

Bowen says data privacy is the privacy official’s job, but “a lot of times you see the privacy official cordoned off from the security function, away from the data function, and that really needs to be tightened up. In my organization, my title is chief privacy and security officer. You need both.”

Perhaps the biggest shift on the security front has been in the placement of accountability, both on corporate entities and individual officers. “We’re at that inflection point now where you can no longer solve privacy with policies and procedures—and some may argue you never could, but it was good enough. We’re now at the stage where companies are accountable for the data they have,” Bergman says.

“Putting the job on someone who doesn’t have the authority or accountability is the wrong approach,” Bowen says. In other words, passing the buck down the chain of command is a tactic that won’t hold up under the wave of new state regulations coming in the wake of CCPA.

Accessing Data Sources
In the Integris study, more than one-half of the respondents reported needing to access 50 or more data sources “to get a defensible picture of where their sensitive data resides.”

Wallace carries that thought a bit further, suggesting that many organizations are unaware of how many data sources they are accessing. “Even the smaller organizations, when you get down to actually looking at the nuts and bolts of their security program, they may say they interface with only five or six” when in reality that number is much higher.

Bowen says 50 data sources is a conservative estimate. “Your typical covered entity, we have found, uses around 900 applications,” he notes.

Such findings raise the possibility that the survey respondents may not have a clear picture of how many data sources they should be evaluating.

Beyond that, there’s the problem of “surprise data” flowing into organizations from unsuspecting sources. Bergman recounts a particularly distressing discovery after an Integris client acquired a smaller company. A routine scan of the organization’s data cluster uncovered adult content and data related to individuals’ sexual orientation and behavioral indicators.

“The CTO’s [chief technology officer] jaw was hanging down,” Bergman recalls. “He said, ‘We’re a B2B [business-to-business] company. There’s no reason we should have that kind of data. Where did it come from?’ It ended up it came from the acquisition. The company had very dirty data.”

Wallace illustrates the inherent complexity of endlessly connected data sources. “That’s a cog with 50 different spoke sets. My spoke may be perfectly secure, but what about the other 14 it’s interacting with?” he says.

Surprise data may be flowing into an organization’s data lake from any number of places. Besides the acquisition scenario, Bergman says other culprits include “data sharing agreements that aren’t properly connected to the data that are being transferred [and] people purchasing data sets and dumping those inside the organization [in a place] that everybody has access to.”

In many cases, cutting down on the flow of surprise data is a matter of integrating inventory into the APIs that connect data sources. “It’s not that difficult, but a lot of folks just don’t do it,” Bowen says.

What’s Sensitive?
To confuse matters, the definition of what constitutes sensitive data is a moving target. New regulations such as GDPR and CCPA are broadening the definition of sensitive data. Determining what that is requires a little deduction.

“For instance, one of our customers has travel profiles stored within their system,” Bergman says. “When someone says, ‘My preference is to have kosher food or halal food, well, that indicates religion. So we tag that as religious information.”

Similarly, human resources systems that track an individual’s vacation days can reveal religious affiliation.

According to CCPA and similar laws, “household data” is considered sensitive. However, the definition of household data has yet to be determined. “We won’t know until case law bears it out,” Bergman says.

For now, experts such as Wallace are advising clients to “look at CCPA whether or not they do business with California because it’s the standard a lot of states are looking to,” adding that none of the coming regulations “should be any less onerous than California.”

Bowen has been monitoring the EU law that has been in effect for over a year. “If you’ve prepared for GDPR, you will likely be ahead of the curve when it comes to complying with CCPA. Now, even though CCPA doesn’t outright say, ‘Hey, go create your data inventory,’ it’s definitely part of it,” he says.

The Five Common Data Source Types
Only 17% of survey respondents were able to access sensitive data across the five common data source types: structured data, unstructured data, semistructured data, cloud-based applications, and data in motion. Accessing data, particularly unstructured data, is a challenge for many organizations because the technology is relatively new.

“More than six years ago, that technology didn’t exist,” Bergman says.

“Unstructured [data] is so much more difficult to track and understand—the study shows that,” Bowen says. “If you can take that unstructured data, the notes in the patient record, extract it, and automatically put some artificial intelligence around it to show what is PHI and what is not, then you give people the tools that will actually help them track it.”

Yet, for as sophisticated as the solutions to manage privacy are becoming, the methods hackers are successfully using to penetrate organizations are still quite rudimentary. Bowen notes that most of the hacking incidents this year have been access and disclosure incidents related to e-mail.

“We see a lot of PHI being sent in e-mail, which is ridiculous. We see developers still pulling down untrusted data sources or code depositories from GitHub that then launch some kind of attack on the infrastructure. We still see the reaction to phishing, which is prolific and brings down the biggest of the titans,” he says.

Wallace offers one small consolation: The organizations that are slower to adopt appropriate protections are also slower to adopt features such as chatbots that collect the sort of unstructured data that can quickly become a liability.

Recommendations
How can health care organizations get a better grip on their stash of data? “The first step is just know what you have. The old days of saying ‘I didn’t know, therefore I’m not responsible’ don’t apply anymore,” Bergman says.

“Compliance is not security,” says Wallace, who calls for a more symbiotic relationship between the two concepts.

Bowen takes the privacy conversation back to square one: “More than anything, remember it’s the patient we’re thinking about. We’re not thinking about job security or quarterly goals so much. We’re talking about how to protect the patient.”

— Sarah Elkins is a freelance writer based in West Virginia.