Home  |   Subscribe  |   Resources  |   Reprints  |   Writers' Guidelines

April 2019

Mitigating Risks Associated With Research Data
By Elizabeth S. Goar
For The Record
Vol. 31 No. 4 P. 22

Deidentifying data and protecting patient health information are just two of the hurdles that researchers and health care organizations must overcome.

When the Feinstein Institute was hit with a $3.9 million penalty after a laptop theft exposed data about several thousand study participants, it was one of just a handful of research-related breaches to hit the news. In fact, it was the only one among more than 30 breaches resulting in fines by the Office for Civil Rights between 2008 and 2016, when Feinstein entered its resolution agreement.

But rarity does not mean researchers and health care organizations can be lax when it comes to data integrity—or the ethical management of protected patient information.

“Most people are happy for their data and samples to be used for health research, particularly when it creates public benefits to other patients, society in general, or future generations,” says Stephanie Crabb, cofounder and principal of Immersive LLC. “But people expect that their data are protected with adequate safeguards and that procedures employed are trustworthy.”

Regulating Trust
Crabb says the concept of “trustworthy procedures,” particularly around data handling, is an important one when it comes to researchers and data utilization, in large part because “people really don’t understand the differences in identifiable and deidentified data.

“Most people understand and expect that their data are deidentified in the research use case, but they are unaware of the perils of reidentification in this era of big data,” Crabb says. “The majority of new health data is being created from consumer-adopted digital health solutions like wearables, sensors, monitoring systems, and health apps. These sources add more unique and personal data to the health care information ecosystem. Simply stated, the more data an organization possesses, acquires, and can aggregate, the more likely it is that a person can be reidentified.

“Research, and what is possible, is being redefined by and through data,” Crabb continues. “While human subject, random clinical trials are plentiful, there is research being performed today with data alone—research that never requires human interaction with the data’s owner. What constitutes ‘trustworthy procedures’ in these scenarios is not clearly addressed in current regulation.”

Data used for research purposes are protected under HIPAA, which has specific requirements when it comes to the use of patient data within the context of research. Specifically, covered entities may use or disclose only limited data sets and properly deidentified data for the purposes of research—a complicated process, according to Robyn Stambaugh, MS, RHIA, AHIMA’s director of HIM practice excellence. “You have to make sure that any research access is appropriate and that the patient cannot reasonably be identified,” she says.

Beyond use of deidentified data to protect human subjects and the privacy of their health information, researchers and health care organizations “have to have the patient’s authorization and informed consent to participate or have an approved alteration or a waived authorization by an Institutional Review Board [IRB] or Privacy Board,” Stambaugh says, adding that most hospitals “are very aware of the necessary components and structures they must have in place and what they have to abide by” when it comes to data access and compliance.

Sean Sullivan, JD, a health care attorney with Alston & Bird, notes that to gain IRB or Privacy Board approval, researchers must demonstrate there is minimal risk of breach, show that research cannot be conducted or reasonably conducted without the waiver, and show that it cannot be reasonably conducted without the specific data requested.

This requires researchers to “develop the protocol and describe how each [criterion] will be met and show the IRB that they will have appropriate administrative safeguards in place to protect PHI [protected health information],” he says. “They also should conduct security risk assessments, which means periodically reviewing their safeguards and policies to ensure they are meeting the requirements of HIPAA and identify any vulnerabilities based on risk.”

Most research is also governed by the Common Rule (45 CFR Part 46, Subpart A) and/or the FDA’s human subject protection regulations (21 CFR Parts 50 and 56), which include protections to help ensure the privacy of subjects and the confidentiality of information. HIPAA builds upon these regulations while creating equal standards of privacy protection for research that is—and is not—governed by the rule.

For example, HIPAA permits data to be used without consent if they are used solely to prepare a research protocol or for similar preparatory purposes such as study design and to determine feasibility. It also spells out how researchers may use PHI from deceased patients, data use with individual authorizations, and data use agreements for limited data set disclosure.

“There is always patient authorization, which often [will] be contained within a packet of other consents,” Sullivan says. “But where you see patient authorizations most often is when the treatment they are getting is specific to research. They’re generally not done after the fact.”

In cases where there is neither consent nor IRB approval, some researchers “decide to work with limited data sets that are deidentified,” Sullivan says. “There should still be safeguards in place, but there are fewer requirements [because] a lot of the specific direct identifiers—names, addresses, Social Security numbers—are removed. This is sometimes a way around” other requirements.

Big Data Raises New Questions
The regulatory environment can only go so far in today’s digital age, where hospitals and health systems are already mining social media and other online sources “under the auspices of ‘improving the patient experience’ to determine what services to offer, how to drive patient loyalty, how to attract new patients, etc,” Crabb says. “When an organization starts to marry these data derived from online sources with data from their EHR and payer claims data, for example, does this start to border on uses of personal data that should be subject to some ethical oversight? Should the standard be even higher for research?”

Marketing executives and researchers alike justify these online data mining methods by citing the public nature of people’s posts and the broadly written terms of agreement that the public readily accepts but rarely reads. “Do these arguments pass ethical scrutiny?” Crabb questions. “Have organizations formally vetted these use cases of online data?”

More data and more powerful analytics, including self-service analytics, allow investigators to “prospect” more freely, to both test and formulate questions. This cycle of query and information collection often occurs without people knowing that research is underway—assuming it is. The line between where exploratory or preresearch ends and traditional research begins can be blurry.

Therefore, Crabb recommends that the industry revisit consent and develop models that reflect and respond to the realities of today’s data and analytics landscape and the way the research community is consuming and using data, as well as emerging trends in consumer privacy protection and information rights.

“Ethical researchers must strive to explain their data management procedures in ways that participants can understand,” she says. “These are topics that should be addressed in formal guidance and policy.”

Mitigating Risks
Even with so many new questions surrounding the utilization of big data for research, there are ways hospitals and health care organizations can mitigate their risk of running afoul of both regulatory and ethical considerations. As is typically the case when it comes to health care data, the secret sauce is made up of governance, policies and procedures, education, and enforcement.

At the most basic, policies should allow organizations to “be doubly sure that the information is deidentified and that they’re not giving any more information than is absolutely necessary,” says Nicole Miller, PhD, a consultant with Miller and Miller Associates, LLC, which provides comprehensive HIM consulting services in the ambulatory care space. “Researchers need to be cognizant of this as well—there is only so much information they’re entitled to, so they should plan for it.

“Researchers need to not demand information that they can’t have, but it falls on the facilities that have the information to protect it.”

Neil Kudler, MD, chief medical officer of VertitechIT, suggests that the issues confronting organizations and researchers in today’s electronic era are no different than they were in the paper era. What needs to be found is a way to respect the privacy of patient health information, ensure the integrity of security systems, and abide by the basic principles of informed consent.

“Those principles really have less to do with the technologies in place than they do with having the ethical constructs at the outset of an engagement,” Kudler says.

Governance, which Kudler says requires support from the highest levels, is the most important component to data management. Executive leadership should provide overarching objectives and goals that are manifested in the design of the data management program governing how information is used by both clinical and academic research teams.

From there, “You’ve got operations that are explicitly designed as a roadmap to execute on a data management program,” Kudler says. “There would be policies and procedures just like any other entity or organization, but with particular attention not simply on the strategy, but on the roadmap so that strategy is executed with a very clear understanding of the regulatory landscape and what the compliance issues are around data, particularly around release of information and the sharing of data even within an organization.”

When communicated across the organization, these types of well-defined structures will go a long way toward establishing effective maintenance of the privacy and security of data used for research. There are also a host of electronic and administrative systems that can assist in ensuring that privacy is maintained, Kudler says.

It is here that HIM can play a significant role. Miller points out that HIM professionals know the regulations and how to maintain compliance. As such, they should be part of any team responsible for data governance and any applicable policies and procedures.

“So many times, IT thinks that because it’s computerized, they’re in control and HIM gets left out of the loop unless we really fight to be at the table,” Miller says. “It’s important because we know what the regulations say we need to do. … If someone from HIM is not the top person, they need to at least be involved.”

Stambaugh concurs, and goes a step further. HIM’s role has always been that of an essential organizational stakeholder, she says, something that is evidenced by the department’s overall role in acquiring and protecting PHI.

“We have such a unique combination of skills in our knowledge base. We’re champions of protecting health information but also leaders in the mitigation of privacy risk for the organization. In terms of research in particular,” Stambaugh says, “its success is highly dependent on the quality of information and data that they’re looking at. HIM professionals support our organization’s research efforts by ensuring the collection and analysis of data provides trustworthy information while protecting it through the process.”

— Elizabeth S. Goar is a freelance writer based in Tampa, Florida.


According to Stephanie Crabb, cofounder and principal of Immersive LLC, utilization of the full breadth of information available in today’s health care data ecosystem gives rise to ethical questions in several areas.

Exploratory and Preresearch
When researchers are unsure of exactly what they’re looking for, they may instead use data for “preresearch” and/or exploratory purposes. Regardless, they should have a sense of direction and purpose. If not, it should most certainly raise ethical concerns.

“Participants should understand the data sets being explored and ultimately studied in terms they can understand,” Crabb says. “Again, we may need to develop an alternative model(s) of consent to accomplish this, to give people the opportunity to opt out at any point in the research process, and to give them better control over their health data.”

Participant Selection
The availability of online data makes it tempting to use them for identifying ideal study participants. The argument is compelling. An ideal match to a study’s target participant profile creates efficiencies and elevates integrity.

“But just because we can mine these rich sources to support participant selection, should we? After all, there is nothing anonymous about lifting someone’s cancer story from Facebook to target them for a clinical trial. We can, but should we?” Crabb asks. “This is but one of the ethical considerations around participant selection, big data, and social media. Another relates to bias—or representativeness—and reliability. Though large numbers of people contribute content online, we cannot assume that the online community is a reflection of the population at large or that self-reported data online are reliable.

“An organization’s ethics around participant selection should be clearly articulated.”

With big data easy to obtain, researchers must be careful to collect only the information that is necessary or required for their study. For example, collecting an entire profile of social media posts to investigate one topic may inadvertently reveal other wholly unrelated pieces of information such as mental health issues. This data “overreach” could violate the ethical expectation that research is to have the least possible impact on participants.

In the case of social media, timestamp and geolocation information is routinely embedded in posts and images, amplifying the risk of reidentification, depending on how the data are being used.

“Investigators and sponsoring organizations should exercise diligence, collecting only pertinent data from the get-go or be equipped to remove extraneous data before doing anything with a target dataset,” Crabb explains. “Again, ethics policies and procedures should be clear on this topic.”

Informed Consent
A cornerstone of research ethics, informed consent is rarely bypassed unless it would fundamentally compromise the study. Besides exploratory and preresearch, there is another type of research—observational studies—that makes use of “routinely collected data.” It, too, raises new considerations around consent.

“Think about studies around sudden cardiac arrest, stroke, [or] trauma where traditional informed consent cannot be obtained, where data are readily employed, and where big data may have value. These types of studies have their place on the research continuum and may very well be highly contributory to personalized medicine and effective population health,” Crabb says.

The digital format of data makes them durable and easy to retain. “With advances in analytics, datasets can be used, compared, and combined in ways that may differ greatly from their original purpose,” Crabb says. “Data’s longevity, lineage, and unanticipated uses can compromise researchers’ ability to guarantee participants’ privacy and anonymity.

“Careful attention must be given to what data are collected and shared, where data come from, and whether those data are likely to be combined with other data in ways that compromise privacy and anonymity,” she continues. “Add genomics, big data banks, and advanced analytics to the mix and the expectations of privacy and anonymity are significantly impacted.”

Advanced Analytics and Algorithms
Algorithms are highly complex, making it difficult for researchers to explain their methods as to how and what data are being used to participants. While that’s a significant hurdle, the real challenge, Crabb says, is that the algorithms themselves may be intellectual property.

“The algorithms may be what enable one clinician or a provider organization to deliver a care innovation or quality of care that their peers cannot. They become the very stuff of competitive advantage. And in today’s value-based, pay-for-performance environment, this is a big deal. Herein lies a new potential conflict in the research mission in the era of data analytics,” she says.