Deidentification Done Right
By Sandra Nunn, MA, RHIA, CHP
For The Record
Vol. 30 No. 6 P. 22
HIM professionals must balance the contradictory goals of using and sharing PHI while protecting privacy, a task being complicated by the emergence of artificial intelligence.
Health care began as a cottage industry, often to provide care for the poor. Many hospitals were arms of religious institutions wanting to provide care for the sick and dying. Other entities were constructed by employers to care for employees in certain professions, ie, the railroad hospitals.
Those days are gone. Today, health care organizations are often parts of huge systems or cogs in consortiums. Other entities such as pharmacies are part of nationwide chains.
All of these organizations participate in the sharing and even the selling of patient data under the guise of deidentifying the contents. There has been a happy complacency on both the health care business and patient sides that all was well as the privacy and security walls were firmly in place. However, cracks are now showing in the public's willingness to blindly share their information and trust in the current security measures such as HIPAA and the Common Rule.
The recent Congressional testimony of Facebook CEO Mark Zuckerberg may have opened the public's eyes to the risks involved in exposing personal data to the internet. In a New York Times article spotlighting David "Doc" Searls, creator of ProjectVRM, a program at Harvard University's Berkman Klein Center for Internet & Society, author Nellie Bowles noted, "After years of largely disregarding their warnings about exactly what companies like Facebook were doing—that is, collecting enormous amounts of information on its users and making it available to third parties with little to no oversight—the general public suddenly seemed to care what they were saying."
As threats rise to the privacy of patient information, HIM professionals who have made it their career to focus on protecting patient data should be hopeful about their long-term prospects. At a recent meeting sponsored by Albuquerque Business First, local firms were advised on how to meet the upcoming General Data Protection Regulations enacted by the European Union. Aspects of these regulations affect US organizations that conduct any kind of international business, a group that includes AHIMA.
The regulations reflect how seriously Europeans take data privacy. Now, it appears US citizens may be interested in more stringent regulations as well.
"It is none too soon for patients to become more aware of the uses and disclosures of their patient data," says Michelle Bean, PhD, MSHA, HIM director at the CHRISTUS St. Vincent Regional Medical Center in Santa Fe, New Mexico. Bean, who completed her PhD studies with a focus in privacy and security, notes that patients have "no idea how their information may be employed when they sign consent for treatment forms allowing for the exchange of patient data for treatment, payment, or health care operations."
She acknowledges that St. Vincent's, like other hospitals in New Mexico, shares deidentified patient information with external organizations for purposes such as benchmarking and data analysis.
Deidentification Provisions and Methods
In the webinar "Data Use — Methods for Deidentification of PHI Under HIPAA," Shefali Mookencherry, MPH, MSMIS, RHIA, CHPS, HCISPP, the service line leader for privacy, security, and disaster recovery at Impact Advisors, explains the general process by which deidentified information is created. In addition, she provides a method to determine an acceptable level of patient identification risk involving the likelihood of reidentification in the data released for research, benchmarking, and commercial employment.
Mookencherry details the 18 HIPAA-protected individual identifiers that must be removed to deidentify patient data, emphasizing #18, which the law describes as "any other unique identifying number, characteristic, or code." She points out the importance of ZIP codes, which are often used in efforts to reidentify patients. HIPAA has 17 restricted ZIP codes located in physical areas with fewer than 20,000 inhabitants. Mookencherry says these codes must be shielded by using three leading zeroes in the code (for example, 00017), which theoretically prevents criminal-minded data analysts from targeting patients and/or their ailments within these small communities.
Mookencherry lists the following as other examples of unique identifying numbers, codes, or characteristics:
• clinical trial record numbers;
• barcodes embedded by EMR and ePrescribing systems into patient records and their medications; and
• patient occupations.
Deidentification creates a dilemma for HIM professionals, who, according to Mookencherry, must balance "the contradictory goals of using and sharing PHI [protected health information] while protecting privacy." The answer is multileveled, she says, noting that there is no "single technique, but a collection of approaches, algorithms, and tools that can be applied to various kinds of data with differing levels of effectiveness."
Typically, entities subject to HIPAA regulations employ one of two strategies to deidentify patient data prior to sharing them. If they can afford to hire what Health and Human Services describes as "a person with appropriate knowledge of and experience with generally accepted statistical and scientific principles and methods for rendering information not individually identifiable," hospitals, physician offices, health insurance systems, and pharmacy chains can opt for the expert determination method.
However, this method, which requires subject matter experts, can be pricey. HIM professionals participating in such a project must maintain careful documentation of the vetting of the experts and the application of their subsequent knowledge to any patient data released for reuse.
In conjunction with HIM expertise in the legal requirements of HIPAA and the Common Rule, an expert can assess variables, including database size, data sensitivity, the potential use of the data (for example, AIDS research), the requesting entity's track record (has it been responsible for prior data breaches or reidentified data?), and the amount of work involved with deidentifying the data.
The concept of replicability plays a key role in the process. For example, if the data of a patient with an unusual diagnosis are shared repeatedly over a period of years, it is more likely that patient can be reidentified. The likelihood of patient data being shared with a number of other recipients also must be considered. For example, radiology images merged in systems such as IBM's Watson have better odds of patient reidentification.
Another factor is distinguishability. In her webinar, Mookencherry describes a combination of three data elements—date of birth, gender, and ZIP code—as being able to render a unique ID for more than 50% of the US population. Therefore, caution must be exercised when releasing patient data containing these and other similar elements.
Working with the organization's HIPAA team, a subject matter expert can provide a risk assessment and adjust what data can safely be released to eliminate risk but retain the data utility needed by the recipient.
The safe harbor method, which requires the removal of all 18 identifiers with the understanding that there is no residual information that can identify an individual such as employer names and listings of household members, is the most common strategy used to prevent patient reindentification.
Keeping in mind that the HIPAA rule is based on structured data—ie, data that are computable and usually reside in relational database fields—HIM professionals must consider compromising data that may reside in unstructured text that is more vulnerable to search and recognition. Therefore, HIM staff schooled in privacy and security principles must take into account how much health information can or cannot be reidentified by potential recipients. If the risk is deemed too great—it should be no more than "very small"—steps must be taken to mitigate the potential threat.
The arrival of artificial intelligence (AI) has some industry experts questioning HIPAA's relevance. "HIPAA needs to be updated in view of technological changes, particularly AI. HIPAA has not had a serious update since it began in 1996," Mookencherry says. She proposes new roles for HIM professionals such as AI developers, who would consider how AI might be used as a tool to combat privacy and security risks. Mookencherry says HIM professionals can adapt to AI's ability to think and change on its own and stresses that HIM consultants can provide the ethical, legal, and patient safety framework needed to govern future AI steps into health care.
Mookencherry, who believes HIPAA can be applied to AI, says business associate or data use agreements may be adapted to future privacy and security challenges. She notes that regulatory agencies such as the National Institute of Standards and Technology and the Office for Civil Rights are beginning to form taskforces and consortiums regarding AI. The fact that "there will continue to be a lot of data sharing" means removal of the original 18 identifiers may be inadequate and certainly will require other removals or modifications, Mookencherry says.
A former distance learning program coordinator for AHIMA, Mookencherry says, "HIM needs to take it to the next step" in terms of safeguarding patient information. She believes the future is bright for HIM students interested in the privacy and security domain, noting that "big-time growth is available."
HIPAA does not specify which methods or combination of methods covered entities may employ to deidentify PHI. Historical methods have included abbreviating some data elements, making it unlikely they can be associated with the same data element in another database. Another method is to use approximations for identifiers such as patient age, which may still yield data satisfactory to the recipients' requirements.
However, HIPAA experts (HIM professionals and legal staff), subject matter experts, and IT staff members may be unable to handle the challenge of deidentifying patient information in the face of AI. As the technology advances, HIM, the legal profession, and information technologists will need to reeducate and collaborate to continue to carry out their mission to protect patient data.
In the UCLA Law Review, Professor Paul Ohm, JD, coined the term "robust anonymization assumption" to describe the widespread belief in the power of anonymization. HIPAA's primary purpose is to balance individual privacy with the necessary flow of health care data.
HIPAA, which uses the term deidentification rather than anonymization, defines deidentification: "Health information that does not identify an individual and with respect to which there is not reasonable basis to believe that the information can be used to identify an individual is not individually identifiable health information."
Ohm points out a problem with the safe harbor method's enumeration of a static list of identifiers in the assumption that "other information in a health record poses no basis for reidentification," noting that the list has not been updated since 2000. Safe harbor's coverage of biometric identifiers does not expressly include genetic data. Ohm points out that "at the time of the Privacy Rule's promulgation, high-throughput genetic sequencing did not yet exist." In the intervening years, such data have become increasingly available and are likely to become increasingly identifiable as reference databases of genetic information are created.
In the article "Computer Security, Privacy, and DNA Sequencing: Compromising Computers With Synthesized DNA, Privacy Leaks, and More," authors Peter Ney, Karl Koscher, Lee Organik, Luis Ceze, and Tadayoshi Kohno address these concerns. "We believe it is prudent to understand current security challenges in the DNA sequencing pipeline before mass adoption," they wrote.
In an evaluation of the "general security hygiene of common DNA processing programs," the authors found "concrete evidence of poor security practices used throughout the field." They propose development of a "broad framework and guidelines to safeguard security and privacy in DNA synthesis, sequencing, and processing."
This is a challenge that must be met by those in HIM specializing in privacy and security.
AI, ANI, AGI
The insufficiency of HIPAA, the Common Rule, and other laws designed to protect privacy becomes evident with the study of AI, ANI (artificial narrow intelligence), and AGI (artificial generalized intelligence).
What we are experiencing today in terms of AI is actually ANI or a narrow use of AI to extend human capabilities through techniques such as natural language processing, which is used to capture a physician's voice and archive it into an EHR. However, the goals for AI are much more aggressive and pose serious threats to privacy and security. For example, Elon Musk recently announced that he is investing heavily in a new company called Neuralink with the goal of one day creating a direct cortical interface between computers and the human brain. This signals a movement from using AI as a human extender to AGI or sentient machines.
Amir Husain, author of The Sentient Machine: The Coming Age of Artificial Intelligence, suggests that AI is the only tool capable of handling health care's growing cybersecurity concerns. "Artificial intelligence and natural language processing are coming closer to the reasoning that occurs in the mind of a human security researcher. But because machine learning is so much faster at processing information, it is able to recognize complex patterns across massive quantities of data almost instantly," he wrote.
What does this mean in terms of the competencies privacy and security professionals will need to succeed going forward? Certainly, experts in HIPAA regulations and the provisions of the Common Rule will not be able to sit on their laurels.
In that vein, the University of Washington in Seattle created the Security and Privacy Research Lab, a part of the Paul G. Allen School of Computer Science and Engineering. In this setting, graduate students explore topics such as random access to large-scale DNA data storage. Associated students in the University of Washington's iSchool, which includes informatics, and the HIM department should be well prepared to understand what the data storage options of the future are likely to be.
As health professionals and machines continue to partner at increasing rates, HIM competencies must reflect that development. In fact, AI's effects on privacy and security are a whole career domain unto themselves.
— Sandra Nunn, MA, RHIA, CHP, is a contributing editor at For The Record and principal of KAMC Consulting in Albuquerque, New Mexico.