Nebulous Notes Hinder Cancer Research
By Susan Chapman
For The Record
Vol. 31 No. 5 P. 14
Ongoing efforts to utilize data from medical records are attempting to gain insight into cancer care. However, accessing this information can be challenging for researchers.
It’s an exciting time in cancer research, much of it sparked by heretofore unheard of access to valuable data. However, high expectations must be tempered by the knowledge that impressive hurdles remain.
Diving Into Medical Records
There is the possibility of linking genetic data to medical records in cases such as high-level categorizing for particular groups of patients such as those with estrogen-receptor positivity or the BRCA gene.
“However, as knowledge has expanded, the large volume of information captured in sequence data about the patients and cancers cannot be easily stored in the EHR,” notes Andrew S. Kanter, MD, MPH, chief medical officer at Intelligent Medical Objects. “There is a big gap between what is being captured in the medical record initially; the important phenotypes can be captured in the record using clinical interface terminology, but the actual sequence data themselves cannot.
“The gene sequencing data that are currently being stored on a Genomic Archiving and Communication System (GACS) is incredibly granular and is a huge bucket of data,” Kanter continues. “It has to be identified, interpreted, and transferred to the EHR to be useful to clinicians.”
While there have been attempts to link the GACS server and the EHR through machine learning, the challenge remains to transfer high-level information that clinicians are aware of into normal language that can be captured by the EHR. “This process is extremely difficult for the GACS server,” Kanter says.
“The process of connecting genetic data to medical records really depends on the data itself,” says Janet Reynolds, BA, CTR, president of the National Cancer Registrars Association. “When you remove a tumor and find out that it has a mutation that is treatable, you have that information. However, you may have that in cancer registry data but not in medical records because there is not always a call for it to be in the record. If it’s all about the tumor testing, that is part of the medical record and pathology. And there are not codes for all those mutations because additional actionable mutations are identified every year.”
Reynolds notes that in addition to the ongoing evolution of cancer research, patient privacy is a strong consideration because genetic data can affect patient insurability. “Patient-specific testing is private, and tumor-specific testing may or may not appear in the medical record if the identified mutation is not supported by discretely coded fields to facilitate research, rather than buried in narrative text.”
Jennifer Ruhl, MSHCA, RHIT, CCS, CTR (NCI SEER), a certified tumor registrar in the quality control section of the Surveillance, Epidemiology, and End Results (SEER) program, believes it’s possible to link genetic information with the EHR in theory, but that the technology is not yet able to accomplish that goal.
“Other departments are able to link to the EHR in terms of claims data and labs. Artificial intelligence medicine is already connecting medical records and pathology reports,” Ruhl says. “SEER … uses the record layout that all standard setters use so that everything comes back in the same format. This record layout is maintained by the North American Association of Central Cancer Registries (NAACCR). Currently, the format is HTML, but there is a proposal to implement XML format, which is the more common format used today. Once that happens, it will be much easier to create more linkages. Some of the larger hospitals may already have these connections, but it’s kind of a mixed bag. Overall, the technology and processes are not there yet.”
Current and Future Developments in Data Gathering
Reynolds says the requirements for what data must be collected are in a constant state of flux. “Once something becomes interesting to researchers and epidemiologists, we begin to collect the information,” she explains. “In 2018, we saw the emergence of a lot of new fields that have to do with big data and impact the information we collect. Once our submissions are made, other information will be able to be identified. We still collect information on tobacco usage and [BMI], which are patient-actionable factors that are known to prevent cancer. Those who do big data research use that information to find actionable items to help prevent cancers and combine that data with the stages of cancer people have.”
Other methods of data collection include natural language processing, which pulls information from charts and deciphers what data are collected. “But even pulling data from pathology and radiology reports is going to take some time,” Reynolds says. “Cancer registrars and researchers will need to curate the data to determine whether they are usable. It’s challenging because of the nuancing that is required to fully understand the information.”
Kanter believes medical professionals are becoming more adept at capturing translations from the genomic sequence information. “There is tremendous progress at the point of care in both incorporating and engaging the clinicians to use technology in the first place,” he says. “Care providers are much more likely to interact with the health information system now and be clear on what they are capturing—for instance, high-level phenotypes.
“Anatomic pathology results and other information are being used to stage cancer—these are value sets that can be used to look for data to help the computer stage a person as being at risk or not. This is high-level categorization of treatment or prognosis, which has been taking place for about 100 years. Those decisions can be done better with existing data in the record, ultimately appearing as interface terminology in the medical record. There are groupings and the rolling up of data in the medical record, which are new. Through those processes, there have been improvements in decision support and workflows. That said, we haven’t gotten into genomics with those processes yet.”
Open-source EHR software has been especially beneficial to facilities that lack funding, such as individual clinics. Other open-source technologies have also made their mark. “For instance, SEER offers SEER*RSA, which provides information about cancer staging,” Ruhl says. “We also offer other features, such as the newly implemented site-specific data items (SSDIs). That information is available freely to vendors so that they can pull information into their software registries. In particular, our open-source information provides a lot of information on coding. NAACCR owns the SSDIs, but SEER is the repository, and the NAACCR SSDI work group offers guidance for coding.”
Other open sources include CAnswer Forum, which enables registrars to post questions. (Individuals other than cancer registrars must obtain permission to access the forum.) Registry Plus and the Cancer Registry also have a great deal of information available for cancer registrars. Registrars can also contact physicians if they need assistance.
“With all of these sources, patient privacy is maintained, as there are no patient identifiers once the data are submitted to SEER,” Ruhl says.
SEER*Stat, which requires an account, stores all the data. The most recent data, those of 2016, became available last month. “In that database, you can access the data collected by registrars. Again, you do not get patient identifiers,” Ruhl says. “Cancer registrars in the hospital or central registries have access to specific patient identifiers, such as names, addresses, etc, but that information is not transmitted to SEER. So when we access the information in SEER*Stat, we are not able to determine the specific individuals who were diagnosed with cancer.”
Researchers searching for specific case information can make a custom data request. “They can actually do additional research but they never get the patient’s name,” Ruhl says. “Additionally, the [Centers for Disease Control and Prevention’s] National Program of Cancer Registries offers a database that allows people to do research on cancer statistics.”
While there are a lot of different statistical programs that allow different presentations of statistics, “a lot of times we can’t collect information because it’s not documented clearly,” Ruhl says. “For instance, a pathologist puts in a ‘mangled’ report. Poor documentation, missing documentation, those are problems that any researcher will encounter.
“You can sometimes have a patient who is diagnosed who then disappears. Some are diagnosed on a death certificate or during an autopsy. Or they are diagnosed and die right away. You may also have a patient who goes through the full treatment course and then survives for a couple of years. You have a hodgepodge of information. This explains a lot of our unknowns in the cancer registry data. You really don’t know what you’re going to encounter. We can get great information from the larger hospitals, but not such useful information from smaller hospitals, which oftentimes do not have the equipment to provide full treatment. Because of this, patients will move to larger hospitals for the treatment they need, and then we are not always able to track them through the medical record.”
Some cancers are not diagnosed until the disease has progressed significantly. For example, Ruhl recalls a case in which a woman noticed a breast lump, did not take action, and arrived several months later at the emergency department with difficulty breathing. “This turned out to be stage IV breast cancer. Even though the woman’s physicians were aggressive in their treatment, she died six months later,” she says.
Other examples include patients who lack insurance for initial treatment but then were able to receive Medicaid and well-to-do patients who did not believe in seeing doctors then later presented with stage IV cancer.
Cancer research efforts can struggle because the level of detail is often lacking when medical record data are exchanged. “Clinical Documentation Architecture (CDA), the format in which hospitals give you your record, tells you what you’re reading and the codes you have to use, but it doesn’t necessarily include all the detailed codes—for example, that you have stage III breast cancer,” Kanter says. “The way most EHRs use CDAs basically dumbs things down by not capturing all that detail. One reason that machine learning and artificial intelligence systems fail is that the data they are being trained on or are triggering is incomplete. It looks like all breast cancers are the same; in reality, they aren’t.”
The Fast Healthcare Interoperability Resources (FHIR) standard allows patient data to move between information systems in a compact manner. “You can think of it like the box that the data are being shipped in. It doesn’t say on the box that things in the box are as detailed as they need to be,” Kanter explains. “When you’re talking about gene variants, you can’t determine what they are when you open the box. The way data are currently tagged, we don’t know how to interpret what is in the box. There are various ways in which people are interpreting the gene sequence. A really interesting thing is that the patient has a gene and the cancer cell has a gene. But is there a medical record for the cancer cells? For example, the BRCA gene is in the patient. If the patient gets breast cancer, that gene becomes more important for the patient’s relatives, not the patient.
“Another patient gene that is important has to do with how a patient metabolizes a chemotherapy drug. If I give a high dose to someone who doesn’t metabolize that medication well, I could give them a toxic dose. Within the cancer itself, there may be gene sequences which can help determine the best treatment or imply different prognoses. Keeping track of all these genes can be quite challenging, and this level of information isn’t something that is going to be easily automated in the near future.”
There are efforts underway to standardize how gene sequences are recorded. For example, through the Human Genome Variation Society (HGVS), it’s possible to connect the EHR with sequence information by interpreting the HGVS designation and using other standards such as FHIR and CDS Hooks (an open-source, clinical decision support standard).
This development has far-reaching, significant effects. “If you have a gene that is recorded in this HGVS format and you are prescribed a particular drug that is impacted by that gene, you can warn the physician that the drug should be avoided or used in a smaller dose because the person does not have the mechanism to break it down,” Kanter says.
A recent study examined these types of interactions. “A Pharmacogenomics Clinical Decision Support Service Based on FHIR and CDS Hooks” looked at pharmacogenomics (PGx), drug-gene interaction, to determine whether PGx could “one day become as much a commodity in EHRs as drug-drug and drug-allergy checking.”
Researchers “found that PGx CDS based on FHIR and CDS Hooks appears to represent a promising means of genomics-EHR integration. More real-world testing along with a set of use-case driven GACS interface requirements will push us closer to the US National Human Genome Research Institute vision of a plug-in PGx app.”
“There are also ways to capture patient outcomes and events in patient-speak,” Kanter says. “The devices that people are using to interact with the health care system, such as smartphones and tablets, allow patients to input information on potentially adverse events, which can then be put back into the EHR to help, or warn the clinician that something unexpected has happened. And this can happen even before the person returns to the clinic.
“Another interesting implication of how including patient-generated information could work is that perhaps there is a rule for the physician to order a test for a patient within a certain period of time. That patient can then interject something into the record which impacts that rule to say, ‘I had it done somewhere else,’ ‘I was sick,’ or ‘I’m having worrisome symptoms.’”
Kanter notes that cancer trials have numerous formal monitoring systems in place that allow patients to interact with technology. “There are innovative organizations that give people a prescription for an app that provides a much more interactive experience,” he says. “They currently are the minority, though.”
Ultimately, cancer researchers’ ability to access important data through the EHR—while progressing and showing great promise—still has a ways to go.
— Susan Chapman is a Los Angeles–based freelance writer.