Home  |   Subscribe  |   Resources  |   Reprints  |   Writers' Guidelines

Winter 2022

Modernizing the Aggregation, Usability of Cancer Data
By Elizabeth S. Goar
For The Record
Vol. 34 No. 1 P. 12

The good news is that there may be multifaceted solutions available to solve multipronged challenges.

Since 1992, a population-based surveillance system of cancer registries established by the Centers for Disease Control and Prevention’s National Program of Cancer Registries has been collecting and sharing detailed information about cancer cases to identify trends and treatments. However, it has struggled to keep pace with the explosive growth in the volume and scope of critical data—a challenge exacerbated by the limitations of EHRs and other clinical information systems that serve as the source of that information.

While cancer registrars extrapolate a significant amount of information from EHRs, including patient demographics, cancer types, treatments, and follow-ups, “the biggest hurdle we face is the lack of documentation provided in the medical record,” says Suzanne Neve, RHIA, CTR, director of cancer registry with Medical Records Associates.

Compounding the problem is the need to manually comb through the entire medical record to identify the nuggets of data they need, all of which “goes into the cancer registry data manually by the [cancer registrar],” Neve says.

In fact, documentation and automation are issues shared by both population-based and hospital-based cancer registries—obstacles that impact both clinician usability and the ability to capture longitudinal data, according to Robert Miller, MD, FACP, FASCO, FAMIA, medical director of CancerLinQ, a not-for-profit subsidiary of the American Society of Clinical Oncology (ASCO).

“When someone is first diagnosed, a registry will get information about the patient, about the cancer, the stage of cancer, the type of the tumor cells, and the histology of the cancer. You may get the initial treatment. But usually there are no updates beyond the initial treatment,” he says. “Electronic health records are not particularly well designed for clinical care. They’re designed more for billing and administrative purposes. [In terms of] actually understanding the journey of the patient who has cancer, EHRs don't do a good job.”

Multipronged Challenges
The two primary types of cancer registries are population based and hospital based. Both receive and collect data about cancer patients, but for different purposes. While population-based registries record all cases in a defined geographic population for epidemiologic and public health purposes, hospital-based registries maintain data on a facility’s or health system’s specific patient population with a focus on improving outcomes. They both, however, struggle with the same data issues.

“Part of the challenge is that [data abstracting] is still a largely manual process, so the data that are getting into them is often not real time. It’s delayed by many months and, in some cases, a year or more,” Miller says. “The registries are well aware of this and they’re working on it, but we need more automated data flows between these various electronic systems. Then we'll have more information about what's going on in populations as well as in individual hospitals.”

The challenges confronting cancer registries when it comes to eliminating the painstaking manual processes involved in abstracting are multifaceted. One is identifying all the information that is contained within the patient’s medical record that should be extrapolated to a registry. Along with patient demographics, targeted data could include the histologic type (eg, small cell carcinoma vs lung cancer), behavior code (malignant or benign), TNM staging, genomic data, and social determinants of health.

The challenge, according to Celeste Adams, PharmD, solutions engineer for Health Language at Wolters Kluwer, is that these data must be aggregated and normalized from multiple EHRs and other clinical systems.

“Unstructured notes such as pathology and radiology reports, progress notes, and operative notes are difficult to extract data from manually,” she says, adding that phone calls and faxes are still common retrieval methods, resulting in unstandardized data.

“Once data are collected, they are often shared electronically, but we need a more efficient method of curating the data in the first place,” Adams says.

Another challenge is that even before data can be curated, they must make their way into the EHR—and do so in a usable format. One problem is the lack of interoperability that leads to information still being delivered via phone or fax, and/or in a pdf format. The other is that EHRs themselves are not keeping pace with the evolving types of data oncology needs for advanced therapies, including genomic data that drives molecular medicine and immunotherapy.

According to Miller, “EHRs don't do a very good job right now, at least in their native forms, of capturing structured genetic information. What happens, unfortunately, is that because a lot of [genetic] testing is done at third-party reference labs outside of the health system, the way that data comes back into the EHR is typically as a pdf. Or it sometimes comes back as a fax, which, God help us, still exists. It’s generally noncomputable; it can't go into any electronic health record in a way that it can be easily retrieved or queried or sorted. One of the biggest challenges with EHRs is that important oncology data don’t start in the chain as structured.”

Neve says the constantly evolving cancer registry rules and regulations exacerbate the issues surrounding extracting cancer data from EHRs. For example, she points to the overhaul of codes and instructions that took effect with 2018 cases “that left many registries across the country unable to abstract and submit cases to their states for over a year, therefore leaving cancer registries with large backlogs [and] making it extremely difficult for EMR developers to keep up with this demand.

“The biggest hurdles that we face are the yearly changes to the industry standards. This would make it difficult to extrapolate the relevant information in a timely manner.”

Multifaceted Solutions
As is often the case when discussing solutions to data collection and sharing challenges, standards are integral. For example, Adams notes that usable formats for interoperability vary by institution and EHR. Additionally, data being reported to registries need to be codified and structured in specific formats and coding systems generally do not align with how EHRs capture data.

She points to a registry requirement to use ICD-0-3 for reporting histologic type and behavior codes. While not often captured in most EHRs, it is “critical for registry reporting, interoperability; internal and external; and in designing appropriate treatment plans that consider not only the site of the tumor but the specific histology. At the same time, it is vital to capture the most specific ICD-10-CM code for billing and other administrative purposes,” Adams says.

Also problematic is that information regarding tumor stage and grade often comes from radiology or pathology reports. These “are unstructured and commonly require manual intervention to extract data” which then “must be transformed [because] while standards do exist, they are not mature,” Adams says.

ASCO is working toward a solution in part through CancerLinQ, which currently works with about 110 cancer centers and oncology practices nationwide to pull data from their EHRs, clean them up, and return everything in a usable format. The service includes access to reports and dashboards that help practices track performance using clinical quality metrics approved by ASCO—some of which are endorsed by the National Quality Forum—as well as compare performance against national benchmarks.

An important aspect of CancerLinQ is working with various standards organizations such as HL7 “to try to define and support well-developed, useful, and clinically relevant data standards,” Miller says. “We are taking a multipronged approach, but there is a gap right now” that CancerLinQ is filling until such time as standards and EHR capabilities are in place to meet the unique needs of oncology and cancer registries.

“We don’t aspire to replace the electronic health record,” he says. “We have a multipronged approach where we … are trying to define what the right standards are, which is some of the work we’ve been doing with HL7 and others. Once it’s clear that EHR customers, the hospitals and health systems, need data delivered in a certain way, it will become incumbent upon the other members of the ecosystem—the laboratories that produce genetic information and the various machines like radiation that connect to the electronic health record—to make greater use of those standards.”

The goal, Miller says, is for data to filter into the EHR using a set of common data standards, “so that everyone is speaking the same language. Then we can come in and do some additional translation from a specialty standpoint. EHRs need to get better at capturing structured data of all types and [in terms of] data that are brought in externally, there needs to be as much attention to that as to what happens during care that's rendered in the office.”

Customization Strategies
Beyond standards, some tweaks to EHR design could make a world of difference to the registrars tasked with abstracting cancer data. Neve says that in an ideal world, text, demographic, and surgery fields would be completed automatically by the import of data directly from the medical record.

The best approach, she says, would be the addition of a section to the EHR that is dedicated to cancer registry information—something akin to the section in Epic systems where tumor board and survivorship information can be entered.

To avoid redundant documentation, a check box within the note could be programmed to trigger an automatic upload to a special folder that “will contain all the necessary documentation, labs, op notes, pathology notes, history and physical pertaining to that patient’s cancer,” Neve says.

She also suggests that a hospital’s medical records committee is in the best position to guide any improvements to the collection and sharing of cancer data, noting that “they are the voices behind the EMR and drive the overall completion of each patient record by the medical staff. The only way to ensure that the adequate and proper information is extrapolated from the EMR is to ensure that the data going into the record are complete and accurate.”

Natural language processing (NLP) also could play a significant role in improving the capture and abstracting of cancer data, according to Adams. Pathology reports, progress notes, operative notes, and radiology reports are often entered as free text and therefore require manual intervention or automated intervention such as NLP or clinical NLP (cNLP), which reduces the manual effort required.

Adams cites MD Anderson Cancer Center, which collaborated with Health Language to develop a terminology strategy to improve the quality and the depth of clinical data. It involves NLP with specialized terminologies, high-volume semantic mapping for nonstandardized source terms, and authoring and maintaining proprietary terminologies or ontologies.

By focusing on advanced terminology services, MD Anderson improved several dimensions of data quality. For example, specialized oncology terms are now available in the organization’s EHR for access at the point of care. This allows capture of the site and histology of the cancers in standardized terminology (ICD-O-3), reducing the need for manual abstraction for registry reporting purposes. Data from disparate sources are also harmonized to provide a single view for data insights.

Another example of leveraging technology to improve cancer data analytics is COTA Healthcare, which provides real-world oncology data and analytics to provider and payer organizations. The company created a “digital code” for each patient’s cancer journey to aggregate patients and view unwarranted variation in treatment plans, outcomes, and costs by leveraging a combination of the following:

• standardized codes from HemOnc to represent treatment regimens in data;

• the Observational Medical Outcomes Partnership Common Data Model to take data from different sources and translate them into a common format for analytics; and

• cNLP from Wolters Kluwer Health Language to standardize ontologies and manage updates to ICD-10 codes.

This, notes Adams, is just one of several use cases for NLP in the oncology space. She says that cNLP and NLP play a significant role in data extraction from EHRs for registry reporting and clinical decision support, outcome reporting for research, and toxicities/symptoms/side effects.

“Through functionality that enables extraction, conversion, and mapping of free text to industry standards, cNLP ensures interoperability and meaningful exchange of data assets,” Adams says.

— Elizabeth S. Goar is a freelance writer based in Wisconsin.