Mining Untapped Data

Home | Subscribe | Resources | Reprints | Writers' Guidelines

March 2013

Mining Untapped Data
By David Yeager
For The Record
Vol. 25 No. 5 P. 10

Several roadblocks are preventing healthcare organizations from becoming information boomtowns.

Like any sort of mining operation, data mining requires significant infrastructure and spadework before anything of value can be uncovered. But as fans of the hit TV series Gold Rush know, once the operation is functional, the investment can pay for itself many times over. Although there still is plenty of work to do in the healthcare sphere, data-mining technology can pay dividends for providers.

Many in the healthcare industry view data mining as a potentially paradigm-shifting technology. It offers the possibility of not only providing decision support to clinicians for tasks such as diagnosis, choosing treatment options, and predicting prognosis but also allows hospitals and other providers to predict long-term trends such as staffing and inventory needs, demographic changes, and market shifts. It also may lead to the development of tools that patients can use to monitor their own health behaviors.

Unlike statistical analysis, which relies strictly on numeric data and uses data sampling to test hypotheses, data mining analyzes numeric, categorical, and multimedia data, such as CTs and MRIs, and completes data sets to reveal underlying patterns. Not surprisingly, data-mining algorithms are more complex than their statistical analysis counterparts, often filling an entire page compared with a single line for statistical analysis. Data mining also generates significantly more data.

“Recently, I data mined just a 6-MB file,” says Illhoi Yoo, PhD, an associate professor of health informatics at the University of Missouri School of Medicine in Columbia. “I got an out-of-memory error [indicating lack of system memory] with a 64-GB workstation after running two full days. You never get such an error with statistical analysis.”

Yoo, lead author of a study on data mining in healthcare published in the August 2012 issue of the Journal of Medical Systems, says the three most widely used data-mining algorithms classify, cluster, and associate data. Classification is used to group data into predefined categories that can help define diagnoses and prognoses based on symptoms and health conditions. Clustering is used to group data objects so that objects within a cluster have many similarities, and objects from different clusters have few similarities, such as with biological taxonomies that group plants and animals by their attributes. Clustering is useful for exploratory studies that have a large amount of undefined data, such as microarray studies of DNA and gene expression.

Association is used to discover hidden patterns in large databases, such as sales patterns or relationships among items that are purchased. For healthcare purposes, association may be able to uncover relationships among symptoms, health conditions, and diseases, allowing researchers to develop evidence-based hypotheses about how illnesses and their complications are formed.

Yoo’s study contains examples of data mining being used to prevent healthcare fraud by developing predictive models that detect unusual claims data patterns, improve Medicare reimbursements by finding underdiagnosed patients when their conditions are less costly to treat, and reduce healthcare costs by identifying and classifying at-risk patients who may benefit from targeted interventions and disease prevention plans. The study also notes many potential uses for data mining in medical research, such as predicting survival rates for medical conditions or a patient’s risk of developing cancer. It also can identify potential drug interactions and relationships between certain medical conditions.

Despite its immense potential, data mining generally is considered to be an underutilized resource. There are several reasons the technology remains largely untapped.

Staking a Claim
A frequent hurdle is access. The rules that currently govern protected health information (PHI) make it difficult to share data. Joe Alea, chief technology officer and senior vice president of development for Curaspan Health Group, says parts of HIPAA need to be modified to better reflect today’s healthcare environment.

“We live in a world where it’s still acceptable to send PHI via the fax machine, which is very unsecure, when there are better, HIPAA-compliant options to send data,” says Alea. “It’s difficult to have the full range of data exchange that you really want unless you’re sharing data on a HIPAA-compliant platform.”

Even if HIPAA concerns are addressed, proprietary data formats in some data storage products can limit access. Alea says companies that make data storage products are sometimes reluctant to share complete data sets. This can make organizationwide data mining difficult, but it’s an even bigger problem when patients use several providers with different types of data storage systems. Better integration among these systems would make data mining more useful and efficient, he notes.

Within an organization, the question of who should be allowed to mine data can be the difference between small and large gains. Often, data mining is strictly the province of the IT department, but organizations that exclude clinicians and researchers may be missing a big opportunity. Yoo says data mining researchers frequently are denied access to data because of privacy and legal concerns. Although privacy and security concerns must be addressed, Francois Ajenstat, director of product management for Tableau Software, says more eyes on the data will increase productivity.

“If you have more people looking at a data set, with different sets of experiences, a variety of skill sets and knowledge, well, maybe the collective organization can actually drive a better result,” he says. “If only the data scientists—experts in statistics or experts in high-end programming—have access to the data, you’re limiting the potential impact that you might get. You might get one really, really amazing insight on those data, but imagine if you could have 100 doctors or 1,000 doctors mine those data for their need, for their research. You would have broader collective good. So freedom of data, which might be scary at first, is actually more empowering and will lead to better decisions. Anybody that could answer questions using data should be empowered to have access to those data.”

A Fool’s Gold?
In recent years, an increasing number of healthcare providers have transitioned to EMRs. At the same time, data storage costs continue to decrease. These factors have resulted in an exponential growth in data volumes, with no end in sight.

With all these data to store—and more piling up every day—the questions of how to store them and whether short- and long-term data should be stored in different places are becoming hot-button topics. While there are valid points on both sides, for data-mining purposes the “where” is less important than the “how.” To realize the data’s full value, tools that allow end users to answer questions are needed.

“Think of a nurse or a doctor. They understand the data, but they don’t necessarily have the tools or the means to be able to access those data and to really make sense of it,” Ajenstat says. “And so there’s this dichotomy: There are more data, yet the people who need access to them can’t get to them.”

The types of tools that clinicians need would allow them to access the entire patient record at the point of care. But the data need to be more than just available; they need be in a format that is easily understood. Visualization technologies that allow users to quickly and easily see patient information, such as preexisting conditions, lab reports, medications, and when they were last treated, can help improve patient care.

Even with better tools, though, medical providers need to pay more attention to data quality. The ongoing data windfall may seem like a boon to data mining, but as is often the case, quality trumps quantity. A great deal of information that is being collected right now is unstructured, making it difficult to mine.

“This is the fundamental problem with healthcare right now,” Alea says. “What happens is because of all of this aggregation of patient data, whether [they’re being used in] real time or long term, it doesn’t really happen very well because data structuring doesn’t happen. Also, from a patient standpoint, there’s a gap of data, so there’s not a clear picture of what’s happening around the entire continuity of care on the patient, which means you have information that lags. The piecemealing of the patient leads to errors. You have diminished productivity.”

The disconnect between clinical and billing data poses a significant hurdle. Since healthcare providers need compensation for their services, data-capture systems often emphasize the billing side. Translating billing data to clinical insight requires a lot of work.

“The biggest barrier is data quality,” Yoo says. “This is because current EMR systems are designed and developed [mainly] for billing purposes rather than clinical purposes. This means, for healthcare data mining, raw data cannot be directly used. Instead, a lot of data extraction [from text such as clinical notes] and transformation are required. This process requires strong domain knowledge and is very labor intensive and time consuming, meaning that even if there are tons of raw data, actual data that can be used for data mining are a few hundreds of records at most.”

To make a big leap in the quality of care, Alea says a comprehensive, patient-centered data model is needed. Rather than scanning information into an EMR, providers need to digitally enter it into their systems in a standardized way so it can be shared among disparate technologies. In addition, methods for analyzing unstructured data, such as PDFs of lab reports and doctor notes, in a structured setting will increase the store of mineable data. Improving data structure also will reduce the number of medical record errors and potentially allow healthcare providers to make faster decisions, lower readmission rates, and reduce lengths of stay.

The Payoff
Despite these hurdles, many in the healthcare industry are using data mining to improve patient care and streamline business processes. David Wiggin, program director for healthcare and life sciences at Teradata, says providers and health plans are using data mining to engage patients in new ways. In some cases, they’re using the results to provide more transparency about patient medical histories. In others, they’re directing patients toward better care and cost outcomes.

“From an analytics perspective, one of the most interesting things that our customers are doing is engaging with their members in much the same way that the retail industry has engaged customers and future customers through multiple channels, targeting the channel of preference for the customer,” Wiggin says. “So health plans and provider organizations are starting to engage with consumers regarding their health and health behaviors, and that’s a game changer.”

Whether the communication is in the form of direct mail, e-mail, text, tweet, or a Facebook ad, Wiggin says data will become an important part of healthcare organizations’ efforts to improve patient satisfaction and attract customers. These types of grassroots efforts may dramatically reduce costs and improve public health. Wiggin believes the convergence of cloud, mobile, social, and big data trends in HIT may open the door to true reform of the healthcare delivery system.

But to what extent? Is it possible that the Internet-driven productivity gains seen in other industries in the mid to late ’90s will soon be realized in healthcare? Wiggin says big changes are coming, most likely sooner rather than later.

“I think it’s just beginning to be under way, but it is coming to healthcare,” he says. “If you think about banking and the simplicity and the convenience of ATMs, and managing banking 24/7, that’s table stakes today. It is becoming table stakes for healthcare, and I think analytics and operational analytics are going to play a key role in improving the delivery of healthcare in this country.”

— David Yeager is a freelance writer and editor based in Royersford, Pennsylvania.