Speech Recognition: Hype vs. Reality

Home | Subscribe | Resources | Reprints | Writers' Guidelines

February 16, 2009

Speech Recognition: Hype vs. Reality
By Brenda J. Hurley, CMT, AHDI-F
For The Record
Vol. 21 No. 4 P. 20

A joint MTIA/AHDI task force hopes to make it easier for healthcare organizations to separate fact from fable.

If you are considering speech recognition technology (SRT), it’s hard to know where to turn for guidance. Each vendor promises that its solution is the best, but after hearing far too many colleagues report less-than-optimal satisfaction with their choices, your instincts tell you to proceed with caution.

If this sounds familiar or if SRT adoption has just seemed too confusing or worrisome to attempt, you could benefit from the recent work done by representatives from the Medical Transcription Industry Association (MTIA) and the Association for Healthcare Documentation Integrity (AHDI). The organizations formed a task force that drafted a white paper to address the risks and benefits of SRT use in healthcare documentation. While the drafted white paper is only the first step in this enormous project, the task force has used it as a basis for developing a survey to gather additional data, with the expectation to publish a guide on speech recognition adoption for healthcare documentation.

George Catuogno, the founding chair of the Work Group on Speech Recognition Adoption and Impact and the white paper’s primary author, recalls the motivation for originating the joint project. “The goal of this project was to provide understanding about speech recognition technology based on information that could be validated and subsequently used to help drive standards and metrics to help consumers make more objective assessments,” he says. “For example, if the technology offers efficiencies that lead to cost savings, how are those efficiencies measured, and are they standard industrywide to help consumers make objective and comparative assessments? Many consumers may fall short in their understanding of the impact the technology has operationally and what risks it presents. The technology is clearly valuable and impacting. The better understanding consumers have about these and other issues and the better ability they have to ask the right questions and make objective assessments, the better they can manage decisions relating to the technology and mitigate their risk.”

The project identified and studied the following factors:

• standardized terminology to aid communication;

• the role of human intelligence (such as medical transcriptionists [MTs]);

• productivity for both back-end and front-end SRT;

• documentation quality;

• documentation turnaround time;

• staffing;

• training, support, and other operational functions;

• when back-end or front-end SRT is the better choice for deployment;

• economic impact on consumers;

• compensation recommendations for producers;

• which metric standards would aid various assessments;

• variables that impact metrics-based assessments;

• elements to gauge return on investment, including hidden costs;

• risk management; and

• the relationship between SRT and the electronic health record (EHR).

Three workgroups were established with varying representation from stakeholder groups (technology providers, transcription service providers, healthcare organizations, and clinicians) to focus on logistics and risk management, economics and trends, and metrics and glossary.

Because it was initiated by the two associations (the AHDI and MTIA) that represent the medical transcription perspective, skeptics may reason that the project will be biased. Not so, says Catuogno. “This initiative was born of the Medical Transcription Industry Association. However, to make it unbiased and credible, we invited constituents from the key stakeholder sectors of healthcare to achieve balanced input in the body of work,” he explains. “As not every issue is black and white, several issues were met with differing perspectives that ultimately may be presented from more than one point of view. This is not intended to be a position paper but rather a document widely accepted by the cross-constituency.”

Indeed, when there were controversial issues presented, different perspectives were included to better understand all sides of the argument. Examples of those 360-degree perspectives include the role of human intelligence in blended solutions (editing documents drafted with back-end SRT), the role of the MT in an SRT world, the advantages and disadvantages of both front-end and back-end SRT, the impact on time required by the healthcare provider for information capture, and best practices in quality.

There were some findings that were predictable that nevertheless provided a new look at common SRT adoption problems. Topping the list was poor dictation habits—hardly a shocker considering it’s also high on the problem list of traditional transcription.

Within the dictation process, whether used for SRT or traditional transcription, the healthcare document process truly begins with dictation. This is a factor that is far too often taken for granted, and it can impact the amount of dollars spent on completing reports, the effectiveness of the technology used, the timeliness of report availability, reimbursement, and even the quality of information within the final report.

Frequently, MTs are provided dictations that are cluttered with background noises—side conversations between the dictator and others, cell phone or portable phone static, beepers going off, sounds that require deciphering from whispers and mumbles—at speeds that would make auctioneers jealous. When asked, many physicians would simply admit that they hate to dictate. Too often, that is obvious in their casual technique and the lack of time devoted to ensuring a clear and concise dictated message.

Dictation is not a perfect process. Several years ago, it was suggested that saving the dictation without transcribing it would be an ideal cost-saving solution, thus eliminating the need for MTs. That sounded like a great idea, until it became apparent that listening to dictation was not easy (due to the aforementioned reasons) and not an efficient way for healthcare providers to get the information they needed. Many for the first time heard “real” dictation recorded by other healthcare professionals. They found it amazing that they could not understand the dictation performed by their own colleagues, although they have frequent conversations with them and felt they could understand them perfectly during those times. Dictation is asynchronous and allows no opportunity to ask questions or seek clarification from the dictator as to what was intended.

Speech recognition is not a silver bullet for poor dictation habits. Whether the dictation is interpreted by a speech recognition engine or by the ear of an MT, applying dictation best practices will provide a greater opportunity for accurate and timely documentation through improved translation accuracy (the measure comparing what was dictated against what a speech recognition engine generated as a first draft document).

A detailed review of quality assessment (QA) principles and practices, specifically those QA challenges with SRT, must be accomplished to ensure that best practices are followed for all documents, whether generated by traditional transcription or through front-end or back-end SRT. For example, routine QA reviews are often overlooked or considered unnecessary for documents produced from front-end SRT because they were created solely by the healthcare provider aided by SRT. It’s important to develop data to confirm or deny the accuracy of this assumption.

Benchmarks also need to be established to measure and quantify parameters related to the final quality of speech recognition-drafted and medical transcription editor (MTE)-corrected documents to find a reasonable balance for both quality and productivity expectations. Some of those common variances to consider when developing metrics for quality and productivity include the author, dictator skills, translation accuracy score, work type, document format, account handling requirements, SRT platform, MTE experience, and training effectiveness for the MTE.

The transition from MT to MTE is not always easy—the craft of editing speech-recognized drafts is different from that of traditional transcription. The role of the MTE requires a specialized skill set. Traditional transcription requires MTs to correct their own mistakes, while speech-recognized editing requires that skill, as well as the correction of the existing textual errors. The complacency factor is another problem that exists with speech recognition editing, as it is easy to trust what was translated or for the brain to be tricked that what it read is what was said. The MTE needs a disciplined eye/ear/brain function and additional navigational skills to perform the technical functions related to editing documents on speech recognition platforms.

Because of these factors, not all MTs make good MTEs. Some highly skilled MTs who are extremely proficient in traditional transcription may not be the best candidates for this transition for various reasons, including frustration with the new skill set and/or the failure to achieve improved efficiency. Those MTs who can no longer withstand the physical demands of transcribing for long hours may find that speech-recognized editing offers them an opportunity to extend their careers.

Training is an important success factor for all SRT users. In front-end SRT, the dictator should learn how to use voice commands for removing or deleting word phrases, how to insert text, the proper correction technique for recognition errors, formatting and cursor placement within the document, how to create macros, and how to command “normals” within the text. For back-end SRT, the MTE should learn the proper correction technique for recognition errors, efficient editing techniques, and managing vocabularies (updating word lists).

It is likely that anyone who has an interest in this topic will derive something significant from the MTIA/AHDI project. Catuogno saw two key pearls emerge. First was the role played by MTs. “What we really uncovered was the importance of human intelligence in the form of the individual [who] performs the documentation,” he says. “Deploying SRT without the medical transcriptionist is like running a household with only one parent. Yes, you can do it, but at what cost? There are a number of findings in our study to date that reveal an important role for the medical transcriptionist, not only in speech recognition use efficacy but also in overcoming EHR adoption relative to clinician usability.”

The project also gave Catuogno further insight into SRT’s influence, both good and bad. “Speech recognition translations are not perfect. It enhances productivity but introduces a different learning skill on the part of the individual making corrections of draft documents. When reading text and listening to audio, the eye may trick that individual into believing what was heard and seen is correct when in fact it is not,” he says. “To effectively correct draft documents, the skill of coordinating an extrasensory perception must be honed to avoid introducing errors that would not be introduced through traditional manual documentation practices. It should be noted too, however, that while SRT may introduce this risk of a new type of accuracy error, it also introduces an offsetting aid to the individual making corrections by appropriately translating many words that might otherwise be documented incorrectly through the manual processes. It’s a give-and-take scenario, so to note the risk without the benefit or vice versa would be to tell only half of the story.”

Indeed, the importance of human intelligence required to appropriately use SRT in the healthcare documentation process has often been overlooked. Advanced technology may be able to help with the task, but it fails in the area of critical thinking.

While it was the aim of the MTIA/AHDI project to offer insights that will help consumers make more informed adoption decisions and better prepare for implementation, based on the research and findings in the drafted white paper, there was a clear need for more widely represented thought leadership, data collection, analytics, and case studies. The white paper is merely a starting point for an ensuing speech recognition adoption guide. Currently, the task force is surveying various stakeholder groups to gather more data. Readers with qualifying expertise, data, or interest in participating in case studies are invited to come forward as contributors. (Visit www.surveymonkey.com/s.aspx?sm=i8iCjKXavWiYdeYi6EzEaQ_3d_3d to complete the survey. Others interested in contributing in any form may contact george.catuogno@sten-tel.com.)

For more than two decades, SRT has been expected to change the documentation process for healthcare providers, with some suggesting it may even eliminate the need for MTs. Lots of promises have been made along its journey, so many in fact that those who evaluate SRT systems find it increasingly difficult to filter the hype from the reality. When results are published and the recommendations embraced, the probability of success from implementing and utilizing SRT will no doubt be greatly improved. Admittedly, some challenges may be difficult, but the benefits will truly be worth the effort.

— Brenda J. Hurley, CMT, AHDI-F, is director of operations at SoftScript.