Realistic Expectations Key to Speech Recognition Success

Home | Subscribe | Resources | Reprints | Writers' Guidelines

February 2015

Realistic Expectations Key to Speech Recognition Success
By Susan Chapman
For The Record
Vol. 27 No. 2 P. 16

The technology has its drawbacks, but adding an experienced transcriptionist to the equation can improve its prospects.

No technology is infallible, but often that doesn't prevent users from expecting perfection. Whether it's an EMR or computer-assisted coding, health care professionals frequently demand the tool to solve all their problems. The same goes for speech recognition.

The Issues
Speech recognition can be an efficient tool for health care organizations and practitioners to meet their workflow needs. However, some medical transcription service organizations are reporting problems with backend transcription, including missing blocks of text and other errors.

"My company works with M*Modal through a platform provider," says Linda M. Sullivan, CEO of NEMT. "The technology had been improving. However, recently we noticed a change in how we receive transcripts. Physicians who had been doing well with speech recognition had to be taken out of the loop by our transcriptionists. Blocks of text were missing from their dictation. The explanation was that the audio was inaudible, but we relistened to the audio files, and they seemed fine."

Juergen Fritsch, chief scientist at M*Modal, notes that overall the technology has matured significantly and is highly accurate when used correctly. "You don't have the old issues that you had five or 10 years ago," he says. Still, Fritsch admits the technology is not without flaws. "If we're working directly with transcriptionists, it is sometimes better to leave the text out than to send back incorrect text when the audio file is inaudible or hard to decipher," he says.

Fritsch says because transcriptionists are more in tune with what their physicians are trying to say, the technology does not fill in the blanks so as not to point transcriptionists in the wrong direction. M*Modal can apply a confidence measure, and the technology self-assesses the quality of the document it creates. "If the audio is not clear and below a certain confidence measure, then we leave the document blank rather than send something that's incorrect," he says. "Conversely, physicians can get annoyed by this."

M*Modal also uses a color-coding system to indicate terms that may be incorrectly transcribed. "This gives physicians the opportunity to focus on those words and see if they need to edit or correct them," Fritsch says. "Some physicians like it and find it useful, and others see it as a distraction." M*Modal offers this capability as a user preference.
"In business-to-business settings, we basically provide the underlying technology, but the intermediary applies it as it sees fit, and the organization doesn't necessarily show all available information to the user," Fritsch says.

Claudia Tessier, RHIA, MEd, who has been working with speech recognition software for many years, notes that in the 1980s, the industry believed the technology would replace transcription in about five years. "There was the naïve expectation that this was the solution for documentation," she says. "But it's neither quick nor easy. Documentation is more than spoken and written words. The whole record needs to be accurate and complete regardless of how it's done."

"One block of dictation or even one character can change the whole meaning of the record," says Sherry Roth, PA-C, DF-AAPA, CHDS, AHDI-F. "If we don't correct errors, we have done the patient a disservice. There can be an error in the record that can be propagated and potentially lead to serious medical errors."

Health care consultant Karen Davis points out the importance of educating physicians about speech recognition's capabilities. "From the physicians' standpoint, they believe they can buy speech recognition software and that's all they need to do to have an accurate record. They underestimate what transcriptionists do," she says. "They buy a package and don't evaluate how it enhances their workflow. They tend to put any change in workflow back on transcriptionists, who are now taking less money to do something more difficult."

Davis adds that physicians are not in a position to analyze outcomes, while noting that a 98% accuracy rate means that roughly one in 50 words is not being recognized. "Not knowing that, the physicians place the burden on the transcriptionists, from whom the doctors believe they can get perfect accuracy and formatting with a small amount of effort," she says.

Tessier says speech recognition works best when both review and editing components are included. "It's not a stand-alone solution," she says. "Some systems work better with certain populations than others. Some environments are better prepared to use the technology successfully. The success of speech recognition is in part due to the software and some of it depends on the work culture. There can be resistance from physicians dictating and also from transcriptionists who may feel threatened that their jobs are in jeopardy."

The Transcriptionist's Role
Shifting to an editor role can be challenging for transcriptionists, says Sullivan, noting that the skill sets are quite different. "Many transcriptionists do not make the transition," she says. "And editors make about half the amount of money as traditional transcriptionists. They are not being paid to type, yet—particularly with the issue of missing text—they have to re-create entire paragraphs by relistening to the audio and typing the missing paragraphs themselves."

Tessier says transcriptionists must become integrated into the speech recognition process. "If a software system proves itself to have the potential it purports to have, then we have to find a way to adapt to it," she says. "There is an increasing willingness to adopt speech recognition in the health care professional community. Many clinicians prefer it to point and click, and the transcriptionist's role is diminishing. Good speech recognition can more quickly accomplish our goal of improved documentation, which means we can use the minds and experience of the transcriptionists with the capability to enhance what the software can do."

Tessier points out that transcriptionists can use their medical, terminology, and grammatical knowledge to complement speech recognition. "The combination of the talented transcriptionist with the good speech recognition software can be an enormous benefit," she notes.

Despite those talents, transcriptionists have been phased out at a growing number of organizations. "Some facilities have eliminated transcriptionists altogether and now depend on their QA [quality assurance] staff to catch the errors," Davis says. "There are a lot of sales pitches that include unrealistic return-on-investment statistics because of the promise that transcriptionists are no longer necessary. But the reality needs to be somewhere in the middle, and people make a whole lot of mistakes in a speech recognition environment before they realize that."

Why Errors Occur
How and where audio is captured plays a leading role in whether speech recognition is successful. For example, dictation captured over the phone can produce a lower quality sample. Noise conditions in general can be another problem. "If you're in a quiet office as opposed to being on a noisy floor, that can be an issue," Fritsch says. "If you know that it will impact the speech recognition results, then you'll pick a quiet place. The technology doesn't work as well with background noise."

Many customer complaints stem from a lack of basic training, he says, with many, such as microphone placement, being easy fixes. For example, a physician who places the microphone too far away or at the wrong angle, or changes the distance during dictation will experience poor results. Another issue is "cross talk," which occurs when a physician dictates while colleagues are doing the same nearby.

"There is a need to educate everyone on what the technology can and cannot do," Fritsch says. "If you have a user who thinks about what he's saying ahead of time and then records, he'll get nearly perfect accuracy all the time. If you get someone who thinks while talking, changes things, and creates nongrammatical sentences, then the results will be different. The technology is prepared to hear well-formed English sentences. Thinking about what you want to say before you speak gets a better result."

Roth concurs: "Because I know what goes on on the transcription end, I know what to do in order to get good speech recognition documents. The program is very faithful to what I say because I think before I record. On the other hand, if you have people who don't know what they're going to say and record as they think, have heavy accents, are eating, are in a noisy environment, or mumble, the results are less than ideal."

Improving Results
For speech recognition to be its most efficient, it must be an integrated process featuring a clinician who speaks clearly in a noise-free environment, a transcriptionist who recognizes and corrects errors, and a suitable EMR. "If you don't have all those factors working together, then you have a problem," Tessier says. "I think that both training clinicians to use the software to its maximum capacity and training the speech recognition system to recognize the voice of the physician are vital."

Davis believes organizations must evaluate their report types and determine physician attitudes toward handling documentation requirements. "Some physicians would rather see more patients than edit for an hour each day," she says. "A lot of packages out there are more template-driven to feed straight into an EMR, which means that their narratives can get lost and must be edited to work with the record."

Roth says well-educated transcriptionists can help reduce the number of EMR errors. To encourage continuing education, some industry organizations advocate a type of licensure or certification to compel transcriptionists to stay current on medication names, equipment, diagnoses, and other important information.

"Transcriptionists definitely need to keep up with the times," she says. "If you want to work for a transcription company, you have to be able to work with the technology and understand that you have to listen differently from how you would when doing traditional transcription. For instance, as I edit, I read reports aloud. If it doesn't match the recording, then I know to fix it. Your eyes can very easily slide over something. You can think you see what you hear if you don't say it aloud."

Davis says many programs don't save voice files, meaning if a physician edits a transcription one month later, there is no audio record to compare the written transcript against the audio file. "A lot of speech recognition software programs are not set up to save voice files like in traditional dictation," she says. "It's a space issue, based on saving files to a computer. A program like that is not cloud-based. Consequently, once it thinks you've finished, edited, and signed off, the voice file is erased.

"Because of this critical issue and others, facilities need to be sure that the software has cloud-based, roaming voice profiles and computer-assisted physician documentation," she continues. "The packages that allow you to send to transcription and editing are the way to go, and facilities have to have an editing plan in place for critical errors."

Fritsch says speaker training is another area health care organizations can beef up to increase the chances of a successful speech recognition implementation. "The more training the better, but the basics can be learned in as little as 30 to 60 minutes," he says. "We are making a really hard push to make people aware of that. It's not sufficient for people to just buy the technology and put it on their computer; they need to get training to show them what they can and cannot do and how to dictate properly. When errors occur, it is not the fault of the technology or the user. It has much more to do with expectation setting and proper training.

"You have a lot of users out there who are nontechnical and may not be interested in how the technology works, so they need to be trained on how they can get the best results and why speech recognition can be so much more efficient for their workflow than working without it."

Fritsch says being reasonable about what to expect from speech recognition enhances its efficiency. "There's an incorrect expectation that speech recognition will work the way it does on Star Trek, especially in the medical community," he says. "By that I mean, some physicians are trying to do everything by just using their voices. However, some tasks are just better done with a keyboard and mouse. It's questionable if speech recognition is better for entering vital signs, for example, as opposed to a keyboard and mouse. My recommendation is to use the technology for the right thing. Don't use it for everything. Use a hammer when you need a hammer. There are times when it's appropriate for speech and then when other input modalities are more helpful."

— Susan Chapman is a Los Angeles-based writer.