Home  |   Subscribe  |   Resources  |   Reprints  |   Writers' Guidelines

February 2016

Front- vs Back-End Speech Recognition: Which Fits Better?
By Elizabeth S. Roop
For The Record
Vol. 28 No. 2 P. 14

To make an educated decision on what type will get results at your organization, several factors must be taken into consideration.

With the research firm KLAS reporting that 90% of hospitals plan to expand deployment and peer60 finding that 57% of hospitals are considering immediate adoption or are planning to make the move within the next two years, the speeding train that is speech recognition (SR) is seemingly unstoppable—that is, except for two major barriers: documentation quality and physician satisfaction.

The latter has turned out to be the most intractable issue—not that it has slowed the train in any meaningful way. In "Front-End Speech 2014: Functionality Doesn't Trump Physician Resistance," KLAS researchers found that 50% of provider organizations consider clinician adoption as the top barrier to expanding use of front-end SR, with most objecting to the impact it has on their workflows.

Nonetheless, nine out of 10 facilities plan to expand the use of front-end SR, sending a clear signal to skeptic physicians that continued resistance will be futile.

The reason is simple: The benefits to the organization as a whole far outweigh the issues clinicians have with the technology. It doesn't have to be a battle of wills, however. By evaluating whether front- or back-end SR, or a hybrid of the two, best meets clinical and organizational needs and then focusing on properly integrating it into the clinical workflow, it's possible to achieve high levels of quality and physician satisfaction.

"You have to consider the workflow in addition to the technology. Health care is as guilty of this as any industry," says John Gobron, CEO of Aventura, which provides situational awareness technology to the health care industry (see sidebar). "You get cool technology and put it with mandated EHR use and it results in a really messy user experience if workflow isn't considered."

The Workflow Conundrum
In fact, workflow can influence which type of SR a physician prefers. Specifically, where they are in the clinical workflow, according to Reid Coleman, MD, chief medical informatics officer of evidence-based medicine for Nuance Communications. An internist, Coleman says physicians tend to prefer front-end recognition when they are involved in the more cognitive aspects of documentation, such as writing an encounter note. With front-end, the physician dictates, self-edits, and signs the note, which immediately becomes part of the patient's record. It also allows the physician to navigate to different parts of the note, essentially "using the note as part of the thought process.

"So in a situation where I am trying to assess what is wrong with a patient and how best to treat and communicate with other caregivers, front-end is great," he says. "On the other hand, if I'm doing a procedure, that note is almost always the same. It's very structured, so the physician is speaking as fast as he can and will then have that transcribed by someone who can look at it and correct anything the [natural language] engine got wrong. Back-end is ideal for this."

Back-end SR converts dictation into electronic text, which is then edited by a medical transcriptionist (MT), including seeking clarification and missing information from the physician, who then must sign the completed document. It tends to be preferred by MTs because it is easier and faster, but especially because it allows them to focus on higher-end activities such as editing.

Merit Sowards, RHIA, HIM quality specialist with Sarasota Memorial Health Care System and operations manager of Gilbert Medical Transcription Service, agrees that while MTs prefer back-end SR, "physicians are torn. They would like their dictated reports in realtime; however, some are just so inundated that they don't have time to edit their work. … They want to document and have it be accurate."

The editing piece of the equation is also the biggest workflow disrupter for physicians, Sowards adds. "Unless they actually stop what they're doing to dictate and edit, front-end may hinder their workflow more than anything. They may be reading their notes, but it really makes a difference because they are talking about multiple patients and in different stages of treatment."

Despite the potential workflow disruption, Sowards notes that front-end SR has an advantage in terms of the speed with which the information is available in the patient's chart for use in ongoing care decisions. Even with back-end recognition, she adds, the turnaround time (TAT) is much faster—40 minutes for a progress report vs two hours with traditional dictation and transcription.

Tim Ruff, director of solutions management at M*Modal, concurs. Front-end makes final notes available instantaneously, while back-end takes longer because it must be edited before the physician can sign off on the document for entry into the patient record.

"In terms of speed, front-end speech recognition allows users to document directly into their EHR, making final notes available instantaneously. The back-end speech workflow sends the note to editors/medical transcriptionists for review, etc, and there is usually a lag time depending upon TATs," Ruff says. "The use of front-end speech recognition technology can provide the speed, convenience, and efficiency clinicians need to document the record completely and directly into the EHR without detracting from patient care. At the same time, it enables provider organizations to save on transcription costs and helps them realize their EHR investment."

A Question of Quality
However, the speed with which front-end recognition delivers the finalized note into the patient's chart can often come at the expense of quality. "It might not be as accurate," Sowards says. "With back-end … they have the MT on the back end filling in the pieces."

The suggestion that front-end, because it lacks the benefit of a review by an MT, has a higher error rate than traditional transcription has been supported by several recent studies. "Error Rates in Physician Dictation: Quality Assurance and Medical Record Production," published in the International Journal of Health Care Quality Assurance, found that doctors made significant errors in dictation, which, for inpatient records, led to 153 critical errors with speech editing/recognition compared with just 20 with transcription, and 403 major speech editing/recognition errors compared with 82 for transcription.

"The 'once-and-done' design philosophy of EHR/EMR tools overlooks the quality assurance role of medical transcriptionists (among others such as clinical documentation specialists). Removing a QA [quality assurance] step in the workflow thus can have important repercussions on documentation quality," the researchers concluded.

In "Error Rates in Breast Imaging Reports: Comparison of Automatic Speech Recognition and Dictation Transcription," published in the American Journal of Roentgenology, researchers identified at least one major error in 23% of SR-generated reports, compared with just 4% in transcribed reports. Further, after adjusting for academic rank, native language, and imaging modality, reports generated with SR were eight times as likely to contain major errors as conventional dictation transcription reports. This led researchers to conclude that "careful editing of reports generated with [SR] is crucial to minimizing error rates in breast imaging reports."

Ruff notes that front-end SR has evolved considerably over the years, and today provides a fast, highly accurate final product. "That said, just like with transcription, physicians are not absolved from the responsibility to review documentation before it is signed, regardless of how it originates," he says. "In both the world of transcription and front-end speech recognition, physicians are ultimately responsible for documentation content. … By moving the editing process closer to the time of creation, you can actually argue the case that 'critical errors' in documentation are more likely to be caught by physicians who review as expected since the patient's condition and documentation intent are fresh in the physician's mind."

And while front-end SR is the type most often held out as an example when quality is questioned, back-end is not without its own accuracy issues. According to the AHIMA practice brief, "Speech Recognition in the Electronic Health Record," without extremely good recognition accuracy and appropriate editing tools, documents produced by back-end SR may require more time to edit from the synchronized audio file than if they were just transcribed. This is due in part to the speech engine's limitations in terms of understanding complex commands.

The practice brief also points to the limited capability to understand complex commands on the server as another issue with back-end SR. For example, "period," "new paragraph," "new line," and "comma" would all be recognized, but template fields would not. Thus, "a document with no punctuation, no formatting, and 90% to 95% accuracy requires extensive editing."

Meeting in the Middle
The key to deploying an SR strategy that hits the mark on both quality and physician satisfaction is to take into consideration workflows and weighing the pros and cons of each form against the human costs associated with their use. For example, Coleman points out that, "Ultimately, front-end is less expensive than back-end because anytime you use fewer people, you save money. But a doctor's time is more expensive than anyone else's in the facility."

Based on that logic, it would seem that back-end SR may be the more cost-effective approach to automating transcription. However, Coleman says Nuance clients have found that front-end SR actually gives clinicians more time, which is why his advice to a facility considering adding or expanding SR technology is that "they probably want both. They should start with front-end, where physicians have to type their progress notes for themselves. Very few facilities have transcriptionists dedicated to progress notes. They are high volume and short, so they are very expensive to transcribe. [Front-end SR] makes it faster and better for clinicians to create these daily notes."

From there, gradually expand front-end SR to documentation, such as consult notes or discharge summaries, requiring the fastest TAT. Once physicians are acclimated to front-end, introduce back-end SR to take over the longer, more tedious documentation such as procedure notes.

"Those are very lengthy notes, very much the same structure, so they are the kind of thing that doctors are happy to dictate and have someone else type," Coleman says. "Ultimately, the quality of the note is the responsibility of the clinician who signs it. … So the real question is: Which one is more likely to have a clinician review the note? We think, without hard evidence, that front-end is more conducive to clinician review."

He also points to the use of macros and templates to simplify dictation and the use of cloud-based systems to enable dictation and review from anywhere at any time as additional options to increase both quality and satisfaction.

From the M*Modal perspective, the best results will come from combining front-end SR with clinical documentation improvement into a single process. For example, by automating the identification of the most common documentation deficiencies and specificity queries in notes created via speech, typing, and/or templates, SR can provide the platform to improve the capture of complexity, acuity, and severity levels in real time and within the documentation workflow before the note is even saved in the EHR.

"Quality is more than just making words appear verbatim in a document," Ruff says, noting that M*Modal's approach is to ensure that the record is complete and accurate from the outset by enabling clarification at the earliest possible stage. "We have combined documentation creation with front-end speech recognition and CDI into one physician-friendly workflow for higher quality, efficiency, satisfaction, and sustainability. This elevates front-end SR into a closed-loop documentation system that ensures physicians create higher-quality documentation right from the start, thereby minimizing the disruptive, delaying, and costly physician retrospective query process."

For Sowards, back-end SR is better positioned to improve both quality and satisfaction because it includes the third-party editing step. She says MTs are able to identify and correct errors and fill information gaps that would otherwise slip through to become part of a patient's record.

"There is a huge value to what the transcriptionist does," Sowards says. "That is their primary job—to make sure the accuracy is there, the note is uniform, and it makes sense. If there is truly an issue with the documentation, or if the patient doesn't make the note, the transcriptionist will catch those things. If the physician does it with front-end and is free-texting, it could potentially put the information in the wrong patient's chart or on the wrong visit. At least with back-end, there are two sets of eyes going forward. … Without that second set of eyes, there is a great risk of medical error."

— Elizabeth S. Roop is a Tampa, Florida-based freelance writer specializing in health care and HIT.


Few dispute the importance of limiting workflow disruptions as being a central strategy for boosting clinician adoption of and satisfaction with speech recognition (SR) technology. But what if workflow integration gets taken up a notch by serving SR to clinicians formatted based on where they are and how they need to access the technology at that point?

Situational awareness technology does just that by providing SR application options for each clinician with the tap of a badge based on where they are in the facility. If they sign into a system located in an area where dictation is commonly undertaken, the technology will serve up the clinician's complete desktop with the SR application already launched and logged in with all preferred settings in place. If the clinician happens to be in a patient's room, that desktop will likely feature just the SR icon that the physician can click to be taken to the application.

"Doctors can walk up to any device, identify themselves, and get a customized desktop," says Aventura CEO John Gobron. "We are big advocates of documenting immediately. The more time that goes by, the more you forget, the more likely you are to make a mistake. There is a big push to get SR closer to patient encounters so physicians can do documentation right away."

The solution, Aventura for Speech Recognition, simultaneously authenticates physicians and launches the facility's speech application along with the rest of their desktop. It also automates and accelerates the log-out process, eliminating the risk of losing data if a system is shut down prematurely.

"Often, profiles take a very long time to load. We can speed that up from minutes to seconds," Gobron says, adding that the technology also will automatically log clinicians out of the system so they will be able to access their same profile, configured just the way they like it, from another computer or device. "From a time perspective, we generally see anywhere from a 15- to a 51-second improvement on a per-encounter basis. If you're saving a minute every time you use a machine, that's significant. It's about a 40% time savings, on average."