April 25, 2005
The
Case for Speech Recognition
By Matt Revis
For The Record
Vol. 17 No. 9 P. 20
A sturdy hardware platform, a comprehensive pilot program, and professional training can help ensure a tidy return on investment.
Traditionally viewed as simply a means of dictating text into a personal computer, today’s speech-recognition software can play a far more significant role in the healthcare environment. In addition to pure dictation, speech-recognition software can be used to manage e-mail, streamline repetitive tasks on the PC, reduce transcription and charting costs, speed up information turnaround, and protect employees from repetitive stress injuries (RSIs).
The software can be integrated with most electronic medical record (EMR) applications to make those programs more effective and easier to use. Rapid hardware advancements and improvements in the technology itself have increased its utility, accuracy, speed, and ease of use. This has brought the cost of ownership to an affordable level for any size medical office or clinic, medical department within a healthcare organization, and even entire hospitals. When properly implemented, speech-recognition software can increase productivity for every employee who works with a computer.
Like any technology, the deployment of a speech-recognition program should be carefully planned so as to achieve the full benefit of the software and maximize the return on investment. This article provides an overview of the basics of how the software works, what medical offices or departments can do with speech recognition, examples of savings, and recommendations for implementation.
Why Use Speech-Recognition Software?
The reason is simple—most people can speak much faster than
they can type. A relatively fast typist who can type 50 net words
per minute (1) can produce a 300-word e-mail in six minutes. Using
speech-recognition software, a person dictating 140 to 160 words
per minute without any errors can produce the same 300-word e-mail
in roughly two minutes. This does not include the additional time
the person can save by using voice commands to open the e-mail program,
look up an e-mail address from their contact management software
programs, and send the e-mail by voice.
How Does Speech-Recognition Software
Work?
Speech-recognition software uses the human voice as the main communication
mechanism between the user and the computer. While relatively simple
to use, speech-recognition software is sophisticated technology
that uses “language modeling” to recognize and differentiate
among the millions of human utterances that make up any language.(2)
The software enables users to input text and data into virtually any Microsoft Windows-based application by voice, as well as to navigate the computer desktop with little or no use of their hands. Users speak naturally into a noise-canceling microphone connected to the computer.(3) The software “recognizes” the spoken words, converts them into text, and displays them on the screen for review.
Most speech-recognition programs also allow users to speak a standard command that prompts the computer to perform an action. For example, the user says, “Start WordPerfect.” The more advanced speech-recognition programs also enable users to create customized commands (macros), such as “Send an e-mail to Doug Z,” which will open an e-mail addressed to Doug Z.
Configuring the software during set-up is referred to as “enrollment.” After installing the program, each user must read aloud from a choice of prepared texts for approximately five minutes. Based on the dictation the application captures, the software analyzes how the user pronounces each word and stores the data to prepare a unique user profile for that individual.
As an individual uses the software and corrects recognition errors, the software becomes increasingly accurate by learning his or her particular speaking style. Most medical recognition programs enable users to add new words or customize the vocabulary for their particular practice or specialty. Using specialty vocabularies can improve accuracy even further. Some speech-recognition software programs include a medical vocabulary—incorporating diseases, medications, procedures, and acronyms in addition to the standard business vocabulary—and can automatically recognize and format prescriptions and patient encounters. For certain programs, specialty medical vocabularies can also be created in-house or purchased from third-party sources.
How is Speech Recognition Used
to Replace Traditional Transcription?
There are many different ways to implement a speech-recognition
solution. Most people choose to have individuals dictate directly
into their own PC and view the transcription as it occurs to correct
any errors. Another method allows users to dictate into a handheld
digital recorder for the user or an assistant to download onto a
PC. Instead of transcribing from scratch, the assistant will download
the audio file, listen to the recorded dictation while reading the
text on screen, and make corrections or edits as necessary.
Speech Recognition Uses
Many different types of healthcare workers can benefit from using
speech recognition. Individual uses for speech recognition can vary
for each employee depending on their responsibilities, workflow,
preferences, and other applications they use as part of their daily
routine. Today, speech recognition is successfully used by a wide
array of healthcare professionals, including doctors, nurses, physician
assistants, pharmacists, administrators, and transcriptionists.
Dictation is the most versatile and widespread use for speech-recognition software. Some individuals can’t or prefer not to type, either because they are untrained as typists, have a disability, or wish to prevent RSIs. Many practices have decreased the number of support staff and require physicians to generate their own records. Even doctors who typically dictate documents for others to transcribe may use speech recognition occasionally, such as when they need to produce a document on the spot or after hours or when they are responding to e-mail.
Doctors who wish to maintain their traditional workflow can dictate into a handheld recorder (4) or save their recorded dictation (5) with their documents for someone else to transcribe or correct at a later time. This can substantially reduce the turnaround time over traditional transcription. If transcription is produced in-house, using speech-recognition software frees up support staff for more productive tasks. If transcription is outsourced for correction, it can significantly reduce an organization’s overhead costs.
Navigate the Windows Desktop
by Voice
Speech-recognition software enables users to “command and
control” the computer desktop simply by using their voice.
Virtually any menu item or dialog box can be controlled for hands-free
operation. Users can edit and format their work, launch applications
and open files, cut and paste, and insert standard blocks of text
or even their scanned signature.
Create, Manage, and Send e-Mail
Managing e-mail takes up an increasing amount of everyone’s
day. Speech-recognition software can be customized so users can
create, navigate, respond, and send e-mail, all by voice, using
their preferred e-mail program. In addition, some speech-recognition
programs contain text-to-speech technology that allows users to
have their e-mail documents read aloud, which enables them to complete
other tasks while reviewing their e-mail.
Mastering the Mundane
Repetitive tasks, such as data entry or form filling, can be accelerated
using speech. In many cases, users who are unfamiliar with complex
software programs are more comfortable “telling” the
computer what to do than trying to master the interface. Macros
can be created to enable users to go from field to field by voice,
or to perform a sequence of keystrokes or mouse movements. The software
can even be configured so a patient’s EMR can be created and
edited using only voice commands.
Create a Paperless Office
Many practices seek to convert all their paper documents into electronic
files to facilitate a secure archive and provide remote access to
staff or patients. Most Windows-based applications can be navigated
by voice using speech-recognition software. The software can help
facilitate the move to a paperless office by making it easier for
anyone to create, format, dictate into, search, and manage electronic
documents by voice.
Increase Productivity Outside
the Office
Healthcare professionals can increase their productivity during
travel time or whenever they are away from the office by dictating
into a portable handheld recorder for transcription later. In addition,
some software programs enable users to easily export their user
file via the network or portable storage device for use on another
computer or laptop so they can use speech recognition anywhere—at
the office, at home, or even on the road.
Work on the Web by Voice
Speech-recognition software enables users to search the Web, access
information, and navigate Web pages by speaking URLs and links.
EMR Applications
Many EMR applications can be more effective and easy to use when
deployed in conjunction with a speech-recognition solution. Searches,
queries, and form filling are all faster to perform by voice than
using a keyboard. Charting, prescription writing, aftercare instructions,
order entry, database searches, document assembly/automation, and
patient record management software programs are all highly conducive
to control by speech. Tasks such as text and data entry can be completed
by voice in most of the programs without any customization. Other
functions can easily be performed using macros or by speech-enabling
the application using a software development kit.
Avoid RSIs
Musculoskeletal disorders (MSDs), including RSIs, are the single
largest job-related injury in the United States. According to the
Occupational Safety and Health Administration (OSHA), 1.8 million
U.S. workers experience work-related disorders annually.(6)
RSIs, which are often incurred by employees working at computers, are the most common MSD. RSIs occur when muscles or tendons are repeatedly overused or forced into an unnatural position. Keyboarding, clicking, and maneuvering the mouse strains and damages muscles and tendons in the fingers, hands, wrists, and arms.
The widespread use of computers in the workplace has contributed to the ubiquity of RSI pain and discomfort. OSHA has identified repetition, such as using a keyboard and/or mouse steadily for more than four hours daily, as a risk factor that could cause an RSI or MSD. “Intensive computer use accounts for a significant number of MSDs each year, and occupational computer use is growing,” according to OSHA reports.(7)
While most RSI sufferers are able to find appropriate treatment and return to their positions, some become permanently disabled and are never able to use their hands to operate a computer again. Workers with severe MSDs often face permanent disability that prevents them from returning to their jobs.
Speech-recognition software can minimize or eliminate keyboarding and mouse movements that damage and strain muscles, tendons, and nerves due to excessive repetition. By giving employees with intensive computer use access to speech-recognition software, you can prevent an injury before problems arise or help employees return to work sooner, reducing workers’ compensation, medical, and replacement labor costs. A recent study on RSIs in the workplace highlights the average cost of this type of injury at $20,000 per affected employee.
Assisting with ADA Compliance
Strategies
Title I of the Americans with Disabilities Act (ADA) of 1990 prohibits
employers from discriminating against qualified individuals with
disabilities. The workforce includes many qualified individuals
with disabilities who can productively use computers when equipped
with speech-recognition software and supporting hardware and software.
Hiring and retaining qualified workers with disabilities is not
only a smart employment practice for most employers, it’s
the law.
Since speech-recognition software can help employers hire and maintain qualified workers with RSIs and other disabilities, this technology plays an important role in employers’ ADA compliance strategies.(8)
Return on Investment
Speech-recognition software can help healthcare organizations save
a significant amount of money. The benefits can be realized in a
provider as small as a solo practitioner’s office all the
way to a hospital with several hundred doctors and nurses on staff.
Typically, a single doctor or nurse who utilizes an outside transcription service spends between $10,000 and $30,000 per year digitizing dictation depending on the individual’s workload. For example, a private practice doctor in San Diego replaced outsourced transcription with a voice-recognition solution and saved more than $10,000 per year by eliminating the need for transcription. In addition, he now has time to see more patients each day because he completes the paperwork for each patient during their visit.
The savings potential in larger organizations can be tremendous. A large medical group in Seattle saved $90,000 the first year it deployed speech recognition and $240,000 the next year as it rolled out the solution to all its doctors and eliminated the need for an in-house transcription staff.
Basics of Implementing a Speech-Recognition
Solution
Successful implementation of a speech-recognition software program
requires careful attention to hardware, user training, and customization.
Some healthcare organizations manage their own speech recognition
installation, customization, and training, but most prefer to outsource
this work to the software manufacturer, a system integrator, or
a speech recognition value-added reseller (VAR).
Hardware Recommendations
Most organizations develop a standard hardware platform for speech
recognition users, with alternative options for employees who use
speech recognition on a laptop, dictate into a handheld digital
recorder, or have special needs. System requirements for speech-recognition
software vary by software manufacturer. Minimum needs will also
vary by the type and number of applications that users deploy. Most
speech-recognition programs run on PC systems, although some Macintosh-based
products are available.
Although speech-recognition programs will automatically adjust to the processor and memory of your computer to provide the best combination of accuracy and speed possible, most users will be happier with systems that exceed the software manufacturer’s minimum requirements. Speech-recognition software is processor-intensive, and in general, the faster the processor, the better the performance. Users who wish to have multiple applications running at the same time will also benefit from having more RAM on their system than the minimum.
A computer’s sound card is another factor that can affect performance. Speech-recognition programs require a sound card that will accurately process the electrical charges that your voice creates when you speak into the microphone. Static or electrical interference will make it difficult or impossible to achieve good speech recognition accuracy. Because of this, speech programs require a high-quality 16-bit sound card. Check with the software manufacturer to verify which sound cards are certified to work with the program.
The software performance can also be affected by the quality of the microphone. Speech recognition requires a high-quality, high-level speech signal. Noise-canceling microphones help block out high ambient noise levels. Most speech-recognition programs are sold with a high-quality, noise-canceling headset microphone that is specifically tuned to the software. Users who do not like wearing a headset may prefer an array microphone; others may opt for a wireless headset. Combined dictation/telephone headsets are also available. Most laptop users achieve high performance with a regular headset microphone, but users who are unable to achieve satisfactory sound quality from their laptop’s built-in sound hardware may wish to use a USB (universal serial bus) microphone that processes their voice signal before sending it to the computer. Check with the software manufacturer to verify which microphones are certified to work with its program.
User Expectations and Training
Setting realistic expectations has a critical impact on the success
or failure of a speech-recognition program. Although the software
itself is easy to install and operate, users who are not accustomed
to dictating their thoughts may need practice. Most physicians who
are familiar with dictation will find it easy to adopt speech-recognition
software. However, they may be used to mumbling or garbling words
and expecting the transcriptionist to interpret what they are saying.
The quality of the “human sound signal” is just as important
as the sound card’s quality.
Although users can begin dictating and using the software after completing their initial five-minute enrollment session, most people increase their productivity when they receive training. Training speeds the learning curve, instills confidence in users, reduces support costs, promotes the success of a pilot program, and maximizes return on investment.
Program Customization
Users who are dictating documents just for others to transcribe
and correct may not need program customization, but virtually everyone
can benefit from customizing the product to complete routine tasks
faster.
Customization may be as simple as the creation of a macro that inserts your name and title at the end of a letter when you say “my signature” or as complex as a macro that executes a series of keyboard commands and mouse strokes with a spoken command. Macro creation tools are typically included in high-end speech-recognition software systems. Although simple macros are easy for users to create, in most cases firms will achieve better results if an IT (information technology) staff member or a speech recognition consultant works with each user to analyze their workflow and customize the program to their needs.
Creating a custom vocabulary including patient, staff, and other physicians’ names will increase accuracy. Many speech-recognition programs permit custom vocabularies and macros to be exported and shared by multiple users, which decreases the time and cost associated with customization. Individual users can increase their accuracy by running a feature contained in most speech programs that analyzes the user’s written documents to learn their writing style and the words they use most often.
Conducting a Pilot
The majority of healthcare organizations finds it valuable to conduct
an on-site evaluation with a small number of users before deploying
a full-scale speech-recognition program. The vendor or a VAR can
help set up a pilot, but it is important that you determine your
own criteria for evaluating productivity and participant satisfaction
before the pilot begins.
For best results, select four to eight computer-savvy employees who want to use speech recognition and are likely to have the time to use the software on a daily basis during the pilot period. A typical pilot, from initial assessment through final evaluation, lasts one to three months. Before the pilot begins, someone from IT or the training department, the vendor, consultant, or VAR should sit down with each participant to analyze his or her daily routine. By doing so, custom vocabularies and macros can be developed to enhance productivity. After the software has been customized for each participant’s needs, group or one-on-one training should be provided.
Conclusion
A growing number of large healthcare organizations, hospitals, clinics,
and solo medical practices have adopted speech-recognition software
programs to increase productivity, reduce costs, and protect against
RSIs. Although implementing a speech-recognition program requires
careful planning, the cost and time savings can be substantial.
— Matt Revis is the senior product marketing manager for dictation products at ScanSoft. He has an MBA from Columbia Business School and has been working in speech technology marketing for five years.
Resources
1. Net words per minute are determined by measuring a person’s
average gross speed in words per minute and subtracting the number
of errors made.
2. How do speech-recognition software programs understand speech?
Speech-recognition software programs are based on statistical probability. The software analyzes an incoming stream of sounds and interprets those sounds as commands and dictation. This process of interpretation is called speech recognition, and its success is measured by the percentage of correct interpretations or recognition accuracy.
The software relies on three sources of information to achieve high recognition accuracy:
• Acoustic model — a mathematical model of the sound patterns used by the speaker’s language.
• Vocabulary — a list of words the program can recognize. Each word in the vocabulary has a text representation and pronunciation.
• Language model — statistical information associated with a vocabulary that describes the likelihood of words and sequences of words occurring in the user’s speech.
When you create and train a user profile, you start with a standard set of models and then customize them for the way you speak (acoustic model) and the way you use words (vocabulary and associated language model). The software employs your customized user files to determine the words you spoke.
3. The quality and type of noise-canceling microphone is a critical success factor in implementing speech recognition.
4. The handheld recorder is typically a digital recorder. Not all recorders work with all speech-recognition software programs. Check with the software manufacturer to confirm whether a recorder is approved for use with their product.
5. Some speech-recognition programs enable users to save their recorded dictation with their text file so they or a third party can correct or edit the file while listening to or periodically checking the original dictation. Check with the software manufacturer to confirm whether this feature is available.
6. OSHA Fact Sheet. Ergonomics By the Numbers.
7. OSHA Ergonomics Program. Federal Register. 2000;65(220):68343.
8. The information contained in this article does not constitute legal advice. If you have any questions regarding the Americans with Disabilities Act or any other law, you should contact a qualified attorney.