Home  |   Subscribe  |   Resources  |   Reprints  |   Writers' Guidelines

April 25, 2005

The Case for Speech Recognition
By Matt Revis
For The Record

Vol. 17 No. 9 P. 20

A sturdy hardware platform, a comprehensive pilot program, and professional training can help ensure a tidy return on investment.

Traditionally viewed as simply a means of dictating text into a personal computer, today’s speech-recognition software can play a far more significant role in the healthcare environment. In addition to pure dictation, speech-recognition software can be used to manage e-mail, streamline repetitive tasks on the PC, reduce transcription and charting costs, speed up information turnaround, and protect employees from repetitive stress injuries (RSIs).

The software can be integrated with most electronic medical record (EMR) applications to make those programs more effective and easier to use. Rapid hardware advancements and improvements in the technology itself have increased its utility, accuracy, speed, and ease of use. This has brought the cost of ownership to an affordable level for any size medical office or clinic, medical department within a healthcare organization, and even entire hospitals. When properly implemented, speech-recognition software can increase productivity for every employee who works with a computer.

Like any technology, the deployment of a speech-recognition program should be carefully planned so as to achieve the full benefit of the software and maximize the return on investment. This article provides an overview of the basics of how the software works, what medical offices or departments can do with speech recognition, examples of savings, and recommendations for implementation.

Why Use Speech-Recognition Software?
The reason is simple—most people can speak much faster than they can type. A relatively fast typist who can type 50 net words per minute (1) can produce a 300-word e-mail in six minutes. Using speech-recognition software, a person dictating 140 to 160 words per minute without any errors can produce the same 300-word e-mail in roughly two minutes. This does not include the additional time the person can save by using voice commands to open the e-mail program, look up an e-mail address from their contact management software programs, and send the e-mail by voice.

How Does Speech-Recognition Software Work?
Speech-recognition software uses the human voice as the main communication mechanism between the user and the computer. While relatively simple to use, speech-recognition software is sophisticated technology that uses “language modeling” to recognize and differentiate among the millions of human utterances that make up any language.(2)

The software enables users to input text and data into virtually any Microsoft Windows-based application by voice, as well as to navigate the computer desktop with little or no use of their hands. Users speak naturally into a noise-canceling microphone connected to the computer.(3) The software “recognizes” the spoken words, converts them into text, and displays them on the screen for review.

Most speech-recognition programs also allow users to speak a standard command that prompts the computer to perform an action. For example, the user says, “Start WordPerfect.” The more advanced speech-recognition programs also enable users to create customized commands (macros), such as “Send an e-mail to Doug Z,” which will open an e-mail addressed to Doug Z.

Configuring the software during set-up is referred to as “enrollment.” After installing the program, each user must read aloud from a choice of prepared texts for approximately five minutes. Based on the dictation the application captures, the software analyzes how the user pronounces each word and stores the data to prepare a unique user profile for that individual.

As an individual uses the software and corrects recognition errors, the software becomes increasingly accurate by learning his or her particular speaking style. Most medical recognition programs enable users to add new words or customize the vocabulary for their particular practice or specialty. Using specialty vocabularies can improve accuracy even further. Some speech-recognition software programs include a medical vocabulary—incorporating diseases, medications, procedures, and acronyms in addition to the standard business vocabulary—and can automatically recognize and format prescriptions and patient encounters. For certain programs, specialty medical vocabularies can also be created in-house or purchased from third-party sources.

How is Speech Recognition Used to Replace Traditional Transcription?
There are many different ways to implement a speech-recognition solution. Most people choose to have individuals dictate directly into their own PC and view the transcription as it occurs to correct any errors. Another method allows users to dictate into a handheld digital recorder for the user or an assistant to download onto a PC. Instead of transcribing from scratch, the assistant will download the audio file, listen to the recorded dictation while reading the text on screen, and make corrections or edits as necessary.

Speech Recognition Uses
Many different types of healthcare workers can benefit from using speech recognition. Individual uses for speech recognition can vary for each employee depending on their responsibilities, workflow, preferences, and other applications they use as part of their daily routine. Today, speech recognition is successfully used by a wide array of healthcare professionals, including doctors, nurses, physician assistants, pharmacists, administrators, and transcriptionists.

Dictation is the most versatile and widespread use for speech-recognition software. Some individuals can’t or prefer not to type, either because they are untrained as typists, have a disability, or wish to prevent RSIs. Many practices have decreased the number of support staff and require physicians to generate their own records. Even doctors who typically dictate documents for others to transcribe may use speech recognition occasionally, such as when they need to produce a document on the spot or after hours or when they are responding to e-mail.

Doctors who wish to maintain their traditional workflow can dictate into a handheld recorder (4) or save their recorded dictation (5) with their documents for someone else to transcribe or correct at a later time. This can substantially reduce the turnaround time over traditional transcription. If transcription is produced in-house, using speech-recognition software frees up support staff for more productive tasks. If transcription is outsourced for correction, it can significantly reduce an organization’s overhead costs.

Navigate the Windows Desktop by Voice
Speech-recognition software enables users to “command and control” the computer desktop simply by using their voice. Virtually any menu item or dialog box can be controlled for hands-free operation. Users can edit and format their work, launch applications and open files, cut and paste, and insert standard blocks of text or even their scanned signature.

Create, Manage, and Send e-Mail
Managing e-mail takes up an increasing amount of everyone’s day. Speech-recognition software can be customized so users can create, navigate, respond, and send e-mail, all by voice, using their preferred e-mail program. In addition, some speech-recognition programs contain text-to-speech technology that allows users to have their e-mail documents read aloud, which enables them to complete other tasks while reviewing their e-mail.

Mastering the Mundane
Repetitive tasks, such as data entry or form filling, can be accelerated using speech. In many cases, users who are unfamiliar with complex software programs are more comfortable “telling” the computer what to do than trying to master the interface. Macros can be created to enable users to go from field to field by voice, or to perform a sequence of keystrokes or mouse movements. The software can even be configured so a patient’s EMR can be created and edited using only voice commands.

Create a Paperless Office
Many practices seek to convert all their paper documents into electronic files to facilitate a secure archive and provide remote access to staff or patients. Most Windows-based applications can be navigated by voice using speech-recognition software. The software can help facilitate the move to a paperless office by making it easier for anyone to create, format, dictate into, search, and manage electronic documents by voice.

Increase Productivity Outside the Office
Healthcare professionals can increase their productivity during travel time or whenever they are away from the office by dictating into a portable handheld recorder for transcription later. In addition, some software programs enable users to easily export their user file via the network or portable storage device for use on another computer or laptop so they can use speech recognition anywhere—at the office, at home, or even on the road.

Work on the Web by Voice
Speech-recognition software enables users to search the Web, access information, and navigate Web pages by speaking URLs and links.

EMR Applications
Many EMR applications can be more effective and easy to use when deployed in conjunction with a speech-recognition solution. Searches, queries, and form filling are all faster to perform by voice than using a keyboard. Charting, prescription writing, aftercare instructions, order entry, database searches, document assembly/automation, and patient record management software programs are all highly conducive to control by speech. Tasks such as text and data entry can be completed by voice in most of the programs without any customization. Other functions can easily be performed using macros or by speech-enabling the application using a software development kit.

Avoid RSIs
Musculoskeletal disorders (MSDs), including RSIs, are the single largest job-related injury in the United States. According to the Occupational Safety and Health Administration (OSHA), 1.8 million U.S. workers experience work-related disorders annually.(6)

RSIs, which are often incurred by employees working at computers, are the most common MSD. RSIs occur when muscles or tendons are repeatedly overused or forced into an unnatural position. Keyboarding, clicking, and maneuvering the mouse strains and damages muscles and tendons in the fingers, hands, wrists, and arms.

The widespread use of computers in the workplace has contributed to the ubiquity of RSI pain and discomfort. OSHA has identified repetition, such as using a keyboard and/or mouse steadily for more than four hours daily, as a risk factor that could cause an RSI or MSD. “Intensive computer use accounts for a significant number of MSDs each year, and occupational computer use is growing,” according to OSHA reports.(7)

While most RSI sufferers are able to find appropriate treatment and return to their positions, some become permanently disabled and are never able to use their hands to operate a computer again. Workers with severe MSDs often face permanent disability that prevents them from returning to their jobs.

Speech-recognition software can minimize or eliminate keyboarding and mouse movements that damage and strain muscles, tendons, and nerves due to excessive repetition. By giving employees with intensive computer use access to speech-recognition software, you can prevent an injury before problems arise or help employees return to work sooner, reducing workers’ compensation, medical, and replacement labor costs. A recent study on RSIs in the workplace highlights the average cost of this type of injury at $20,000 per affected employee.

Assisting with ADA Compliance Strategies
Title I of the Americans with Disabilities Act (ADA) of 1990 prohibits employers from discriminating against qualified individuals with disabilities. The workforce includes many qualified individuals with disabilities who can productively use computers when equipped with speech-recognition software and supporting hardware and software. Hiring and retaining qualified workers with disabilities is not only a smart employment practice for most employers, it’s the law.

Since speech-recognition software can help employers hire and maintain qualified workers with RSIs and other disabilities, this technology plays an important role in employers’ ADA compliance strategies.(8)

Return on Investment
Speech-recognition software can help healthcare organizations save a significant amount of money. The benefits can be realized in a provider as small as a solo practitioner’s office all the way to a hospital with several hundred doctors and nurses on staff.

Typically, a single doctor or nurse who utilizes an outside transcription service spends between $10,000 and $30,000 per year digitizing dictation depending on the individual’s workload. For example, a private practice doctor in San Diego replaced outsourced transcription with a voice-recognition solution and saved more than $10,000 per year by eliminating the need for transcription. In addition, he now has time to see more patients each day because he completes the paperwork for each patient during their visit.

The savings potential in larger organizations can be tremendous. A large medical group in Seattle saved $90,000 the first year it deployed speech recognition and $240,000 the next year as it rolled out the solution to all its doctors and eliminated the need for an in-house transcription staff.

Basics of Implementing a Speech-Recognition Solution
Successful implementation of a speech-recognition software program requires careful attention to hardware, user training, and customization. Some healthcare organizations manage their own speech recognition installation, customization, and training, but most prefer to outsource this work to the software manufacturer, a system integrator, or a speech recognition value-added reseller (VAR).

Hardware Recommendations
Most organizations develop a standard hardware platform for speech recognition users, with alternative options for employees who use speech recognition on a laptop, dictate into a handheld digital recorder, or have special needs. System requirements for speech-recognition software vary by software manufacturer. Minimum needs will also vary by the type and number of applications that users deploy. Most speech-recognition programs run on PC systems, although some Macintosh-based products are available.

Although speech-recognition programs will automatically adjust to the processor and memory of your computer to provide the best combination of accuracy and speed possible, most users will be happier with systems that exceed the software manufacturer’s minimum requirements. Speech-recognition software is processor-intensive, and in general, the faster the processor, the better the performance. Users who wish to have multiple applications running at the same time will also benefit from having more RAM on their system than the minimum.

A computer’s sound card is another factor that can affect performance. Speech-recognition programs require a sound card that will accurately process the electrical charges that your voice creates when you speak into the microphone. Static or electrical interference will make it difficult or impossible to achieve good speech recognition accuracy. Because of this, speech programs require a high-quality 16-bit sound card. Check with the software manufacturer to verify which sound cards are certified to work with the program.

The software performance can also be affected by the quality of the microphone. Speech recognition requires a high-quality, high-level speech signal. Noise-canceling microphones help block out high ambient noise levels. Most speech-recognition programs are sold with a high-quality, noise-canceling headset microphone that is specifically tuned to the software. Users who do not like wearing a headset may prefer an array microphone; others may opt for a wireless headset. Combined dictation/telephone headsets are also available. Most laptop users achieve high performance with a regular headset microphone, but users who are unable to achieve satisfactory sound quality from their laptop’s built-in sound hardware may wish to use a USB (universal serial bus) microphone that processes their voice signal before sending it to the computer. Check with the software manufacturer to verify which microphones are certified to work with its program.

User Expectations and Training
Setting realistic expectations has a critical impact on the success or failure of a speech-recognition program. Although the software itself is easy to install and operate, users who are not accustomed to dictating their thoughts may need practice. Most physicians who are familiar with dictation will find it easy to adopt speech-recognition software. However, they may be used to mumbling or garbling words and expecting the transcriptionist to interpret what they are saying. The quality of the “human sound signal” is just as important as the sound card’s quality.

Although users can begin dictating and using the software after completing their initial five-minute enrollment session, most people increase their productivity when they receive training. Training speeds the learning curve, instills confidence in users, reduces support costs, promotes the success of a pilot program, and maximizes return on investment.

Program Customization
Users who are dictating documents just for others to transcribe and correct may not need program customization, but virtually everyone can benefit from customizing the product to complete routine tasks faster.

Customization may be as simple as the creation of a macro that inserts your name and title at the end of a letter when you say “my signature” or as complex as a macro that executes a series of keyboard commands and mouse strokes with a spoken command. Macro creation tools are typically included in high-end speech-recognition software systems. Although simple macros are easy for users to create, in most cases firms will achieve better results if an IT (information technology) staff member or a speech recognition consultant works with each user to analyze their workflow and customize the program to their needs.

Creating a custom vocabulary including patient, staff, and other physicians’ names will increase accuracy. Many speech-recognition programs permit custom vocabularies and macros to be exported and shared by multiple users, which decreases the time and cost associated with customization. Individual users can increase their accuracy by running a feature contained in most speech programs that analyzes the user’s written documents to learn their writing style and the words they use most often.

Conducting a Pilot
The majority of healthcare organizations finds it valuable to conduct an on-site evaluation with a small number of users before deploying a full-scale speech-recognition program. The vendor or a VAR can help set up a pilot, but it is important that you determine your own criteria for evaluating productivity and participant satisfaction before the pilot begins.

For best results, select four to eight computer-savvy employees who want to use speech recognition and are likely to have the time to use the software on a daily basis during the pilot period. A typical pilot, from initial assessment through final evaluation, lasts one to three months. Before the pilot begins, someone from IT or the training department, the vendor, consultant, or VAR should sit down with each participant to analyze his or her daily routine. By doing so, custom vocabularies and macros can be developed to enhance productivity. After the software has been customized for each participant’s needs, group or one-on-one training should be provided.

Conclusion
A growing number of large healthcare organizations, hospitals, clinics, and solo medical practices have adopted speech-recognition software programs to increase productivity, reduce costs, and protect against RSIs. Although implementing a speech-recognition program requires careful planning, the cost and time savings can be substantial.

— Matt Revis is the senior product marketing manager for dictation products at ScanSoft. He has an MBA from Columbia Business School and has been working in speech technology marketing for five years.

Resources
1. Net words per minute are determined by measuring a person’s average gross speed in words per minute and subtracting the number of errors made.

2. How do speech-recognition software programs understand speech?

Speech-recognition software programs are based on statistical probability. The software analyzes an incoming stream of sounds and interprets those sounds as commands and dictation. This process of interpretation is called speech recognition, and its success is measured by the percentage of correct interpretations or recognition accuracy.

The software relies on three sources of information to achieve high recognition accuracy:

Acoustic model — a mathematical model of the sound patterns used by the speaker’s language.

Vocabulary — a list of words the program can recognize. Each word in the vocabulary has a text representation and pronunciation.

Language model — statistical information associated with a vocabulary that describes the likelihood of words and sequences of words occurring in the user’s speech.

When you create and train a user profile, you start with a standard set of models and then customize them for the way you speak (acoustic model) and the way you use words (vocabulary and associated language model). The software employs your customized user files to determine the words you spoke.

3. The quality and type of noise-canceling microphone is a critical success factor in implementing speech recognition.

4. The handheld recorder is typically a digital recorder. Not all recorders work with all speech-recognition software programs. Check with the software manufacturer to confirm whether a recorder is approved for use with their product.

5. Some speech-recognition programs enable users to save their recorded dictation with their text file so they or a third party can correct or edit the file while listening to or periodically checking the original dictation. Check with the software manufacturer to confirm whether this feature is available.

6. OSHA Fact Sheet. Ergonomics By the Numbers.

7. OSHA Ergonomics Program. Federal Register. 2000;65(220):68343.

8. The information contained in this article does not constitute legal advice. If you have any questions regarding the Americans with Disabilities Act or any other law, you should contact a qualified attorney.