Joe Barr

Subscribe to Joe Barr: eMailAlertsEmail Alerts
Get Joe Barr via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Getting the Mandrake version of ViaVoice running on Red Hat 7.3

Are you a Red Hat 7.3 user wanting ViaVoice Dictation? Go rustle-up a copy of Mandrake 8.0 & roll up your sleeves

(LinuxWorld) — After finishing last week's column, I tinkered and toyed with the commercial version of ViaVoice Dictation for Linux for another day or two. I never got it working correctly. I continue to work on it with help from IBM. At some point I hope to report it can be made to work on current distributions. For now, at least, the better choice is the version bundled with Mandrake's 8.0 PowerPack Edition.

You can still buy Mandrake 8.0, by the way, but only by telephone or fax, and not on the Mandrake Web site. I ordered mine for $35 plus $13 for second-day air. Here is how to get the Mandrake version running on Red Hat 7.3.

  1. Read Volker Kuhlmann's page from start to finish
  2. Uninstall the commercial version of ViaVoice Dictation if present (note the comments about the uninstall on Kuhlmann's page)
  3. From the Mandrake commercial CD No. 2, installation:
    1. IBMJava2-JRE-1.3-2.0
    2. ViaVoice_runtime-3.1-0.0
    3. ViaVoice_runtime_US_LangPack-3.1-0.0
    4. ViaVoice_TTS_rtk-5.1-1.2
    5. ViaVoice_Dictation-1.1-0.0
  4. Add JAVA_PATH=/opt/IBMJava2-13/jre to your environment
  5. Apply Volker Kuhlmann's patch
  6. Run vvuser

Now you should be where an intelligently packaged product would be after installation: ready to start using the product. If you're not, go back over the entire process above, making sure each step is completed correctly before moving to the next.

Once you've finished the enrollment process by running vvstartuserguru, you have created a model for ViaVoice Dictation to use in deciphering and transcribing your spoken word. The model is only as accurate and complete as the training session itself, both in the quality of your reading and the word usage in the training material. Make an error in punctuation, and you have just taught the ViaVoice Dictation how to do something wrong.

The software is not the only thing facing a learning curve. You are too, unless it is already natural for you to pause and speak the punctuation marks that context usually provides. Inflection may make the words "First time for you" a question when you're in line at a parachute jump school, but not with ViaVoice Dictation. You need to say "First time for you {pause} Question mark {pause}" to get that same meaning in your text.

There is much more you can say while dictating other than the text you want transcribed and punctuation marks. There are hundreds of commands for moving around in a document or formatting text. So many so that a special window exists (the "What Can I Say" window) that itemizes them for you, in context or all at once. Your choice.

Commands are spoken as discrete rather than continuous speech. This means simply that you need to pause before and after each one. Otherwise ViaVoice Dictation might mistake your command for text. There are too many commands to itemize here, but I think it is safe to say that if you can do it from a keyboard you can do it from the microphone: cut, find, correct, paste, insert, delete, capitalize, or spell out a word. You can also move the cursor to the next or previous word, line, paragraph, or page. Or even turn the microphone off while you talk on the telephone.

The "What Can I Say" function is one of the things that I never got working with the commercial version. That really makes it much more difficult to learn how to use ViaVoice Dictation efficiently.

Accuracy, accuracy, accuracy!

Naturally, you want the accuracy of ViaVoice Dictation to improve over the starting point achieved through the initial training session. An IBM employee told me on the ViaVoice Dictation mailing list that repeating the training session — even if you select a different text to read — is not necessarily the best way to improve ViaVoice Dictation's accuracy.

Instead, he suggested that correcting errors in dictation is the best way to improve performance. He pointed out that Correction always updates the language model, which helps predict the probability of one word following another, but sometimes it updates the acoustic model as well. If you are prompted to record a new word or phrase while correcting text, the acoustic model is being updated.

To enter the Correction mode, you can click on Correction from the toolbar or you can simply pause and say Correction. This brings up the Correction popup window you see below. From there, you have several options to help you bring what's been transcribed more in line with what you've said.

Correction Window
Editor’s note: The above image is reduced in size and color palette to allow it to load quickly. Click on the above image to see the original.

Simply put the cursor over the word (or highlight the phrase) that you want to correct. From the toolbar at the bottom of the Correction window, you can format the word by capitalizing, making it all upper case, or all lower case, or spelling it out. A list of alternative possibilities for the selected word/phrase also appears in the Correction window. You can make the correction simply by clicking on the correct choice, if it is present. If not, you can speak the correction.

If all else fails, you can type the word in the Correction window to replace the selected text. This where you might be asked to speak the word so that it can be added to the acoustic model. Once it is added, the chances of it being misinterpreted again are much lower than before.

Yes, correction is time-consuming and — at first at least — slower than whatever methods you've used before. Nevertheless, it works, and it works by improving the model so you won't have to be making the same corrections time and time again. Please note that if you use a specialized vocabulary in one area of your life, it may be worthwhile to create a separate speech model. Transcribing surgical reports takes a whole 'nother vocabulary than does a note to Aunt Nadine.

By default, ViaVoice Dictation saves your documents in a form that contains both the audio and the text. This can eat up a lot of real estate on your drive. Once you are finished with a document, you can save it as text only and get rid of the large overhead required to store the audio. Of course, when you do that, you lose the ability to play it back in your own voice.

Unless and until IBM commits to supporting ViaVoice Dictation for Linux, I cannot recommend it. It could be a very nice product offering for Linux users if IBM brought it up to snuff and on parity with IBM's new Windows and Mac OS/X versions, but my gut feeling is IBM will simply drop the Linux version. I hope I'm wrong.

If you are a committed Linux user running an RPM-based distribution with a need for ViaVoice Dictation, I recommend that you beg, borrow, or steal a copy of the Mandrake PowerPack Edition 8.0 instead of trying the IBM commercial offering.

More Stories By Joe Barr

Joe Barr is a freelance journalist covering Linux, open source and network security. His 'Version Control' column has been a regular feature of Linux.SYS-CON.com since its inception. As far as we know, he is the only living journalist whose works have appeared both in phrack, the legendary underground zine, and IBM Personal Systems Magazine.

Comments (1) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

Most Recent Comments
gboko 02/19/05 03:41:51 AM EST