Listen to me Write!

Home » Posts tagged 'speech recognition'

Tag Archives: speech recognition

A Small Contribution for Nonexpert Speech Recognition Users

Sorry that I have not been posting any updates. I have been having a horrible time due to a flare-up of my RSI Injury and the subsequent medication and adjustment. At present I’m on indeterminate leave from my University studies and I just don’t know when I will be able to continue them. The combination of pain and sedation really doesn’t give me much chance of study and research.

I have just found a very interesting post in the Forums of SpeechComputing.com

Let me share the full post from this contributor as he has quite a story to tell which in many respects is similar to my situation and where I’m headed…

A Small Contribution for Nonexpert Speech Recognition Users

via A Small Contribution for Nonexpert Speech Recognition Users | Speech Computing73384.

Below you will find my personal survival guide for navigating personal computers through speech recognition. I have compiled it over years as some sort of personal blog, taking note of useful software and tricks as they came along.

I am posting it in the hope that other folks who are suddenly forced to abandon using keyboard and mouse will realize that there is hope, and so others may benefit from little tricks that took me forever to figure out. A lot of these are available elsewhere online, but I thought it might be useful to collect them together so that they are easily available for new or less experienced users. Some of these I came up myself, although I would not be surprised this others before me have also documented them.

Wherever possible I have tried to link to the original source of the helpful material. I am grateful to the speech recognition user community for their active and useful presence online. The various topics are presented in order of importance in my opinion. I will not be able to maintain and update this regularly, but I do plan to continue collecting interesting hints and tips, and if these additions reach critical mass I will try my best to repost.

My personal story is that as a result of round-the-clock coding since a very young age I am no longer able to use my hands to control a keyboard, mouse, iPhone/iPad, etc. So, I am forced to rely on speech recognition exclusively. The positive message that I would like to convey is that if you invest in conquering the admittedly very steep learning curve, you will be able to do the vast majority of the things that you need on a desktop, and even be faster at some of them. A top-of-the-line machine with all the necessary software should cost you less than $3000 and if your employer will not cover this cost you might be able to get financial assistance elsewhere.

I have no commercial interests of any kind in any of the programs or suggestions mentioned below.

You will note that some of the useful tricks below rely on free third-party software. To the extent that you can, please donate to the authors of the software.

Good luck to everyone!

https://docs.google.com/document/d/11igSgg23TKOunwB7MilWdyUmazJlrbgT6cdq9JwjS2s/pub385

Update: I did ask the author for permission to post this here and his response was;

that’s no problem at all. I hope you find it helpful, and good luck.
Submitted by DragonSpeechRookie on Tue, 03/19/2013 – 02:27.

Zotero Forums – accessibility: voice-recognition

In trying to get Zotero working within my Word 2003 I’ve been researching what others have said and done about it. Here is a long, and at times rambling, thread relating to Dragon NaturallySpeaking and Zotero. Note that this thread originally started in 2009 so it is discussing older versions of both software programs.

Zotero Forums – accessibility: voice-recognition124.

Natlink – what is it to Dragon NaturallySpeaking?

So the question I had was just what is this Natlink? I have installed it and it seems to work well but just what is it?

I found this amongst the messages on SpeechComputing.com:

NatLink is an platform built on top of DNS that allows writing extremely powerful voice commands (more powerful than what you can do with Advanced Scripting) by writing entire Python programs. Pretty much unusable directly unless you’re a programmer.

Building on top of NatLink are:

Vocola 2: implements a very simple and concise language for writing voice commands that handles 95% of the commands you might want.

Unimacro: a series of ready to use powerful grammars for things like switching tasks, opening folders, and editing lines.

Dragonfly: a higher level, more object-oriented interface for NatLink. Somewhat usable by nonprogrammers using cut-and-paste programming.

Both Vocola and dragonfly can be used with Windows speech recognition as well. There is a somewhat dated comparison between Vocola 2 and Unimacro at

http://qh.antenna.nl/unimacro/features/unimacroand…

that you may find useful. Note that you can call Unimacro actions from Vocola 2 if you have both installed.

via Natlink, Python, etc. please outline the features of these addons-they sound useful | Speech Computing.

Copying Dragon to another computer

Last night I copied over the User Profile from my home PC to this laptop. Then installed Dragon here. This morning I ran Dragon and tried to install my Profile but it spat the Dummy and wouldn’t recognise it. Back to the drawing board and see if I can find another way to get this copied system working the same as my home PC.

Always such fun working with programs isn’t it. Sigh.

New blog on the future of speech applications | Speech Computing

New blog on the future of speech applications | Speech Computing.

New Blog on the Future of Speech Applications

AVIOS announces a new blog devoted to the future of applications using speech technologies.

The Advanced Voice Input Output Society (AVIOS) announces a new blog devoted to the future of applications using speech technologies. Each week a new article will be written and posted by a speech technology expert. The public is invited to submit questions, comments, observations, and additional predictions on each weekly topic. The blog can be viewed by setting your browser to http://www.avios.org.

What will the world of speech technology be in the next five years? Weekly articles will include:

· Accessing In-depth information by voice: City Companion by Deborah Dahl

· Multimodal User Interfaces by Matt Yuschik, CitiBank CTO R&D

· Speech-enabled owner’s manual for the car by Tom Schalk

· Language learning by Bill Scholz, President, AVIOS

· “Do-it-yourself” apps by James A. Larson, Co-Program Chair, SpeechTEK

· Stress detection using speech analysis by Nava A. Shaked, CEO, BBT Ltd.

· Speech analytical tools by Loren Wilde, CTO, Wilder Communications, Inc.

· The dream of a personal assistant by Roberto Pieracine, Director and CEO, The International Computer Science Institute (ICSI)

· The future of spoken language interaction with computers by Alexander Rudnicky, Carnegie Mellon University

· Making speech-based interaction truly natural by Sara H. Basson, Worldwide Program Director – Services Innovation Lab IBM – TJ Watson Research Center

Visit the blog each week to read about and participate in discussions about future applications of speech technology.

The Applied Voice Input Output Society (AVIOS) is a not-for-profit private foundation founded in 1981. AVIOS provides a forum for promoting practical applications of advanced speech technology, such as speech recognition, text-to-speech synthesis, and speaker authentication, along with supporting technologies such as natural language interpretation and knowledge representation.

Chasing the Tale of the Dragon

The theme of today was very similar to yesterday, plenty of pain, not much of work.

Earlier today I rang the Disability office at Curtin University to seek help with Dragon NaturallySpeaking. I have been getting all sorts of bad responses when trying to use Dragon to navigate the Internet through my Firefox browser and I asked if they had somebody that could assist me sort it out. Thankfully there was somebody able to help me, he rang me later that day and we set a time whereby he would work with me to take over my machine (using Team Viewer 8) and correct the default settings of Dragon back to something that was more usable and user-friendly. We worked together for almost an hour and now I can use Dragon inside a browser with a great deal more success.

Certainly the time spent mousing has been cut down considerably and there has been a similar reduction in the frustration rate that I had been going through. He also had me go through a training exercise where I read the text on the screen presented by Dragon so that it can process the way that I speak and improve the accuracy of its responses. I’m still getting some silly answers but most of the time it is picking up what I want to say.

I also managed to get some interesting works from the Elizabeth Branch of the Playford City Library today, they’ve been on hold through the Salisbury library, but because all of the South Australian libraries are linked. Now I was able to arrange to pick them up at the Elizabeth library or officially the Playford City Library.

As I’m doing Assignment 2 on wikis and Wikipedia. I found these to be an interesting reference source:
“Good Faith Collaboration the Culture of Wikipedia” by Joseph Michael Reagle, Jr. ISBN 978-0-262-01447-2
“wiki – Web collaboration” by Anja Ebersbach, Markus Glaser, Richard Heigl, Alexander Warta. ISBN 978-3-540-35150-4

This will give me something to read while I’m on the plane on Sunday heading to Canberra for the linux.conf.au – LCA 2013. I am proud to say I have been selected as a representative under the Regional Development Program which means they’re paying to get me there and to look after me while I’m at the place. There isn’t any way I could have afforded to go 🙂
Monday and Tuesday are what they call Miniconfs with a wide range of speakers and topics ranging from Programming your Arduino to getting government and local authorities to use open source software. Wednesday to Friday will be the main Conference. One of the highlights for me will be at listening to Tim Berners-Lee present the Keynote speech on the Friday morning. For those of you not aware Tim Berners-Lee was the inventor of the World Wide Web you are currently using back in the early 90s.
I will be flying back to Adelaide on Saturday the 2nd.

Dear Reader, if you are attending the conference or you live in Canberra I would love to catch up with you to say G’day. Just drop me a line to set up a meeting place.

Time out for bad behavior

Took today off. Can’t study in pain, been doing too much mousing around. Trying to do research but Dragon not cooperative.

Found how to use the Mousegrid tool. It puts a 9 box grid up and you call the number of the box that has your button you want to click. It then provides a smaller grid to gradually isolate the button. When it is focused on that button you simply say “click”.  How can this be efficient?
Dragon doesn’t want to accept commands as much today, kept calling them text and opening the dictation window GGRRRRRR!

Maybe tomorrow, if the pain has gone I can get some work done. So far behind which just added to the stress which enhances the pain in a vicious circle. Painkillers make me groggy which stops use of the computer. Round and round we go …
Saturday is a big Australia Day for me, my daughter (12) go to community breakfast and Citizenship Ceremony then in the afternoon we will be in the city joining thousands of others in the Multicultural Parade down Adelaide’s King William St.  We’re walking as part of the local Japanese community, I in my red Matsuri Happi coat, she will be in young girl’s kimono or the lighter weight yukata.
Happy Australia Day to all my readers.

Sunday I fly to Canberra to attend Linux Australia conference. Will post more on that later.

OpenSource Speech Recognition

Two of the OpenSource Speech Recognition systems that I am aware of are XVoice and CMU Sphinx.

Xvoice is a project using the old software and codings of IBM’s ViaVoice. I was part of this mailing list for quite a few years but the project is essentially dead as they cannot find a current voice engine to power their software. The last update to their software was in 2007.

CMU Sphinx is a project under the auspices of the Carnegie Mellon University but is still a long way from being considered an adequate speech recognition system.

Linux users have another means of running speech recognition and that is to run Wine which emulates Windows then they can install a version of Dragon NaturallySpeaking. Not really effective but it is a possible alternative.

There are other OpenSource speech recognition projects out there such as the Julias system which works in Japanese. There are other speech models for Julias including English but basically you need to build your own model which reduces the usability of this system to developers and very experienced coders. Julias is the most up to date of the Open Source systems having the latest software released on August 1, 2012.

old IBM voice systems

old IBM voice systems by ilox101
old IBM voice systems, a photo by ilox101 on Flickr.

These were given to me around 1995 by Les Legge, an old friend from the days of the CompuServe Pacific Forum. At the time Les was high up in IBM and also Editor of the Everyday Electronics magazines.
These programs were both fun to work with but recognition rates were poor mostly because of low system resources available at the time.

%d bloggers like this: