maemo.org - Talk - View Single Post - Poll - do you want voice dialing on the N900

Wikiwide	2010-11-03 , 02:45
Posts: 2,000 \| Thanked: 3,345 times \| Joined on Jun 2010 @ N900: Battery low. N950: torx 4 re-used once and fine; SIM port torn apart	#155

Originally Posted by WereCatf

Indeed, it could be trained. But then there is the problem that you'd need a lot of training to reach 80% accuracy or more, and at that point it'd start using quite a lot of memory.

For example, there is this Dragon Naturally Speaking application for PCs. It can be used to dictate free speech to text in for example Word, but even with training it never reaches 100% accuracy. After enough training it reaches sufficient accuracy for most people, I suppose, but it starts taking several hundreds of megabytes of memory.

As such I doubt it would be feasible on a device as limited as N900. One way to go around the memory and performance hits would be to do the recognition on a server and just stream the microphone input there, but that'd create some lag between the input and output and it still probably wouldn't be feasible over 3G.

Quick reply...
No server, please!

I hope there is a way during training of the program to store the training not as high-quality audio, but as light-weight patterns. Repeat the same word ten/twenty times with different intonations, and let it store the most basic acoustic pattern. And when it sees that a word pronounced by you fits several patterns, let it ask for additional samples of these words to make the patterns more exact and not to stumble again.
And, several hundreds of megabytes isn't a lot for device with 32 GB, though I would really prefer to use less space.

Ideally, there would be one pattern for each letter, and nothing else.

Esperanto has exactly one sound for exactly one letter, if I'm not mistaken.

Esperanto speech recognition might be reasonably more accurate than English even before training, with default patterns for each letter. Though I haven't seen an Esperanto-based simple speech-recognition software yet.

And it would solve one more problem: the program wouldn't react to human conversation unless people would speak Esperanto, and in this case people would most likely remember that the computer speaks Esperanto, too.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Interesting links here:

http://www.freepatentsonline.com/EP1217609.html
http://www.freepatentsonline.com/y2002/0198712.html
http://www.freepatentsonline.com/y2002/0198715.html
http://www.hpl.hp.com/techreports/2001/HPL-2001-182.pdf
HEWLETT PACKARD, year 2001-2002
http://www.bartneck.de/publications/...jRoMan2009.pdf
http://www.bartneck.de/publications/...ila/index.html
http://www.bartneck.de/publications/...aluationROILA/

It seems that Hewlett Packard's CPL turned out to be difficult for humans to learn...

So now from Netherlands yet another artificial language comes: trying to balance machine recognition and human learning.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I personally would prefer Esperanto: it has better vocabulary and is designed by humans for humans.

Let ROILA be used for machine-to-machine interaction.

Quote & Reply |