Reply
Thread Tools
Flandry's Avatar
Posts: 1,559 | Thanked: 1,786 times | Joined on Oct 2009 @ Boston
#21
Originally Posted by dhd View Post
This is a great idea, and I would be willing to serve as a mentor for it. I am the author of PocketSphinx and the GStreamer and Python bindings for it, and was also a main developer on Flite for quite some time.

Perhaps Maemo.org and CMU Sphinx could co-sponsor it? See our GSoc page at: http://cmusphinx.sourceforge.net/wiki/summerofcode2010
That is wonderful news. I have updated the GSoC wiki proposal with you as a mentor and Sphinx as a cosponsor.

I find a new potential use for this every day. Yesterday i was going, going all day and would have liked to have been able to review an article between meeting 4 and 5, a 35 minute bike ride. I already had the webpage opened in MicroB.

With pocket jeeves i would have hit the headset button at a stop light and told jeeves "read web page". He would have read through the titles of the pages opened in MicroB, and i could have indicated which one i wanted. Then he would have read it to me...

BTW i see that one solution proposed in the brainstorm is to reverse engineer the google speech recognition interface. That's an interesting project but it should be updated to clarify that it only works with an active data connection. One of Maemo's strong selling points for many is that it doesn't live in the cloud (google's or otherwise).
__________________

Unofficial PR1.3/Meego 1.1 FAQ

***
Classic example of arbitrary Nokia decision making. Couldn't just fallback to the no brainer of tagging with lat/lon if network isn't accessible, could you Nokia?
MAME: an arcade in your pocket
Accelemymote: make your accelerometer more joy-ful

Last edited by Flandry; 2010-03-08 at 18:07.
 

The Following 2 Users Say Thank You to Flandry For This Useful Post:
Posts: 3 | Thanked: 5 times | Joined on Mar 2010
#22
Originally Posted by Flandry View Post
With pocket jeeves i would have hit the headset button at a stop light and told jeeves "read web page". He would have read through the titles of the pages opened in MicroB, and i could have indicated which one i wanted. Then he would have read it to me...
I should caution you that while this is possible, it's fairly ambitious, and the results may not be as great as you'd like. Reading web pages, in particular, is pretty hard to do, at least in a way that will make sense to the listener.

Nonetheless I think that if we can just get the framework for this built properly, it's something that people can improve over time.
 

The Following User Says Thank You to dhd For This Useful Post:
Posts: 3 | Thanked: 5 times | Joined on Mar 2010
#23
Also, I assume everyone has seen these videos on the N800 already:

http://www.youtube.com/watch?v=DV-WNQFe-LM
http://www.youtube.com/watch?v=OEUeJb6Pwt4

But if not, that should give you some indication of what was possible with the old OMAP2 processors. I don't have an N900 but extrapolating from the BeagleBoard the N900 should be able to do speech recognition about twice as fast.

The real work is in tying into the maemo desktop and building grammars and dictionaries, plus getting the API and UI right...
 

The Following User Says Thank You to dhd For This Useful Post:
Flandry's Avatar
Posts: 1,559 | Thanked: 1,786 times | Joined on Oct 2009 @ Boston
#24
Here's part of what is needed already implemented:
http://talk.maemo.org/showthread.php?t=34982
__________________

Unofficial PR1.3/Meego 1.1 FAQ

***
Classic example of arbitrary Nokia decision making. Couldn't just fallback to the no brainer of tagging with lat/lon if network isn't accessible, could you Nokia?
MAME: an arcade in your pocket
Accelemymote: make your accelerometer more joy-ful
 
Posts: 4 | Thanked: 3 times | Joined on Apr 2010
#25
Farook, amazing idea, I would like to make a contribution in this project and be a part of it
 
Posts: 18 | Thanked: 72 times | Joined on Sep 2008
#26
I've actually been thinking about doing something like this myself; I have a minor proof-of-concept hacked together, but the only thing I can't figure out how to do myself is detect when the headset button is pressed. I figure it must be some D-BUS signal that I haven't located yet; can anyone with knowledge of Maemo internals point me to the right place for that?
 
Posts: 18 | Thanked: 72 times | Joined on Sep 2008
#27
I'm beginning some exploratory work on developing an AUI for the N900. I may at some point want testers to give me an idea of how the interface feels for them, but for now, what would really help me is this: what kind of functionality would you like it to have? In specific, what actions would you like to be able to take using voice control? Having a set of clear use cases will make it much easier to focus my development.
 

The Following 2 Users Say Thank You to Cirne For This Useful Post:
Posts: 73 | Thanked: 66 times | Joined on May 2011
#28
This idea seems pretty much dead. What a shame.

A speech engie would be very attractive. One function is commands, another is text input. Speech recognition software exist for GNU. Could this not be used as a basis for scripting a system that would react to basic commands and take appropriate actions as a result. For example 'music next song' or 'music stop' a script to make it first look into the different options under music and automatically recognise this is relevant to music player, and then secondly perform the action 'stop' or 'next song'. Then another thing could be for example 'PDF read page 19' or 'browser read *find on page*speech to text* I'm beginning' *start*. Or 'speech to text *where marker is*start*'. Another few options 'open browser'. listens to open command first, then action. Or 'call contact John', or 'call number 12345678'.
As you can see, I am no programmer, I can imagine it would be quite a work to script a speech recognition tool and speech to text tool, but in the end it would bring much joy and many advantages.

Organising it in a tree structure like described would make the first job a bit easier, and would open easier programming for the people who would program specific actions and syntaxes to fit the main commands.

Just my opinion.
 

The Following User Says Thank You to zeebra For This Useful Post:
Posts: 293 | Thanked: 163 times | Joined on Jan 2012 @ beijing-islamabad
#29
any one into this ? i just had a idea to pronounce on the forum but guess what it is already there , some thing like siri or nuance speech recognition . Then i had this video seen which was quite emphasizing on the pocket-sphinx http://www.youtube.com/watch?v=cOf1XQyxyHU
something written in python ? could it be possible in anyway on maemo 5 ? just an idea though !
 

The Following User Says Thank You to imo For This Useful Post:
Reply

Tags
aui, handsfree, text-to-speech, voice command, voice control, voice recognition

Thread Tools

 
Forum Jump


All times are GMT. The time now is 10:06.